John MacIntyre has written a lovely
blog about his first experience with "Clean Code". The change in the structure of his code, and in his attitude about code, is dramatic. As an author, there is little that can be more gratifying and inspiring than to see this kind of direct result. So, thank you John!
On twitter a few weeks ago, John suggested that I review the code that he wrote. I was happy to agree to do this. There's little I like better than reading other people's code. I learn much more from other people's code than I learn by writing my own. So again, thank you John for letting me pick a few nits.
The first thing I did was to download the zip file from his blog and load up all the source files in to TextMate. I didn't want to compile or run anything just yet, I simply wanted to look at the difference between the two batches of code. John had organized it very nicely into two folders named, appropriately enough,
DirtyDal and
CleanDal.
The
DirtyDal consisted of a single 300 line class that implemented all the CRUD methods for a
Comment table. The code was simple enough, but was laced with lots of distracting comments and was laden with considerable duplication. It was the typical kind of CRUD class that built up SQL commands from arguments, executed those commands, and then extracted the results. Code like this often has a repeating structure since every command follows the same basic form. 1. Construct SQL. 2. Execute SQL. 3. Process results.
The
CleanDal consisted of many very small classes arranged in an pleasant inheritance hierarchy that eliminated much of the duplication. The base classes provided the basic framework for a SQL operation, and the derivatives implemented them simply and directly.
So my overall impression was that this was a significant improvement in structure and cleanliness. John made the point in his blog that this new structure took more lines of code, and was more complicated. While it's true that more lines are used, most of those lines are boilerplate required by the statically typed nature of C#. There are fewer actual executable lines, and that's the important measure. As to whether the new code is more complicated, it certainly has a more complex structure. But the code within that structure is much simpler. The structure is visible, and it's contents are simple. To me, that's a rather significant improvement.
So then I loaded the solution into VS. Everything came up nicely, and the project built without trouble. Good! I hate compile and path errors in downloaded software.
Next, I ran the tests. John was good enough to include tests! Indeed, it was from reading those tests that I learned how he intended his inheritance hierarchy to work. So, well done John!
Unfortunately the tests did not run. They failed because they tried to connect to a database that did not exist on my computer. Why do I need to create a dabase just to run these tests? The database is irrelevant to the tests!
"Wait!" you say. "This is a CRUD application, how could the database be irrelevant to the tests?" Simple. I
believe the database works, so there's no need to test
it. All I care about is the code that John wrote. I just want to be sure that the SQL commands care created properly, and that the returned data is processed appropriately. The database is a complication that I'd rather not deal with.
My next complaint is that John did not use one of the standard test frameworks, like NUnit. Instead he just wrote his own little main program that called his functions and made sure they worked. Worse, those tests were written at a fairly high level. For example, he tested that
InsertComment worked by calling
InsertComment, and then by calling
SelectComment to make sure the data was inserted. For my money this test touches way too much code! It does not prove to me that the SQL is generated properly in each situation, and I am not convinced that complementary bugs in
Insert and
Select aren't canceling each other out and generating false positives.
To fix these tests I tried to create a mock derivative of
SqlConnection but, of course, it was sealed
(grumble). I thought about wrapping
SqlConnection in an
IConnection interface, but unfortunately
SqlConnection returns
SqlCommand which returns
SqlParameter, all of which are sealed (
double grumble!)
I thought about downloading one of the mocking tools, like TypeMock, but my time is limitted, and I'm not an expert .NET programmer.
So rather than fiddle around with trying to get a mocking framework or a database up and running
(triple grumble) I decided just to refactor without running the tests.
(Gasp, Horror) What can you do when the test environment isn't supplied...
So I went back to the test and started to read. The first thing it does is create a
CommentData object, so I looked at the
CommentData class. Here's a small snippet.
public class CommentData { /// <summary> /// Id of the comment /// </summary> public Int32? CommentId { get; set; }
I really hate these kinds of comments. The comment tells you nothing more than the property name, so why is it there? So I deleted it and it's ilk.
To be fair to Jonathan, this class is not in his "clean" folder. It's one of the classes he didn't refactor. Still, I think the point is an important one so I chose to show it anyway.
Next is this lovely function:
public bool Equals(CommentData obj) { if (!CommentId.Equals(obj.CommentId)) return false; if (!Comment.Equals(obj.Comment)) return false; if (!CommentorId.Equals(obj.CommentorId)) return false; return true; }
Wouldn't this read better if it was written like this:
public bool Equals(CommentData obj) { return CommentId == obj.CommentId && Comment.Equals(obj.Comment) && CommentorId == obj.CommentorId; }
And doesn't this need a
null check for the
Comment property?
Next the test calls the
static Execute method of
CommentInsertCommand.
public static void Execute(CommentData commentData,
int userId,
SqlConnection cn)
{
ThrowExceptionIfExecuteMethodCommentParameterIsInvalid(commentData);
List<CommentData> comments = new List<CommentData>(1);
comments.Add(commentData);
Execute(comments, userId, cn);
}
This function is nice and small. However, the creation of the list comes out of nowhere. It make sense when you look at the
Execute statement two lines down, but I dislike the fact that the reader has to look two lines down to understand what's going on. So this might be better:
public static void Execute(CommentData commentData,
int userId,
SqlConnection cn) {
ThrowExceptionIfExecuteMethodCommentParameterIsInvalid(commentData);
Execute(toList(commentData), userId, cn);
}
private static List<CommentData> toList(CommentData commentData) {
List<CommentData> comments = new List<CommentData>(1);
comments.Add(commentData);
return comments;
}
The function that checks the exceptions looks like this:
protected static void ThrowExceptionIfExecuteMethodCommentParameterIsInvalid(
CommentData commentData) {
if (null == commentData)
throw new ArgumentNullException("commentData");
if (commentData.CommentId.HasValue)
throw new ArgumentNullException(
"commentData",
"CommentInsertCommand is only for adding new data.");
}
I understand the old C programmer's trick of inverting equality statements. It prevents the inadvertent omission of a
= from being silent. But I don't like it. It doesn't read right. I prefer:
if (commentData == null)
CommentData is the subject of that sentence, and
null is the direct object. Inverting them doesn't jive well with the way we think. I dislike any statement in code that causes the reader to do a double-take. Code that protects the author at the expense of the reader is flawed code.
There is another function in this class that has the following name:
ThrowExceptionIfExecuteMethodCommentsParameterIsInvalid
Can you tell the difference between that function name, and the one we just looked at? It bothers me that these two function have names that are so very similar. What's worse is that they use two different conventions. The latter refers directly to the
comments argument of the function that calls it. However, the former does not use the precise name of the argument it is checking. Rather it abbreviates
CommentData to
Comment.
Fixing this changes the method names as follows:
ThrowExceptionIfExecuteMethodCommentDataParameterIsInvalid
ThrowExceptionIfExecuteMethodCommentsParameterIsInvalid
That's really not that much better. So how about this:
ThrowIfCommentDataIsInvalid(CommentData)
ThrowIfListOfCommentsIsInvalid(IEnumerable<CommentData>)
Or maybe even this:
ThrowIfInvalid(CommentData)
ThrowIfInvalid(IEnumerable<CommentData>)
Usually I prefer the longer names. But in this case the argument type is easily seen as part of the function name. So I think I like it better.
The implementation of the second function is:
protected static void ThrowIfInvalid(IEnumerable<CommentData> comments) {
if (comments == null)
throw new ArgumentNullException("comments");
if (AreAllCommentsNew(comments))
throw new ArgumentNullException(
"comments", "CommentInsertCommand is only for saving new data.");
}
protected static bool AreAllCommentsNew(IEnumerable<CommentData> Comments) {
return (0 == Comments.Count(x => !x.CommentId.HasValue));
}
Notice that the implementations of these two functions have some redundancy. They both perform the following check:
commentData.CommentId.HasValue
And they both emit the same (or very similar) exception message. It seems to me that the check in the first function need not exist. It can be shortened to:
protected static void ThrowIfInvalid(CommentData commentData) {
if (commentData == null)
throw new ArgumentNullException("commentData");
}
That's about all I have time for today. The dog is barking at my door, and my wife has fish cooking upstairs.