The Clean Coder: September 2010

Friday, September 24, 2010

The Hacker, The Novice, The Artist, and The Craftsman.

In my last blog "Too Small to Refactor", I made the statement:

"Clean code has always been about money, and has never been about art."

Apparently this took a few people by surprise. One person comented:

"...I thought we were talking about craft, and the cost-cutting as a subproduct..."

So what is the difference between a craftsman and an artist? And just to add some spice, how do they differ from a hacker and a novice?

I realize that I am making a Myers-Briggs type error. People cannot truly be classified using binary attributes. A person classified by MBTI as an introvert, certainly has some extrovert characteristics. By the same token a programmer who shows some attributes of a hacker, probably also shows some attributes of a craftsman.

Still, I think the definitions of these terms can be useful as a way to classify programmer behaviors.

What is the difference between the Hacker, the Novice, the Artist, and the Craftsman? It's all about their personal definition of "Done".

The Artist is done when the structure and behavior of the artist's code has achieved perfection in the artist's eyes. There is no consideration of money spent, or money to be earned. The artist does not care about ROI. The artist does not care how long it takes. The artist cares only about the final result.

An artist will spend hours, days, even weeks, on a single line of code. An artist will throw away whole functions and modules because they somehow don't feel right. An artist polishes, and polishes, and polishes in pursuit of some elusive goal of perfection.

The Hacker is done when the behavior of the code achieves some personal goal. The Hacker is not concerned with ROI. The Hacker does not care about the code at all. The Hacker does not care about how much, or how little, is spent creating the code. The Hacker does not care if anyone else ever uses the code. The Hacker only cares about making it work -- once. After that, the Hacker loses interest.

The Novice is done as soon as the code works "well enough". The Novice strives to minimize initial coding time. The Novice is not concerned about ROI. The future cost of the code is of no concern to the Novice. Nor does the Novice care about the number of hidden and/or subtle defects left in the code. The Novice simply wants to get to the next task as soon as possible. The Novice is driven by schedule; or rather, the Novice is driven by pleasing managers who are driven by schedule.

The Craftsman is done when ROI has been maximized. The Craftsman strives to be a good steward of the monies being spent. The Craftsman want to make sure that every dollar goes as far as it can, and earns as much as it can in return. Therefore the Craftsman makes sure the code works, and can be kept working with minimum extra cost.

The Craftsman understands that most defects in behavior and structure will be very expensive to repair later and very inexpensive to eliminate now. So the Craftsman pushes towards a very clean implementation. But the Craftsman also recognizes that some rare defects in behavior and structure are going to cost more to eliminate than to tolerate; and so the Craftsman uses judgment, acquired over years, to maximize ROI.

Saturday, September 18, 2010

The Uncertainty Principle, and the Quantum Discontinuity.

What is the uncertainty principle? Some people think of it as the "Observer Effect", the fact that any time you measure an object, you change something about it. For example, to see where an object is you have to bounce photons of light off of it, and those photons change the object.

But that's not what the uncertainty principle really is. The Heisenberg Uncertainty Principle (HUP) is defined as follows:

That is, the uncertainty in position times the uncertainty in momentum is greater than or equal to Plank's constant over two. Plank's constant being 1.054571628(53)×10⁻³⁴ J·s

What this means is that you cannot know both the position and momentum of something. To the extent you know one, the other is uncertain. Of course Plank's constant is a very small number, so most of the time the uncertainty of our knowledge does not matter. It only becomes significant over very short distances and very small changes in momentum.

To understand where this uncertainty comes from, we need to understand that all matter and all energy is composed of waves. I know this is difficult to envision, but take it on faith for the moment because once you accept this fact the uncertainty principle suddenly makes a great deal of sense.

Imagine a circular pond of very still water. The surface is like a mirror in all directions. Hold a stick in your hand, and insert the tip into the water. Move the stick up and down at a fixed frequency. Waves will ripple away from the stick in all directions, completely filling the pond. The waves have no certain position. They are everywhere on the surface of the lake. However, because the frequency is fixed, the waves have a very well defined energy (momentum). That's the first half of the HUP, we know the energy but we cannot determine any position.

Now change the way you move the stick. Shake it up and down randomly -- perfectly randomly! Make sure you incorporate every frequency into your motion. The surface of the lake will return to it's mirror sheen because all frequencies will integrate out. You will know where all the energy is, it's in the stick, but you won't know how much energy there is because it's all random. And that's the second half of the HUP. You know the position of the energy, but the quantity is purely random.

That's how photons work. Photons are "particles" of light. But does light really move as a spray of particles? It sometimes seems to. However, it also seems to move in waves like the waves of a pond.

Plank showed that the energy of a photon is equal to Plank's constant times the wavelength of the photon's frequency:

What this means is that we cannot measure energies that are not integral multiples of this quantity. Indeed, no interaction between two particles can take place unless there is an exchange of an integral multiple of that quantity. For example, an interaction that would require half that energy simply could not occur.

So now, imagine a light source that emits light at a fixed frequency at a rate exactly equal to one photon per second. Is this light source emitting one photon per second? Or is it filling the space around it with light waves. The answer is both. The space around the light source is filled with a field of waves. However, those waves cannot interact with any other matter more than once per second. And the position of that interaction is, by the Uncertainty Principle, random. So if you set up a screen around the light source, you'd see tiny little sparks of light in random positions at roughly one second intervals.

The waves are all there, filling space like the waves on the pond, but the energy of those waves can only be deposited in fixed quantities, and the position of each one of those deposits is random. If you put such a light source in the center of a room, it would "illuminate" that room. However, your eyes would only register the photons that managed to randomly deposit their energy on your retina via a pathway that passed through your pupil after reflecting off the objects in the room. And that would happen at a rate much less than once per second because most of the photons emitted by the light source would deposit their energy somewhere other than in your eye.

What is bouncing off the furniture in the room? Photons? No, it is the waves that are reflecting off the objects in the room and that are passing through your pupil, refracting through your cornea and lense, and "striking" your retina. And then those waves deposit their energies as photons at uncertain locations. The waves determine the probability that the energy will be deposited at one uncertain place or another. So in some very real sense the waves are waves of probability. If an area of the room is in shadow, no waves will be present in that part of the room, and so the probability that the waves will deposit their energy as photons in that area is zero.

If you put a camera in the room and left the shutter open for a very long time, the camera would record a perfectly normal image of an illuminated room. Over time the field of waves would deposit some of it's energy as photons in the camera's receptor. The probability information carried by those waves would cause those photons to build up an image of the room.

Indeed, that's what's going on right now as you are looking at these words. The field of waves leaving your computer screen carries probability information to your retina, causing photon's to be randomly deposited there. It's just that the flux of photons is so huge, that we do not notice their randomness.

What does this have to do with software?

Nothing directly.

Friday, September 17, 2010

Too Small to Refactor?

In John MacIntyre's second blog about Clean Code he presented a very simple little payroll calculator, refactored it, and then asked whether it was truly worth refactoring in an ROI sense. His conclusion was that he would probably not refactor it in a "real" project, but probably would refactor it if it were his own project. His reasoning was that the refactoring was worth doing for artistic reasons, but not for financial reasons.

This argument suggests that Clean Code is about art rather than money. This is a fundamental flaw in logic. Clean code has always been about money, and has never been about art. Craftsmen keep their code clean because they know that clean code costs less. They know that cleaning code has a one-time cost, but that leaving it unclean has a long-term repeating chronic cost that increases with time. Craftsmen understand that the best way to reduce cost and increase ROI is to keep their code very clean.

Here was the code that John began with (I've translated it from C# to Java for my own sanity.)

If we are going to refactor this, we're going to need some tests. So the first thing I did was to write enough tests to cover the code.

The algorithm was a little bit wordy, so I shorted it up a bit and made the two sections of the if statement independent of each other.

Next I got rid of that boolean argument. Boolean arguments are always troublesome beasts. Some poor schmuck is bound to call it with the wrong value, and all the poor people reading this code will wonder whether they should look up the argument to see what it means. Boolean arguments loudly declare that this function does two things instead of one thing.

This had a profound effect on the tests. The tests look almost like they are using two derivatives rather than two instances of the same class. Indeed, we should probably continue pushing the refactoring in that direction. Creating two derivatives is simple enough. First I changed the tests to create instances of the derivatives, and then I wrote the derivaties themselves.

That sets things up nicely. Now I just need to push the calculate method down into the two derivatives.

Nice! Now all I need to do is refactor the two derivatives.

Now that's nice! Nearly the same number of lines of code as the original, and so much cleaner! But was it worth it?

Of course it was! The two business rules have completely decoupled from each other. They are in totally different files, and know nothing about each other. If someone adds a new kind of pay calculation, like a SalariedCalculator, none of these files will need to change! (We call that the Open Closed Principle by the way.) Think about what we'd have had to do with the old implementation! Boolean's don't split into three very well.

Yes, this was worth it. It was worth it because we've all been impeded by bad code before. We all know that bad code slows down everyone who reads it, every time they read it! Bad code is like Herpes. It's the gift that keeps on giving.

Thursday, September 9, 2010

John MacIntyre's Clean Code Experience #1.

John MacIntyre has written a lovely blog about his first experience with "Clean Code".   The change in the structure of his code, and in his attitude about code, is dramatic. As an author, there is little that can be more gratifying and inspiring than to see this kind of direct result. So, thank you John!

On twitter a few weeks ago, John suggested that I review the code that he wrote. I was happy to agree to do this. There's little I like better than reading other people's code. I learn much more from other people's code than I learn by writing my own. So again, thank you John for letting me pick a few nits.

The first thing I did was to download the zip file from his blog and load up all the source files in to TextMate. I didn't want to compile or run anything just yet, I simply wanted to look at the difference between the two batches of code. John had organized it very nicely into two folders named, appropriately enough, DirtyDal and CleanDal.

The DirtyDal consisted of a single 300 line class that implemented all the CRUD methods for a Comment table.   The code was simple enough, but was laced with lots of distracting comments and was laden with considerable duplication. It was the typical kind of CRUD class that built up SQL commands from arguments, executed those commands, and then extracted the results. Code like this often has a repeating structure since every command follows the same basic form. 1. Construct SQL. 2. Execute SQL. 3. Process results.

The CleanDal consisted of many very small classes arranged in an pleasant inheritance hierarchy that eliminated much of the duplication. The base classes provided the basic framework for a SQL operation, and the derivatives implemented them simply and directly.

So my overall impression was that this was a significant improvement in structure and cleanliness. John made the point in his blog that this new structure took more lines of code, and was more complicated. While it's true that more lines are used, most of those lines are boilerplate required by the statically typed nature of C#. There are fewer actual executable lines, and that's the important measure. As to whether the new code is more complicated, it certainly has a more complex structure. But the code within that structure is much simpler. The structure is visible, and it's contents are simple. To me, that's a rather significant improvement.

So then I loaded the solution into VS. Everything came up nicely, and the project built without trouble. Good! I hate compile and path errors in downloaded software.

Next, I ran the tests. John was good enough to include tests! Indeed, it was from reading those tests that I learned how he intended his inheritance hierarchy to work. So, well done John!

Unfortunately the tests did not run. They failed because they tried to connect to a database that did not exist on my computer. Why do I need to create a dabase just to run these tests? The database is irrelevant to the tests!

"Wait!" you say. "This is a CRUD application, how could the database be irrelevant to the tests?" Simple. I believe the database works, so there's no need to test it. All I care about is the code that John wrote. I just want to be sure that the SQL commands care created properly, and that the returned data is processed appropriately.   The database is a complication that I'd rather not deal with.

My next complaint is that John did not use one of the standard test frameworks, like NUnit. Instead he just wrote his own little main program that called his functions and made sure they worked. Worse, those tests were written at a fairly high level. For example, he tested that InsertComment worked by calling InsertComment, and then by calling SelectComment to make sure the data was inserted. For my money this test touches way too much code! It does not prove to me that the SQL is generated properly in each situation, and I am not convinced that complementary bugs in Insert and Select aren't canceling each other out and generating false positives.

To fix these tests I tried to create a mock derivative of SqlConnection but, of course, it was sealed (grumble). I thought about wrapping SqlConnection in an IConnection interface, but unfortunately SqlConnection returns SqlCommand which returns SqlParameter, all of which are sealed (double grumble!)

I thought about downloading one of the mocking tools, like TypeMock, but my time is limitted, and I'm not an expert .NET programmer.

So rather than fiddle around with trying to get a mocking framework or a database up and running (triple grumble) I decided just to refactor without running the tests. (Gasp, Horror) What can you do when the test environment isn't supplied...

So I went back to the test and started to read. The first thing it does is create a CommentData object, so I looked at the CommentData class. Here's a small snippet.

public class CommentData {
    /// <summary>
    /// Id of the comment
    /// </summary>
    public Int32? CommentId { get; set; }

I really hate these kinds of comments. The comment tells you nothing more than the property name, so why is it there? So I deleted it and it's ilk.

To be fair to Jonathan, this class is not in his "clean" folder. It's one of the classes he didn't refactor. Still, I think the point is an important one so I chose to show it anyway.

Next is this lovely function:

    public bool Equals(CommentData obj) {
      if (!CommentId.Equals(obj.CommentId)) return false;
      if (!Comment.Equals(obj.Comment)) return false;
      if (!CommentorId.Equals(obj.CommentorId)) return false;
      return true;
    }

Wouldn't this read better if it was written like this:

    public bool Equals(CommentData obj) {
      return CommentId == obj.CommentId &&
             Comment.Equals(obj.Comment) &&
             CommentorId == obj.CommentorId;
    }

And doesn't this need a null check for the Comment property?

Next the test calls the static Execute method of CommentInsertCommand.

public static void Execute(CommentData commentData,
                             int userId,
                             SqlConnection cn)
{
      ThrowExceptionIfExecuteMethodCommentParameterIsInvalid(commentData);
      List<CommentData> comments = new List<CommentData>(1);
      comments.Add(commentData);
      Execute(comments, userId, cn);
}

This function is nice and small. However, the creation of the list comes out of nowhere. It make sense when you look at the Execute statement two lines down, but I dislike the fact that the reader has to look two lines down to understand what's going on. So this might be better:

    public static void Execute(CommentData commentData,
                               int userId,
                               SqlConnection cn) {
      ThrowExceptionIfExecuteMethodCommentParameterIsInvalid(commentData);
      Execute(toList(commentData), userId, cn);
    }

    private static List<CommentData> toList(CommentData commentData) {
      List<CommentData> comments = new List<CommentData>(1);
      comments.Add(commentData);
      return comments;
    }

The function that checks the exceptions looks like this:

    protected static void ThrowExceptionIfExecuteMethodCommentParameterIsInvalid(
                          CommentData commentData) {
      if (null == commentData)
        throw new ArgumentNullException("commentData");
      if (commentData.CommentId.HasValue)
        throw new ArgumentNullException(
          "commentData",
          "CommentInsertCommand is only for adding new data.");
    }

I understand the old C programmer's trick of inverting equality statements. It prevents the inadvertent omission of a = from being silent. But I don't like it. It doesn't read right. I prefer:

      if (commentData == null)

CommentData is the subject of that sentence, and null is the direct object. Inverting them doesn't jive well with the way we think. I dislike any statement in code that causes the reader to do a double-take. Code that protects the author at the expense of the reader is flawed code.

There is another function in this class that has the following name:

ThrowExceptionIfExecuteMethodCommentsParameterIsInvalid

Can you tell the difference between that function name, and the one we just looked at? It bothers me that these two function have names that are so very similar. What's worse is that they use two different conventions. The latter refers directly to the comments argument of the function that calls it. However, the former does not use the precise name of the argument it is checking. Rather it abbreviates CommentData to Comment.

Fixing this changes the method names as follows:

ThrowExceptionIfExecuteMethodCommentDataParameterIsInvalid

ThrowExceptionIfExecuteMethodCommentsParameterIsInvalid

That's really not that much better. So how about this:

ThrowIfCommentDataIsInvalid(CommentData)
ThrowIfListOfCommentsIsInvalid(IEnumerable<CommentData>)

Or maybe even this:

ThrowIfInvalid(CommentData)

ThrowIfInvalid(IEnumerable<CommentData>)

Usually I prefer the longer names. But in this case the argument type is easily seen as part of the function name. So I think I like it better.

The implementation of the second function is:

    protected static void ThrowIfInvalid(IEnumerable<CommentData> comments) {

      if (comments == null)

        throw new ArgumentNullException("comments");

      if (AreAllCommentsNew(comments))

        throw new ArgumentNullException(

          "comments", "CommentInsertCommand is only for saving new data.");

    }

    protected static bool AreAllCommentsNew(IEnumerable<CommentData> Comments) {

      return (0 == Comments.Count(x => !x.CommentId.HasValue));

    }

Notice that the implementations of these two functions have some redundancy. They both perform the following check:

commentData.CommentId.HasValue

And they both emit the same (or very similar) exception message. It seems to me that the check in the first function need not exist. It can be shortened to:

    protected static void ThrowIfInvalid(CommentData commentData) {
      if (commentData == null)
        throw new ArgumentNullException("commentData");
    }

That's about all I have time for today. The dog is barking at my door, and my wife has fish cooking upstairs.