The laws of TDD

Since the introduction of the TDD methodology by Kent Beck, there has been an effort to define a simple framework that serves as a guide for its application in practice.

Initially, Kent Beck proposed two very basic rules:

Don’t write a line of new code unless you first have a failing automated test.
Eliminate duplication.

That is, to be able to write production code, first we must have a test that fails and that requires us to write that code, precisely because that’s what’s needed to pass the test.

Once we’ve written it and checked that the test passes, our effort goes towards reviewing the written code and eliminating duplication as much as possible. This is very generic, because on the one hand it refers to refactoring, and in the other hand, to the coupling between the test and the production code. And being so generic, it’s hard to translate to practical actions.

On top of that, these rules don’t tell us anything about how big the jumps in the code involved in each cycle should be. In his book, Beck suggests that the steps -or baby steps- can be as small or as big as we find them useful. in general, he recommends using small steps when we’re unsure or have little knowledge about the algorithm, while allowing larger steps if we have enough experience and knowledge to be sure about what to do next.

With time, and starting from the methodology learnt from Beck himself, Robert C. Martin established the “three laws”, which not only define the action cycle in TDD, but also provide some criteria about how large the steps should be in each cycle.

It’s not allowed to write any production code unless it passes a failing unit test
It’s not allowed to write more than the one unit test that’s sufficient to fail; and compilation errors are failures
It’s not allowed to write more production code than necessary to pass the one failing unit test

The three laws are what make TDD different to simply writing tests before code.

These three laws impose a series of restrictions whose objective is to force us to follow a specific order and workflow. The define several conditions that, if they’re fulfilled, generate a cycle and guide our decision-making. Understanding how they work will help us to make the most out of TDD to help us produce quality code that we’re able to maintain.

Theses laws have to be fulfilled all at the same time, because they work together.

The laws in detail

It’s not allowed to write any production code unless it passes a failing unit test

The first law states that we can’t write any production code unless it passes an existing unit test that is currently failing. This implies the following:

There has to be a test that describes a new aspect about the behavior of the unit that we’re describing.
This test must fail because there isn’t anything that makes it pass in the production code.

In short, the first law forces us to write a test that defines the behavior that we’re about to implement in the unit of software that we want to develop, all before having to consider how to do it.

Now, how should this test be?

It’s not allowed to write more than the one unit test that’s sufficient to fail; and compilation errors are failures

The second law tells us that the test must be sufficient to fail, and that we have to consider compilation errors as failures (or their equivalent in interpreted languages). For examples, among these errors there would be some as obvious as that the class or function doesn’t exist or hasn’t been defined yet.

We must avoid the temptation of writing and skeleton of the class or function before writing the first test. Remember that we’re talking about Test Driven Development. Therefore, it’s the tests that tell us what production code to write and when to write it, and not the opposite.

For the test, “being sufficient to fail” means that it must be very small in several ways, and this is something that is quite difficult to define at first. Frequently we talk about the “simplest” test, the simplest case, but it’s not exactly this way.

What conditions must a TDD test meet, especially the first one?

Well, basically it should force us to write the minimum possible amount of code that can be executed. That minimum, in OOP, would be to instantiate the class that we want to develop without worrying about any further details, for now. The specific test will vary a little depending on the language and testing framework that we’re using.

Let’s take a look at this example from the Leap Year kata, where we write code to find out if a year is a leap year or not. In the example, my intention is to create a Year object to which I can ask if it’s a leap year by sending it the message IsLeap. I’ve come upon this exercise in several kata compilations. For this chapter the examples will be written in C#.

The rules are:

The years not divisible by 4 aren’t leap years (for example, 1997).
The years divisible by 4 are leap years (1999), unless:
If they’re divisible by 100 they aren’t leap years (1900).
If they’re divisible by 400 they are leap years (2000).

Our goal would be to be able to use Year objects in this manner:

1 var year = new Year(1996);
2 
3 if (year.IsLeap()) {
4 	// do something
5 }

The usual impulse is to try and start in the following way, because it looks like it’s the example of the simplest possible case.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void ShouldNotBeLeapYear()
11         {
12             var year = new Year(1997);
13             Assert.False(year.IsLeap())
14         }
15     }
16 }

However, it’s not the simplest test that could fail for only one reason. Actually, it can fail for at least five different reasons:

The Year class doesn’t exist yet.
It also wouldn’t accept parameters passed by the constructor.
It doesn’t answer to the IsLeap message.
It could return nothing.
It could return an incorrect response.

That is, we can expect the test to fail for those five causes, and only the fifth is the one that’s actually being described by the test. We have to reduce them to just one.

In this case, it’s very easy to see that there’s a dependency between the various causes of failure, in such a way that for one to surface, the previous ones have to be solved. Obviously, it’s necessary to have a class that we can instantiate. Therefore, our first test should be much more modest and just expect that the class can be instantiated:

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year();
13         }
14     }
15 }

If we ran this test, we would see it fail for obvious reasons: the class that we try to instantiate isn’t anywhere to be found. The test is failing by a compilation -or equivalent- problem. Therefore, it could be a test that’s sufficient to fail.

Throughout the process we’ll see that this test is redundant and that we can do without it, but let’s not get ahead of ourselves. We still have to make it pass.

It’s not allowed to write more production code than necessary to pass the one failing unit test

The first and the second laws state tha we have to write a test and tell us how that test should be. The third law tells us how production code should be: the condition is that it must pass the test that we’ve written.

It’s very important to understand that it’s the test the one that tells us what code we need to implement, and therefore, even though we have the certainty that it’s going to fail because we don’t even have a file with the necessary code to define the class, we still must execute the test and expect its error message.

That is: we must see that the test, indeed, fails.

The first thing that it’ll tell us when trying to run it is that the class doesn’t exist. In TDD, that’s not a problem, but rather an indication about what we have to do: add a file that contains the definition of the class. It’s possible that we’re able to generate that code automatically using the IDE tools, and it would be advisable to do it that way.

In our example, the message of the test says:

1 The type or namespace name 'Year' could not be found 
2 (are you missing a using directive or an assembly reference?)

And we would simply have to create the Year class.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year();
13         }
14     }
15 
16     public class Year
17     {
18     }
19 }

At this point we run the test again to make sure it passes from red to green. In many languages this code will be enough. In some cases you might need something more.

If this is so and the test passes, the first cycle is complete and we can move on to the next behavior, unless we think that we have the chance to refactor the existing code. For example, the usual thing to do here would be to move the Year class to its own file.

1 namespace LeapYear
2 {
3     public class Year
4     {
5     }
6 }

If the test hasn’t passed, we’ll look at the message of the failed test and we’ll act accordingly, adding the minimum code necessary for it to finally pass and turn green.

The second test and the three laws

When we’ve managed to pass the first test by applying the three laws, we might think that we haven’t really achieved anything. We haven’t even tackled the possible parameters that the class might need in order to be constructed, be them data, collaborators -in the case of services-, or use cases. Even the IDE is complaining that we’re not assigning the instantiated object to any variable.

However, it’s important to adhere to the methodology, especially in these first stages. With practice and the help of a good IDE, the first cycle will have taken a few seconds at most. In these few seconds we’ll have written a piece of code that, while certainly very small, is completely backed by a test.

Our goal still is to let the tests dictate what code we must write to implement each new behavior. Since our first test already passes, we have to write the second.

Applying the three laws, what comes next is:

Write a new failing test that defines a behavior
That this test is the smallest possible one that still forces us to make a change in the production code
Write the minimum and sufficient production code to pass the test

Which could be the next behavior that we need to define? If in the first test we’ve forced ourselves to write the minimum necessary code to instantiate the class, the second test can lead us through two paths:

Force us to write the necessary code to validate constructor parameters and, therefore, be able to instantiate an object with everything that it needs.
Forcing us to introduce the method that executes the desired behavior.

This way, in our example, we could simply make sure that Year is able to answer the IsLeap message.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year();
13         }        
14         
15         [Test]
16         public void CanRespondToIsLeapMessage()
17         {
18             var year = new Year();
19             year.IsLeap();
20         }
21     }
22 }

The test will throw this error message:

1 'Year' does not contain a definition for 'IsLeap' and no accessible extension...

Which tells us that the next step should be to introduce the method that answers that message:

1 namespace LeapYear
2 {
3     public class Year
4     {
5         public void IsLeap()
6         {
7         }
8     }
9 }

The test passes, indicating that the objects of type Year can now attend to the IsLeap message.

Having arrived at this point, we could ask ourselves: what if we don’t obey the three laws of TDD?

Violations of the three laws and their consequences

Disregarding the easy joke that we’ll end up fined or in jail for not following the laws of TDD, the truth is that we’d really have to suffer some consequences.

First law: writing production code without having a test

The most immediate consequence is that we break the red-green cycle. The code that we write is no longer guided or covered by tests. In fact, if we wish to have that part tested, we’ll have to write “a posteriori” tests (QA tests).

Imagine that we do this:

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         private readonly int _aYear;
 8 
 9         public Year(int aYear)
10         {
11             _aYear = aYear;
12         }
13 
14         public bool? IsLeap()
15         {
16             if (_aYear % 4 == 0)
17             {
18                 return true;
19             }
20             
21             return false;
22         }
23     }
24 }

The existing tests will fail because we need to pass a parameter to the constructor, and also we don’t have any test that’s in charge of verifying the behavior that we’ve just implemented. We’d have to add tests to cover the functionality that we’ve introduced, but they’re no longer driving the development.

Second law: writing more that one test that fails

We can interpret this in two ways: writing multiple tests, or writing one test that involves too big a jump in behavior.

Writing multiple tests would cause several problems. To pass them all we would need to implement a large amount of code, and the guide that those tests could be providing gets so blurry that it’s almost like having none at all. We wouldn’t have clear and concrete indications to solve by implementing new code.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year();
13         }        
14         
15         [Test]
16         public void CanRespondToIsLeapMessage()
17         {
18             var year = new Year();
19             year.IsLeap();
20         }
21 
22         [Test]
23         public void CommonYearsShouldNotBeLeap()
24         {
25             var year = new Year(1997);
26             Assert.False(year.IsLeap());
27         }
28         
29         [Test]
30         public void YearDivisibleBy4ShouldBeLeap()
31         {
32             var year = new Year(1996);
33             Assert.True(year.IsLeap());
34         }
35     }
36 }

Here we’ve added two tests. To make them pass we’d have to define two behaviors. Also, they’re too large. We haven’t yet established, for example, that we’ll have to pass a parameter to the constructor, nor that the response will be of type bool. These tests mix various responsibilities and try to test too many things at once. We’d have too write too much production code in only one go, leading to insecurity and room for errors to occur.

Instead, we need to make tests for smaller increments of functionality. We can see several possibilities:

To introduce that the answer is a bool we can assume that, by default, the years aren’t leap years, so we’ll expect a false response:

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year();
13         }        
14         
15         [Test]
16         public void CanRespondToIsLeapMessage()
17         {
18             var year = new Year();
19             year.IsLeap();
20         }
21 
22         [Test]
23         public void ByDefaultYearsAreNotLeapYears()
24         {
25             var year = new Year();
26             Assert.False(year.IsLeap());
27         }
28     }
29 }

The error is:

1 Argument 1: cannot convert from 'void' to 'bool?'

Can be solved with:

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         public Year()
 8         {
 9 
10         }
11 
12         public bool IsLeap()
13         {
14             return false;
15         }
16     }
17 }

However, we have another way to do it. Since the language is strongly typed, we can use the type system as a test. Thus, instead of creating a new test:

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year();
13         }        
14         
15         [Test]
16         public void CanRespondToIsLeapMessage()
17         {
18             var year = new Year();
19             year.IsLeap();
20         }
21     }
22 }

We change the return type of IsLeap:

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         public Year()
 8         {
 9 
10         }
11 
12         public bool IsLeap()
13         {
14             
15         }
16     }
17 }

When we run the test it will indicate that there’s a problem, as the function isn’t returning anything:

1 'Year.IsLeap()': not all code paths return a value

And finally, we just have to add a default response, which will be false:

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         public Year()
 8         {
 9 
10         }
11 
12         public bool IsLeap()
13         {
14             return false;
15         }
16     }
17 }

To introduce the construction parameter we could use a refactoring. In this step we could be conditioned by the programming language, leading us to different solutions.

The refactoring path is straightforward. We just have to incorporate the parameter, although we won’t use it for now. In C# and other languages we could do it by introducing an alternative constructor, and in this way the tests would continue to pass. In other languages, we could mark the parameter as optional.

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         private readonly int _aYear;
 8 
 9         public Year()
10         {
11             
12         }
13 
14         public Year(int aYear)
15         {
16             _aYear = aYear;
17         }
18         
19         public bool IsLeap()
20         {
21             return false;
22         }
23     }
24 }

Since a parameterless constructor doesn’t make sense for us, now we could remove it, but first we’d have to refactor the tests so that they use the version with a parameter:

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanInstantiate()
11         {
12             new Year(1997);
13         }        
14         
15         [Test]
16         public void CanRespondToIsLeapMessage()
17         {
18             var year = new Year(1997);
19             year.IsLeap();
20         }
21     }
22 }

The truth is that we don’t need the first test anymore, since it’s implicit in the other one.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanRespondToIsLeapMessage()
11         {
12             var year = new Year(1997);
13             year.IsLeap();
14         }
15 
16     }
17 }

And now we can remove the parameterless constructor, as it won’t be used again in any case:

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         private readonly int _aYear;
 8 
 9         public Year(int aYear)
10         {
11             _aYear = aYear;
12         }
13         
14         public bool IsLeap()
15         {
16             return false;
17         }
18     }
19 }

Third law: writing more production code than necessary to pass the test

This one is probably the most common violation of the three. We often come to a point where we “see” the algorithm so clearly that we feel tempted to write it now and finish the process for good. However, this can lead us to miss some situations. For example, we could “see” the general algorithm of an application and implement it. But, this could have distracted us and prevented us from considering one or several particular cases. This possibly incomplete algorithm, once incorporated to the application and deployed, could lead us to errors in production and even to economic losses.

For example, if we add a test to check that we control non-leap years:

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanRespondToIsLeapMessage()
11         {
12             var year = new Year(1997);
13             year.IsLeap();
14         }
15 
16         [Test]
17         public void CommonYearsAreNoLeapYears()
18         {
19             var year = new Year(1997);
20             Assert.False(year.IsLeap());
21         }
22     }
23 }

In the current state of our exercise, an example of excess of code would be this:

 1 using System;
 2 
 3 namespace LeapYear
 4 {
 5     public class Year
 6     {
 7         private readonly int _aYear;
 8 
 9         public Year(int aYear)
10         {
11             _aYear = aYear;
12         }
13         
14         public bool IsLeap()
15         {
16             if (_aYear % 4 == 0)
17             {
18                 if (_aYear % 400 == 0)
19                 {
20                     return false;
21                 }
22 
23                 if (_aYear % 100 == 0)
24                 {
25                     return true;
26                 }
27 
28                 return true;
29             }
30             return false;
31         }
32     }
33 }

This code passes the test, but as you can see, we’ve introduced much more than it was necessary to achieve the behavior defined by the test, adding code to control leap years and special cases. So, apparently, everything is fine.

If we try a leap year we’ll see that the code works, which reinforces our impression that all’s good.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanRespondToIsLeapMessage()
11         {
12             var year = new Year(1997);
13             year.IsLeap();
14         }
15 
16         [Test]
17         public void CommonYearsAreNoLeapYears()
18         {
19             var year = new Year(1997);
20             Assert.False(year.IsLeap());
21         }
22 
23         [Test]
24         public void ShouldIdentifyStandardLeapYears()
25         {
26             var year = new Year(1996);
27             Assert.True(year.IsLeap());
28         }
29     }
30 }

But, a new test fails. Years divisible by 100 should not be leap years (unless they are also divisible by 400), and this error has been in our program for a while, but until now we didn’t have any test that executed that part of the code.

 1 using System;
 2 using System.Runtime.Remoting.Metadata.W3cXsd2001;
 3 using NUnit.Framework;
 4 
 5 namespace LeapYear
 6 {
 7     public class Tests
 8     {
 9         [Test]
10         public void CanRespondToIsLeapMessage()
11         {
12             var year = new Year(1997);
13             year.IsLeap();
14         }
15 
16         [Test]
17         public void CommonYearsAreNoLeapYears()
18         {
19             var year = new Year(1997);
20             Assert.False(year.IsLeap());
21         }
22 
23         [Test]
24         public void ShouldIdentifyStandardLeapYears()
25         {
26             var year = new Year(1996);
27             Assert.True(year.IsLeap());
28         }
29         
30         [Test]
31         public void ShouldIdentifyCenturyExceptionOfLeapYears()
32         {
33             var year = new Year(1900);
34             Assert.False(year.IsLeap());
35         }
36     }
37 }

This is the kind of problem that can go unnoticed when we add too much code at once in order to pass a test. The code excess probably doesn’t affect the test that we were working on, so we won’t know if it hides any kind of problem, and we won’t know it unless we happen to build a test that makes it surface. Or even worse, we won’t know it until the bug explodes in production.

The solution is pretty simple: only add the code that’s strictly necessary to pass the test, even if it’s just returning the value expected by the test itself. Don’t add any behavior if there’s not a failing test forcing you to do it.

In our case, it was the test which verified the handling of the non-leap years. In fact, the next test, which aimed to introduce the detection of standard leap years (years divisible by 4), passed without the need for adding any new code. This leads us to the next point.

What does it mean if a test passes just after writing it?

When we write a test and it passes without us needing to add any production code, it can be due to any of these reasons:

The algorithm that we’ve written is general enough to cover all of the possible cases: we’ve completed the development.
The example that we’ve chosen isn’t qualitatively different from others that we had already used, and therefore it’s not forcing us to write production code. We have to find a different example.
We’ve added too much code, which is what we’ve just talked about in the previous bit.

In this leap year kata, for example, we’ll arrive at a point in which there’s no way of writing a test that fails because the algorithm supports all possible cases: regular non-leap years, leap years, non-leap years every 100 years, and leap years every 400.

The other possibility is that the chosen example doesn’t really represent a new behavior, which can be a symptom of a bad definition of the task, or of not having properly analyzed the possible scenarios.

The red-green-refactor cycle

The three laws establish a framework which we could call a “low level” one. Martin Fowler, for his part, defines the TDD cycle in these three phases which would be at a higher level of abstraction:

Write a test for the next piece of functionality that you wish to add.
Write the production code necessary to make the test pass.
Refactor the code, both the new and the old, so that all’s well structured.

These three stages define what’s usually known as the “red-green-refactor” cycle, named like this in relation to the state of the tests in each of the phases of the cycle:

Red: the creation of a failing test (it’s red) that describes the functionality or behavior that we want to introduce in the production software.
Green: the writing of the necessary production code to pass the test (making it green), with which it’s verified that the specified behavior has been added.
Refactor: while keeping the tests green, reorganizing the code in order to improve its structure, making it more readable and sustainable without losing the functionality that we’ve developed up until this point.

In practice, refactoring cycles only arise after a certain number of cycles of the three laws. The small changes driven by these start to accumulate, until they arrive at a point in which code smells start to appear, and with them the need for refactoring.

References

Up next

Fizz Buzz