The choice of the first test

In TDD, the choice of the first test is an interesting problem. In some papers and tutorials about TDD they tend to talk about “the simplest case” and don’t elaborate further. But in reality, we should get used to looking for the smallest test that can fail, which is usually a very different thing.

All in all, it doesn’t seem like a very practical definition, so it probably deserves a more thorough explanation. Is there any somewhat objective way to decide which is the minimum test that can fail?

In search of the simplest test that can fail

Suppose the Roman Numerals kata. It consists in creating a converter between decimal and roman numbers. Suppose that the class is going to be RomanNumeralsConverter, and that the function is called toRoman, so that it would be used more or less like this:

1 $roman = $romanNumeralsConverter->toRoman(54)

According to the “simplest case” approach, we could write a test not unlike this one:

1 public function testShouldConvertOne(): void
2 {
3     $converter = new RomanNumeralsConverter();
4     
5     $this->assertEquals("I", $converter->toRoman(1));
6 }

Looks right, doesn’t it? However, this is not the simplest test that can fail. Actually, there are at least two simpler test that could fail, and both of them will force us to create production code.

– Let’s take a moment to think about the test that we’ve just written: what will happen if we execute it?

– It’s going to fail.

– But, why will it fail?

– Obvious: because we haven’t even defined the class. When we try to execute the test it cannot find the class.

– Can we say that the test fails for the reason that we expect it to fail?

– Hmmm. What do you mean?

– I mean that the test establishes that we expect it to able to convert the decimal 1 to the roman I. It should fail because it can’t do the conversion, not because it can’t find the class. Actually, the test can fail for at least three causes: that the class doesn’t exist, that the class doesn’t have the toRoman method, and that it doesn’t return the result “I”. And it should only fail because of one of them.

– Are you telling me that the first test should be just instantiating the class?

– Yes.

– And what’s the point of that?

– That the test, when it fails, can only fail for the reason that we expect it to fail.

– I have to think about that for a moment.

– No problem. I’ll wait for you at the next paragraph.

That is the question. In spite of it being the simplest case, this first test can fail for three different reasons that make us consider the test as not-passing (remember the second law: not compiling is failing), therefore, we should reduce it so that it fails for just one cause.

As a side note: it’s true that it could fail for many other causes, such as typing the wrong name, putting the class in the wrong namespace, etc. We assume that those errors are unintentional. Also, running the test will tell us the error. Hence the importance of running the tests, seeing how they fail, and making sure they fail properly.

Let’s go a bit slower, then.

The first test should just ask that the class exists and can be instantiated.

1 public function testShouldInstantiate(): void
2 {
3     $this->expectNotToPerformAssertions();
4     
5     new RomanNumeralsConverter();
6 }

In PhpUnit, a test without assertions fails or is at least considered risky. In order to make it pass clearly we specify that we’re not going to make assertions. In other languages this is unnecessary.

To pass the test I have to create the class. Once created, I will see the test pass and then I’ll be able to face the next step.

The second test should force me to define the desired class method, although we still don’t know what to do with it or what parameters it will need.

1 public function testShouldBeAbleToConvertToRoman(): void
2 {
3     $this->expectNotToPerformAssertions();
4     
5     $converter = new RomanNumeralsConverter();
6     $converter->toRoman();
7 }

Again, this test is a little special in PHP. For example, in PHP and other languages we can ignore the return value of a method if it’s not typed. Other languages will require us to explicitly type the method, which in this step could be done using void so that it doesn’t return anything at all. Other strategy would be to return an empty value with the correct type (string in this case). There are languages, on the other hand, that require the result of the method to be used -if it’s even returned-, but they also allow you to ignore it.

An interesting issue is that once we’ve passes this second test, the first one becomes redundant: the case is already covered by this second test, and if a change in code made it fail, the second test would fail as well. You can delete it in the refactoring phase.

And now is when the “simplest case” makes sense, because this test, after the others, will fail for the right reason:

1 public function testShouldConvertOne(): void
2 {
3     $converter = new RomanNumeralsConverter();
4     
5     $this->assertEquals("I", $converter->toRoman(1));
6 }

This is already a test that would fail for the expected reason: the class doesn’t have anything defined to do the conversion.

Again, once you make this test pass and you’re in the refactoring phase, you can delete the previous one. It has fulfilled its tasks of forcing us to add a change to the code. Additionally, in case that second test fails due to a change in the code, our current test will also fail. Thereby, to can also delete it.

I guess now you’re asking yourself two questions:

Why write three tests to end up keeping the one that I had though at the beginning
Why can I delete those tests

Let’s do this bit by bit, then.

Why start with such small steps

A test should have only one reason to fail. Imagine it as an application of the Single Responsibility Principle. If a test has more than one reason to fail, chances are that we’re trying to make a test provoke many changes in the code at once.

It’s true that in testing there’s a technique called triangulation, in which, precisely, several possible aspects that must occur together are verified in order to consider that the test passes or fails. But, as we’ve said at the beginning of the book, TDD is not QA, so the same techniques are not applicable.

What we want in TDD is that the tests tell us what is the change that we have to make in the software, and this change should be as small and unambiguous as possible.

When we don’t have any written production code, the smallest thing we can do is creating a file in which we define the function or class that we’re developing. It’s the first step. And even then, there are chances that we don’t do it correctly:

we make a mistake in the name in the file name
we make a mistake in its location in the project
we mistype the class’s or the function’s name
we make a mistake when locating it in a name space
…

We have to prevent all of those problems just to be able to instantiate a class or be able to use a function, but this minimal test will fail if either of them happen. When correcting all the thing that can occur, we’ll make the test pass.

However, if the test can fail for more reasons, the potential sources of error will multiply, as there are more things that we need to do to make it pass. Also, some of them can depend and mix between themselves. In general, the necessary change in production code will be too large with a red test, and therefore, making it become green will be more costly and less obvious.

Why delete these first tests

In TDD many tests are redundant. From the point of view of QA, we test too much in TDD. In the first place, because many times we use several examples of the same class, precisely to find the general rule that characterizes that class. On the other hand, there are tests that we do in TDD that are already included in other ones.

This is the case of these first tests that we’ve just shown.

The test that forces us to define the software unit for the first time is included in any other test we can imagine, for the simple reason that we need the class in order to be able to execute those other tests. Put into a different way, if the first test fails, then all of the rest will fail.

In this situation, the test is redundant and we can delete it.

It’s not always easy to identify redundant tests. In some stages of TDD we use examples of the same class to move the development, so we may reach a point in which some of those tests are redundant and we can delete them as they’ve become unnecessary.

On the other hand, a different possibility is to refactor the test using data providers or other similar techniques with which to cheapen the creation of new examples.

The happiness of the paths

Happy path testing

We call the flow a program happy path when there aren’t any problems and it’s able to execute the entire algorithm. The happy path occurs when no errors are generated during the process because all of the handled data are valid and their values lie within their acceptable ranges, nor are there any other failures that can affect the software unit that we’re developing.

In TDD, happy path testing consists in choosing examples that must return predictable values as a result, which we can use to test. For example, in the kata Roman Numerals, one possible happy path test would be:

1 public function testShouldConvertFour(): void
2 {
3     $converter = new RomanNumeralsConverter();
4     
5     $this->assertEquals("IV", $converter->toRoman(4));
6 }

Very frequently we work with happy path tests in the kata. This is so because we’re interested in focusing in the development of the algorithm and that the exercise doesn’t last too long.

Sad path testing

On the contrary, sad paths are those program flows that end badly. When we say that they end badly, we mean that an error occurs and the algorithm cannot finish executing.

However, the errors and the ways in which production code deals with them are a part of the behavior of the software, and in real work they deserve to be considered when using the TDD methodology.

In that sense, sad path testing would be precisely the choice of test cases that describe situations in which the production code has to handle wrong input data or responses from their collaborators which we also have to manage. An example of this would be something like this:

1 public function testShouldFailWithInvalidInput(): void
2 {
3     $converter = new RomanNumeralsConverter();
4     
5     $this->expectException(InvalidInput::class);
6     $converter->toRoman(-12.34);
7 }

That is: our roman numeral converter cannot handle negative numbers nor numbers with decimal digits, and therefore, in a real program we’d have to handle this situation. In the example, the consequence is throwing an exception. But it could be any other form of reaction that suits the purposes of the application.

1 public function testShouldFailWithInvalidInput(): void
2 {
3     $converter = new RomanNumeralsConverter();
4     
5     $this->assertEquals('Non possum hic numerus converto', $converter->toRoman(-12.3\
6 4));
7 }

Up next

NIF