There is not a correct -or incorrect, if we’re getting there- way of reading this book. It all comes down to your interests.
To begin with, it’s structured in four main parts that may be read separately:
The first introduces the basic concepts of TDD, as well as some strategies to learn to use and introduce this discipline in your practice.
In the second, a selection of kata or code exercises are introduce, with which the Test Driven Development concepts and techniques are explained in depth in their classic definition. They range from some that are very well known, to some that are self-made.
Each of the kata is organized as follows:
A theoretical chapter dedicated to a relevant aspect of TDD, highlighted by that kata, over which I have put special emphasis while solving it.
An introduction to the kata, its origin if it’s unknown, its problem statement, and a series of recommendations and points of interest about it.
A solution developed in a different programming language and explained in detail. There’s a repository with solutions to the kata in various languages.
The third part introduces the outside-in TDD methodology. Outside-in TDD is an approach that seeks to boost the design phase, and can be applied to real project development.
The fourth part is oriented to showcasing an example of a realistic project and how can TDD be incorporated in the various stages of development and maintenance, from the creation of a minimum viable product (MVP) to defect resolution and new feature incorporation.
If your looking for a manual to learn TDD from scratch, my advice would be to read it in order. The code exercises are laid out in order to introduce the concepts in a specific progression, one that I have reached through personal experience and when I’ve taught other people to use TDD.
In the beginning, you might think that the TDD exercises are too trivial and little realistic. Keep in mind that the name kata is not coincidental. A Kata, in martial arts, is a repetitive exercise that is practiced until its movements are automated and beyond. If you practice any sport you’ll have done dozens of exercises aimed at increasing your flexibility, strength, mobility and automations, without them having a direct application to that sport. TDD kata have that same function: they prepare your brain to automate certain routines, generate specific habits, and be able to detect particular patterns in the development process.
Possibly, the outside-in approach looks that much more applicable to your daily work to you. In fact, it’s a way of developing projects using TDD. However, a solid base in classic TDD is fundamental to be successful using it. Outside-in is very close to Behavior Driven Development.
As it has been mentioned before, the various parts and exercises are relatively independent. If you already have some experience with the Test Driven Development discipline, you can jump directly to the sections or exercises that you’re interested in. Oftentimes you’ll discover something new. One of the things I’ve found out is that even though you have practiced the same exercise dozens of times, new ideas always manage to come out.
If you’re looking to introduce TDD in your or in your team’s workflow, it’s possible that you skip directly to the part about TDD in real life. It’s the one that has, to put it that way, the most dependency on previous knowledge and experience. In that case, if you think you are laking fluency in TDD, it’s possible that you must first take a look at other parts of the book.
To reach a good level of TDD performance you should practice the exercise many times. I’m not talking three or four, I’m talking dozens of times, in different parts of your professional life and, ideally, in different languages. There exists several kata repositories in which to find exercises, and you can also invent or discover your own.
It’s also advisable to look at how other people do these exercises. In the web there are available lots of kata examples done in a variety of programming languages, and it’s a great way of comparing your solutions and process.
And last but not least, one of the best ways to learn is practicing with other people. Be it in work projects, trainings, or practice communities. Live discussion of the solutions, the size of the steps, the behavior to test… will contribute to the honing and strengthening of your development process.
Assumptions
For this book some assumptions are made:
That you have a certain experience with any programming language and with a testing environment for that language. In other words: you know how to write and run tests. It doesn’t matter that your favorite language isn’t contemplated in this book.
This book’s examples are written in various languages, and as far as possible the usage of language-specific qualities is avoided. In fact, I am inexperienced in many of them, and therefore the code may appear very simple. On the other hand, this is something desirable in TDD, as you’ll see throughout the book.
It’s clear to you that the objetive of the code exercise is not so much solving the problem as its proposed, which is eventually solved, but the path through which we arrive at that solution.
You understand that there’s neither a unique solution nor a precise path in the resolution of the kata. If your solution doesn’t perfectly math the one showcased in this book, it’s not a problem.
Disclaimer
The proposed solutions to the kata are provided as explained examples of the reasoning processes that might be followed. They are not ideal solutions. When you do your version you could follow a completely different process that could be as valid (or even more) than the one presented here.
On the other hand, successive executions of the same kata by the same person could lead them to different solutions and paths. That is one of its benefits: by getting used to and automating certain thinking patterns, we can focus on more details each time and find new and more interesting points of intervention.
Likewise, as our fluency in a programming language increases, the implementations we achieve can be better and more elegant.
While preparing the kata presented in this book I have worked through several versions, in different languages, in order to find the most interesting routes and even purposefully causing some problems that I was interested in highlighting. The solution that I’ve finally decided to publish, in each case, is oriented towards some point that I wanted to accentuate about the TDD process, so it may not always be the optimal one.
That is to say, in a way, the kata come with a catch: they’re about forcing things up to the point they best achieve a didactical objetive.
In another order of thing, I have taken advantage of this project to force myself to experiment with different programming languages. In some cases, there are new to me or I have very little experience working with them, so it’s possible that the implementations are specially rough or don’t include some of there more specific and optimal features.
Basic TDD concepts
In this first part we’ll introduce the basic concepts to understand what is Test Driven Development and how it’s different from other disciplines and methodologies that use tests. We’ll also talk about how can you learn TDD, be it individually or in a team of practice community.
The first chapter has a general introduction to the process of Test Driven Development.
The chapter about basic concepts is a glossary that we’ll make use of throughout the book.
Finally, the chapter about coding-dojo and kata proposed some simple ideas to start practicing with a team or by oneself.
What is TDD and why should I care about it?
Test Driven Development is a software development methodology in which tests are written in order to guide the structure of production code.
The tests specify -in a formal, executable and exemplified manner- the behaviors that the software we’re working on should have, defining small objectives that, after being achieved, allow us to build the software in a progressive, safe and structured way.
Despite we’re talking about tests, we’re not referring to Quality Assurance (from now on: QA), even though by working with TDD methodology we achieve the secondary effect of obtaining a unitary test suite that is valid and has the maximum possible coverage. In fact, typically part of the tests created during TDD are unnecessary for a comprehensive battery of regression tests, and therefore end up being removed as new tests make them redundant.
That is to say: both TDD and QA are based in the utilizations of tests as tools, but this use is different in several respects. Specifically, in TDD:
Tests are written before the software that they execute even exists.
The tests are very small and their objective is to force writing the minimum amount of production code needed to pass the test, which has the effect of implementing the behavior defined by the test.
The tests guide the development of the code, and the process contributes to the design of the system.
In TDD, the tests are defined as executable specifications of the behavior of a given unit of software, while in QA, tests are tools for verification of that same behavior. Put in simpler words:
When we do QA, we try to verify that the software that we’ve written behaves according to the defined requirements.
When we do TDD, we write software to fulfill the defined requirements, one by one, so that we end up with a product that complies with them.
The Test Driven Development methodology
Although we will expand on this topic in depth throughout the book, we will briefly present the essentials of the methodology.
In TDD, tests are written in a way that we could think of as a dialogue with production code. This dialogue, the rules that regulate it, and the cycles that are generated by this way of interacting with code will be practiced in the first kata of the book: FizzBuzz.
Basically, it consists in:
Writing a failing test
Writing code that passes the test
Improving the code’s (and the test’s) structure
Writing a failing test
Once we are clear about the piece of software which we’re going to work on and the functionality that we want to implement, the first thing to do is to define a very small first test that will fail hopelessly because the file containing the production code that it needs to run doesn’t even exist. While this is something that we’ll deal with in all of the kata, in the NIF kata we will delve into strategies that will help us to decide on the first tests.
Here’s an example in Go:
Although we can predict that the test won’t even be able to be compiled or interpreted, we’ll try to run it nonetheless. In TDD it’s fundamental to see the tests fail, assuming it isn’t enough. Our job is making the test fail for the right reason, and then making it pass by writing production code.
The error message will indicate us what to do next. Our short-term goal is to make that error message disappear, as well as those that might come after, one by one.
For instance, after introducing the decToRoman function, the error will change. Now it’s telling us that it should return a value:
It could even happen that we get an unexpected message, such as that we’ve tried to load the Book class and it turns out that we had mistakingly created a filled named brok. That’s why it’s so important to run test, and see if it fails and how does it do it exactly.
This code results in the following message:
This error tells us that we have misspelled the name of the function, so we start by correcting it:
And we can continue. Since the test states that it expects the function to return “I” when we pass it 1 as an input, the failed test should indicate us that the actual result doesn’t match the expected one. However, at the moment, the test is telling us that the function doesn’t return anything. It’s still a compilation error and still not the correct reason to fail.
To make the test fail for the reason that we expect it to, we have to make the function return a string, even if it’s an empty one.
So, this change turns the error into one related with the test definition, as it’s not obtaining the result that it expects. This is the correct reason for failure, the one that will force us to write the production code that will pass the test.
And so we would be ready to take the next step:
Writing code that passes the test
As a response to the previous result, we write the production code that is needed for the test to pass, but nothing else. Continuing with our example:
After making the test pass we can start creating the file that’ll contain the unit under test. We could even rerun the test now, which probably would cause the compiler or the interpreter to throw a different error message. At this point everything depends a bit on circumstances, such as conventions in the language we’re using, the IDE we’re working with, etc.
In any case, it’s a matter of taking small steps until the compiler or interpreter is satisfied and can run the test. In principle, the test should run and fail indicating that the result received from the unit of software doesn’t match the expected one.
At this point there’s a caveat, because depending on the language, the framework, and some testing practices, the concrete manner of doing this first test may vary. For example, there are test frameworks that just require for the test to not throw any errors or exceptions to succeed, so a test that simply instantiates an object or invokes any of its methods is enough. In other cases it’s necessary that the test includes an assertion, and if none is made it’s considered as not passing.
In any case, this phase’s objective is making the test run successfully.
With the Prime Factors we’ll study the way in which production code can change to implement new functionality.
Improve the structure of the code (and tests)
When every test passes, we should examine the work done so far and check if it’s possible to refactor both the production and test code. Here we apply the usual principles: if we detect any smell, difficulty in understanding what’s happening, knowledge duplication, etc. we must refactor the code to make it better before continuing.
Ultimately, the questions at this point are:
Is there a better way to organize the code that I’ve just written?
Is there a better way to express what this code does and make it easier to understand?
Can I find any regularity and make the algorithm more general?
For this reason, we should keep every test that we’ve written and made pass. If any of them turn red we would have a regression in our hands and we would have spoiled, so to speak, the already implemented functionality.
It’s usual not to find many refactoring opportunities after the first cycle, but don’t get comfortable just yet: there’s always another way of seeing and doing things.
Tras el primer ciclo es normal no encontrar muchas oportunidades de refactor, pero no te fíes: siempre hay otra manera de ver y hacer las cosas. As a general rule, the earlier you spot opportunities to reorganize and clean up your code and do so, the easier development will be.
For instance, we’ve created the function under test in the same file as the test.
Turns out there’s a better way to organize this code, and it is creating a new file to contain the function. In fact, it’s a recommended practice in almost every programming language. However, we may have skipped it at first.
And, in the case of Go, we can convert it in an exportable function if its name is capitalized.
To delve further into everything that has to do with the refactor when working we’ll have the Bowling Game kata.
Repeat the cycle until finishing
Once the production code passes the test and is as nicely organized as it can be in this phase, it’s time to choose other functionality aspect and create a new failing test in order to describe it.
This new test fails because the existing code doesn’t cover the desired functionality and introducing a change is necessary. Therefore, our mission now is to turn this new test green by making the necessary transformations in the code, which will be small if we’ve been able to size our previous tests properly.
After making this new test pass, we search for refactoring opportunities to achieve a better code design. As we advance in the development of the piece of software, we’ll see that the possible refactorings become more and more significant.
In the first cycles we’ll begin with name changes, constant and variable extraction, etc. Then we’ll advance to introducing private methods or extracting certain aspects as functions. At some point we’ll discover the necessity of extracting functionality to helper classes, etc.
When we’re satisfied with the code’s state, we keep on repeating the loop as long has we have remaining functionality to add.
When does development end in TDD?
The obvious answer to this question could be: when all the functionality is implemented.
But, how do we know this?
Kent Beck suggested making a list of all of the aspects that would have to be fulfilled to consider the functionality as complete. Every time any one of them is attained it’s crossed off the list. Sometimes, while advancing in the development, we realize that we need to add, remove or change some elements in the list. It’s good advice.
There is a more formal way of making sure that a piece of functionality is complete. Basically, it consists in not being able to create a new failing test. Indeed, if an algorithm is implemented completely, it will be impossible to create a new test that can fail.
What is not Test Driven Development
The result or outcome of Test Driven Development is not creating flawless software free of any defect, although many of them are prevented; or generating a suite of unitary tests, although in practice it’s indeed obtained and has a coverage that can even reach 100% (with the tradeoff that it may have redundancy). But, none of these are TDD’s objectives, in any case they’re just certainly beneficial collateral effects.
TDD is not Quality Assurance
Even though we use the same tools (tests), we use them for different purposes. In TDD, testing guides development, setting specific objectives that are reached by adding or changing code. The result of TDD is a suite of tests that can be used in QA as regression tests, although it’s frequent that we have to retouch those tests in some way or other. In some cases to delete redundant tests, and in others to ensure that the casuistries are well covered.
In any case, TDD helps enormously in the QA process because it prevents many of the most common flaws and contributes to building well structured and loosely coupled code, aspects that increase software reliability, our ability to intervene in case of errors, and even the possibility of creating new tests in the future.
TDD doesn’t replace design
TDD is a tool to aid in software design, but it doesn’t replace it.
When we develop small units with some very well defined functionality, TDD helps us establish the algorithm design thanks to the safety net provided by our tests.
But when considering a larger unit, a previous analysis that leads us to a “sketch” of the main elements of the solution allows us to have a development frame.
The outside-in approach tries to integrate the design process within the development one, using what Sandro Mancuso tags as Just-in-time design: we start from a general idea about how the system will be structured and how it will work, and we design within the context of the iteration that we’re in.
How TDD helps us
What TDD provides us is a tool that:
Guides the software development in a systematic and progressive way.
Allows us to verifiable claims about whether the required functionality has been implemented or not.
Helps us avoid the need to design all of the implementation details in advance, since it’s a tool that helps the software component design in itself.
Allows us to postpone decisions at various levels.
Allows us to focus in very concrete problems, advancing in small steps that are easy to reverse if we introduce errors.
Benefits
Several studies have shown evidence that suggests that the application of TDD has benefits in development teams. It’s not conclusive evidence, but research tends to agree that with TDD:
More tests are written
The software has fewer flaws
The productivity is not diminished, it can even increase
It’s quite difficult to quantify the advantages of using TDD in terms of productivity or speed, but subjectively, many benefits can be experienced.
One of them is that the TDD methodology can lower the cognitive load of development. This is so because it favors the division of the problem in small tasks with a very defined focus, which allows us to save the limited capacity of our working memory.
Anecdotal evidence suggests that developers and teams introducing TDD reduce defects, diminish time spent on bugs, increase deployment confidence, and productivity is not adversely affected.
Next we’ll define some of the concepts that are used throughout the book. They must be understood within the context of Test Driven Development.
Test
A test is a small piece of software, usually a function, which runs another piece of code and verifies if it produces an expected result or effect. A test is, basically, an example of usage of the unit under test in which a scenario is defined and the tested unit is executed to see if the results matches what we had in mind.
Many languages use the notion of TestCase, a class that groups several related tests together. In this approach, each method is a test, although it’s usual to refer to the test case as just “test”.
Test as specification
A test as specification utilizes usage examples from the tested piece of software in order to describe how it should work. Significant examples are used, above all, but it’s not always done in a formal way.
It’s opposite to the test as verification, typical of QA, in which the piece of software is tested by choosing the test cases in a systematic manner to verify that it fulfills what’s expected of it.
Failing test
A failing test is a specification that cannot be fulfilled yet because the production code that lets it pass hasn’t been added. Testing frameworks typically picture them with a red color.
Passing test
A passing test is a specification that runs production code which generates an expected result or response. Testing frameworks typically give them a green color.
Types of tests
Unit tests
They are tests that test an isolated unit of software; their dependencies are doubled to keep their influence on the result controlled.
Integration tests
Integration tests usually test groups of software units, so that we can verify their communication and combined action.
Acceptance tests
Acceptance tests are integration tests that test a software systems as if they were yet another of their consumers. We normally write them depending on the business’s interests.
Test Case
It’s a class that groups several tests together.
Test Suite
It’s a set of test and/or test cases that can usually be executed together.
Production code
In TDD we use the name “production code” to refer to the code that we write to make tests pass and which, eventually, will end up being executed in a production system.
Software unit
Software unit is a quite flexible concept that we have to interpret within a context, but it usually refers to a piece of software that can be executed in a unitary and isolated manner, even if when it’s composed of many elements.
Subject under test
The software unit that is exercised in a test. There’s a discussion about what is the scope of a unit. At one extreme there are those who consider that a unit is a function, a method, or even a class. However, we can also consider as unit under test a set of functions or classes that are tested through the public interface of one of them.
Refactoring
A refactoring is a change in code that doesn’t alter its behavior or its interface. The best way to ensure this is having at least one test that exercises the piece of code that is being modified, so that after each change we make sure that the test keeps passing. This proves that the behavior hasn’t changes even though the implementation has been modified.
Some techniques or refactoring patterns are described in compilations such as these, from Refactoring Guru, or the classic book from Martin Fowler.
Automatic refactoring
Precisely because some refactorings are very well identified and characterized, it’s been possible to develop tools that can execute them automatically. These tools are available in the IDE.
Coding-dojo y katas
Kata
In the software world we call kata to design and programming exercises that pose relatively simple and limited problems that we can use to practice development methodologies.
This term is a borrowing from the Japanese word that refers to the training exercises typical of martial arts. Its introduction is attributed to Dave Thomas (The Pragmatic Programmer)1, referring to the completion of small code exercises, repeated over and over again until achieving a high degree of fluency or automation.
Applied to TDD, kata seek to train the test-production-refactoring as well as the ability to add behavior by means of small code increments. These exercises will help you divide functionality in small parts, choose examples, advance in the project step by step, switch priorities depending on the information provided by the tests, etc.
The idea is to repeat the same kata many times. On top of gaining fluency in the application the process, in each of the repetitions there’s the possibility of discovering new strategies. With repeated practice, we’ll favor the development of certain habits and pattern recognition, automating our development process to a certain extent.
You can train with kata by yourself or with others. A systematic way of doing this is through a Coding Dojo.
Coding-dojo
A coding-dojo is a workshop in which a group of people, regardless of their level of knowledge, perform a kata in a collaborative and non-competitive way.
The idea of Coding Dojo or Coder’s Dojo was introduced in the XP2005 conference by Laurent Bossavit y Emmanuel Gaillot.
The basic structure of a coding-dojo is pretty simple:
Presentation of the problem, explanation of the exercise (5-10 min)
Coding session (30-40 min)
Sharing of the status of the exercise (5-10 min)
The coding session continues (30-40 min)
Sharing and review of the achieved solutions
The coding session can be structured in several ways:
Prepared kata. A presenter explains how to solve the exercise, but relying on the feedback from the attendants. No progress is made until consensus is reached. It’s a very suitable way of working when the group is just starting out and few people are familiar with the methodology.
Randori kata. The kata is done in paring using some system to switch between a driver (on the keyboard) and a co-pilot. The rest of the attendants collaborate by making suggestions.
Hands-on workshop. One alternative is to make the participants form pairs and work on the kata collaboratively. Halfway through the exercise, a few-minutes break is taken in order to discuss the work that has been done. At the end of the session, all of the different solutions are presented (at whatever step of the assignment each team has arrived). Participants can choose their preferred programming language, so it’s a great opportunity for those that are looking to get started with a new one. It can also be a good approach for beginners if the pair members have different levels of experience.
Advice for completing the kata individually
In the beginning it may be a good idea to attend directed kata. Essentially, it’s a kata performed by an expert in the shape of a live coding session where they explain or comment the different steps with the audience, in such a way that you can easily see the dynamic in action. If you don’t have this possibility, which may be the most common scenario, it’s a good idea to watch some kata on video. You will find some links in the chapters dedicated to each kata.
Above all, the goal of the kata is to exercise the TDD discipline, the application of the three laws, and the red-green-refactor cycle. Production code is actually less important, in the sense that it’s not the main objective of the learning, although it will always be correct as long as the tests pass. However, every execution of the kata can lead us to discover new details and new ways of facing each phase.
Namely, the kata are designed to learn to develop software using tests as a guide, and to train the mindset, the reasonings, and the analysis that help us perform this task. In general, developing a good TDD methodology will help us write better software thanks to the constraints it imposes.
Obviously, the first tries will take their time, you will get into paths with no apparent return, or you will straight up skip some of the steps of the cycle. When this happens, you just have to go back or start over from scratch. These are exercise that don’t have a unique correct answer.
In fact, every programming language, approach, or test environment could favor some solutions over the others. You can perform a kata several times, trying to assume different starting assumptions in each try, or applying different paradigms or conditions.
If you reach points in which you can choose between different courses of actions, take note of them in order to repeat the exercise, and try a different path later to see where it leads you.
In TDD it’s really important to focus on the here and the now that each failing tests defines, and not get anxious about reaching the final objective. This doesn’t mean putting it aside or dedicating ourselves to something else. It simply means that we have to tread that path step by step, and doing it like that will take us to the finish line almost without us realizing, with much less effort and more solidity. Acquiring this mindset, dealing only with the problem in front of us, will help us reduce stress and think more clearly.
If possible, try repeating the same kata using different languages, even different testing frameworks. The two best known families are:
xSpec, which are oriented to TDD and tend to favor testing by example, providing specific syntax and utilities. Their handicap is that they don’t usually work well for QA.
xUnit, which are the most generic testing frameworks, albeit more QA oriented. Nevertheless, they can be used in TDD without any problems.
How to introduce TDD in development teams
Introducing the TDD methodology in development teams is a complex process. Above all, it’s important to contribute to generating a culture that’s open to innovation, to quality, and to learning. The greatest reluctance often comes from a fear of TDD slowing down the development, or that at first they cannot see direct applications to the daily problems.
I personally believe that using both formal and informal channels can be of interest. Here are some ideas.
Establishing a weekly time, one or two hours, for a coding-dojo open to the whole team. Depending on the level of expertise, it could start with directed kata, hands-on type sessions, or the format that seems the most appropriate to us. Ideally, several people would be able to get them moving.
Bringing on experienced people into the teams who could help us introduce TDD in pairing or mob-programming work sessions, guiding their coworkers.
Organizing specific trainings, with external help if people with enough experience aren’t available.
Introducing (if there isn’t one already) a technical blog in which to publish articles, exercises, and examples about the topic.
In this part we present a series of code exercises with which we’ll explore in depth how Test Driven Development is done.
We’ll use the discipline’s classic style or approach. TDD is a software development methodology re-discovered by Kent Beck, based on the way that the first computer programs used to be built. Then, calculations were first carried out by hand, so as to have the reference of the expected the result that would have to be reproduced in the computer. In TDD, we write a very simple program that tests that the result of other program matches the expected one. The key here is that this program hasn’t been written yet. It’s that simple.
The methodology was presented by Beck in his book TDD by example, in which, among other things, teaches how to build a testing framework using TDD. Subsequently, various authors have contributed to the refinement and systematization of the model.
The laws of TDD
Since the introduction of the TDD methodology by Kent Beck, there has been an effort to define a simple framework that serves as a guide for its application in practice.
Initially, Kent Beck proposed two very basic rules:
Don’t write a line of new code unless you first have a failing automated test.
Eliminate duplication.
That is, to be able to write production code, first we must have a test that fails and that requires us to write that code, precisely because that’s what’s needed to pass the test.
Once we’ve written it and checked that the test passes, our effort goes towards reviewing the written code and eliminating duplication as much as possible. This is very generic, because on the one hand it refers to refactoring, and in the other hand, to the coupling between the test and the production code. And being so generic, it’s hard to translate to practical actions.
On top of that, these rules don’t tell us anything about how big the jumps in the code involved in each cycle should be. In his book, Beck suggests that the steps -or baby steps- can be as small or as big as we find them useful. in general, he recommends using small steps when we’re unsure or have little knowledge about the algorithm, while allowing larger steps if we have enough experience and knowledge to be sure about what to do next.
With time, and starting from the methodology learnt from Beck himself, Robert C. Martin established the “three laws”, which not only define the action cycle in TDD, but also provide some criteria about how large the steps should be in each cycle.
It’s not allowed to write any production code unless it passes a failing unit test
It’s not allowed to write more than the one unit test that’s sufficient to fail; and compilation errors are failures
It’s not allowed to write more production code than necessary to pass the one failing unit test
The three laws are what make TDD different to simply writing tests before code.
These three laws impose a series of restrictions whose objective is to force us to follow a specific order and workflow. The define several conditions that, if they’re fulfilled, generate a cycle and guide our decision-making. Understanding how they work will help us to make the most out of TDD to help us produce quality code that we’re able to maintain.
Theses laws have to be fulfilled all at the same time, because they work together.
The laws in detail
It’s not allowed to write any production code unless it passes a failing unit test
The first law states that we can’t write any production code unless it passes an existing unit test that is currently failing. This implies the following:
There has to be a test that describes a new aspect about the behavior of the unit that we’re describing.
This test must fail because there isn’t anything that makes it pass in the production code.
In short, the first law forces us to write a test that defines the behavior that we’re about to implement in the unit of software that we want to develop, all before having to consider how to do it.
Now, how should this test be?
It’s not allowed to write more than the one unit test that’s sufficient to fail; and compilation errors are failures
The second law tells us that the test must be sufficient to fail, and that we have to consider compilation errors as failures (or their equivalent in interpreted languages). For examples, among these errors there would be some as obvious as that the class or function doesn’t exist or hasn’t been defined yet.
We must avoid the temptation of writing and skeleton of the class or function before writing the first test. Remember that we’re talking about Test Driven Development. Therefore, it’s the tests that tell us what production code to write and when to write it, and not the opposite.
For the test, “being sufficient to fail” means that it must be very small in several ways, and this is something that is quite difficult to define at first. Frequently we talk about the “simplest” test, the simplest case, but it’s not exactly this way.
What conditions must a TDD test meet, especially the first one?
Well, basically it should force us to write the minimum possible amount of code that can be executed. That minimum, in OOP, would be to instantiate the class that we want to develop without worrying about any further details, for now. The specific test will vary a little depending on the language and testing framework that we’re using.
Let’s take a look at this example from the Leap Year kata, where we write code to find out if a year is a leap year or not. In the example, my intention is to create a Year object to which I can ask if it’s a leap year by sending it the message IsLeap. I’ve come upon this exercise in several kata compilations. For this chapter the examples will be written in C#.
The rules are:
The years not divisible by 4 aren’t leap years (for example, 1997).
The years divisible by 4 are leap years (1999), unless:
If they’re divisible by 100 they aren’t leap years (1900).
If they’re divisible by 400 they are leap years (2000).
Our goal would be to be able to use Year objects in this manner:
The usual impulse is to try and start in the following way, because it looks like it’s the example of the simplest possible case.
However, it’s not the simplest test that could fail for only one reason. Actually, it can fail for at least five different reasons:
The Year class doesn’t exist yet.
It also wouldn’t accept parameters passed by the constructor.
It doesn’t answer to the IsLeap message.
It could return nothing.
It could return an incorrect response.
That is, we can expect the test to fail for those five causes, and only the fifth is the one that’s actually being described by the test. We have to reduce them to just one.
In this case, it’s very easy to see that there’s a dependency between the various causes of failure, in such a way that for one to surface, the previous ones have to be solved. Obviously, it’s necessary to have a class that we can instantiate. Therefore, our first test should be much more modest and just expect that the class can be instantiated:
If we ran this test, we would see it fail for obvious reasons: the class that we try to instantiate isn’t anywhere to be found. The test is failing by a compilation -or equivalent- problem. Therefore, it could be a test that’s sufficient to fail.
Throughout the process we’ll see that this test is redundant and that we can do without it, but let’s not get ahead of ourselves. We still have to make it pass.
It’s not allowed to write more production code than necessary to pass the one failing unit test
The first and the second laws state tha we have to write a test and tell us how that test should be. The third law tells us how production code should be: the condition is that it must pass the test that we’ve written.
It’s very important to understand that it’s the test the one that tells us what code we need to implement, and therefore, even though we have the certainty that it’s going to fail because we don’t even have a file with the necessary code to define the class, we still must execute the test and expect its error message.
That is: we must see that the test, indeed, fails.
The first thing that it’ll tell us when trying to run it is that the class doesn’t exist. In TDD, that’s not a problem, but rather an indication about what we have to do: add a file that contains the definition of the class. It’s possible that we’re able to generate that code automatically using the IDE tools, and it would be advisable to do it that way.
In our example, the message of the test says:
And we would simply have to create the Year class.
At this point we run the test again to make sure it passes from red to green. In many languages this code will be enough. In some cases you might need something more.
If this is so and the test passes, the first cycle is complete and we can move on to the next behavior, unless we think that we have the chance to refactor the existing code. For example, the usual thing to do here would be to move the Year class to its own file.
If the test hasn’t passed, we’ll look at the message of the failed test and we’ll act accordingly, adding the minimum code necessary for it to finally pass and turn green.
The second test and the three laws
When we’ve managed to pass the first test by applying the three laws, we might think that we haven’t really achieved anything. We haven’t even tackled the possible parameters that the class might need in order to be constructed, be them data, collaborators -in the case of services-, or use cases. Even the IDE is complaining that we’re not assigning the instantiated object to any variable.
However, it’s important to adhere to the methodology, especially in these first stages. With practice and the help of a good IDE, the first cycle will have taken a few seconds at most. In these few seconds we’ll have written a piece of code that, while certainly very small, is completely backed by a test.
Our goal still is to let the tests dictate what code we must write to implement each new behavior. Since our first test already passes, we have to write the second.
Applying the three laws, what comes next is:
Write a new failing test that defines a behavior
That this test is the smallest possible one that still forces us to make a change in the production code
Write the minimum and sufficient production code to pass the test
Which could be the next behavior that we need to define? If in the first test we’ve forced ourselves to write the minimum necessary code to instantiate the class, the second test can lead us through two paths:
Force us to write the necessary code to validate constructor parameters and, therefore, be able to instantiate an object with everything that it needs.
Forcing us to introduce the method that executes the desired behavior.
This way, in our example, we could simply make sure that Year is able to answer the IsLeap message.
The test will throw this error message:
Which tells us that the next step should be to introduce the method that answers that message:
The test passes, indicating that the objects of type Year can now attend to the IsLeap message.
Having arrived at this point, we could ask ourselves: what if we don’t obey the three laws of TDD?
Violations of the three laws and their consequences
Disregarding the easy joke that we’ll end up fined or in jail for not following the laws of TDD, the truth is that we’d really have to suffer some consequences.
First law: writing production code without having a test
The most immediate consequence is that we break the red-green cycle. The code that we write is no longer guided or covered by tests. In fact, if we wish to have that part tested, we’ll have to write “a posteriori” tests (QA tests).
Imagine that we do this:
The existing tests will fail because we need to pass a parameter to the constructor, and also we don’t have any test that’s in charge of verifying the behavior that we’ve just implemented. We’d have to add tests to cover the functionality that we’ve introduced, but they’re no longer driving the development.
Second law: writing more that one test that fails
We can interpret this in two ways: writing multiple tests, or writing one test that involves too big a jump in behavior.
Writing multiple tests would cause several problems. To pass them all we would need to implement a large amount of code, and the guide that those tests could be providing gets so blurry that it’s almost like having none at all. We wouldn’t have clear and concrete indications to solve by implementing new code.
Here we’ve added two tests. To make them pass we’d have to define two behaviors. Also, they’re too large. We haven’t yet established, for example, that we’ll have to pass a parameter to the constructor, nor that the response will be of type bool. These tests mix various responsibilities and try to test too many things at once. We’d have too write too much production code in only one go, leading to insecurity and room for errors to occur.
Instead, we need to make tests for smaller increments of functionality. We can see several possibilities:
To introduce that the answer is a bool we can assume that, by default, the years aren’t leap years, so we’ll expect a false response:
The error is:
Can be solved with:
However, we have another way to do it. Since the language is strongly typed, we can use the type system as a test. Thus, instead of creating a new test:
We change the return type of IsLeap:
When we run the test it will indicate that there’s a problem, as the function isn’t returning anything:
And finally, we just have to add a default response, which will be false:
To introduce the construction parameter we could use a refactoring. In this step we could be conditioned by the programming language, leading us to different solutions.
The refactoring path is straightforward. We just have to incorporate the parameter, although we won’t use it for now. In C# and other languages we could do it by introducing an alternative constructor, and in this way the tests would continue to pass. In other languages, we could mark the parameter as optional.
Since a parameterless constructor doesn’t make sense for us, now we could remove it, but first we’d have to refactor the tests so that they use the version with a parameter:
The truth is that we don’t need the first test anymore, since it’s implicit in the other one.
And now we can remove the parameterless constructor, as it won’t be used again in any case:
Third law: writing more production code than necessary to pass the test
This one is probably the most common violation of the three. We often come to a point where we “see” the algorithm so clearly that we feel tempted to write it now and finish the process for good. However, this can lead us to miss some situations. For example, we could “see” the general algorithm of an application and implement it. But, this could have distracted us and prevented us from considering one or several particular cases. This possibly incomplete algorithm, once incorporated to the application and deployed, could lead us to errors in production and even to economic losses.
For example, if we add a test to check that we control non-leap years:
In the current state of our exercise, an example of excess of code would be this:
This code passes the test, but as you can see, we’ve introduced much more than it was necessary to achieve the behavior defined by the test, adding code to control leap years and special cases. So, apparently, everything is fine.
If we try a leap year we’ll see that the code works, which reinforces our impression that all’s good.
But, a new test fails. Years divisible by 100 should not be leap years (unless they are also divisible by 400), and this error has been in our program for a while, but until now we didn’t have any test that executed that part of the code.
This is the kind of problem that can go unnoticed when we add too much code at once in order to pass a test. The code excess probably doesn’t affect the test that we were working on, so we won’t know if it hides any kind of problem, and we won’t know it unless we happen to build a test that makes it surface. Or even worse, we won’t know it until the bug explodes in production.
The solution is pretty simple: only add the code that’s strictly necessary to pass the test, even if it’s just returning the value expected by the test itself. Don’t add any behavior if there’s not a failing test forcing you to do it.
In our case, it was the test which verified the handling of the non-leap years. In fact, the next test, which aimed to introduce the detection of standard leap years (years divisible by 4), passed without the need for adding any new code. This leads us to the next point.
What does it mean if a test passes just after writing it?
When we write a test and it passes without us needing to add any production code, it can be due to any of these reasons:
The algorithm that we’ve written is general enough to cover all of the possible cases: we’ve completed the development.
The example that we’ve chosen isn’t qualitatively different from others that we had already used, and therefore it’s not forcing us to write production code. We have to find a different example.
We’ve added too much code, which is what we’ve just talked about in the previous bit.
In this leap year kata, for example, we’ll arrive at a point in which there’s no way of writing a test that fails because the algorithm supports all possible cases: regular non-leap years, leap years, non-leap years every 100 years, and leap years every 400.
The other possibility is that the chosen example doesn’t really represent a new behavior, which can be a symptom of a bad definition of the task, or of not having properly analyzed the possible scenarios.
The red-green-refactor cycle
The three laws establish a framework which we could call a “low level” one. Martin Fowler, for his part, defines the TDD cycle in these three phases which would be at a higher level of abstraction:
Write a test for the next piece of functionality that you wish to add.
Write the production code necessary to make the test pass.
Refactor the code, both the new and the old, so that all’s well structured.
These three stages define what’s usually known as the “red-green-refactor” cycle, named like this in relation to the state of the tests in each of the phases of the cycle:
Red: the creation of a failing test (it’s red) that describes the functionality or behavior that we want to introduce in the production software.
Green: the writing of the necessary production code to pass the test (making it green), with which it’s verified that the specified behavior has been added.
Refactor: while keeping the tests green, reorganizing the code in order to improve its structure, making it more readable and sustainable without losing the functionality that we’ve developed up until this point.
In practice, refactoring cycles only arise after a certain number of cycles of the three laws. The small changes driven by these start to accumulate, until they arrive at a point in which code smells start to appear, and with them the need for refactoring.
The FizzBuzz kata is one of the easiest kata to start practicing TDD. It poses a very simple and well-defined problem, so, in a first phase it’s very easy to solve it completely in a session of one or two hours. But, its requirements can also be expanded. Setting the requisite that the size of the list should be configurable, that new rules can be added, etc., should bump up the difficulty a bit and lead us to achieve more complex developments.
In this case, being our first kata, we’ll follow the simplest version.
History
According to Coding Dojo, the authorship of the kata is unknown1, but it’s commontly considered that it was introduced to society by Michael Feathers and Emily Bache in 2008, in the framework of the Agile2008 conference.
Problem statement
FizzBuzz is a game related to learning division in which a group of students take turns to count incrementally, saying a number each one, replacing any number divisible by three with the word “Fizz”, and any number divisible by five with the word “Buzz”. If the number is divisible by both three and five, then they say “FizzBuzz”.
So, our objective shall be to write a program that prints the numbers from 1 to 100 in such a way that:
if the number is divisible by 3 it returns Fizz.
if the number is divisible by 5 it returns Buzz.
if the number is divisible by 3 and 5 it returns FizzBuzz.
Hints to solve it
The FizzBuzz kata is going to help us understard and start applying the Red-Green-Refactor cycle and the Three laws of TDD.
The first thing we should do is to consider the problem and get a general idea about how we’re going to solve it. TDD is a strategy that helps us avoid the necessity of having to make a detailed analysis and an exhaustive design prior to the solution, but that doesn’t mean that we shouldn’t first understand the problem and consider how we’re going to tackle it.
This is also necessary to avoid getting carried away by the literal statement of the kata, which can lead us to dead ends.
The first thing we’re going to do, once we have that general idea on how to approach the objective, is to apply the first law and write a test that fails.
This test should define the first behavior that we need to implement.
Writing a test that fails means, at this time, writing a test that won’t work because there isn’t any code to run, a fact that will be pointed out to us by the error messages. Even though you might find it absurd, you must try to run the test and confirm that it doesn’t pass. It’s the test error messages that will tell you what to do next.
To get the test to fail we have to apply the second law, which says that we can’t write more tests than necessary to fail. The smallest possible test should force us to define the class by instantiating it, and little more.
Last, to make the test pass, we’ll apply the third law, which says that we mustn’t write any more production code than necessary to pass the test. That is: define the class, the method that we’re going to exercise (if applicable), and make it return some response that will finally make the test pass.
The two first steps of this stage are pretty obvious, but the third one, not so much.
With the first two steps we try to make the test fail for the right reasons. That is, first it fails because we haven’t written the class, so we define it. Then, it will fail because the method that we’re calling is missing, so we define it as well. Finally, it will fail because it doesn’t return the response that we expect, which is what we’re testing in itself.
And what response should we be returning? Well, no more no less than the one that the test expects.
Once we have a first test and a first piece of production code that makes it pass, we’ll ask ourselves this question: what will be the next behavior that I should implement?
Our objective will be to write a program that prints the numbers from 1 to 100 in such a way that:
if the number is divisible by 3 it returns Fizz.
if the number is divisible by 5 it returns Buzz.
if the number is divisible by 3 and 5 it returns FizzBuzz.
Language and focus
We’re going to solve this kata in Python with unittest as our testing environment. The task consists in creating a FizzBuzz class which will have a generate method to create the list, so it will be used more or less like this:
To do so, I create a folder called fizzbuzzkata and to it I add the fizzbuzz_test.py file.
Define the class
What the exercise asks for is a list with the numbers from 1 to 100 changing some of them by the words “Fizz”, “Buzz”, or both of them in case of fulfilling certain condictions.
Note that it doesn’t ask for a list of any amount of numbers, but rather specifically from 1 to 100. We’ll come back to this in a moment.
Now we’re going to focus in that first test. The less we can do is make it possible to instantiate a FizzBuzz type object. Here’s a possible first test:
It may look weird. This test is just limited to trying to instantiate the class and nothing else.
This first test should be enough to fail, which is what the second law states, and force us to define the class so the test can pass, fulfilling the third law. In some environments it would be necessary to add an assertion, given that they consider that the test hasn’t passed if it hasn’t been explicitly verified, but it’s not the case in Python.
So, we launch it to see if it really fails. The result, as it was expected, is that the test doesn’t pass, displaying the following error:
To pass the test we’ll have to define the FizzBuzz class, something we’ll do in the test file itself.
And with this, the test will pass. Now that we’re green we can think about refactoring. The class doesn’t have any code, but we could change the name of the test for a more adequate one:
Usually it’s better that the classes live in their own file (or Python module) because it makes it easier to manage the code and keep everything located. So, we create a fizzbuzz.py file and we move the class to it.
And in the test, we import it:
When we introduce this change and run the test, we can verify that it passes and that we’re in green.
We’ve fulfilled the three laws and closed our first test-code-refactor cycle. There’s not much else to do here, except for moving on to the next test.
Define the generate method
The FizzBuzz class not only doesn’t do anything, it doesn’t even have any methods! We’ve said that we want it to have a generate method, which is the one that will return the list of numbers from 1 to 100.
To force us to write the generate method, we have to write a test that calls it. The method will have to return something, right? No, not really. It’s not always necessary to return something. It’s enough if nothing breaks when we call it.
When we run the test, it tells us that the object doesn’t have any generate method:
Of course it doesn’t, we have to add it:
Now we already have a class capable of answering to the generate message. Can we do any refactoring here?
Well, yes, but not in the production code, but in the tests. It turns out that the test that we’ve just written overlaps the previous one. That is, the test_responds_to_generate_message test covers the test_can_instantiate test, making it redundant. Therefore, we can remove it:
Perhaps this surprises you. This is what we talk about in the beginning of the book, some of the tests that we use to drive the development stop being useful for some reason or another. Generally, they end up becoming redundant and don’t provide any information that we’re not already getting from other tests.
Define a behavior for generate
Specifically, we want it to return a list of numbers. But it doesn’t need to have the multiples of 3 and 5 converted just yet.
The test should verify this, but it must keep passing when we have developed the complete algorithm. What we could verify would be that it returns a 100 element list, without paying any attention to what it contains exactly.
This test will force us to give it a behavior in response to the generate message:
Of course, the test fails:
Right now, the method returns None. We want a list:
When we change generate so that it returns a list, the test fails because our condition isn’t met: that the list has a certain number of elements.
This one is finally an error from the test. The previous one were basically equivalent to compiling errors (syntax errors, etc.). That’s why it’s so important to see the tests fail, to use the feedback that the error messages provide us.
Making the test pass is quite easy:
With the test in green, let’s think a little.
In the first place, it could be argued that in this test we’ve asked generate to return a response that meets two conditions:
be of type list (or array, or collection)
have exactly 100 elements
We could have forced this same thing with two even smaller tests.
This tiny little steps are often called baby steps, and the truth is that they don’t have a fixed length, they depend on our practice and experience instead.
Thus, for example, the test that we’ve created is small enough to not generate a big leap in the production code, although it’s capable of verifying both conditions at once.
In the second place, note that we’ve just written the necessary code to fulfill the test. In fact, we return a list of 100 None elements, which may seem a little pointless, but it’s enough to achieve this test’s objective. Remember: don’t write more code than necessary to pass the test.
In the third place, we have written enough code, between test and production, to be able to examine it and see if there’s any opportunity for refactoring.
The clearest refactoring opportunity that we have right now is the magic number 100, which we could store in a class variable. Again, each language will have its own options:
And we have some more in the test code. Once again, the new test that we’ve added overlaps and includes the old one, which we could remove.
In the same way, the name of the test could improve. Instead of referencing the specific number, we could simply indicate something more general, that doesn’t tie the test to a specific implementation detail.
Last but not least, we still have a magic number 100, which we will name:
And with this, we’ll have finished a new cycle in which we have already introduced the refactoring phase.
Generate a list of numbers
Our FizzBuzz can already generate a list with 100 elements, but at the moment each of them is literally nothing. It’s time to write a test that forces us to put some elements inside that list.
To do this, we could expect the generated list to contain the numbers from 1 to 100. However, we have a problem: at the end of the development process, the list wil contain the numbers but some of them will be represented by the words Fizz, Buzz, or FizzBuzz. If I don’t take this into account, this third test will start failing as soon as I start implementing the algorithm that converts the numbers. It doesn’t seem like a good solution.
A more promising approach would be: what numbers won’t be affected by the algorithm? Well, those that aren’t multiples of 3 or 5. Thereby, we could choose some of them to verify that they’re included in the untransformed list.
The simplest of them all is 1, which should occupy the first position of the list. For symmetry reasons we’re going to generate the numbers as strings.
The test is very small and fails:
At this point, what change could we introduce in the production code to make the test pass? The most obvious one could be the following:
It’s enough to pass the test, so it suits us.
One problem that we have here is that the number ‘1’ doesn’t appear as such in the test. What it does appear is its representation, but we use its position in num_list, which is a 0-index array. We’re going to make explicit the fact that we’re testing against the representation of a number. First, we introduce the concept of position:
And now the concept of number, as well as its relationship with position:
Now we don’t need to refer to the position at all, just to the number.
We could make the test easier to read. First, we separate the verification:
We extract the representation as a parameter in the assertion, and we make an inline of number, to make the reading more fluent:
As you can see, we’ve work a lot in the test. Now introducing new examples will be very inexpensive, which will help us write more tests and make the process more pleasant and convenient.
We keep generating numbers
Actually, we haven’t yet verified whether the generate method is returning a list of numbers, so we need to keep writing new tests that force us to create that code.
Let’s make sure that the second position is occupied by the number two, which is the next simplest number that’s not a multiple of 3 or 5.
We have a new test which fails, so we’re going to add some code to production so that the tets passes. However, we have some problems with this implementation:
To intervene in it, we’d need to refactor it a little first. At least, extract the response to a variable that we could manipulate before returning it.
But, since the test is failing right now, we can’t refactor. Before that we have to cancel or delete the test that we’ve just created. The easiest would be to comment it out to prevent its execution. Remember, to do any refactorings it’s compulsory that the tests are passing:
Now we can work:
And we activate the test again, which now fails because the number 2 is represented by a ‘1’. The simplest change that I can come up with, right now, is this one. So silly:
The truth is that the test is green. We know that this is not the implementation that will solve the full problem, but our production code is only obligated to satisfy the existing tests and nothing more. So, let’s not get ahead of ourselves. Let’s see what we can do.
To start, the name of the test is obsolete, let’s generalize it:
Now that this has been solved, let’s remember that previously we saw that the concepts of “number” and “representation” were necessary to better define the expected behavior in the tests. We can now introduce them in our production code:
It’s a first step. We can see the limitations of the current solution. For example, why does the 1 have a special treatment? And what will happen if we want to verify other number? There are several problems.
As for the number 1, the key lies in the list of numbers idea. Right now we’re generating a list of constants, but each of the elements of the list should be a correlative number, beginning with 1 until completing the desired number of elements.
And then we’d have to replace each number by its representation. Something like this:
This structure keeps passing the test, but it doesn’t seem very practical. However, we can see a pattern. We need to iterate over the list to give solution:
With the information that we have, we could simply assume that it’s enough to convert the number into a string and put it in its place:
Of course, there are more compact and pythonic ways, such as this one:
But we should be careful, we’re probably getting too ahead of ourselves with this refactoring, and it’ll surely become a source of problems further down the line. For this reason, it’s preferable to keep a more direct and naive implementation, and leave the optimizations and more advanced structures for later, when the behavior of the method is completely defined. So, I would advise you to avoid this kind of approach.
All of this refactoring is done while the tests are green. This means that:
With the test, we describe the behavior that we want to develop
We make the test pass by writing the simplest possible code, as stupidly simple it looks, with the intent of implementing that behavior
We use the green tests as a safety net to restructure the code until we find a better design: easy to understand, maintain, and extend.
Points 2 and 3 are build based on these principles:
KISS: Keep it simply stupid, which means keeping the system as mindless as possible, that is, not trying to add intelligence prematurely. The more mechanical and simple, the better, as long as it meets its needs. This KISS is our first approach.
Gall’s law: every working complex system has evolved from a simpler system that also worked. Therefore, we start with a very simple implementation that works (KISS), and we make it evolve towards a more complex one that works as well, something that we’re sure about because the test keeps passing.
YAGNI: You aren’t gonna need it, which prevents us from implementing more code than strictly necessary to pass the current tests.
But now we have to implement new behaviors.
The test that doesn’t fail
The next number which is not a multiple of 3, 5 o 15 is 4, so we add an example for this:
And the test passes. Good news? It depends. A test that passes just after its creation is always a reason for suspicion, at least from a TDD point of view. Remember: writing a failing test is always the first thing to do. If the test doesn’t fail, it means that:
The behavior is already implemented
It’s not the test we were looking for
In our case, the last refactoring has resulted in the general behavior of the numbers that don’t need transformation. In fact, we can categorize the numbers in these classes:
Numbers that are represented as themselves
Multiples of three, represented as ‘Fizz’
Multiples of five, represented as ‘Buzz’
Multiples of both three and five, represented as ‘FizzBuzz’
Numbers 1 and 2 belong to the first class, so they’re more than enough, since any of the numbers in that class would serve as an example. In TDD we need them both, because they’ve helped us to introduce the idea that we would have to iterate through the number list. However, just one of them would be sufficient for a QA test. For this reason, when we introduce the example of the number 4, we don’t have to add any additional code: the behavior is already implemented.
It’s time to move on to the other classes of numbers.
Learning to say “Fizz”
It’s time for our FizzBuzz to be able to convert the 3 into “Fizz”. A minimal test to specify this would be the following:
Having a failing test, let’s see what minimal production code we could add to pass it:
We’ve added an if that makes this particular case pass. For the time being, with the information that we have, there isn’t any other better way. Remember KISS, Gall and YAGNI to avoid advancing faster than you should.
Regarding the code, there may be a better way to populate the list. Instead of generating a list of numbers and changing it later, perhaps we could initialize an empty list and append the representations of the numbers one by one.
This works. Now num_list becomes kind of pointless as a list. We can make a change:
And remove the temporary variable:
Everything continues to work correctly, as the tests attest.
Saying “Fizz” at the right time
Now we want it to add a “Fizz” when the corresponding number is a multiple of 3, and not just when it’s exactly 3. Of course, we have to add a test to specify this. This time we use the number 6, which is the closest multiple of 3 (and not of 5) that we have.
To pass the test we just have to make a pretty small change. We have to modify the condition to expand it to all of the multiples of three. But we’re going to do it incrementally.
First, we establish the behavior:
With this, the test passes. Now let’s change the code so that it uses the concept of multiple of:
The test keeps passing, which indicates that our hypothesis is correct. Now we can remove the redundant part of the code:
At this point you may want to try other examples from the same class, although it’s not really necessary since any multiple of three is an adequate representative. For this reason, we’ll move on to the next behavior.
Learning to say “Buzz”
This test lets us specify the new behavior:
So, we modify the production code to make the test pass. Same as we did before, we treat the particular case in a particular manner.
Yes, we already know how we should handle the general case of the multiples of five, but it’s preferable to force ourselves to go slowly. Remember that the main objective of the exercise isn’t to solve the list generation, but rather do it guided by tests. Our main interest now is to internalize this slow step cycle.
There’s not much else that we can do now, except for continuing to the next test.
Saying “Buzz” at the right time
At this point, the test is quite obvious, the next multiple of 5 is 10:
And, again, the change in the production code is simple at first:
Next, we perform the refactoring step by step, now that we’ve ensured the behavior:
And then:
And with this refactoring, we can proceed to the next -and last- class of numbers.
Learning to say “FizzBuzz”
The structure is exactly the same. Let’s start with the simplest case: 15 should return FizzBuzz, since 15 is the first number that is a multiple of 3 and 5 at the same time.
The new test fails. Let’s make it pass:
Saying “FizzBuzz” at the right time
And, again, we introduce a test for another case of the “multiples of 3 and 5” class, which will be 30.
This time I’ll jump directly to the final implementation, but you get the idea:
And we have our “FizzBuzz”!
Wrapping up
We’ve completed the development of the specified behavior of the FizzBuzz class. In fact, any other test that we could add now would confirm that the algorithm is general enough to cover all cases. That is, there isn’t any conceivable test that could force us to add more production code: there’s nothing else we must do.
In a real work case, this code would de functional and deliverable. But we can certainly still improve it. The fact that all of the tests are passing indicates that the desired behavior is fully implemented, so we could fearlessly refactor and try to find a more flexible solution. For example, with the following solution it would be easier to add extra rules:
And if you look closely, you can see that it would be relatively easy to modify the class so we could introduce the rules from the outside, as it would be enough to pass the rule dictionary at the moment of instantiating the class, fulfilling the Open for extension and Closed for modification principle. In this case, we’ve allowed for the original rules to be used unless others are not specifically indicated, so the tests continue to pass in exactly the same manner as before.
What have we learned in this kata
The laws of TDD
The red->green->refactor cycle
To use minimal test to make the production code advance
To change the production code as minimally as possible to achieve the desired behavior
To use the refactor phase to improve the code design
Selection of examples and finalization criteria
One of the most frequent questions when you start doing TDD is how many tests do you have to write until you can consider the development to be finished. The short answer is: you’ll have to do all of the necessary tests, and not one more. The long answer is this chapter.
Checklist driven testing
A good technique would be to follow Kent Beck’s advice and write a control list or check-list in which to annotate all of the behaviors that we want to implement. Obviously, as we complete each behavior, we cross it off the list.
It’s also possible that, during the work, we discover that we need to test some other behavior, that we can remove some of the elements of the list, or that we’re interested in changing the order that we had planned. Of course, we can do all of this as convenient.
The list is nothing more than a tool to not have to rely on our memory so much during the process. After all, one of the benefits of doing Test Driven Development is to reduce the amount of information and knowlege that we have to use in each phase of the development process. Each TDD cycle involves a very small problem, that we can solve with pretty little effort. Smalls steps that end up carrying us very far.
Let’s see an example with the Leap Year kata, in which we have to create a function to calculate if a year is a leap year or not. A possible control list would be this one:
Another example for the Prime Factors kata, in which the exercise consists in developing a function that returns the prime factors of a number:
Example selection
For each behavior that we want to implement, we’ll need a certain amount of examples with which to write the tests. In the following chapter we’ll see that TDD has two principal moments: one related to the establishment of the interface of the unit that we’re creating, and other in which we develop the behavior itself. It’s at this moment when we need examples that question the current implementation and force us to introduce code that produces the desired behavior.
A good idea is, therefore, to take not of several possible examples with which to test each of the items of the control list.
But, how many examples are necessary? In QA there are various techniques to choose representative examples with which to generate the tests, but they have the goal og optimizing the relationship between the number of tests and their ability to cover the possible scenarios.
We can use some of them in TDD, although in a slightly different manner, as we’ll see next. Keep in mind that we use TDD to develop an algorithm, and in many cases, we discover it as we go. For that, we’ll need several examples related to the same behavior, in such a way that we can identify patterns and discover how to generalize it.
The techniques that we’re going to look at are:
Partition by equivalence class
This technique relies on one idea: that the set of all possible conceivable cases can be divided in classes according to some criterion. All of the examples in a class would be equivalent between them, so it would suffice to test with only one example from each class, as all are equally representative.
Limit analysis
This technique is similar to the previous one, but paying attention to the limits or boundaries between classes. We choose two examples from each class: precisely those that lie at its limits. Both examples are representatives of the class, but they lets us study what happens at the extremes of the interval.
It’s mainly used when the examples are continuous data, or we care especially about the change that occurs when passing from one class to another. Specifically, it’s the kind of situation where the result depends on whether the value being considered is larger, strictly larger, etc.
Decision table
The decision table is nothing more than the result of combining the possible values, grouped by classes, of the parameters that are passed to the unit under test.
Let’s take a look the election of examples in the case of Leap Year. For this, we being with the list:
Let’s see the first item. We could use any number that’s not divisible by 4:
In the second item, the examples should meet the condition of being divisible by 4:
Let’s pay attention to the next element of the list. The condition of being divisible by 100 overlaps the previous condition. Therefore, we have to remove some of the examples from the previous item.
And the same thing happens with the last of the elements of the list. The examples for this item are the numbers that are divisible by 400. It also overlaps the previous example:
This way, the example list ends up like this:
On the other hand, the example selection for Prime Factors could be this one:
Using many examples to generalize an algorithm
In simple code exercises such as the Leap Year kata, it’s relatively simple to anticipate the algorithm, so we don’t need to use too many examples to make it evolve and implement it. Actually, it would suffice to have an example from each class, as we’ve seen when we talked about the partition by equivalence classes, and in a few minutes we would be done with the problem.
However, if we’re just starting to learn TDD, it’s a good idea to go step by step. The same when we have to face complex behaviors. It’s preferable to take really small baby steps, introduce several examples, and wait to have sufficient information before trying to generalize. Having some amount of code duplication is preferable to choosing the wrong abstraction and keep constructing on top of it.
A heuristic that you may apply is the rule of three. This rule tells us that we shouldn’t try to generalize code until we have at least three repetitions of it. To do it, we’ll have to identify the parts that are fixed and the parts that change.
Consider this example, taken from an exercise from the Leap Year kata. At this point the tests are passing, but we haven’t generated an algorithm yet.
There we have our three repetitions. What do the have in common apart from the if/then structure? Let’s force a small change:
Clearly, the three years are divisible by 4. So we could express it in a different way:
Which is now an obvious repetition and can be removed:
This has been very obvious, of course. However, things won’t always be this easy.
In summary, if we don’t know the problem very well, it can be useful to wait until the rule of three is fulfilled before we start thinking about code generalizations. This implies that, at least, we’ll introduce three examples that represent the same class before we refactor the solution to a more general one.
Let’s see another example from the same kata:
The duplication that isn’t
The divisible by concept is pretty obvious in this occasion and we don’t really need a third case to evaluate the possibility of extracting it. But, the main thing here isn’t actually duplication. Actually, it would have been enough with one example. We have encountered the idea that the condition that is being evaluated is the fact that the year number must be divisible by a certain factor. With this refactor, we make it explicit.
This gets clearer if we advance a bit further.
We find the same structure repeated three times, but we cannot really extract a common concept from here. Two of the repetitions represent the same concept (leap year), but the third one represents exceptional regular duration years.
In search of the wrong abstraction
Let’s try another approach:
If we divide the year by 4, we could propose another idea, since that could help us tell apart the parts that are common from the parts that are different.
It’s weird, but it works. Simpler:
It’s still working. But, what use is it to us?
On the one hand, we still haven’t found a way to reconcile the three if/then structures.
On the other hand, we’ve made the domain rules unrecognizable.
In other words: trying to find an abstraction relying only on the existence of code repetition can lead us to a dead end.
The correct abstraction
As we’ve pointed out before, the concept in which we’re interested is leap years and the rules that determine them. Can we make the code less repetitive? Maybe. Let’s do it again, from the top:
The question is that the “divisible by 400” is an exception to the “divisible by 100” rule:
Which lets us do this and compact the solution a little bit:
Maybe we could make it more explicit:
But now it looks a bit weird, we need to be more explicit here:
At this point, I wonder if this solution hasn’t become too unnatural. On the one hand, the abstraction is correct, but by taking it this far, we’re probably being guilty of a certain amount of over-engineering. The domain of the problem is very small and the rules are very simple and clear. If you compare this:
To this:
I think I would stick with the first solution. That said, in a more complex and harder to understand problem, the second solution might be a lot more appropriate, precisely because it would help us make the involved concepts explicit.
The moral of the story is that we mustn’t strive and struggle to find the perfect abstraction, but rather the one that’s sufficient at that particular moment.
Evolution of the behavior through tests
The TDD methodology is based in work cycles with which we define a desired behavior in the form of a test, we make changes in the production code to implement it, and we refactor the solution once we know that it works.
While we have specific tooling to detect situations in need of refactoring, and even well-define methods to carry it out, we don’t have specific resources that guide the necessary code transformations in a similar manner. That is, is there any process that help us decide what kind of change to apply to the code in order to implement a behavior?
The Transformation Priority Premise1 is an article that suggests a useful framework in this sense. Starting from the idea that as the tests become more specific the code becomes more general, it proposes a sequence of the type of transformations that we can apply every time that we’re in the implementation phase, in the transition from red to green.
The development through TDD would have two main parts:
In the first one we build the class’ public interface, defining how we’re going to communicate with it, and how it’s going to answer us. We analyze this question in it’s most generic way, which would be the data type that it returns.
In the second part we develop the behavior, starting from the most general cases, and introducing the more specific ones later.
Let’s see this with a practical example. We’ll perform the Roman Numerals kata paying attention to how the tests help up guide these two parts.
Constructing the public interface of a test driven class
We’ll always start with a test that forces us to define the class, as for now we don’t need anything else than an object with which to interact.
We run the test to see it fail, and the, we write the empty class definition, the minimum necessary to pass the test.
If we’ve created it in the same file as the test, now we can move it to its place during the refactoring phase.
We can already think about the second test, which we need to define the public interface, that it: how we’re going to communicate with the object once it’s instantiated, and what messages it’s going to be able to understand:
We’re modifying the first test. Now that we have some fluency we can afford this kind of license, so writing new tests is more inexpensive. Well, we check that it fails for the reason that it has to fail (the toRoman message is not defined), and next we write the necessary code to make it pass. The compiler helps us: if we run the test we’ll see that it throws an exception that tells us that the method exists but it’s not implemented. And probably the IDE tells us something about it too, one way or another. Kotlin, which is the language that we’re working with here, ask us directly to implement it:
For now, we remove these indications introduced by the IDE:
And this passes the test. We already have the message with which we’re going to ask RomanNumerals to do the conversion. The next step can be to define that the response we expect should be a String. If we work with dynamic typing or Duck Typing we’ll need a specific test. However, in Kotlin we can do it without tets. It’s enough to specify the return type of the function:
This won’t compile and our current test will fail, so the way to make it pass would be to return any String. Even an empty one.
We may consider this as a refactoring up to a certain point, but we can apply it as if it were a test.
Now we’re going to think about how to use this code to convert arabic numbers to roman notation. Since there is no zero in the latter, we have to start with number 1.
When we run the test, we can see that it fails because the function doesn’t expect an argument, so we add it:
And this passes the test. The public interface has been defined, but we still don’t have any behavior.
Drive the development of a behavior through examples
Once we’ve established the public interface of the class that we’re developing, we’ll want to start implementing its behavior. We need a first example, which for this exercise will be to convert the 1 into I.
To do this, we already need to assign the value to a variable and use an assertion. The test will end up like this:
From null to constant
Right now, RomanNumerals().toRoman(1) returns "", which for all intents and purposes is equivalent to returning null.
What is the simplest transformation that we can make to make the test pass? In few words, we go from not returning anything to returning something, and to pass the test, that something ought to be the value “I”. That is, a constant:
The test passes. This solution might shock you if it’s your first time peeking at TDD, although if you’re reading this book you’ll have already seen more examples of this. But this solution is not stupid.
In fact, this is the best solution for the current state of the test. We may know that we want to build an arabic to roman numeral converter, but what the test specifies here and now is just that we expect our code to convert the integer 1 to the String I. And that’s exactly what it does.
Therefore, the implementation has exactly the necessary complexity and specificity level. What we’re going to do next will be to question it with another example.
But first, we should do a refactoring.
We’ll do it to prepare for what comes next. When we change the example, the response will have to change as well. So, we’re going to do two things: use the parameter that we receive, and, at the same time, ensure that this test will always pass:
We run the test, which should pass without any issues. Moreover, we’ll make a small adjustment to the test itself:
The test continues to pass and we are already left with nothing to do, so we’re going to introduce a new example (something that is now easier to do):
When we run the test we check that it fails because it doesn’t return the expected II. A way to make it pass is the following:
Note that, for now, we’re returning constants in all cases.
Let’s refactor, as we’re in green. First we refactor the test to make it even more compact, and easier to read and extend with examples.
We add yet another test. Now it’s even easier:
We see it fail, and, to make it pass, we add a new constant:
And now, expressing the same, but in a different manner and using only one constant:
We could extract it:
And now it’s easy to see how we could introduce a new transformation.
From constant to variable
This transformation consists in using a variable to generate the response. That is, now instead of returning a fixed value for each example, we’re going to calculate the appropriate response. Basically, we’ve started to build an algorithm.
This transformation makes it clear that the algorithm consists in piling up as many I as number indicates. A way of seeing it:
This for loop could be better expressed as a while, but first we have to make a change. It should be noted that the parameters in Kotlin are final, so we can’t modify them. For this reason, we’ve had to introduce a variable and initialize it to the value of the parameter.
On the other hand, since the i constant is only used once and its meaning is pretty evident, we’re going to remove it.
This way we’ve started to build a more general solution to the algorithm, at least up to the point that’s currently defined by the tests. As we know, it’s not “legal” to accumulate more than 3 equal symbols in the roman notation, so in it’s current state, the algorithm will generate the wrong roman representations if we use it on any number larger than 3.
This indicates that we need a new test to be able to incorporate a new behavior and develop the algorithm further, which is still very specific.
But, what is the next example that we could implement?
From unconditional to conditional
In the first first place we got number 4, which in roman notation is expressed as IV. It introduces a new symbol, which is a combination of symbols in itself. For all we know it’s just a particular case, so we introduce a conditional to separate the flow into two branches: one for the behavior that we already know, and other for the new one.
The test will fail because it tries to convert the number 4 to IIII. We introduce the conditional to handle this particular case.
Oops. The test fails because we have forgotten to subtract the consumed value. We fix it like this, and we leave a note for our future selves:
We advance to a new number:
We check that the test fails for the expected reasons and we get IIIII as a result. To make it pass we’ll take another path, introducing a new conditional because it’s a new different case. This time we don’t forget to subtract the value of 5.
The truth is that we had already used conditional before, when our responses were constant, to choose “which constant to return”, so to speak. Now we introduce the conditional in order to be able to handle new case families, as we’ve already exhausted the capabilities of the existing code to solve the new cases that we’re introducing. And within that execution branch that didn’t exist before, we resort to a constant again in order to solve it.
We introduce a new failing test to force another algorithm advance:
This case is especially interesting to see fail:
We need to include the “V” symbol, something that we can do in a very simple way by changing the == for a >=.
A minimal change has sufficed to make the test pass. The next two examples pass without any extra effort:
This happens because our current algorithm is already general enough to be able to handle these cases. However, when we introduce the 9, we face a different casuistry:
The result is:
We need a specific treatment, so we add a conditional for the new case:
We keep running through the examples:
Being it a new symbol, we handle it in a special manner:
If we take a look at the production code we can identify structures that are similar between them, but we can’t clearly see a pattern that we could refactor and generalize. Maybe we need more information. Let’s proceed to the next case:
This test results in:
To begin, we need to enter the “X” symbol’s conditional, so we make this change:
And this is enough to make the test pass. With numbers 12 and 13 the test continues to pass, but when we reach 14, something happens:
The result is:
This happens because we’re not accumulating the roman notation in the return variable, so in some cases we crush the existing result. Let’s change from a simple assignment to an expression:
This discovery hints that we could try some specific examples with which to manifest this problem and solve it for other numbers, such as 15.
And we apply the same change:
19 also has the same solution. But if we try 20, we’ll see a new error, a rather curious one:
This is the result:
The problem is that we need to replace all of the 10 that are contained in the number by X.
Changing from if to while
To handle this case, the simplest thing to do is changing the if to a while. whileis an structure that is both a conditional and a loop at the same time. if executes the conditioned branch only once, but while does it as long as the condition continues to be met.
Could we use while in all cases?NOw that we’re in green, we’ll try to change all of the conditions from if to while. And the tests prove that it’s possible to do so:
This is interesting, we can see that the structure get more similar each time. Let’s try changing the cases in which we use an equality to see if we can use >= in its place.
And the tests keep on passing. This indicates a possible refactoring to unify the code.
Introducing arrays (or collections)
It’s a big refactoring, the one that we’re going to do here in just one step. Basically, it consists in introducing a dictionary structure (Map, in Kotlin), that contains the various conversion rules:
The tests continue to pass, indication that our refactoring is correct. In fact, we wouldn’t have any error until reaching number 39. Something to be expected, as we introduce a new symbol:
The implementation is simple now:
And now that we’ve checked that it’s working properly, we move it to a better place:
We could keep adding examples that are not yet covered in order to add the remaining transformation rules, but essentially, this algorithm isn’t going to change anymore, so we’re reached a general solution to convert any natural number to roman notation. In fact, this is how it would end up. The necessary tests, first:
And the implementation:
We could try several acceptance tests to verify that it’s possible to generate any roman number:
Small production code transformation can result in big behavioral changes, although to do that we’ll need to spend some time on the refactoring, so that the introduction of changes is as simple as possible.
This kata demonstrates that, as the tests get more specific, the algorithm becomes more general. But, besides that, it’s a wonderful kata to reflect on example selection, and why the tests that pass as soon as we write them aren’t really useful.
On the other hand, the kata reveals a much more intriguing concept: the premise of the priority of the transformations, according to which, in the same way that there are refactorings (which are changes in the structure of a code that don’t alter its behavior), there would also be transformations (that are changes in the code that do produce changes in its behavior).
These transformations would have an order, from simplest to the most complex, and a priority in their application that dictates that we should apply the simpler ones first.
History
The kata was created by Robert C. Martin1 when he was writing a program for his son to calculate the prime factors of a number. Thinking about its development, his attention was caught by the way in which the algorithm evolved and became simpler as it became more general.
Problem statement
Write a class with a generate method that returns a list of the prime factors of an integer. If yo uprefer a more procedural -or even functional- approach, try to write a primefactors function.
To not overcomplicate the exercise, the result may be expressed as an array, list or collection, without having to group the factors as powers. For example:
Hints to solve it
This is a very simple kata, as well as a very potent one: you don’t need many cycles to carry it out, but nevertheless, highlights some specially important features of TDD.
To begin, we can analyze the examples that we could try. In principle, the arguments will be natural numbers. We have three main categories:
Numbers that don’t have any prime factors, the only case is 1.
Numbers that are prime, such as 2, 3, or 5.
Numbers that are product of several prime numbers, such as 4, 6, 8, or 9.
Beside, among non-prime numbers, we find those that are the product of 2, 3, or n factors, repeated or not.
Applying the laws of TDD that we’ve already seen, we’ll start with the smallest possible failing test. Then, we’ll write the necessary production code to make the test pass.
We’ll go through the different cases by writing the test first, and then, the production code that makes it pass without breaking any of the previous tests.
One of the curiosities of this kata is that we can just go through the list of natural numbers in order, taking examples as we go until we consider that we can stop. However, is this the best strategy? Can it lead us to selecting unhelpful examples?
Our objective will be to write a program that decomposes a natural number into its prime factors. For the sake of simplicity, we won’t group the factors as powers. We’ll leave that as a posterior exercise, if you wish to advance a bit further.
Language and approach
We’re going to solve this kata in Javascript, using the Jest testing framework. We’ll create a primefactors function to which we’ll pass the number that we wish to decompose, obtaining as a response an array of its prime factors sorted from lowest to highest.
Define the function
Our first test expects the primefactors function to exist:
Which, as we already know, hasn’t been defined yet:
We introduce it without further ado. For now, in the test file itself:
We haven’t yet communicated with the function in the test, so we’re going to introduce that idea, passing it a first example of a number to decompose, along the result that we expect. The first thing that should draw our attention is, that due to the peculiarities of the definition and distribution of primes among natural numbers, we have a very intuitive way of organizing the examples and writing the test. It’s almost enough to start with number one and advance incrementally.
Number one, moreover, is a particular case (it doesn’t have any prime factor), so it suits us especially well as a first test.
To pass the test we need a minimal implementation of the function:
Note that we don’t even implement the function’s necessity for a parameter. We’re gonna make the test be the one that asks for it first. Meanwhile, we delete the first test, given that it has become redundant.
Define the function’s signature
The second test should help us define the function’s signature. To do so, we need a case in which we expect a response different than [], something we’ll be able to do if we receive a parameter that introduces the necessary variation. Number 2 is a good example with which to achieve this:
To solve this case we need to take into account the parameter defined by the function, which forces us to introduce and use it. In our solution, we handle the case that the previous test states, and we make an obvious implementation to pass the test that we’ve just introduced. We’re postponing the implementation of the algorithm until we have more information:
Obtaining more imformation about the problem
The next case that we’re going to try is decomposing number 3, which is prime like number 2. This test will help us to better understand how to handle these cases:
Now that we have this failing test we’ll make an obvious implementation, such as returning the passed number itself. Since it’s a prime number, this is perfectly correct. There’s not much else to do here.
Introducing a test that doesn’t fail
In the presentation of the kata we’ve divided the cases into categories. Let’s review:
Edge or special cases, such as 1
Prime numbers, like 2, 3, or 5
Non-prime numbers, like 4, 6, or 8
We’ve already covered the first category, since there are no more edge cases to consider.
We haven’t begun to treat the third category yet, and we haven’t done any test with any of its examples.
The second category is the one that we’ve been testing until now. At this point, we could keep selecting examples from this category and trying new cases. But, what would happen? Let’s see it.
The test passes without implementing anything!
It was pretty obvious, wasn’t it? At this moment, the so-called algorithm, doesn’t do anything else than consider all numbers as primes. For this reason, if we keep using prime numbers as examples, nothing will force us to make changes to the implementation.
When we add a test that doesn’t fail, it means that the algorithm that we’re developing is already general enough to solve every case from that category, and therefore, it’s time to move on to a different category that we cannot yet successfully handle. Or, if we’ve already discovered all of the possible categories, it means that we’ve already finished!
We’re going to start using example from the non-prime category. But, we’re also going to refactor the solution to be able to see these categories in a more explicit manner:
Questioning our algorithm
The first non-prime that we have is number 4, and it’s the simplest of them all for many reasons. So, this time we write a test that will fail:
There are many ways of approaching this implementation. For example, we have this one that, while especially naive, is effective:
In spite of its simpleness, it’s interesting. It help us understand that we have to distinguish between primes and non-primes in order to develop the algorithm.
Nevertheless, it has a very unkept look. Let’s try to organize it a bit more neatly:
It basically says: if a number is higher than 1 we try to decompose it. If it’s 4, we return its factorization. And if it’s not, we return the same number, for it will be prime. Which is true for all of our current examples.
Discovering the multiples of 2
The next number that we can decompose is 6. A nice thing about this kata is that every non-prime number gives us a different response, and that means that every test is going to provide us with some information. Here it is:
Let’s begin with the naive implementation:
There’s nothing wrong with doing it this way. On the contrary, this way of solving the problem starts highlighting regularities. 4 and 6 are multiples of 2, so we want to introduce this knowledge in the shape of a refactoring. And we could do this thanks to our tests, that demonstrate that the function already decomposes them correctly. So, we’re going to modify the code without changing the behavior that we’ve already defined through tests.
Our first try relies on the fact that the first factor is 2 and is common between them. That is, we can design an algorithm that processes multiples of 2 and, for now, we assume that the remainder of that first division by 2 is the second of its factors, whichever it is.
To do so we have to introduce an array-type variable with which to deliver the response, to which we’ll be adding the factors as we discover them.
This has been a first step, now it’s clearer how it would work, and we can generalize it by expressing it like this:
This refactoring almost works, but the test for number 2 has stopped passing. We fix it, and we advance a step further.
This new implementation passes all the tests, and we’re ready to force a new change.
Introducing more factors
Among the non-prime numbers we could consider several groupings to the effect of selecting examples. There are some cases in which the numbers are decomposed as the product of two prime factors, and others in which they’re decomposed as the product of three or more. This is relevant because our next examples are 8 and 9. 8 is 2 * 2 * 2, while 9 is 3 * 3. The 8 forces us to consider the cases in which we can decompose a number in more than two factors, while the 9, those in which new divisors are introduced.
In principle, we don’t care much about which case to start with. Maybe the key is to pick the case that seems the easiest to manage. Here we’ll start by decomposing the number 8. This way we keep working with the 2 as the only divisor, which at the moment looks a little easier to approach.
Let’s write a test:
To implement, we have the change an if for a while. That is, we have to keep dividing the number by 2 until we can’t do it anymore.
This change is quite spectacular because, while being very small, is also very powerful. By applying this, we can decompose any number that is a power of 2, nothing more and nothing less. But this is not the final goal, we want to be able to decompose any number, and to do so we must introduce new divisors.
New divisors
At this point, we need an example to force us introduce new divisors. Earlier we had left number 9 for later, and now the time has come to resume it. 9 is a good example because it’s a multiple of 3 without being a multiple of 2. Let’s write a test that we’re sure will fall.
Again, let’s start with an implementation that’s very naive but works. The important thing is to pass the test, proof that we’ve implemented the specified behavior.
With the previous code, all tests are green. At this point it’s made obvious that each new divisor that we wish to introduce, such as 5, will need a repetition of the block, so let’s refactor it into a general solution.
This algorithm looks pretty general. Let’s test a couple of cases:
We’ve added two tests that pass. It seems that we’ve solve the problem, but… don’t you have the sensation of having leaped too far with this last step?
The shortest path isn’t always the fastest
The development road in TDD isn’t always easy. The next test is sometimes very obvious, but other time we are faced with several alternatives. Choosing the wrong path can lead us to a dead end or, like it has happened here, to a point where we have to implement too much in one go. And as we’ve already discussed, the changes that we add to the production code should always be as small as possible.
In the sixth test, we decided to explore the path of “repetitions of the same factor” instead of forcing other prime factors to appear. Would it have been better to follow that ramification of the problem? Let’s try it, let’s rewind and go back to the situation as it was before that sixth test.
Introducing new factors, second try
This is the version of the production code that we had when we arrived at the sixth test:
Now, let’s go down the other route:
The following production code let’s us pass the new test, and all the previous ones:
Now we could refactor:
More than two factors
To introduce more than two factors we need a test:
The necessary change is a simple one:
And we can rid ourselves of that last if, as it’s covered by the while that we’ve just introduced:
If we add new tests we’ll see that we can refactor any number without problems. That is to say, after this last change and its refactoring, we’ve finished the development of the class. Has this path been any better? Partly yes. We’ve come up with an almost identical algorithm, but I’d say that the journey has been smoother, the jumps in production code have been less steep, and everything has gone better.
Do we have any criteria to select new examples?
From the traditional QA point of view, there is a series of methods to choose the test cases. However, these methods are not necessarily applicable in TDD. Remember how we started this book: QA and TDD are not the same despite using the same tools and overlapping a lot. TDD is a methodology to drive software development, and the most adequate tests to do it can be slightly different to those that we would use to verify the behavior of the finished software.
For example, our categorization of the numbers into primes and non-primes may be more than enough in QA, but in TDD, the case of non-prime numbers could be further subdivided:
Powers of a prime factor, such as 4, 8 or 9, which involve just one prime number multiplied several times by itself.
Products of different primes, such as 6 or 10, which involve more than one prime number.
Products of n prime factors, with n larger than two.
Each of these categories forces us to implement different parts of the algorithm, which can set up problems that are more or less difficult to solve. Even, a bad choice could lead us to a dead end.
Nevertheless, nothing prevents us from rewinding and going back if we get stuck. When we are faced with reasonable doubt about going one way or another in TDD, it’s best to take note about the state of the development, and mark that point as a point of return in case we get ourselves into some code swamp. Just go back and think again. Making mistakes is also information.
What have we learned in this kata
With this kata we’ve learned how, as we add tests and they become more specific, the algorithm becomes more general
We’ve also seen that we get better results when we prioritize the simplest transformations (changes in the production code)
The choice of the first test
In TDD, the choice of the first test is an interesting problem. In some papers and tutorials about TDD they tend to talk about “the simplest case” and don’t elaborate further. But in reality, we should get used to looking for the smallest test that can fail, which is usually a very different thing.
All in all, it doesn’t seem like a very practical definition, so it probably deserves a more thorough explanation. Is there any somewhat objective way to decide which is the minimum test that can fail?
In search of the simplest test that can fail
Suppose the Roman Numerals kata. It consists in creating a converter between decimal and roman numbers. Suppose that the class is going to be RomanNumeralsConverter, and that the function is called toRoman, so that it would be used more or less like this:
According to the “simplest case” approach, we could write a test not unlike this one:
Looks right, doesn’t it? However, this is not the simplest test that can fail. Actually, there are at least two simpler test that could fail, and both of them will force us to create production code.
– Let’s take a moment to think about the test that we’ve just written: what will happen if we execute it?
– It’s going to fail.
– But, why will it fail?
– Obvious: because we haven’t even defined the class. When we try to execute the test it cannot find the class.
– Can we say that the test fails for the reason that we expect it to fail?
– Hmmm. What do you mean?
– I mean that the test establishes that we expect it to able to convert the decimal 1 to the roman I. It should fail because it can’t do the conversion, not because it can’t find the class. Actually, the test can fail for at least three causes: that the class doesn’t exist, that the class doesn’t have the toRoman method, and that it doesn’t return the result “I”. And it should only fail because of one of them.
– Are you telling me that the first test should be just instantiating the class?
– Yes.
– And what’s the point of that?
– That the test, when it fails, can only fail for the reason that we expect it to fail.
– I have to think about that for a moment.
– No problem. I’ll wait for you at the next paragraph.
That is the question. In spite of it being the simplest case, this first test can fail for three different reasons that make us consider the test as not-passing (remember the second law: not compiling is failing), therefore, we should reduce it so that it fails for just one cause.
As a side note: it’s true that it could fail for many other causes, such as typing the wrong name, putting the class in the wrong namespace, etc. We assume that those errors are unintentional. Also, running the test will tell us the error. Hence the importance of running the tests, seeing how they fail, and making sure they fail properly.
Let’s go a bit slower, then.
The first test should just ask that the class exists and can be instantiated.
In PhpUnit, a test without assertions fails or is at least considered risky. In order to make it pass clearly we specify that we’re not going to make assertions. In other languages this is unnecessary.
To pass the test I have to create the class. Once created, I will see the test pass and then I’ll be able to face the next step.
The second test should force me to define the desired class method, although we still don’t know what to do with it or what parameters it will need.
Again, this test is a little special in PHP. For example, in PHP and other languages we can ignore the return value of a method if it’s not typed. Other languages will require us to explicitly type the method, which in this step could be done using void so that it doesn’t return anything at all. Other strategy would be to return an empty value with the correct type (string in this case). There are languages, on the other hand, that require the result of the method to be used -if it’s even returned-, but they also allow you to ignore it.
An interesting issue is that once we’ve passes this second test, the first one becomes redundant: the case is already covered by this second test, and if a change in code made it fail, the second test would fail as well. You can delete it in the refactoring phase.
And now is when the “simplest case” makes sense, because this test, after the others, will fail for the right reason:
This is already a test that would fail for the expected reason: the class doesn’t have anything defined to do the conversion.
Again, once you make this test pass and you’re in the refactoring phase, you can delete the previous one. It has fulfilled its tasks of forcing us to add a change to the code. Additionally, in case that second test fails due to a change in the code, our current test will also fail. Thereby, to can also delete it.
I guess now you’re asking yourself two questions:
Why write three tests to end up keeping the one that I had though at the beginning
Why can I delete those tests
Let’s do this bit by bit, then.
Why start with such small steps
A test should have only one reason to fail. Imagine it as an application of the Single Responsibility Principle. If a test has more than one reason to fail, chances are that we’re trying to make a test provoke many changes in the code at once.
It’s true that in testing there’s a technique called triangulation, in which, precisely, several possible aspects that must occur together are verified in order to consider that the test passes or fails. But, as we’ve said at the beginning of the book, TDD is not QA, so the same techniques are not applicable.
What we want in TDD is that the tests tell us what is the change that we have to make in the software, and this change should be as small and unambiguous as possible.
When we don’t have any written production code, the smallest thing we can do is creating a file in which we define the function or class that we’re developing. It’s the first step. And even then, there are chances that we don’t do it correctly:
we make a mistake in the name in the file name
we make a mistake in its location in the project
we mistype the class’s or the function’s name
we make a mistake when locating it in a name space
…
We have to prevent all of those problems just to be able to instantiate a class or be able to use a function, but this minimal test will fail if either of them happen. When correcting all the thing that can occur, we’ll make the test pass.
However, if the test can fail for more reasons, the potential sources of error will multiply, as there are more things that we need to do to make it pass. Also, some of them can depend and mix between themselves. In general, the necessary change in production code will be too large with a red test, and therefore, making it become green will be more costly and less obvious.
Why delete these first tests
In TDD many tests are redundant. From the point of view of QA, we test too much in TDD. In the first place, because many times we use several examples of the same class, precisely to find the general rule that characterizes that class. On the other hand, there are tests that we do in TDD that are already included in other ones.
This is the case of these first tests that we’ve just shown.
The test that forces us to define the software unit for the first time is included in any other test we can imagine, for the simple reason that we need the class in order to be able to execute those other tests. Put into a different way, if the first test fails, then all of the rest will fail.
In this situation, the test is redundant and we can delete it.
It’s not always easy to identify redundant tests. In some stages of TDD we use examples of the same class to move the development, so we may reach a point in which some of those tests are redundant and we can delete them as they’ve become unnecessary.
On the other hand, a different possibility is to refactor the test using data providers or other similar techniques with which to cheapen the creation of new examples.
The happiness of the paths
Happy path testing
We call the flow a program happy path when there aren’t any problems and it’s able to execute the entire algorithm. The happy path occurs when no errors are generated during the process because all of the handled data are valid and their values lie within their acceptable ranges, nor are there any other failures that can affect the software unit that we’re developing.
In TDD, happy path testing consists in choosing examples that must return predictable values as a result, which we can use to test. For example, in the kata Roman Numerals, one possible happy path test would be:
Very frequently we work with happy path tests in the kata. This is so because we’re interested in focusing in the development of the algorithm and that the exercise doesn’t last too long.
Sad path testing
On the contrary, sad paths are those program flows that end badly. When we say that they end badly, we mean that an error occurs and the algorithm cannot finish executing.
However, the errors and the ways in which production code deals with them are a part of the behavior of the software, and in real work they deserve to be considered when using the TDD methodology.
In that sense, sad path testing would be precisely the choice of test cases that describe situations in which the production code has to handle wrong input data or responses from their collaborators which we also have to manage. An example of this would be something like this:
That is: our roman numeral converter cannot handle negative numbers nor numbers with decimal digits, and therefore, in a real program we’d have to handle this situation. In the example, the consequence is throwing an exception. But it could be any other form of reaction that suits the purposes of the application.
NIF
Start with the sad paths and postpone the solutions
This kata consists, originally, in creating a Value Object to represent the NIF, or Spanish Tax Identification Number. Its usual form is a string of eight numeric characters and a control letter, which helps us ensure its validity.
As it’s a Value Object, we want to be able to instantiate it from a string and guarantee that it’s valid in order to use it without problems anywhere else in the code of an application.
One of the difficulties of developing these kinds of objects in TDD is that sometimes they don’t need to implement significant behaviors, and it’s more important to make sure that they are created consistent.
The algorithm to validate them is relatively simple, as we’ll see, but we’re mostly interested in how to rid ourselves of all of the strings of characters that can’t form a valid NIF.
History
This kata is original, and it came about by chance while I was preparing a small introduction to TDD and live coding workshop about the benefits of using the methodology in day-to-day work.
While I was delving into this example, two very interesting questions made themselves apparent:
Starting with tests that discard the invalid cases allows to avoid having to deal with the development of the algorithm as soon as we start, getting them out of the way and reducing the problem space. The consequence is that we end up designing more resilient objects, with cleaner algorithms, contributing to preventing the apparition of bugs in the production code.
The mechanism of postponing the solution of each problem until the next text becomes apparent. That is: to make each new test pass, we introduce an inflexible implementation that allows us to pass that test, but in order for the previous ones to keep passing, we are forced to refactor the code that we already had.
Problem statement
Create a Nif class, which will be a Value Object, to represent the Spanish Tax Identification Number. This number is a string of eight numeric characters and a final letter that acts as a control character.
This control letter is obtained by calculating the remainder of diving the numeric part of the NIF by 23 (mod 23). The result indicates us in which row of the following table to look up the control letter.
Remainder
Letter
0
T
1
R
2
W
3
A
4
G
5
M
6
Y
7
F
8
P
9
D
10
X
11
B
12
N
13
J
14
Z
15
S
16
Q
17
V
18
H
19
L
20
C
21
K
22
E
There’s a special case of NIF, which is the NIE, or Foreigners’ Identification Number. In this case, the first character will be one of the letters X, Y and Z. For the calculation of mod 23, they are replaced by the values 0, 1 and 2, respectively.
Hints to solve it
This kata can help us learn several things, both about TDD and about data types and validation.
In kata, it’s common to ignore issues such as data validation in order to simplify the exercise and focus in the development of the algorithm. In a real development we cannot do this: we should actually put a lot of emphasis on validating the data at different levels, both for security reasons and to avoid errors in the calculations.
So we’ve included this kata precisely to practice how to use TDD to develop algorithms that first handle all of the values that they cannot manage, both from the structural point of view as well as from the domain one.
Specifically, this example is based on the fact that the effective behavior of the constructor that we’re going to create is assigning the value that we pass to it. All else it does is check that the value is suitable for that, so it acts as a barrier for unwanted values.
Being a Value Object, we’ll try to create a class to which we pass the candidate string in the constructor. If the string turns out to be invalid for a NIF, the constructor will throw an exception, preventing the instantiation of objects with inadequate values. Fundamentally, our first tests will expect exceptions or errors.
From the infinite amount of strings that this constructor could receive, only a few of them will be valid NIF, so our first goal could be to delete the most obvious ones: those that could never fit because they have the wrong number of characters.
In a second stage, we’ll try to control those that could never be NIF due to their structure, basically due to them not following the “eight numeric character plus one final letter” pattern (taking into account the exception of the NIE, which could indeed have a letter at the beginning).
With this, we’d have everything we need to implement the validation algorithm, as we’d only have to handle strings that could be NIF from a structural point of view.
One thing that the previous steps guarantees us is that the tests won’t start failing when we introduce the algorithm, as its examples could never be valid. If we started using strings that had that valid NIF structure, even if we’d written them randomly, we could run out into one string that was valid by chance, and when implementing the corresponding part of the algorithm that test would fail for the wrong reason.
In this kata we’re going to follow an approach that tackles the sad paths first, that is, we’re going to handle the cases that would cause an error first. Thus, we’ll first develop the validation of the input structure, and then move on to the algorithm.
It’s usual that the kata ignore issues such as validation, but in this case we’ve decided to go for a more realistic example, in the sense that it’s a situation with which we have to deal quite often. In the code of a project in production, the validation of input data is essential, and it’s worth practicing with an exercise that focuses almost exclusively on it.
Besides, we’ll see a couple of interesting techniques to transform a public interface without breaking the tests.
Statement of the kata
Create a Nif class, which will be a Value Object to represent the Spanish Tax Identification Number. It’s a string of eight numeric characters, with a final letter that acts as a control character.
This control letter is obtained by calculating the remainder of diving the numeric part of the NIF by 23 (mod 23). The result indicates us in which row of the following table to look up the control letter. In this table I’ve also included some simple examples of valid NIF so you can use them in the tests.
Numeric part
Remainder
Letter
Valid NIF example
00000023
0
T
00000023T
00000024
1
R
00000024R
00000025
2
W
00000025W
00000026
3
A
00000026A
00000027
4
G
00000027G
00000028
5
M
00000028M
00000029
6
Y
00000029Y
00000030
7
F
00000030F
00000031
8
P
00000031P
00000032
9
D
00000032D
00000033
10
X
00000033X
00000034
11
B
00000034B
00000035
12
N
00000035N
00000036
13
J
00000036J
00000037
14
Z
00000037Z
00000038
15
S
00000038S
00000039
16
Q
00000039Q
00000040
17
V
00000040V
00000041
18
H
00000041H
00000042
19
L
00000042L
00000043
20
C
00000043C
00000044
21
K
00000044K
00000045
22
E
00000045E
You can create invalid NIF simply by choosing a numeric part and adding a letter that doesn’t correspond it.
Invalid example
00000000S
00000001M
00000002H
00000003Q
00000004E
There’s an exception: the NIF for foreigners (or NIE) may start by the letters X, Yor Z, which for the purposes of the calculations are replaced by the numbers 0, 1 and 2, respectively. In this case, X0000000T is equivalent to 00000000T.
To prevent confusion we’ve excluded the letters U, I, O and Ñ.
A string that starts with a letter other than X, Y, Z, or that contains alphabetic characters in the central positions is also invalid.
Language and focus
We’re going to solve this kata using Go, so we’re going to clarify its result a bit. In this example we’re going to create a data type Nif, which will basically be a string, and a factory function NewNif which will allow us to build validated NIF starting from an input string.
On the other hand, testing in Go is also a bit peculiar. Even though the language includes support for testing as a standard feature, it doesn’t include common utilities such as asserts.
Disclaimer
To solve this kata I’m going to take advantage of the way in which Go handles errors. These can be returned as one of the responses of a function, which forces you to always handle them explicitly.
Designing tests based on error messages is not a good practice, as they can easily change, making tests fail even when there hasn’t really been an alteration of the functionality. However, in this kata we’re going to use the error messages as a sort of temporary wildcard on which to rely, making them go from more specific to more general. By the end of the exercise, we’ll be handling only two possible errors.
Create the constructor function
In this kata, we want to start by focusing on the sad paths, the cases in which we won’t be able to use the argument that’s been passed to the constructor function. From all the innumerable string combinations that the function could receive, let’s first give an answer to those that we know won’t be of use because they don’t meet the requirements. This answer will be an error.
We’ll start by rejecting the strings that are too long, those that have more than nine characters. We can describe this with the following test:
In the nif/nif_test.go file
For now we’ll ignore the function’s responses, just to force ourselves to implement the minimum amount of code.
As expected, the test will fail because it doesn’t compile. So we’ll implement the minimum necessary code, which can be as small as this:
nif/nif.go file
With this, we get a foundation on which to build.
Now we can go a step further. The function should accept a parameter:
We make the test pass again with:
And finally return:
the NIF, when the one we’ve passed is valid.
an error in the case it’s not possible.
In Go, a function can return multiple values and, by convention, errors are also returned as the last return value.
This provides a flexibility that is not common to find in other languages, and let us play with some ideas that are at least curious. For example, for now we’re going to ignore the response on the function and focus exclusively on the errors. Our next test is going to ask the function to return only the error without doing anything with it. The if is there, for now, to keep the compiler from complaining.
This test tells us that we must return something, so for now we indicate that we’re going to return an error, which can be nil.
Let’s go a step further by expecting a specific error when the condition defined by the test is met: the string is too long. With this, we’ll have a proper first test:
Again, the test will fail, and to make it pass we return the error unconditionally:
And with this, we’ve already completed our first test and made it pass. We could be a little more strict in the handling of the response to contemplate the case in which err is nil, but it’s something that doesn’t have to affect us for the time being.
At this point, I’d like to draw your attention to the fact that we’re not solving anything yet: the error is returned unconditionally, so we’re postponing this validation for later.
Implement the first validation
Our second test has the goal of forcing the implementation of the validation we’ve just postponed. It may sound a little weird, but it showcases that one the great benefits of TDD is the ability to postpone decisions. By doing so we’ll have a little more information, which is always an advantage.
This test is very similar to the previous one:
This test already forces us to act differently in each case, so we’re going to implement the validation that limits the strings that are too long:
Again, I point out that at the moment we’re not implementing what the test says. We’ll do it in the next cycle, but the test is fulfilled by returning the expected error unconditionally.
There’s not much else we can do in the production code, but looking at the tests we can see that it would be possible to unify their structure a bit. After all, we’re going to make a series of them to which we pass a value and expect a specific error in response.
A test to rule them all
In Go there is a test structure similar to the one provided by the use of Data Providers in other languages: Table Test.
With this, it’s now very easy and fast to add tests, especially if they are from the same family, like in this case in which we pass invalid candidate strings and check for the error. Also, if we make changes to the constructor’s interface, we only have a place in which to apply them.
With this, we’d have everything ready to continue developing.
Complete the validation of the length and start examining the structure
With the two previous tests we verify that the string that we’re examining meets the specification of having exactly nine characters, although that’s not implemented yet. We’ll do it now.
However, you may be asking yourself why don’t we simply test that the function rejects the strings that don’t fulfill it, something that we could do in just one test.
The reason is that there are actually two possible ways in which the specification may not be met: the string has more than nine characters, or the string has less. If we do a single test, we’ll have to choose one of the two cases, so we cannot guarantee that the other is fulfilled.
In this specific example, in which we’re interested in just one value, we could raise the dichotomy between strings with length nine and strings with lengths other than nine. However, it’s common for us to have to work with interval of values that, moreover, can be open or closed. In that situation, the strategy of having two or even more tests is far safer.
In any case, in the point at which we are, we need to add another requirement in the form of a test in order to drive the development. The two existing tests define the string’s valid length. The next test asks about its structure.
And with the refactoring that we’ve just made, adding a test is extremely simple.
We’ll start at the beginning. Valid NIF begin with a number, except a for a subset of the that begin with one of the letters X, Y, and Z. One way of defining the test is the following:
To pass the test, we first solve the pending problem of the previous one:
Here we have a pretty clear refactoring opportunity that would consist in joining the conditionals that evaluate the lengths of the string. However, that will cause the test to fail since we would at least have to change an error message.
The not very clean way of changing the test and production code a the same time
One possibility is to temporarily “skip” our self-imposed condition of only doing refactorings with all tests in green, and making changes in both production and test code at the same time. Let’s see what happens.
The first thing would be to change the test so it expects a different error message, which will be more generic and the same for all of the cases that we want to consolidate in this step:
This will cause the test to fail. An issue that can be solved by changing the production code in the same way:
The test passes again and we are ready to refactor. But we’re not going to do that.
The safe way
Other option is to make a temporary refactoring in the test in order to make it more tolerant. We just make it possible to return a more generic error apart from the specific one.
This change allows us to make the change in production without breaking anything:
The test keeps passing, and now we can perform the refactoring.
Unify the string length validation
Unifying the conditionals is easy now. This is the first step, which I include here to have a reference of how to do this in case we were working with an interval of valid lengths.
But it can be done better:
And a little more expressive:
Finally, a new refactoring of the test to contemplate these changes. We remove our temporary change, although it’s possible that we need it again in the future.
Note that we’ve been able to make all these changes without the tests failing.
Moving forward with the structure
The code is pretty compact, so we’re going to add a new test that lets us move forward with the validity of the structure. The central fragment of the NIF is composed only of numbers, exactly seven:
We run it to make sure that it fails for the right reason. To pass the test, we have to solve the previous test first, so we’ll add code to verify that the first symbol is either a number or a letter in the X, Y and Z set. We’ll do it with a regular expression:
This code is enough to pass the test, but we’re going to make a refactoring.
Invert the conditional
It makes sense that, instead of matching a regular expression that excludes the non-valid strings, we match an expression that detects them. If we do that, we’ll have to invert the conditional. To be honest, the change is pretty small:
The end of the structure
We’re reaching the end of the structural validation of the NIF, we need a test that tells us which candidates to reject depending on its last symbol, which leads us to solving the pending problem from the previous test:
From the four non-valid letters we take the U as an example, but it could be I, Ñ, or O.
However, to make the test pass, what we do is make sure that the previous one is fulfilled. It’s easier to implement that separately:
Compacting the algorithm
This passes the test, and we are met by a familiar situation which we’ve already solved before: we have to make the errors more generic with the temporary help of some extra control in the test:
We change the error messages in the production code:
Now we unify the regular expression and the conditionals:
We can still make a small but important change. The last part of the regular expression, .*, is there to fulfill the requirement of matching the whole string. However, we don’t really need the quantifier as one character is enough:
And this reveals a detail, the regular expression only matches strings that have exactly nine characters, so the initial length validation is unnecessary:
We’ve walked so far… only to retrace our steps. However, we didn’t know this in the beginning, and that’s where the value of the process lies.
Lastly, we change the test to reflect the changes and, again, remove our temporary support:
Finishing the structural validation
We need a new test to finish the structural validation part. The existing tests guarantee that the strings are correct, but the following validation already involves the algorithm that calculates the control letter.
This test should ensure that we can’t use a structurally valid NIF with an incorrect control letter. When we presented the kata we gave some examples, such as 00000000S. This is the test:
And here is the code that passes it:
And now, of course, it’s time to refactor.
Compacting the validation
This refactoring is pretty obvious, but we have to temporarily protect the test again:
We make the error more general to be able to unify the regular expressions and the conditionals:
And now we join them while the tests keep passing:
With this we finish the structure validation, and we’d have the implementations of the mod23 algorithm left. But to do that we need a little change of approach.
Let’s look on the bright side
The algorithm is, in fact, very simple: calculate the remainder (of dividing by 23), and use it as an index to look up the corresponding letter in a table. Implementing it in only one iteration would be easy. However, we’re going to do it more slowly.
Until now, our tests have been pessimistic: they expected incorrect NIF results in order to pass. But now, our new test must be optimistic, that is, they’re going to expect that we pass them valid NIF examples.
At this point, we’ll introduce a change. You may remember that for now we’re only returning the error, but the final interface of the function will return the validated string as a NIF type that we shall create for the occasion.
That is, we must change the code so that it returns something, and that something has to be of a type that doesn’t exist yet.
To make this change without breaking the test, we’re going to make use of a somewhat contrived refactoring technique.
Changing the public interface
In the first place, we extract the body of NewNif to another function:
The tests keep on passing. Now, we introduce a variable:
With this, we can make it so that FullNewNif returns the string without affecting the test, because it stays encapsulated within NewNif.
The tests still pass, and we’re almost finished. In the test, we change the usage of NewNif by FullNewNif.
And they’re still passing. Now, the function returns the two parameters that we wanted and the tests remain unbroken. We can proceed to remove the original NewNif function.
And use the IDE tools to change the function name from FullNewNif to NewNif.
NOW it’s time
Our goal now is to push the implementation of the mod23 algorithm. This time, the tests expect the string to be valid. Also, we want to force the return of Nif type objects instead of strings.
As a first step, we change the production code to introduce and use the Nif type:
Now the test will fail because we haven’t validated anything yet. To make it pass we add a conditional:
A note about Go: custom types can’t have nil value, they should have an empty value instead. For this reason, we return an empty string in the case of an error.
Moving forward with the algorithm
For now we don’t have many reasons to refactor, so we’re going to introduce a test that should help us move forward a bit. In principle, we want it to drive us to separate the numeric part from the control letter.
One possibility would be to test another NIF that ends with the letter T, such as 00000046T.
To pass the test, we could do this simple implementation:
And now we start refactoring.
More refactoring
In the production code, we can take a look at what’s different and what’s common between the examples. Both of them have T as their control letter, and their numeric part is divisible by 23. Therefore, their mod23 will be 0.
Now we can perform the refactoring. A first step.
And, after seeing the tests pass, the second step:
With this change, the tests pass, and the function accepts all of the valid NIF that end with a T.
Validating more control letters
In this kind of algorithm there isn’t much of a point in trying to validate all of the control letters, but we can introduce another one to force ourselves to understand how the code should evolve. We’ll try a new one:
This test is already failing, so let’s make a very simple implementation:
This already gives us an idea about what we’re getting at: a map between letters and the remainder after dividing by 23. However, in many languages strings can work as arrays, so it would be sufficient to have a string with all the control letters properly sorted, and access the letter that’s in the position indicated by the modulus.
A refactoring, for even more simplicity
First we implement a simple version of this idea:
We have our first version! We’ll add the full letter list later, but for now we can try to fix up the current code a little. First, we make controlMap constant:
Actually, we could extract all of the modulus calculation part to another function. First we rearrange the code to better control the extraction:
Remember to verify that the tests keep passing. Now we extract the function:
And we can compact the code a little bit further while we add the rest of the control letters. At first sight it could look like “cheating”, but in the end it’s nothing more than generalizing an algorithm that could be enunciated as “take the letter that’s in the position given by the mod23 of the numeric part”.
With this we can already validate all of the NIF excepting the NIE, which begin with the letters X, Y or Z.
NIE support
Now that we’ve implemented the general algorithm let’s try to handle its exceptions, which aren’t all that many. NIE begin with a letter that, to all effects of the calculation, gets replaced by a number.
The test that seems to be the most obvious at this point is the following:
The X0000023T is equivalent to 00000023T, will this affect the result of the test?
We run the test and… surprise? The test passes. This happens because the conversion that we do in this line generates an error that we’re currently ignoring, but causes the numeric part to still be equivalent to 23 (whose mod23 is 0 and should be paired with the letter T).
In other languages the conversion doesn’t fail, but assumes the X as 0.
In any case, this opens up two possible paths:
remove this test, refactor the production code to treat the error, and see that it fails when we put it back
test other example that we know will fail (Y0000000Z) and make the change later
Possibly, in this case the second option would be more than enough, since our structural validations would be assuring that the error couldn’t appear once the function was completely developed.
However, it could interesting to introduce the handling of the error. Managing errors, including those that could never happen, is always good practice.
So, let’s cancel the test and introduce a refactoring to handle the error:
Here’s the refactor. In this case, I handle the error causing a panic, which is not the best way of managing an error, but allows us to make the test fail and to force ourselves to implement the solution.
If we run the tests we can check that they’re still green. But, if we reactivate the last test, we can see it fail:
And this already forces us to introduce a special treatment for these cases. It’s basically replacing the X with a 0:
It can be refactored by using a Replacer:
At this point, we could make a test to force us to introduce the rest of the replacements. It’s cheap, although it’s ultimately not very necessary for the reason we discussed earlier: we could interpret this part of the algorithm as “replacing the initial letters X, Y and Z with the numbers 0, 1 and 2, respectively”.
We only need to add the corresponding pairs:
After a short while of refactoring, this would be a possible solution:
What have we learned in this kata
Use sad paths to move development
Use table tests in Go to reduce the cost of adding new tests
A technique to change the returned errors by a more general one without breaking the tests
A technique to change the public interface of the production code without breaking the tests
In earlier chapters we mentioned the laws of TDD. Originally these laws were two, in Kent Beck’s formulation:
Don’t write a line of new code unless you first have a failing automated test.
Eliminate duplication.
Essentially, what Kent Beck proposed, was to first define a small part of the specification through a test, implement a very small algorithm that satisfies it, and then, revise the code in search of duplication cases to refactor into a more general and flexible algorithm.
And this is, more or less, the way Martin Fowler defined the Red-Green-Refactor cycle:
Write a test for the next piece of functionality that you wish to add.
Write the production code necessary to make the test pass.
Refactor the code, both the new and the old, so that all’s well structured.
This statement seems to assume that the refactoring is, so to speak, the end of each stage of the process. But, paradoxically, if we interpret the cycle literally, we’ll fall into a bad practice.
The function of refactoring in TDD
In general, in Test Driven Development it’s favored that both tests and changes in production code are as small as possible. This minimalist approach is beneficial because it allows us to work with a light cognitive load in each cycle, while we learn and reach a more extensive and deeper comprehension of the problem, postponing decisions to a moment at which we’re knowledgeable enough to face them.
Usually, our small TDD steps let us make very simple code changes every time. Many times these changes are obvious and lead us to implementation that we could consider naive. However, as simple or rough they might seem, these implementations do pass the tests, and therefore meet the specifications. We could ship this code if we needed to, because the behavior has been developed.
And once we make the last test pass and all of them are green, we’re in good condition to refactor. This green state gives us freedom to change the shape of the implementation, assuring that we’re not accidentally changing the achieved functionality.
The refactoring phase is there, precisely, to evolve those naive implementations and turn them into better designs, making use of the safety net provided by the passing tests.
Which refactorings to do
In each cycle there are many possible refactorings. Obviously, during the first phases they will be smaller, and we might even think that they’re unnecessary. However, it’s wise to take the opportunity when it presents itself.
We can perform many types of refactorings, such as:
Replace magic numbers with constants.
Change variable and parameter names to better reflect their intentions.
Extract private methods.
Extract conditionals to methods when they become complex.
Flatten nested conditional structures.
Extract conditional branches to private methods.
Extract functionality to collaborators.
Refactoring limits
Sometimes, an excess of refactoring can lead us to an implementation that’s too complicated and prevents us from advancing the TDD process. This happens when we introduce patterns prematurely without having finished the development first. It would be a premature refactoring similar to the premature optimization, generating code that’s hard to maintain.
We could say that there are two kinds of refactoring involved:
One kind with limited reach, applicable in each red-green-refactor cycle, whose function is to make the algorithm more legible, sustainable, and capable to evolve.
The other kind, which will take place once we’ve completed all of the functionality, and whose objective is to introduce a more evolved and pattern-oriented design.
Another interesting question in the introduction of language-exclusive features, which in principal we’d also like to leave until that final phase. Why leave them for that moment? Precisely, because they can limit our capability to refactor a code if we’re not yet sure about towards where it could evolve.
For example, this construction in Ruby:
It could be refactored -in fact it’s recommended to do so- in this way. I think it’s really beautiful:
In this case, the structure represents the idea of assigning a default value to the variable, something that we could also achieve in this way, which is common in other languages:
The three variations make the tests pass, but each of them puts us in a slightly different spot regarding future requirements.
Por example, let’s assume that our next requirement is to be able to introduce several names. One possible solution would be to use splat parameters, that is, let the function admit an undefined number of parameters that will later be presented in the method as an array. In Ruby this is expressed like this:
This declaration, for example, is incompatible with the third variant, as the splat operator doesn’t admit a default value and we’d have to re-implement that step, which would lead us back to using one of the other variants.
In principle this doesn’t seem like that big of an inconvenience, but it means undoing all of the logic determined by that structure, and depending on the development stage that we’re in, it can even lead us to dead ends.
The other options are a little less inconvenient. Apart from changing the signature, the only thing we have to change is the question (nil? by empty?) and the default value, which instead of a string, becomes an array of strings. Of course, to finish we have to join the collection in order to show it in the greeting.
Or the rubified version:
Apart from that, at this point it would be necessary to refactor the name of the parameter so it more clearly reflects it new meaning:
So, as a general recommendation, it’s convenient to seek equilibrium between the refactors that help us maintain the code clean and legible, and those that we could consider over-engineering. An implementation that is not the most refined could be easier to change in the long run as more and more tests get introduced than a very evolved one.
Don’t over-refactor ahead of time.
When is the right moment to refactor
To refactor, the sine qua non condition is that all of the existing tests are passing. At this moment we’re interested in analyzing the state of the implementation and applying the most appropriate refactorings.
If a test is red, it’s telling us that a part of the specification hasn’t been achieved yet, and therefore, we should be working on doing that instead of refactoring.
But there’s a special case: when we add a new failing test and we realize that we need to do some previous refactoring to be able to implement the most obvious or simple solution.
What do we do in this case? Well, we have to take a step back.
The step back in the Red-Green-Refactor cycle
Let’s assume a simple example. we’re going to start the Test double greeting kata. We begin with a test with which to define the interface:
Our next step is to create the simplest implementation that passes the test, which we could achieve like this:
The next requirement is to handle the situation where a name is not provided, in which case it should offer some sort of anonymous formula such as the one we use as an example in this test:
In the first place, the test fails because the argument isn’t optional. But on top of that, it’s not even used in the current implementation, and we need to use it to fulfill this test’s most obvious requirement. We have to execute several preparatory steps before we’re able to carry out the implementation:
Make the name parameter optional
Use the parameter in the return value
The thing is that the new requirement provides us with new information that would be useful to refactor what’s already been developed with the first test. However, as we have a failing test, we shouldn’t be doing any refactoring. For this reason we delete or cancel the previous test, for example, by commenting it out:
By doing this, we’re back to having all tests in green and we can apply the necessary changes, which don’t alter the behavior that’s been implemented so far.
We make the name parameter optional.
And here, we start using the parameter:
This has allowed us to advance from our first rough implementation to another that’s flexible enough to still pass the first test, while setting up some better conditions to reintroduce the next one:
Obviously the test fails, but this time the reason for failure is, precisely, that we’re missing the code that solves the requirement. Now, the only thing we have to do is check if we’re receiving a name or not, and act consequently.
In a sense, it turns out that the information from the future, that is, the new test that we design to introduce the next piece of functionality affects the past, that is, the adequate state of the code that we need to continue. This forces us to consider the depth of the necessary refactor before facing the new cycle. In this situation, it’s best to go back to the last passing test, cancelling the new on, and work on the refactoring until we’re better prepared to keep going.
Bowling game
The refactoring phase
In the previous kata, in general, the TDD cycles we’re execute in quite a fluent manner.
However, you may have noticed that sometimes, making a new test pass involved doing a certain refactoring in the production code before we were able to face the necessary changes to make the test pass.
The kata that we’re about to practice, apart from being one of the classic ones, has a peculiarity: almost every new piece of functionality that we add, every new test, requires a relatively large refactoring of the algorithm. This creates a dilemma: we can’t refactor if the test is not green.
Or, put in another way: sometimes we’ll run into a situation where a new test provides us with some new information that we didn’t have before, and shows us a refactoring opportunity that we have to take before implementing the new piece of functionality.
For this reason, with the Bowling Game kata we’ll learn how to handle this situation and take a step back to refactor the production code using what we learn when we think about the new test.
In a sense, the information from the future will help us change the past.
History
The Bowling kata is very well known. We owe it to Robert C. Martin, although there’s a very popular version by Ron Jeffries in the book Adventures in C#.
Problem statement
The kata consists in creating a program to calculate the scores of a Bowling game, although to avoid complicating it too much, only the final result is calculated without performing any validations.
If you’re not familiar with the game and its scoring system, here are the rules that you need to know:
In each game, the player has 10 turns called frames.
In each frame, the player has two tries, or rolls, to knock down the 10 pins (which results in a total of 20 ball rolls throughout the whole game).
In each roll, the knocked down pins are counted.
If no pin was knocked down, that’s a Gutter.
If the player hasn’t knocked all of the bowling pins by the end of their second roll, the score of the frame is just the sum of both rolls. For example 3 + 5 = 8 points in the frame.
If the player knocks down all 10 pins in the frame (for example 4 + 6), that’s called spare, and grants a bonus equal to the score of the next roll, the first one of the next frame (10 from the current frame, plus 3 from the next throw, for example, equalling 13). That is, the final score of a spare is calculated after the following roll, and in a sense, that roll is counted twice (once as a bonus, and a second time as a regular roll).
If the player knocks down all 10 pins in a single roll, that’s a strike. In that case, the bonus is equal to the score of the whole next frame (for example, 10 + (3 + 4) = 17). After a strike, the frame ends without a second roll.
In the case that this happens in the tenth and last frame, there may be one or two extra rolls as necessary.
Hints to solve it
The Bowling Game is an interesting kata because of the challenge of handling the spares and the strikes. When we detect one of these cases we have to look up the result of the following rolls, and therefore we need to keep track of the history of the match.
This will force us to change the algorithm several times in quite a radical way, which leads us to the problem of how to manage the changes without breaking the TDD cycles, that is, refactoring the production code while keeping the tests green.
To better understand what we’re talking about, a situation in which we might find ourselves would be the following:
After a couple of cycles, we start testing the spare case. At this point, we realize that we need to make a relatively large change to the way that we were calculating the total score. Ultimately, what happens is that we have to refactor at while having a test that’s not passing. But, this contradicts the refactoring phase definition, which requires all tests to be passing.
The solution, fortunately, is very simple: take a step back.
Once we know that we want to refactor the algorithm, it’s enough to comment out the new test to deactivate, and keeping the last test green, refactor the production code. When we’re done, we bring the new test back to life, and develop the new behavior.
The kata consists in creating a program to calculate the scores of a Bowling game, although to avoid complicating it too much, only the final result is calculated without performing any validations.
A brief remainder of the rules:
Each game has 10 frames, each one with 2 rolls.
In each turn, the knocked down bowls are counted, and that number is the score * 0 points is a gutter * If all the pins are knocked down in two rolls, it’s called a spare, and the score of the next roll is added as a bonus * If all the pins are knocked down in just one roll, it’s called a strike, and the score of the next two rolls is added as a bonus
If a strike or spare are achieved in the last frame, there are extra rolls.
Language and approach
To do this kata, I’ve chose Ruby and RSpec. You may notice that I have a certain preference towards the *Spec family testing frameworks. The thing is that they have been designed with TDD in mind, considering the tests as specifications, which helps a lot to escape the mindset of thinking about the tests as QA.
Having said that, there’s no problem in using any other testing framework, such as those from the *Unit family.
On the other hand, we’ll use object oriented programming.
Starting the game
At this point, the first test should be enough to force us to define and instantiate the class:
The test will fail, forcing us to write the minimum production code necessary to make it pass.
And once we’ve made the test pass, we move the class to its own file, and make the test require it:
We’re ready for the next test.
Let’s throw the ball
For our BowlingGame to be useful, we’ll need at least two things:
A way to indicate the result of a roll, passing the number of knocked down pins, which would be a command A command results in an effect in the state of an object, but doesn’t return anything. We’ll need an alternative way to observe that effect.
A way to obtain the score at a given moment, which would be a query. A query returns an answer, so we can verify that it’s the one that we expect.
You may be wondering: which of the two should we tackle first?
There is not a fixed rule, but a of seeing it could be the following:
Query methods return a result, so their effect can be tested, but we have to make sure that the returned responses won’t make it harder for us to create new failing tests.
On the other hand, command methods are easy to introduce with a minimum amount of code without having to worry about their effect in future tests, except from making sure that the parameters that they receive are valid.
So, we’re going to start by introducing a method to throw the ball, which simply expects to receive the number of knocked down pins, which can be 0. But to force that, we must first write a test:
And the minimum necessary code to make the test pass is, simply, the definition of the method. Basically, we can now communicate to BowlingGame that we’ve thrown the ball.
Time to refactor
In this kata, we’ll pay special attention to the refactoring phase. We have strike find a balance so that certain refactorings don’t condition our chances to make the code evolve. In the same way that premature optimization is a smell, premature over-engineering also is.
The production code doesn’t offer any refactoring opportunity yet, but the tests start showing a pattern. The game object could live as an instance variable, and be initialized in a setup method of the specification or test case. Here, we use before.
And this makes the first test redundant:
With this, the specification will be more manageable.
Counting the points
It’s time to introduce a method that lets us check the game scoreboard. We call it from a failing test:
The test will fail, since the score method doesn’t exist.
And it will keep failing because it has to return 0. The minimum to make it pass is this:
The world’s worst thrower
Many of the solutions of this kata jump directly to the point where they start to define the behavior of BowlingGame after the 20 rolls. We’ve chosen a path of smaller steps, and we’re going to see what it entails.
Our next test will try to make it possible to obtain a scoreboard after 20 throws. A way to do it is to simulate them, and the simplest simulation would be to consider all them as failed rolls, that is, not a single pin would be knocked down and the final score would be 0.
This seems like a good test to start with:
But it’s not. We run it, and it passes in the first try.
This test doesn’t force us to introduce any changes in the production code because it doesn’t fail. Ultimately, it’s the same test that we had before. However, in a way it’s a better test, since our objective is to make score return the results after all of the rolls.
Organizing the code
We just remove the previous test for being redundant, as that behavior would already be implicitly contained in the one that we’ve just defined.
As the test hasn’t required us to write any production code, we need a test that does fail.
Teaching our game to count
We need to expect a result different than zero in score to be forced to implement new production code. From all the possible results of a complete bowling game, maybe the simplest one to test is the case where every throws only knocks down one pin each. This way, we expect the final score to be 20, and there isn’t any chance for extra points or throws to be generated.
This test already fails because there’s nothing counting and accumulating the points of each roll. Therefore, we need to set a variable, which initializes as zero and accumulates the results.
But, hold on a second… Aren’t these too many things?
A step back to reach further
Let’s review, to pass the current failing test, we need to:
Add a variable to the class to store the scores
Initialize it to 0
Accumulate the results in it
Those are many things to add in single cycle while having a failing test.
The thing is, actually, we could forget about this test for a moment, and go back to the previous state, when we were still in green. To do so, we comment out the new test so it doesn’t get executed.
And now we proceed to the refactor. We start by changing the constant 0 by a variable:
We can improve this code, storing the points that were obtained in the throw in the variable. This code still passes the test and involves a minimal change:
Recovering a cancelled test
Now we do run the fourth test, observing that it fails again:
The necessary change in the code is smaller now. We have to initialize the variable in construction, so that each game starts at 0 and then accumulates the points. Note that apart from the constructor, it will be enough to add a + sign.
Again in green, knowing that we’re already accumulating points.
Getting more comfortable
If we observe the tests, we see that it might be useful to have a method to roll the ball several times with the same result. So we extract it, and of course, we use it:
How to handle a spare
Now that we’re sure that our BowlingGame is able to accumulate the points achieved in each throw, it’s time to keep going. We can start to handle special cases like, for example, how to process a spare, that is, knocking down the ten pins with the two rolls that are in a frame.
So, we write a test to simulate this situation. The simplest would be to imagine that the spare occurs in the first frame, and that the result of the third roll is the bonus. To make things easier, the rest of the rolls in the game are 0, so we don’t introduce any strange scores.
Here’s a possible test:
The test fails because score returns 13 points when the should be 16. Right now, there isn’t any mechanism that counts the throw after the spare as a bonus.
The problem is that we shouldn’t be counting the points by roll, but rather by frame, in order to know if a frame has resulted in a spare or not, and to act consequently. Moreover, now it’s not enough to simply add the points. Instead, we have to pass the counting responsibility to the score method, so that roll is limited to storing the partials, leaving the logic of calculating points at the frame level to score.
Introducing the concept of frame
First, we go back to the previous test, temporarily cancelling the one that’s falling right now:
Let’s refactor. In the first place, we change the name of the variable:
The tests continue to pass. Now we change its meaning, and move the sum to score:
We check that the tests keep passing. It could be a good time to introduce the concept of frame. We know that there’s a maximum of 10 frames.
With this change, the tests still pass, and now we have access to the score by frame. It looks like we’re ready to reintroduce the previous test.
Continue handling spare
We reactivate the failing test.
Now we’re better set to introduce the desired behavior through a pretty small change:
Adding an if block is sufficient to make the test pass.
Removing magic numbers and other refactorings
At this point, with all tests in green, we can make several improvements to the code. Let’s go bit by bit:
Let’s give meaning to some magic numbers that are in the production code:
The calculation of the frame score could be extracted to a method, which would save us the temporary variable:
We can give meaning to the sum of the points of each of the frame’s throws, as well as to the question of it being a spare or not. Also, we can rubify the code a little:
The truth is that this is crying out to be extracted to a Frame class, but we’re not going to do it right now, as we could be committing a smell due to an excess of design.
On the other hand, looking at the test, we can see some points of improvement. Such as being more explicit in the example:
And with this, we finish the refactoring. Next, we want to handle the strike case.
Strike!
A strike implies knocking down all of the pins in a single throw. In this case, the bonus equals the sum of the points obtained in the two following rolls. The next test sets up an example:
This time the test fails because the production code calculates a total of 17 points (10 from the strike, plus 7 more from the two next throws). However, it should be counting that 7 twice: the bonus and the normal score.
Now, we have everything that we need in the production code, and in principle, we shouldn’t have to go back. Just introduce the necessary changes. Fundamentally, we’re interested in detecting that the strike has happened.
Reorganizing the game knowledge
Our current production code lets us pass the tests, so we’re ready to fix its structure
Let’s start by making some things about the strike more explicit:
The structure of the frame’s score calculation is not very clear, so we’re going to go back and change it so it’s more expressive:
This refactor makes it clear that strike and spare have a different structure, which makes them harder to understand and manage. We change spare to make them the same, and while we’re at it, we also remove the magic numbers.
Now we can extract methods to make the calculations more explicit:
The world’s best player
In principle, our current development is sufficient. However, it’s convenient to have test to certify it. For example, this new test corresponds to a perfect game: all of the rolls are strikes:
When we run it, the test passes, which confirms us that BowlingGame is working as expected.
With all of the tests passing and the functionality completely implemented, we could try to evolve the code towards a better design. In the following example, we’ve extracted a Rolls class which is basically an array that also has all of the score calculation methods that we’ve been extracting so far:
What have we learned in this kata
Refactoring is the the design stage in classic TDD, it’s the moment at which, once we’ve implemented a behavior, we reorganize the code so that it’s clearer and better expressed
We must take the refactoring opportunities right when we detect them
We refactor both the test and the production code
Greetings
A functional kata to rule them all
The concept of pure function is of great interest to Test Driven Development as it forces us to think about a behavior that must evolve while the only things we can know about it -from the point of view of the test- are its current inputs and outputs. This is common to every classic TDD development, since it relies on black box tests. That is, we don’t take into account how the implementation of our unit under development works, we just interact with it through its public interface.
This is why I propose it as the final exercise of this series, because it helps train everything that we’ve learned in the previous ones, adding an extra restriction to force us not to use the resources that we’d have if we were using and object-oriented approach, such as keeping a state or extracting behavior to dependencies.
Moreover, given that the requirements change in each iteration, it forces us to refactor constantly to be able to introduce the necessary behavioral changes.
History
This kata isn’t very well known. I’ve found it in TestDouble, where Nick Gauthier is mentioned as the author.
Problem statement
The formulation of this kata is very simple. We have to create a pure function greet() that returns a string with a greeting. As a parameter, it must accept the name of the person that we want to greet.
Next, we’ll add requirements that will force us to extend the algorithm to support them, just through the function’s input and output. Each requisite will have be accompanied by an example. They are the following:
Requirements
input
output
1. Interpolate name in a simple greeting
“Bob”
Hello, Bob.
2. If no name is passed, return some generic formula
null
Hello, my friend.
3. If we’re yelled at, yell back
“JERRY”
HELLO, JERRY!
4. Handle two names
“Jill”, “Jane”
Hello, Jill and Jane.
5. Handle any number of names, using Oxford commas
“Amy”, “Brian”, “Charlotte”
Hello, Amy, Brian, and Charlotte.
6. Allow mixing regular and yelled names, but separate the answers
“Amy”, “BRIAN”, “Charlotte”
Hello, Amy and Charlotte. AND HELLO BRIAN!
7. If a name contains a comma, split it
“Bob”, “Charlie, Dianne”
Hello, Bob, Charlie, and Dianne.
8. Allow escaping the commas of #7
“Bob”, “\“Charlie, Dianne\””
Hello, Bob and Charlie, Dianne.
Hints to solve it
Part of the interest of this kata lies in working on one requirement at a time, so it’s important to not get ahead of ourselves and go one by one.
The difficulty: solving it without creating any extra units, only through the greet() interface. Each of the requirements lets us build a test that forces us to extend the behavior, although we can create as many tests as we see fit.
On the other hand, that “step back” that we talked about in the Bowling kata becomes very important. When we solve a requirement, making the corresponding test pass, we’ll find out that we need to pave the way for the implementation of the next one, while keeping all current tests green.
Summarizing:
Focus on a requirement each time, in the proposed order.
Once achieved, refactor to make the next requisite easier: make it so that the change is easy (this might be hard), and then make the easy change, as Kent Beck would say.
The formulation of this kata is very simple. We have to create a pure function greet() that returns a string with a greeting. As a parameter, it must accept the name of the person that we want to greet.
Next, we’ll add requirements that will force us to extend the algorithm to support them, just through the function’s input and output. Each requisite will have be accompanied by an example. They are the following:
Requirements
input
output
1. Interpolate name in a simple greeting
“Bob”
Hello, Bob.
2. If no name is passed, return some generic formula
null
Hello, my friend.
3. If we’re yelled at, yell back
“JERRY”
HELLO, JERRY!
4. Handle two names
“Jill”, “Jane”
Hello, Jill and Jane.
5. Handle any number of names, using Oxford commas
“Amy”, “Brian”, “Charlotte”
Hello, Amy, Brian, and Charlotte.
6. Allow mixing regular and yelled names, but separate the answers
“Amy”, “BRIAN”, “Charlotte”
Hello, Amy and Charlotte. AND HELLO BRIAN!
7. If a name contains a comma, split it
“Bob”, “Charlie, Dianne”
Hello, Bob, Charlie, and Dianne.
8. Allow escaping the commas of #7
“Bob”, “\“Charlie, Dianne\””
Hello, Bob and Charlie, Dianne.
Language and approach
We’re going to solve this kata with Scala and the FunSite framework. We’ll write it using a functional approach.
Basic greeting
The way this kata is presented provides us with practically all the test cases we might need. At this point, I believe that we can start with a relatively long jump.
This is our first test, in which we assume that the function will be a method from the Greetings class in the greetings package.
In any case, when using very strongly typed languages, many times we won’t be able to begin with smaller tests, as the compiler itself would force us to introduce more code. But, on the other hand, the strict typing lets us safely ignore those very same tests. In fact, you may consider that the strongly typed system is, in a certain way, a testing system.
The test will fail, as expected. In this case, we’ll create the minimum necessary code to make it pass in one go:
Scala doesn’t let us define a function without arguments and then call it with one, so we’re forced to include it in the signature. Otherwise, we return the string expected by the test so that it turns green.
Generic greeting
The second case consists in handling the situation where no name is passed, so the greeting should be a generic one.
The first thing we do is observe that the test fails due to the fact that greet expects a parameter that we’re not passing. This indicates that it should be optional.
Our first impulse would be to correct that and allow an optional parameter to be passed. But, we have to take into account that if we do that the test will continue to fail.
Therefore, what we’re going to do, is to discard this last test momentarily, and refactor our current code while we keep the first test passing.
Use the parameter
We deactivate the test:
And we do the refactoring. In Scala it’s possible to set default values, eliminating the need to pass a parameter.
The only thing left would be to make effective use of the parameter, this time, through an interpolation.
Back to the generic greeting
We turn the second test on again to be able to implement requirement number two, which consists in a generic greeting when no names are passed.
The test won’t pass, but the necessary change to make it do so is very simple:
It’s very important to take note of this detail. We’ve made a very small change, but in order for it to be small, we performed a refactoring while protecting ourselves with the previous test. It’s very common to go and try to do that refactoring while the new test isn’t passing, but that’s bad practice because that way we can never be sure about what we’re doing and why it could be failing.
Answering with a yell
This third test introduces the new requirement of responding in a different manner to those names expressed in all capitals:
We make sure that the test fails for the right reason before starting to write the production code. This is one possible approach:
Having arrived at this point, let’s see what refactoring opportunities we have. This leads us to this very simple solution:
For the time-being there’s not much else that we can do with the information that we have, so let’s move on and examine the next requisite.
Be able to greet two people
Requisite number four asks us to handle two name, which changes the greeting chain slightly. Of course, it provides us with an example with which to design a test.
It’s possible that, while you’re writing the test, the IDE itself warns you that it’s not right to pass two arguments when the function’s signature only allows one, which on top of that is optional. In any case, the execution of the test will fail due to a compilation error.
As we’ve already seen in other occasions, the best way to tackle this it to go back to the previous test and fo a refactoring with which to prevent the problem. So, we temporarily cancel the test that we’ve just introduced.
Getting ready for several names
And we refactor towards an implementation that allows us to introduce two parameters. The easiest way to do so is to use splat parameters. However, this will force us to change the algorithm, as the parameters will be presented as a Seq of Strings object. On top of that, we change the name of the parameter.
This is a naive reimplementation, it’s sufficient to let us pass the test, but it could be further developed to better match the style of the language. One of the best things about TDD is precisely this, a great facility to sketch out functional implementations, that might be rough but help us reflect on the problem and experiment with alternative solutions.
To improve it a bit, first we’re going to extract the if condition to a nested function, which is not only more expressive, but also easier to reuse if needed:
The question now is, should we reintroduce the fourth test, or should we keep on refactoring to support the changes that we need?
A refactoring before proceeding
The last piece of refactoring has allowed us to support lists of names, but we’d need to change the approach to be able to handle lists of shouted names.
Up until now, we don’t decide if we need to yell until we’re building the greeting. However, it’s possible that we’re interested in separating the names first by whether they have to be shouted or not.
So, what we do is split the name list in two -yelled and regular names- and adapt the rest of the code to fit this.
With this, we should be better prepared to tackle the fourth test, so we reactivate it.
Reintroducing a test
When we turn the fourth test back on again, what we could have predicted happens: the greeting will be directed towards just one person, which will be precisely the last of the two.
The result is:
That is, the test fails for the correct reason, indicating that we have to introduce a new change in the code to process and concatenate the list of names. Thanks to the previous refactorings, it’s easy to introduce:
It’s important to note that, at this point, we’re not trying to get ahead of the next requirements, we’re just solving the current problem. Only when we introduce the next test and learn new thing about the behavior that we have to implement to the function, we’ll maybe consider going back and refactor the previous changes as we need.
Handle an indeterminate amount of numbers
The fifth requirements consists in handling an arbitrary number of names, with a small change in the greeting format. We introduce a new test to specify it:
The result of the test is:
We can start from the next change:
This breaks the previous test and doesn’t pass the new one, which indicates us that the last element of the list requires special treatment:
Let’s do this literally, that is: let’s separate the last element:
However, this change makes the last test pass, at the same time that it breaks the previous and first one. The problem is that the case of the normal greeting and the two-people greeting can’t follow the same pattern. We’re robbing Peter to pay Paul.
Since we’re starting to break tests that we’re already in green, it’s best to go back to the stage in the code at which the four previous tests were passing.
What this back and forth tells us is that there’s two kinds of cases that have a different treatment.
Lists of 2 or less names.
Lists of more than 2 names.
It’s simplest to acknowledge and embrace this in the code itself:
Again, a rough and naive implementation lets us pass all of the tests, invoking such a simple mechanism as postponing the generalization might be. It’s now, after having achieved the desired behavior, when we can start trying to analyze the problem and search for a more general algorithm.
As we want to focus in the part of the algorithm that concatenates the names inside of the greeting, we’ll first do the following refactoring, extracting the target block of code to an inline function:
The most interesting part is to have specifically isolated the name concatenation. Let’s do a couple more changes. Right now we’re directly acting on the normal sequence that’s in the greet function’s namespace, and therefore, is global within the inner concatenate function:
After having made sure that the tests keep passing, let’s make the different cases that they handle explicit. Right now, the idea of “only one name” is covered implicitly by the two-name case. Our objective here is to better understand the regularities in the three situations:
Let’s take a small step further in the case of the two names:
In Scala this can be expressed more succinctly using match... case:
And a little bit more:
Shout to the shouters, but only to them
In the previous test we’ve tackled the problem of generalizing the algorithm for any number of names and make it more expressive without breaking the achieved functionality. It’s time to introduce a new requirement through a new test:
This test fails, as we could expect. It’s interesting to note that we were already prepared for this case and we were already treating the “screaming” greetings separately. According to what we deducted from the example, we could treat the same way we did the “non-screaming”, taking into account that the two cases may appear simultaneously. After a couple of tries, we arrive at this:
Separate names that contain commas
The next requirement that we’re asked is to separate the names that contain commas. To get a better hold of this, it’s basically like allowing the user to pass the names as an indeterminate number of strings, as well as in the shape of one long string that contains several names. This doesn’t actually alter the way in which we generate the greeting, but rather the way in which we prepare the input data.
Therefore, it’s time to add a test to exemplify this new requisite:
We run the test to check that it doesn’t pass, and we wonder about how to solve this new case.
In principle, we could go through the list of people and split each of them alongside the comma. As this will generate a collection of collections, we flatten it. There are methods in Scala to do so:
And here’s the test, that passes without a problem.
Once we’ve checked that the solution works, we refactor the code a little:
Escaping commas
The eighth requisite is to allow the previous behavior to be avoided if the input text is escaped. Let’s see the case in the shape of a test:
Again, this affects the preparation of the data before the assembly of the greeting. The solution that comes to mind is to detect the situation in which the string comes escaped first, and then replace the comma for an arbitrary character before doing the split. Once done, we restore the original comma.
In this case, we achieve it with a regular expression, replacing the comma for the # symbol, and restoring it after.
With this, we complete all of the requirements. We can do a small refactoring:
One of the things that comes into attention in this kata is that the functional approach allows us to achieve relatively large behavioral changes with comparatively small changes in the production code.
What have we learned in this kata
In this kata we’ve learned to postpone the generalization until we have more information about the algorithm that we’re developing
We’ve applied the techniques that we’ve learned in previous kata
We’ve checked that a strong type system can save us a few tests
Outside-in TDD
The outside-in TDD methodology tries to boost the communicative nature of object-oriented programming, stressing the messages between collaborating objects and paying attention to the design of the system.
To do this, it starts from the outside, creating an acceptance test that describes what is about to be developed, and establishing a double cycle in which we alternate between the acceptance and unitary levels. In the unitary level we design the collaboration between objects, deciding the responsibility attribution in each phase of the iteration. For this purpose test doubles are used, mocks, setting certain expectations for them.
The most renowned author about this approach is Sandro Mancuso, who introduces it in several publications and conferences1.
Outside-in TDD doesn’t contradict the classic approach, but introduces a methodology that’s more applicable to real software development and provides it with context, emphasizing the design necessities.
On the other hand, it’s possible to perform an outside-in following the classic rules, seeking design during the refactoring phases. It’s no usual to find examples of this. One of them is this one from Sandro Mancuso himself with the Rover kata2, although it’s not a comprehensive application.
TDD approaches
The Test Driven Development methodology is based on a relatively small set of rules or principles. But an aspect that isn’t explicitly defined is the way in which this can be applied to different development situations.
Thus, for example, the way in which we can direct the development of a class or a function through tests is very evident. A good deal of the kata in this book and, in general, the entry-level TDD kata in general, do exactly that. The problem arrives when we jump to the real world, a point at which many people fail to find benefits in the introduction of TDD in their development process.
The key issue here is that a user story doesn’t usually consist of developing a class and integrating it within the existing code, but rather, the usual is developing features that involve a set of components including some kind of interface to the outside world (UI, API), as well as use cases, entities and domain services, among others.
This leads to a very simple question: where to start?
The different ways of answering this questions could be reduced to three, not as distant between them as one might think. In fact, they’re not mutually exclusive.
Classic TDD or Detroit School
This approach goes by both names because it its, so to speak, the TDD original model propose by the founders of the Extreme Programming paradigm (Kent Beck, Ward Cunningham, Ron Jeffries), born in the context of the Chrysler Comprehensive Compensation System project in Detroit.
Following this philosophy, a complex project would usually be approached by defining the necessary software units and creating each one of them through a standard TDD process.
Taking a very simplistic example: imagine that our task is designing an API endpoint.
This would mean creating, at least, a controller, a use case, one or two entities, and their corresponding repositories.
In this classic TDD approach, one the necessary components have been determined, we would begin creating them in dependency order, starting from the domain entities and advancing outwards. That is to say, if to build a unit I need another unit, I will build the latter first. Since the dependencies point to the domain, it would be appropriate to start solving the problem in the domain layer and advance by going out towards the outermost layers.
Some of the features that characterize this model are:
It’s tested against the units’ public APIs, using black box testing. This implies that we don’t make assumptions in the test about the way the unit is implemented.
Special emphasis on the refactoring phase, in which the design is introduced. We must refactor as soon as we have green tests, as small as the opportunity may look.
Keeps the use of test doubles to a minimum, essentially limiting them to architectural boundaries.
Development goes from the inside and outwards. Prioritizes the identification and development of domain logic.
It focuses on the state and outcomes of objects and their methods.
This approach provides TDD’s expected benefits:
Work in small and manageable increments.
Generate a safety net with many regression tests.
Possibility of refactoring the implementation with great safety.
As for drawbacks, it should be noted:
Tests don’t really help drive the design, but rather the implementation of the units. The design is done during the refactoring phase and can lead to the extraction of unit collaborators that are tested through the unit’s public interface.
We run the risk of creating software units that are too large, something that can be addressed by applying refactoring intensively, specially extracting to private methods and collaborators when possible.
Also, we run the risk of creating unnecessary functionality in the innermost units by not being clear about the requirements of the components that depend on them. It contradicts a bit the principle of interface segregation, which precisely promotes that they are defined by their consumers’ needs.
Problems may arise when integrating the components.
Outside-in, London School or mockist
It’s origin also lies within the extreme programming community, but in this case, the Londoner one. It owes its name to the fact that it favors a methodology based on starting from the needs of the consumers of a system.
In general, the outside-in methodology states that a complex project would be approached by defining its outermost interface and working inwards, discovering and defining the necessary units on the way with the help of doubles.
Some features that characterize this model are:
The interactions between the units are tested, also called white box testing. That is, the assertions verify the messages that some components send to others.
The refactoring phase is less important, and the design is done while tests are red.
Test doubles are heavily used, we have to decide which collaborators manage a unit at each moment, and doubles are created in order to discover and establish their interfaces. Real classes are implemented subsequently using a classic TDD process in which the dependencies are doubled first and implemented later. For this reason it’s also known as Mockist TDD.
Development goes from the outside and inward, protected by an acceptance test.
It focuses on communication between objects, so it may even be considered more of an OOP approach, in Alan Kay’s original sense.
Benefits:
It provides us with a work approach that fits specially well in multidisciplinary teams and is more business-oriented.
Reduces or eliminates the final product’s integration problems.
Lowers the chance of writing unnecessary code, the interfaces are more compact.
Introduces consideration for design early on in the development process.
We pay more attention to interactions between objects. Having to use doubles first in order to design their interfaces helps us make them more concise and easy to manage.
Fits Behavior Driven Development very well.
Drawbacks:
The refactoring cost is higher because of its focus on interactions, and the tests tend to be more fragile due to their coupling to the implementation. However, we have to think that these interactions are necessary, and above all, they have been designed and decided by us, so they’re reasonably stable implementations.
Behavior Driven Development
It could be said that if we begin outside-in development from a more external step, we arrive at Behavior Driven Development.
In its main two schools, TDD is a methodology focused on the technical process of developing software. But BDD goes a step further by integrating business into development.
Schematically it’s still TDD. It begins with a test and the development is driven by new tests. The difference is that in BDD we ask ourselves about behaviors or features in which we’re interested, and we describe them in business language through examples. In fact, there’s a structured language with this very same purpose: Gherkin.
These descriptions are translated in the shape of acceptance tests and are developed from there, through a methodology that’s quite similar to outside-in which, in turn, can use TDD’s classic approach when it’s time to implement the specific software units. All in all, the kind of unit tests that BDD favors tend to use a “specification through examples” style as opposed to assertions.
In practice, BDD is outside-in TDD but taking the people that are interested in the software and their needs as a starting point, not the contracts or the implementation’s technical requirements.
There exist specific tools for this approach, the best known being Cucumber, in Ruby, which has ports for other languages. These tools are used to convert Gherkin documents into executable tests. But from this point on, we enter outside-in methodology.
So, what approach should we follow? And how do we learn TDD under the light of these approaches?
As mentioned at the beginning of the chapter, learning TDD through kata can be difficult to transfer to everyday practice in a real development problem. However, it’s a necessary learning before entering the outside-in approach, which is much more realistic in several respects.
Outside-in doesn’t exclude the classical approach, but rather puts it in context while providing us with a design focus driving by test to which you could roughly apply the same principles of TDD: start with a test, write minimal production code to make it pass, and refactor the solution if there’s an opportunity.
After all these are tools, and their point is to have them lying around nearby in order to use them when they come in handy. In real work, I’d say the important thing is to be able to mix styles conveniently. In a specific task we may start from a classic style, but after reaching a certain point we might introduce Mocks so as to not lose focus from a certain flow and be able to sort out the details later.
It’s harder to find kata in which to use an outside-in approach. In general they are longer and more complex, although it’s also possible to adapt some of the classic kata in order to practice it.
A TDD training plan could be structured as follows:
Introductory training with classic kata
Advanced training with kata in the form of agile-kata
In this outside-in development section, we will carry out a small project which consists in an API for a to-do list application.
Essentially, we want to implement the following functionalities:
US 1
As a User
I want to add tasks to a to-do list
So that, I can organize my task
US 2
As a User
I want to see the task in my to-do list
So that, I can know what I have to do next
US 3
As a User
I want to check a task when it is done
So that, I can see my progress
Test examples
Write a test that fails (done)
Write Production code that makes the test pass
Refactor if there is opportunity
Endpoints, payloads and responses
For simplicity, the expected to-do list will be an array of strings with formatted task data.
Design
In order to develop outside-in, it’s necessary to do prior design to a certain extent. Of course, it’s not a matter of generating all of the components’ specifications down to the last detail, but rather of setting a general idea about the architecture model that we’re going to follow, and the large components that we expect to develop.
This will help us place the various elements and understand their relationships and dependencies. It provides us with a context about how the application cycle works and how its components are organized and communicated.
Layers
Our application will be organized in layers:
Domain: contains the domain entities, the heart of the application itself, in which the business concepts, processes and rules are represented.
Application: the application’s different use cases, representing its consumer’s intentions.
Infrastructure: the necessary concrete implementations so that the application functions. In turn, this layer has various ports: * Entry points, as can be the API, which contains the controllers that interact with the consumers. In this case, the console commands and others would go here as well. * Persistence: the persistence technology adapters that we need to implement the repository. * Other adapters if needed.
+ Vendor or Lib, they contain the third-party resources that the application needs to function.
The dependencies always point towards the domain.
Application flow
When doing an HTTP request to an endpoint, a controller gathers the necessary data and passes them to an instance of the corresponding use case. It collects the response, if there’s any, and transforms it to deliver it to the consumer.
The use case instantiates or claims the necessary domain entities from the repository, and uses the domain services to perform its task.
The use cases can take the shape of commands or queries. In the first case, they cause an effect on the system. In the second one, they return a response. To accommodate the response to the controller’s demand, they may use some kind of data transformer, so that the domain objects never reach the controller, but a representation instead. By means of a Strategy pattern, we can make the controller decide in which concrete representation it’s interested.
Architecture
We will build the application using the hexagonal architecture1 approach with a three-layer structure: domain, application and infrastructure, just as we’ve detailed above. The development will begin with an acceptance test, which acts as a consumer of the API, which will lead us to implementing the controllers in the first place.
This a generic schematic of the architecture type that we’ll have in mind when developing this application.
Mockist outside-in
Outside-in TDD, also known as mockist or London school, is a TDD approach that seeks to implement software features starting from an acceptance test and advancing towards the interior of the software.
Instead of designing the system in the refactoring phase, as the classic approach would, the outside-in approach does it during the red phase, that is, when the acceptance test is still failing. Development will end when the acceptance test finally passes. As the need to implement new components arises, they’re developed in a classic style.
Thus, for example, in the development of an API, first an acceptance test against the API would be written, as if it were another of its consumers. The next step would be to design and test the controller, then the use case, and then the services and entities handled by that use case, until reaching the application domain. In all cases we would mock the dependencies, so that we’d be testing the messages between the application’s objects.
The methodology to do this is based on two cycles:
Acceptance test cycle. It’s a test that described the complete feature at the end to end level, using real implementation of the system’s components, except for those that define its limits. At this level, the test failures serve as a guide to know what we have to develop next.
Unit test cycle. Once we have a failure in the acceptance test that tells us what to develop, we’ll take a step towards the inside of the system and use unit test to develop the corresponding component, mocking those collaborators or dependencies that it may need. When we’re finished, we return to the acceptance test cycle in order to find our next objective.
Development
This time we’ll develop the kata in PHP, using this repository since it comes with PHP and Symfony already installed, which provides us with an HTTP framework with which to start developing.
https://github.com/franiglesias/tb
We already have a basic test in the repository that we’ll use as a starting point:
Designing the acceptance test
We need an acceptance test that describes how the application has to work. We have an example for that. These are the tasks that we’re going to put in our list:
Therefore, the steps that the test has to execute are to annotate the three tasks, mark the first one as done, and be able to show us the list. These operations are:
For the sake of simplicity, the response will be a representation of each task in one line of text with the above format.
Starting at the end: what will the expected result be?
To start designing our test we begin at the end, that is, from the call to recover the task list that represents the result that we expect to achieve at the end of the process. From there, we will reproduce the previous steps that would have been needed to reach that state.
To reach this point, we would have needed to make one petition to the API for each of the tasks, and one more to mark a task as completed. Thereby, the complete test would look like this:
If we execute it we’ll start seeing errors about framework configuration problems. The first thing we have to do is get the test to fail for the right reason, which is none other than, when asking for the task list, receiving a $list response that’s not the one that we expect. Therefore, we’ll start by addressing these problems until we get the test to run.
Solving the necessary details in the framework
The first error tells us that there ins’t any controller in the location expected by the framework. In our case, on top of that, we want to build a solution with a clean architecture. According to that, the API controllers should be in the Infrastructure layer, so we’ll change the configuration of Symfony’s services.yaml so that it expects to find the controllers in another path. Specifically, I prefer to put them in:
src/Infrastructure/EntryPoint/Api/Controller
Therefore, services.yaml will look like this:
If we run the test again, we will see that the error message has change, which indicates a good intervention on our part. Now it tells us that there aren’t any controllers in the newly defined location, so we’ll create a TodoListController in the path: \App\Infrastructure\EntryPoint\Api\Controller\TodoListController.
And for now, we leave it like this. We run the test to see what it says. We have two kinds of messages. On the one hand, several exceptions that indicate us that the endpoint routes can’t be found, which we haven’t defined yet.
On the other hand, the test tells us that the call to the endpoint returns null and, therefore, we don’t have the task list yet.
So we need out controller to be able to handle these routes before anything else. The first route that it cannot find is POST /api/todo, which we would use to add new tasks to the list. To solve this, we will introduce an entry in the routes.yaml file.
Once this route has been added, we run the acceptance test again. The appropriate thing is to run the test after each change †o confirm that if fails for the expected reason. In this case, we expect it to tell us that we don’t have an addTask method in TodoListController, and we have to add it in order to advance.
As you can see, in the method I throw an exception that will allow me to see when the real controller is being called. This way, I will be sure about whether it is what I have to implement next. I’ve gotten this technique from Sandro Mancuso in his Outside-in video, and I think it’s really useful. In some occasions, the compiler or interpreter could point this lack of implementation itself, but doing it explicitly will make things easier for us.
When re-running the test, the first error literally tells us that we have to implement addTask.
And this leads us to the unit test cycle.
First unit test
The first unit test introduces us a step further towards the interior of the application. The acceptance test executes the code from the outside of the application, while the controller is located in the Infrastructure layer. What we are going to do is develop the controller as a unit test, but instead of using the classic approach, which consists in implementing a solution and then using the refactoring stage to design the components, we’ll start by this latter point.
That is, what we want to do is to design which components we want the controller to use in order to return a response, mock them in the test, implementing only the controller’s own code.
In this example, I will assume that each controller invokes a use case in the application layer. So that it’s more easily understood, I won’t be using a command bus as I would in a real application, but I’ll invoke the use cases directly instead.
This is my first unit test:
On the one hand, in the test we simulate a request with a JSON payload, which will be the one that provides us with the necessary data. The AddTaskHandler mock simulates that we simply call its execute method, passing it -as a parameter- the description of the task provided in the endpoint call.
Thanks to the use of mocks we don’t need to worry about what’s happening further inside the application. What we are testing is the way in which the controller obtains the relevant data and passes them to the to the use case so it does whatever it must. If there isn’t any problem, the controller will return a 201 response, indicating that the resource has been created. We won’t deal with all of the possible errors that could occur in this example, but you can get an idea of how it could be handled.
Now we run the TodoListController test to ensure that it fails for the expected reasons: that AddTaskHandler is not called and that the HTTP 201 code is not returned.
In this case, the first error is that we don’t have an AddTaskHandler class which to mock, so we create it. We’ll put it in App\Application.
We run the test again, which will indicate us that there isn’t any execute method that can be mocked. We add it, but we let it throw an exception to tell us that it’s not implemented. We’ll see the usefulness of this in a while, because it’s not actually going to be executed in this test.
Instead, if everything has gone well, at this point the test will asks us to implement the controller’s addTask method, which is the step we were trying to reach.
This code makes the test pass. Given that it’s relatively simple, we won’t do it in very small steps in order to move with the explanation faster.
We’re going to take advantage of the fact that the test is green in order to refactor it a bit. We know that we’re going to have to add more tests in this TestCase and instantiate the controller several times, so we’re going to make our life easier for the near future. After making sure it’s still passing, the test looks like this:
It’s time to run the acceptance test again.
Back to the acceptance cycle
Now that the TodoListController test is passing, we no longer have any work to perform in this level, so we go back to the acceptance test to check whether anything is still failing, and what is it that fails.
At this point, what it tells us is that AddTaskHandler::execute is not implemented. Remember the exception we added earlier? Well, that tells us that we have to move one level deeper and get to the Application layer to develop the use case. Of course, with a unit test.
Like we said earlier, in outside-in we design during the red test phase and mock the components that the current unit can use as collaborators. We normally wouldn’t make entity doubles. In this case, what we expect of the use case is:
That it creates a new task, modeled as a domain entity Task.
That it makes it persist in a repository.
+ The task has to get an Id, which will be provided by the repository.
This indicates that the use case will have one dependency, the TaskRepository repository, and that we’ll start modeling the tasks with a Taskentity. This is the test.
We execute it, and it will tell us what to do.
The first thing will be to create TaskRepository so that we can mock it In this case, the repository is defined as interface in the domain layer, as we know already. So we start by doing that.
The next thing will be the Task entity, which is also in the domain.
For now I limit myself to creating to creating the basics, we’ll see what the development asks of us.
The next error indicates us that we don’t have a nextId method in TaskRepository, so we introduce it in the interface.
We’re also missing a store method. Same thing:
Last, when invoking the execute method it throws the well-known exception that it’s laking code, indicating that we’ve already prepared everything that we needed up until this point. So, let’s finally implement.
With this code the test passes. We don’t have anything else to do here, expect to see if there’s anything that we can refactor. In the test we see some details that can be improved to make everything easier to understand:
Let’s go back to the acceptance test and see what happens.
New visit to the acceptance test
When we run the acceptance test again it indicates us that, although we have an interface for TaskRepository, we haven’t defined any concrete implementation, so the test isn’t executed. It’s time to develop one.
Taking into account that we’re creating a REST API, we need that the tasks that we store persist between calls, so in principle, an in-memory repository won’t work for us. In our case we’ll use a vendor, which is located in the repository that we’re using as a base for this development. It’s the FileStorageEngine class. It simply saves the objects to a file, so that we simulate a real database whose persistence is sufficient to run the test.
So, let’s write unit tests to develop a task repository that uses FileStorageEngine.
Executing the test tells us that we don’t have a FileTaskRespository, so we start building it. When it fails, the test will tell us what we have to do. And this is the result:
Again, we have skipped some baby steps to reach the desired implementation. Once the test passes, we return to the acceptance test.
The test now tells us that we’re missing the implementation of the nextId method in FileTaskRepository. So we come back to the unit test.
In principle, what we’re going to do is simply return the number of saved tasks -plus one- as a new id. This won’t work properly in the event that we end up deleting tasks, but it will suffice for now. This is the test:
And this is the implementation:
It would be necessary to add a few more cases to verify it, but we leave it as is in order to move faster now.
Finishing the first user story
If we run the acceptance test now, we’ll see that the error that shows up says that we don’t have a route for the endpoint in which we mark a task as completed. This means that the first of our User Stories is finished: tasks can now be added to the list.
We’ve gone from the exterior of the application to the details of the implementation, and every step was already covered by tests. The truth is that we’ve been able to get a lot of work done, but there’s still a long way to go.
And the first step should sound familiar to us. We have to define the route to the endpoint, the controller, a new use case, and the interaction with the task repository. To routes.yaml we add the route:
We add a method to TodoListController:
When we add this code and execute the acceptance test, the error messages asks us to implement the new method. So we go to TodoListControllerTest and add the following test:
This test will fail because we haven’t defined MarkTaskCompletedHandler yet, so we will be running the test and solving the different errors until it fails for the right reasons and, after that, we’ll implement whatever it needs to pass.
Once we’ve added the basic code of the use case we can start implementing the controller, which will look like this:
And with this we make TodoListControllerTest pass. It’s time to run the acceptance test once again so that it tells us what we need to do now.
And basically, what it says is that we must implement MarkTaskCompletedHandler, which doesn’t have any code yet. For that purpose we will need a unit test.
The use case will depend on the repository to obtain and update the desired task. That will be what we mock.
As a somewhat striking detail, I should note that we’re going to mock an entity. This is necessary to be able to test that something that interests us happens: that we call its markCompleted method. This will force us to implement it. I would usually avoid mocking entities.
When we run the test it asks us for a retrieve method, which we don’t have in the repository yet.
As well as markCompleted in Task:
Finally, we have to implement the execute method of the use case, which will look like this:
And we’re done here for now.
We’ll run the acceptance test again. Let’s see what it tells us.
The first thing that it indicates us is that we don’t have the retrieve method in the FileTaskRepository repository. We have to implement it in order to continue. To do this, we’ll use the same FileTaskRepositoryTestCase that we had already started.
It will ask us to implement retrieve. This would be enough:
And it does suffice. Now that we’re in green, we can take the opportunity to fix the test up a little bit.
Once this has been done, we can run the acceptance test again and see how far we’ve come.
When we do this, the exception that we had left in Task::markCompleted is thrown. For now we’ll implement it without doing anything else. We’ll wait until other tests force us to do it, since we don’t actually have any way of verifying it without specifically creating a method to check its status in a test.
This allows the test to reach the next interesting point: we don’t have a route to recover the task list. In routes.yaml we add the definition:
We run the acceptance test to see that it’s no longer asking for the route, but rather for the implementation of a controller. And we add a skeleton to TodoListController.
So we have to go back to TodoListControllerTestCase to develop this method:
The test will fail since we need to implement GetTasksListHandler.
When we are able to run the whole test, we start implementing. This is our attempt:
The problem here is that we have to introduce a way of converting the list -as the GetTaskListHandler returns it- to the format that the endpoint consumer requires. It’s a representation of the task in the form of a text string.
There are several ways of solving this, and all of them require Task to give us some kind of usable representation:
The simplest one would be to perform the conversion in the controller itself, going through the list and generating their representations. In order to do that we would need a method that took care of it.
Another one would be to create a service that does the conversion. It would be a dependency of the controller.
And a third alternative would be to use that same service, but passing it to GetTaskListHandler as an strategy. This way, the controller decides how it wants to get the list, although it’s GetTaskListHandler the one that prepares it.
This last option will be the one that we use. But to do that we’ll need to modify tests. Not a lot, fortunately, only TodoListControllerTest actually needs changing.
And the controller will end up like this:
And the use case will be this one:
And, for now, our formatter implementation will be like this:
We’re back to green, and in this case, as we’ll see, it means that we’ve already finished with TodoListController. Let’s see what the acceptance test has to say.
The acceptance test asks us to implement the use case. So we’ll have to create a new unit test.
Running the test show us the necessity to implement a findAll method in the repository. Once this is solved, we will have to implement the execute method of the use case:
This simple implementation takes us to green, and we can re-run the acceptance test. We’re very close to the end! But we have to add the findAll method to the specific repository. First the test:
Test that is swiftly solved with:
And we run the acceptance test once again to see where to go next. This time the test tells us that we have to implement the TaskListFormatter::format method. We are really two steps away, but we have to create a unit test.
At this point we could propose different designs that avoid dealing with presentation issues in a domain entity, but for simplicity we’ll make Task able to provide its text representation by adding an asString method.
It’s worth wondering if it would be appropriate to use a double of Task here -something that we already did in another test- and wait until the acceptance test ask us to develop Task, or if it would be preferable to just use the entity as is, and that the test forced us to introduce the necessary methods.
In practice, having reached this point, I think that it all comes down to the complexity that this may entail. In this exercise, the behavior of Task is pretty trivial, so we could just move forward with the entity without further complications. But if the behavior is complex, it might be better to slow down, work with the mock, and spend the necessary time afterwards.
So here we’ll use mocks for that as well.
We run the test to see it fail because we don’t have the asString method in Task. So we introduce it. Notice that we haven’t implemented markCompleted yet.
When re-running the test it soon complains about the format method not being implemented, so we get down to it:
And we’re already in green. Time go back to the acceptance test loop.
Last steps
The acceptance test, as we might have expected, fails because Task::asString is not implemented. We had also left Task:markCompleted unimplemented doing nothing. It could be a good idea to let it complain again, and that way making sure that it’s being called and not forgetting to handle it as well.
And when running the acceptance test again, we see that it complains exactly about that, and that this is where we want to be now.
We have to move forward with the development of Task, using a unit test. As we don’t want to add methods, for now, we will verify the state of done through asString.
This test passes. So we have to go back to the acceptance test.
Now the test message has changed. It’s asking us to implement markCompleted in Task, but the test itself is failing because the responses don’t match. It expects this:
and it obtains this:
By now, the reason is obvious. There is nothing that’s implemented in Task that takes care of keeping the done state.
Let’s add one more case to the test:
Now we implement it:
With the test in green, we run the acceptance test again, and… Yes! The test passes without any more problems: we have finished the development of our application.
What have we learned in this kata
The mockist outside-in modality seems to contravene TDD rules. In spite of that, the whole process has been guided by what tests indicate.
The acceptance test will fail as long as everything that’s necessary to run the application has not been implemented.
We always move between the acceptance test loop, and each of the acceptance test that we’ll have to use to develop the components.
Once the acceptance test passes the feature is complete, at least in the terms in which we have defined the test.
In unit tests we use mocks to define the public interface of each component according to the needs of its consumers, which helps us to maintain the principle of interface segregation.
Classic outside-in TDD
It’s possible to follow an outside-in methodology while we keep the classic TDD cycle. As you might already know, in this approach the design is applied during the refactoring phase, so once that we’ve developed a rough version of the desired functionality, we start identifying responsibilities and extracting them to different objects with which to compose the system.
In the classic style kata that we’ve presented in the second part of the book we haven’t reached this stage of extraction to collaborators, although we have suggested it several times, and it would be a perfectly feasible thing to do. In fact, it’s a recommended exercise.
However, when we talk about outside-in, it’s frequent that we rather think about more complex projects than the simple problems proposed in the kata. That is to say, the development of a real-world software product as seen by its consumers.
Our to-do list application backend example would fit this category. In the previous chapter we’ve developed the project using the mockist approach, whose main feature is that we start from an acceptance test and we then enter each application component, which which develop with the help of a unit test, mocking the innermost components that we’ve not yet developed.
In classic TDD, it’s usual to make an up-front design to get a rough idea about the necessary components, each of which is then developed and integrated later.
But classic outside-in is a little bit different. We would also start with a test at the acceptance level and with the goal of writing the login that makes it pass. In the refactoring phases, we would start extracting objects capable of handling the various identified responsibilities.
For this example we will write a new version of our to-do list application, this time in Ruby. The HTTP framework will be Sinatra, and the testing framework RSpec.
Posing the problem
Our starting point will also be an acceptance test as consumers of the API. In a way, we could consider the system as one big object with which we communicate via requests to its endpoints.
It being classic TDD, we won’t be using mocks unless we need to define an architecture boundary. Obviously, in order to define these kinds of things, we need to have some minimum amount of up-front design, so we expect that at some point we’ll have use cases, domain entities, and repositories.
In our example, the architecture boundary will be the repository. As we won’t define the specific persistence technology yet, we will mock it when the time comes. Then we’ll see how we can develop an implementation.
Kicking off development
My first test proposal is the following:
This test tries to instantiate a TodoListApp object, which is the class in which we will define the sinatra application that will respond in the first instance. It requires installing rspec if we don’t already have it, and it will fail with this error:
Which tells us that the class isn’t defined anywhere. To make it pass, I will introduce the class in the same file as the test, and when I manage to turn it green, I’ll move it to its proper location.
This is enough to pass the test, so I will make the most obvious refactoring, which is to move TodoListApp to a more adequate place in the project.
The refactoring phase is the stage in which we make design decisions within the classic approach. The controllers belong to the infrastructure layer, so it will be there where I place this class. With that, the test looks like this:
And we verify that it still passes.
For the next point I need to take a bit of a longer leap and prepare the client that will execute the requests against the endpoints. Using rack-test, I can create an API client. Since I’m green, I will introduce it and start it. We’ll have to install rack-test first.
This refactoring doesn’t change the test result, so we’re doing pretty fine.
Now we’re going to make sure that we can make a POST /api/todo call, and that someone answers.
Now the test fails because the application is not able to route the call to any method. It’s time to work on the implementation in TodoListApp until we manage to make the test pass. This will require introducing and installing sinatra.
The truth is that this is enough to pass the test, since we don’t have any expectation about the answer. We need a bit more resolution to force us to implement an action associated to the endpoint, so we change the test to be more precise and explicit.
And this test, which is already a real test, shows us that the desired route isn’t found:
With which we can already implement an action that responds.
Now we’ve made the test pass, returning a fixed response, and we now have the assurance that our application is answering to the endpoint. It would be time to introduce the call with its payload, which will be the description of the new task.
The test doesn’t add any new information. If we want to move forward with the development we’ll have to introduce another test that questions the current implementation, forcing us to make a change in the direction of achieving whatever the test is expected to do.
This endpoint is use to create tasks and save them to the list, which means that an effect (side effect) is produced in the system. It’s a command and doesn’t offer any response. In order to test it, we have to check the effect by verifying that there’s a created task somewhere.
One possibility is to assume that the task will persist in a TaskRespository, which would be a TodoListApp collaborator. Repositories are objects in the architecture boundaries and they are based on a specific technology. This assumes a certain level of prior design, but I think that it’s an acceptable compromise within the classic approach.
This implies modifying the way in which TodoListApp is instantiated so we can pass collaborators to it. Therefore, before anything else, we’re going to refactor the test so that the creation of new examples is easier and the test becomes more expressive.
It would end up looking like this:
After this redesign the test keeps passing. Now, we have to introduce a double of the repository. The minimum that we need to force ourselves to create something is:
With which we’d have to introduce the definition of the class. By now, we’ll do it in the same file.
And we pass it to TodoListApp as a construction parameter.
In principle, these changes don’t affect the test result. So, let’s move TaskRepository to where it belongs, the domain layer.
Then, we need to define the effect that we expect to obtain, which we do by setting an expectation about the message that we’re going to send to the task repository.
The test initially fails because we have introduced Task, so we add it now to its place in the domain layer: we’ll need it soon. By doing so, we get the test to fail for the right reason.
By adding this code to TodoListApp we get the test to pass.
Now we need a new test to ask us to implement the instantiation of a Task with the desired values. That is, we want Task to be initialized with the id 1 and our specified description. In order for the test to work, we have to implement a initialization in Task, which we don’t have yet, and some way to compare Task objects.
On the other hand, we have to implement a way of initializing Task. This creation may be covered by the acceptance test itself. Another way to do it would be to develop Task with a unit test, but to be honest I don’t think it’s necessary at the time.
When we insert this in the test:
It will start to fail, so we have to implement the initialization.
Now the test fails because we weren’t initializing Task properly in TodoListApp, as we weren’t passing it any arguments. With this small change, the test starts passing.
We could say that we’re using constants here in order to satisfy the test, so we have to evolve the code and obtain a more flexible implementation. I’ll start with a small refactor that reveals what we have to achieve next.
It’s that simple, we have to obtain values for the variables that we’ve just introduced. But right now we aren’t checking. It’s time to introduce a matcher.
To use it, we’ll change the test:
At this moment the test won’t pass because Task doesn’t expose any methods that allow us to access its attributes, so we’ll add attr_reader:
And with this, the test passes.
task_description comes in the requestpayload. Since it’s already defined in the test, we could simply use it right now.
As for the task id, we’ll need an identity generator. In our design, we have placed this responsibility on TaskRepository, which would have a next_id method. In this case, we’ll have to specify it in the test by using a stub.
With the production code just as it is now, the test passes, so it doesn’t tell us what we would have to do next. So, I’m going to cheat a little and force a test failure:
Now, introducing the call to next_id finally makes sense:
Extraction of the use case
Now the test is passing and we could say that the endpoint implementation is complete. However, we face many problems:
TaskRepository is a mock. We know which interface it should have, but we don’t have any concrete implementation that can work in production.
There’s a lot of business logic in the controller that shouldn’t be there.
In fact, we have domain objects in the controller: Task and TaskRepository.
Summarizing, right now, the controller is doing more things that it should. On top of its job as a controller, which is handling the requests that arrive from the outside, it’s performing tasks that belong to the application layer, coordinating domain objects.
Therefore, we would have to extract this part of the implementation to a new object, which will be the use case AddTaskHandler.
The first thing that I do is extract the functionality to a private method.
I will create an AddTaskHandler class in the application layer that encapsulates the same functionality:
And I replace the method implementation for a call:
I make a method inline:
And I refactor the solution a bit, moving the initialization to the constructor and removing some temporal variables:
The next step is to inject the AddTaskHandler dependency in place of the repository one. To do that, I first change the test:
This will cause the test to fail because the production code is still expecting the repository as a dependency, so we change it in the following way:
And we may now consider this part solved.
Implementing a repository
To kick off the design we have started with a mockTaskRepository. We have introduced an empty class to be able to double it, but this real version can’t even receive messages. This is a liberty I’ve taken in order to avoid having to start developing from the inside, creating domain layer components -like this repository- before knowing how they were going to be used.
The repository is one of those objects that live in the architecture boundary, so to speak, so using a double is acceptable enough. However, now we’re going to try to implement a version that can be used for testing.
This poses a little problem if we consider TaskRepository to be a domain object, so we don’t want to have concrete implementations of this layer. A simple way to do it is by using composition: we would have a TaskRepository class in domain that would simply delegate to the concrete implementation that we injected. This is the approach that we’re going to adopt in this case, implementing the versions of the repository that may be needed, starting from a unit test extracting the implementations from a generic one.
This time, we start by the repositories’ ability to attend a next_id message, which should be one when the repository is empty.
This method doesn’t exist yet and the test will fail. We implement a first version.
With a green test, we’re going to perform a refactoring. next_id should provide us with a number: the result of adding one to the amount of stored tasks. So we’re going to represent this using code first.
It would be nice to be able to add elements and check if things are really working, so we’re going to allow the repository to be initialized with some contents.
With this, we can test that if we initialize the repository with some element, it returns the correct identifier. For example, like this:
This should be enough for us to trust next_id. You might be thinking that generating identities using this algorithm isn’t precisely robust, but it’s sufficient and satisfies our example for now. In any case, we could implement any other strategy.
Now we could use next_id as an indirect way of knowing if we’ve added tasks to the repository, so we can already test the store method.
For now, the test fails because we don’t have a method that handles the store message, so we add it and implement the simplest solution:
Which, moreover, is enough to get the test to pass. The last test overlaps the previous next_id test, so we’re going to remove it.
And we can also remove the initialization, since we don’t really need it.
We could make sure that we’re able to introduce more tasks:
Since we want to separate the specific persistence technology, I will use these tests to extract an in-memory repository. It ends up looking like this:
Now we can inject it, to do it we modify the test first:
And now that we only have one place to initialize the repository…
The test will fail, but now we only need to make this change:
With which we have a TaskRepository that we will be able to configure so it uses different persistence technologies, and that we could start using in our acceptance test.
A possible change is this one, although we’ll continue to evolve it later:
Obtaining the to-do list
Once we’re able to add tasks, it would be interesting to also be able to access them. Our next acceptance test would describe this action, introducing one or more tasks and obtaining a list with all that we have.
We run this test and see that it fails, since there isn’t any controller handling this route.
This time the error is that it doesn’t return anything. We can easily fix it with this constant implementation:
Of course, it would be best to recover the tasks from the repository and generate the response from there. To do it we’re going to change the test a bit, introducing an extra task and expecting a longer list as a result.
The test will fail as the generated and expected lists don’t match. To get it to pass we would need to inject the repository again, so that we can recover the saved tasks.
For now we can do it in the test, but first we would have to cancel this second test to go back to green, and then make the changes that we needed. This is the test that would remain:
The production code:
Now we run into a couple of problems:
We don’t have a method in the repository to obtain the tasks
We have to handle the transformation of Taskinto its representation
Personally, I like I’m interested in tackling the latter first. Set to return a hard-coded answer, I can start with the transformation from the Task object, and then I’ll resume the development of TaskRepository.
In fact, this makes sense as a refactoring in the current situation, while the test is still green. So we get to it:
This solution is very simple in Ruby and lets us pass the test.
For the next step we’ll need to implement the find_all method in the repository, so we have to change focus and move to its test. For now, we start with a simple test:
To make it pass we need:
And as it’s not implemented in memort_storage, we add it to it:
This passes the test. We could add some tests here to verify that the stored tasks are, in fact, the ones that we have saved. After tinkering a bit:
With which we’d have everything we need in the repository. Therefore, we can introduce its use in the production code after recovering the test:
Similarly to how we did in the previous story, now would be the moment to extract the business logic that the controller contains to a use case. We have to remember that the rule is to let the controller be the one who decides which representation it needs.
We’ll follow the same procedure as before, extracting a private method with the functionality that we’re going to move to the use case. Here we’ve taken quite a long leap of code, implementing the transformation strategy by using a block.
It’s now when we create the use case:
And we use it within the code.
With these changes the test passes. Executing the use case doesn’t have any effect on the test, so we’re going to move the code with the following steps:
First, we copy the private method get_tasks_list in the execute of the use case:
We run the test to make sure that this change doesn’t carry any undesired effects. Now we remove the call to the private method and we try again:
With this we make sure that it’s the use case the one that is executing the action, and therefore, is causing the test to keep passing.
Now it’s only a matter of deleting the private method.
And that’s it. The second user story is implemented. We still have a bit of refactoring left. We’re going to inject the use case that we’ve just created. Also, we’re still going to leave the TaskRepository dependency, as it’s foreseeable that we’ll need it again.
And we apply this in the test:
Ruby is pretty concise, but even so, I’m going to do some refactoring in the acceptance test extracting the API calls to methods:
Mark a task as completed
The last piece of functionality that we’re going to implement is to mark a task as completed. We have to perform the steps that we’ve been following until now:
Add and example to the acceptance test
Implement the functionality in the controller
Extract it to a use case
If we need to develop anything new in an object, like it happened with TaskRepository, we do it with a green acceptance test, so that we can later use it in the code without problems.
So let’s go. Let’s start with the acceptance test, which thanks to the previous refactorings should be easy to write. Here it is:
The main point of interest in this test is that we’re going to check that it has worked by recovering the list and seeing if the task is already being represented as marked. In many respects, we could consider that this test would be sufficient to validate all of the list functionality, since in order to reach the final result, all of the other actions -that we’ve developed with other tests- work.
So we’re going to start adding production production code until we get the test to pass. Of course, the first problem is that there’s neither a route nor an associated controller.
With this first step we solve this problem, and the test failure now has to do with the content of the response.
This is the error:
This means that the completed task appears unmarked, which is exactly where we want to be.
A way of solving it is using this code:
And this code makes our current test pass. However, it makes the previous test fail -the one that retrieves all of the tasks- as in that test we assume that none of them are completed.
Of course, what we need is that a task can say that it’s completed. We need to add some behavior to Task, but also keep the previous acceptance tests passing. Therefore, we’re going to temporarily remove this test, revert this last change, and work to add the capacity of being marked as completed to Task.
For now, it’s enough to remove the last assertion, which is the one that controls the behavior change in Task:
And I also have to neutralize the change in the production code, temporarily:
Let’s see, then, how to mark completed tasks:
This suffices to introduce the property, initialize it as false, and expose a method to access it.
On the other hand, we need to be able to mark the task as completed:
Which is pretty easy to achieve:
For this part, we got everything we need.
Now, we’re going to perform a refactoring in order to use some of these capabilities. With this refactoring we keep the current behavior and get ready to handle the important change:
So we recover the test:
Which fails for the desired reason. The fact that we’re interested in things failing for good reasons never stops being kind of funny:
Now is when we implement a tentative solution:
And this passes the test. Obviously we need to recover the task first so we can update it, but it’s something that we don’t have in our TaskRepository yet. But since all our test are passing, we can add the functionality.
We implement it like this:
Together with:
Now we can use it in our implementation, replacing the direct assignation of task that we had until now.
And we’re almost done! The acceptance test is still passing. The only thing that remains is to introduce the use case, for which we follow the refactoring process that we already know well. First we extract the functionality to a private method.
We introduce the new class, which simply uses the same code that’s already tested.
And now, we introduce its use. Since this action is idempotent, we can do this in such a way that we make sure that it works before deleting the code that we’ve just moved.
And the continues to pass as expected. So we can delete the previously extracted method. Later we’ll have to change the construction so as to inject the use case. But let’s do it bit by bit:
We’re going to direct the change in the construction from the test, starting the application with the services that it really needs.
The tests will fail disastrously, but the change is easy to apply. This is how the application will look like:
What have we learned in this kata
It’s perfectly possible to apply an outside-in approach with TDD’s classic methodology.
The classic outside-in methodology requires all tests to be green to introduce the design, because we do it in the refactoring phase.
At some moments we might need test doubles, although we’ll prefer to use fake or test-specific implementations (such as in-memory repositories), or, where appropriate, stubs before mocks.
TDD in real life
In this part, we’ll deal with how it’s possible to incorporate TDD in all of the development processes in real projects.
We’ll work in a project to create the backend of a simple to-do list application. The same that we’ve used for the outside-in example. But this time we’ll have a slightly different starting point, with the project organization based on user stories.
The second chapter of this part will show us how to work when we have to fix a bug, from the way of reproducing it, to the steps that we’ll have to follow in order to solve it.
The third chapter is about the implementation of new user stories in the system.
Task list, outside-in TDD sliced in user stories
In this version of the same exercise about creating an application using TDD, we’ll work with the project organized in user stories. That is: we have divided the project into features that provide value. Our goal is to show a work methodology that we could put into practice inside real projects.
This project will be done in PHP, with PHPUnit and some components from Symfony framework. The solution is a bit different from that in the previous chapter because this time we are going to limit the scope of our job to the user story, and that imposes some constraints that we didn’t have before.
Adding task to a list
Let’s review the definition:
US 1
As a User
I want to add tasks to a to-do list
So that, I can organize my tasks
To complete this user story we will need, apart from an endpoint to which we can call and a controller that manages it, a use case to add tasks to the list, and a repository to store them. Our use case will be a command, so the effect of the action will be a call to the repository storing every new task.
To be able to verify this with a test we don’t want to write code that won’t be needed in production. For example, we are not going to write methods (yet) to retrieve information from the repository. Strictly speaking, at this moment we don’t even know if we are going to need them (spoiler: yes, but that will be coding for a future that we don’t know). So, at first, we will use a mock of repository and check that the right calls are made.
Once we have this clarified, we write a test that will send a POST request to the endpoint to create a new task and will verify that at some time, we are calling to a task repository, trusting that the real implementation will manage it correctly when available.
It is a good idea to start the test from the end, stating what we expect, and build the rest with the needed actions. In this case, we expect the existence of a TaskRepository that will be an interface. Also, we will introduce the concept of Task.
We will have to run the test and implement all the things that it will be asking until we get it failing because the right reason.
The first error message is that we haven’t defined TaskRepository, so we start from there:
This error happens in PHP and PHPUnit. In other languages, you could find a different one.
For the moment, my solution is to create it in the same file as the test. If the error message changes, then I will move it to its file.
Now, the test fails because of a different reason, so we have passed this obstacle. We use the Move Class refactoring to move TaskRepository to App\TodoList\Domain\TaskRepository and then we run again the tests, getting the following error:
That is telling us that we have not defined the class Task. At the moment, we will create Taskin the same file, re-running the test to see if the error message changes.
The error is saying that there is not a method store in TaskRepository, so we can’t mock it. We have to introduce it, but we will move Task to its place in App\TodoList\Domain before. As you can see, we are organizing code using a layered architecture.
After moving Task, we add the store method to TaskRepository:
The next error is a bit weirder:
It has to do with the Symfony framework configuration that we are using for this exercise. This message is telling us that there are no files that contain controllers in the given path and namespace. I don’t want them there, instead, I want to put them into App\TodoList\Infrastructure\EntryPoint\Api. This is because I want to keep a clean architecture, with components organized in layers. Controllers and other entry points to the application are in the infrastructure layer, inside a category EntryPoint that, in this case, has a port, related to the communication using the API.
To achieve this, we only have to go to the file config/services.yaml and change what is required:
When we run the test, we get a similar error:
This is positive because it reflects that we have made the change in services.yaml right, but we haven’t added a controller in the desired location so that it can be loaded and avoid the error. So we add a TodoListController.php file to the folder with this code:
Running this test throws to new error message. This is the first:
And that’s a framework problem because the HTTP client is calling to an endpoint that is not defined anywhere yet. We solve this by configuring it in routes.yaml.
As we always do after a change, we run the test, that now will shout that there is not a method in the controller in charge of managing the response to this end point.
We can implement it this way:
It is a single line that throws an exception to mark that the method is not implemented. We do it this way so the test itself stated that we have something not implemented. An empty method body would not tell us anything and, in many situations, it would be easy to lose track of the thing we have left pending to write.
If we run the test, it throws exactly that error:
But also this one, from the test itself.
This is the error which we will be expecting from the test as we wrote it. There are no framework configuration errors. I tell us that a Task is never stored in the repository. In other words: no production code executes the desired behavior.
Those two errors tell as that is the time to implement.
And to do that, we need to go one step into the application. In our example, this is TodoListController. At this point, we leave the acceptance test loop and we enter in a cycle of unitary tests to develop TodoListController::addTask.
Designing in red
The acceptance test is not passing and it is asking us to implement something in TodoListController. To do so, we are going to think about how we want the controller to be and it will delegate the job to other objects.
In particular, we want the controller to be a very tiny layer in charge of:
Getting the necessary information from the request
Pass it to a use case to do what is needed
Get the response from the use case and send it back to the endpoint
In a classic approach, we would implement the complete solution in the controller and, then, we would be moving logic to the required components.
Instead of that, in the mockist approach, we design how this implementation level would look like and we use doubles for collaborations we could need. For example, this is our test:
In this test, two things are verified. On the one hand, that we return a response with a status code of 201 (created). On the other, that we will have a use case named AddTaskHandler in charge to process the creation of the task provided its descriptions, that it’s received as a payload in the request.
When we run the test, we start getting the expected error. The first on is that we don’t have any AddTaskHandler. Again, I’ll start by adding it to the test fail, and I’ll move it in the next step. In fact, it’s literally what the error says:
So, we add:
Running the test now will ask us to implement the method execute that is not defined. Before doing so, we are going to move AddTaskHandler, that is the use case, to its place in the application layer: App\TodoList\Application. Next, we add the method including our “not implemented” exception.
This way, what will happen is as follows: once we’ve implemented the controller, we’ll see that its unitary test passes, because we are using the AddTaskHandler double and we are not calling to the real code. This will happen when running the acceptance test, which will be pointing us to implement AddTaskHandler and go one lever deeper in the application.
The next error is well known:
This indicates that the test is calling the addTask method, not implemented yet. It is just where we want to be. We will implement logic in TodoListController::addTask to make the test pass:
The test passes!
We could go slower here to drive the implementation with smaller steps, but I think that it’s better to do it in only one because the logic is not too complex, and this way we’re not too dispersed. The important thing, however, is that we have achieved the goal of developing this controller with a unitary test that is passing right now.
Given that the unitary test is passing, we don’t have any more work to do at this level. Nevertheless, I’m going to make a little refactoring to hide the details of getting the payload from the request, leaving the body of the controller a bit cleaner and easy to follow.
Returning to the acceptance test
Once we’ve made the unit test pass, we have to return to the acceptance level so that it tells us how to proceed. We execute it and we get the following:
Now it’s time to go a little bit deeper in the application and move on to the AddTaskHandler use case. What we expect from this UseCase is that it uses the information it receives to create a task, and stores it in TaskRepository.
To create a task, we’ll need to give it an ID, which we’re going to ask from the repository itself. The repository will have an appropriate method.
We can express this with the following unit test.
We run the test. We get this error first:
We add the method to the interface:
Which generates this error:
And we’re ready to implement the use case. This code should suffice:
This code is enough to pass the test, so we can go back to the acceptance level.
New cycle
When rerunning the acceptance test we find out that it passes. However, the use story isn’t implemented yet, as we don’t have a concrete repository in which Task objects are being saved. In fact, our Task classes don’t have any code yet.
The reason is that we’re using a TaskRepositorymock in the acceptance test. We’d like to stop using it so that TodoList uses a concrete implementation. The problem that we’d have if we did so, would be that we wouldn’t have any method to explore the content of the repository and verify the test. We’re going to do this in two stages.
In the first one, we simple remove the usage of the mock and verify that the API response returns the code 201 (created).
Before continuing, we have to erase the service definition that we made earlier in services_test.yaml. As it’s the only service that we had declared there, we can just delete the file without any problem.
And when we execute the test, the following framework error appears:
This happens because we just have an interface of TaskRepository, and we would need a concrete implementation that we could use. This way, we have an error that lets us move on with the development. We’ll need a test to implement FileTaskRepository, a repository based on a simple text file to store the serialize objects:
In the first place we’re going to create a default implementation for FileTaskRepository in the right place, which will be App\TodoList\Infrastructure\Persistence:
When we execute the acceptance test again two errors happen. One tells us that we have to implement the repositories’ nextIdentity method. The other, which is an error of the test itself, informs us that the endpoint returns the 500 code instead of 201. This is reasonable, as out current FileTaskRepository implementation will fail fatally.
But it’s good news, because it tells us how to proceed. So, we’ll create a new unit test to guide the development of FileTaskRepository. In this test, we simulate a different number of objects in storage to ensure the correct implementation.
With this test passing, we return to the acceptance test, which fails again. The endpoint returns an error 500 because we don’t have an implementation of the store method in FileTaskRepository.
We’ll introduce a new test, although we’ve refactored it a bit beforehand in order to make introducing changes easier:
This is our implementation to pass the test:
We have to implement the Task::id method, which makes us also introduce a constructor:
The implementation passes the test. To not lose track I won’t introduce any more examples, which would be appropriate to grow more confidence about the behavior of the test. But, for now, it’s enough to understand the process.
As we’re in green, we go back to the acceptance test to check how far we’ve come. And when we run it, the acceptance test passes, indicating that the feature is complete. Or almost, as at the moment we don’t have any way to know if the tasks have been stored or not.
One possibility is to obtain the contents of FileStorageEngine and see if our tasks are there. It doesn’t force us to implement anything in the production code:
The test verifies that we’ve stored a task in the repository, confirming that the first user story is implemented. It may be a good time to examine what we’ve done so far and see if we can do any refactoring that may facilitate the next development steps.
Let’s start with the acceptance test:
TodoListControllerTest:
There are other small changes in some files, but we’re not going to go into detail here.
See the tasks on the list
US 2
As a User
I want to see the task in my to-do list
So that, I can know what I have to do next
Our second user story requires its own endpoint, controller, and use case. We already have a task repository, to which we’ll have to add a method to retrieve the full list.
Since we have a real implementation of the repository, we no longer have to use a mock, as we had done earlier to be able to kickstart the development. In a situation in which we were using database persistence, or something similar, we’d probably need a fake implementation, such an in-memory repository or even the simple file repository that we’re using, which we need because of the persistence problem between PHP requests.
This is the first version of the acceptance test for this user story:
So we run it and, as before, we take note of the error that it throws and fix them until the test fails for the proper reasons. In this case we can see two related errors.
The first one is that there isn’t an appropriate route for the endpoint.
Which, of course, causes the error in the test when verifying the state code:
We configure the route in routes.yaml:
We run the test. The error is different, which indicates that we’ve correctly made the change, but now it tells us that we’re missing the specific controller:
So we add our initial empty implementation:
When we run the test again, an exception is thrown indicating that we need to implement something. It’s time to go back to the TodoListController test. It’s important to learn to identify when to move between the acceptance test cycle and the unit tests cycle.
The new test helps us introduce the new use case GetTaskListHandler, but it also poses an interesting problem: what should GetTaskListHandler, Task objects, or a representation of them?
In this case, the most adequate option would be to use some kind of DataTransformer and apply a Strategy pattern so that TodoListController tells the use case which DataTransformer it wants to use. This transformer can be passed to the controller as a dependency, and it will send it to the use case as a parameter.
As you can see, now we’re literally designing. So we’re going to see how the test ends up.
At this point, we only need to TaskListTransformer so that the controller can pass it to the use case. If we run the test, it’ll fail because we haven’t defined the GetTaskListHandler class yet. We introduce an initial implementation.
Running the test again we see that now it’s asking for TaskListTransformer. First we move GetTaskListHandler to its location in App\TodoList\Application. Then we create TaskListTransformer.
We check the result of the test again, which now tells us that we’re missing an execute method in GetTaskListHandler. Same as before, we first move the TaskListTransformer to its place.
In principle, I would put it in App\TodoList\Infrastructure\EntryPoint\Api, as the purpose of the transformer is to prepare a specific response for the API. But this would be correct for the concrete implementation that we end up using. If we do it this way we’d have a badly oriented dependency, as it would be pointing from Application towards Infrastructure. To invert it, we’ll have to put TaskListTransformer in the application layer as an interface. It’s place would be: App\TodoList\Application\TaskListTransformer.
Once relocated, we add the execute method to GetTaskListHandler.
Having added this, when we run the test we see that it fails because we’ve managed to trigger the exception that asks us to implement getTaskList in the controller:
And we can implement whatever the test needs to pass:
We may observe that the controller has many dependencies. This can be solved with a command bus or by dividing the class in several smaller ones, but we’re not going to do it in this exercise to avoid losing focus.
In any case, the test passes, which indicates that it’s time to move back to the acceptance test cycle.
It will keep failing, as we might have expected:
An error that tells us that the next step is to use a unit test to develop the GetTaskListHandler use case.
When we run this test, it asks us to add the findAll method to the repository.
We do this both in the interface and the concrete implementation:
And the same for the transform method in TaskListTransformer:
Which will end up like this, once it’s been redefined as an interface:
With these changes, the test will now fail to tell us that we need to implement the execute method of the use case, which is just where we wanted to be:
And here’s the implementation that makes the test pass.
Now that we’ve gone back to green, we’ll return to the acceptance cycle. When we run the test, the result is a new error message, which asks us to implement findAll in FileTaskRepository.
This requires a unit test.
When we run it, it will ask:
So we get to it:
Now the unit test passes, with which we’ve implemented a good part of the repository. Will it be enough to make the acceptance test pass?
No, we still have work to do. At this moment, we’re asked to introduce a concrete implementation of TaskListTransformer.
Now we have to introduce a new unit test to develop the concrete Transformer, which we’ll locate in App\TodoList\Infrastructure\EntryPoint\Api, as it’s the controller the one that’s interested in using it. We’ll call it StringTaskListTransformer, as it converts a Task into a string representation.
This is going to pose a small design challenge. We don’t have any way to access the properties of Task yet, an entity that we also hadn’t had to develop further until now, and the truth is that we shouldn’t condition its implementation to this kind of necessity. In a more realistic and sophisticated system we could apply a Visitor pattern or something similar. In this case, what we’ll do is we’ll pass a template to Task, and Task will return it all filled out with its data.
Since Task is an entity I prefer not to mock it, so the test will end up this way:
And the production code could be this one:
The test will throw an error to tell us that the representedAs method is not implemented in Task, so we can add it.
Saving the distance, we can use the current test as an acceptance test. If we execute it, we’ll see that this exception is thrown:
Which would indicate the necessity to advance to the next level and create a unit test to develop Task, or at least its representedAs method. Another option would be to develop Task under the coverage of the current test, but this is not a very good idea, as the test could require examples that don’t actually provide anything useful to the matter at hand and are only relevant to Task.
For the time-being, this implementation would already suffice.
So we could go up one level and go back to the previous Transformer test, which passes with no problem.
With this test in green, we return to the acceptance level, which also passes, indicating that we’ve finished the development of this user story.
Checking completed tasks
US-3
As a User
I want to check a task when it is done
So that, I can see my progress
The third user story is easily built from the two previous ones, as our application already allows to introduce tasks and see the list. For this reason, before beginning the development, let’s refactor the acceptance test so it’s easier to extend. In fact, we can even reuse some of its parts. This is the results, already with the new acceptance test added.
When we run the test, it fails -as expected- because it doesn’t find the route to the endpoint:
And, as we’ve done before, we’ll have to define it and create a controller that handles it. First, the route definition in routes.yaml.
A new execution of the test indicates that there’s a controller missing:
And we add an empty one:
The error now is:
And the test fails because it expects that endpoint to be properly running and responding, but it’s not implemented yet. Therefore, we move to the unit level to define the functionality of the controller.
As in the previous cases, implementing the functionality requires apart from the controller, a use case that utilizes the repository to recover and store back the task we wish to mark. Therefore, the key of the test will be to expect the use case to be executed with the right parameters.
So, the test will end up more or less like this:
Once we have the test, we run it. The result is that it asks us to create the MarkTaskCompletedHandler class.
We create it in the test itself, and then we move it to its location in App\TodoList\Application Next, it will ask us to create the execute method.
Which we’ll prepare this way:
With this we’ll already have all that we need to implement the controller’s action, something we do because the following error says so:
This is the code that will pass the controller test.
Once the controller test passes, we’ll have to rerun the acceptance test. It will reveal the next step.
It’s asking us to implement the use case. Therefore, we need a new unit test:
The execution of the test throws the following error:
Up until now we hadn’t needed this method in the repository, so we’ll have to add it to the interface.
This will be enough to be able to continue executing the test, and reach the point where it asks us to implement the execute method of the use case.
So we get to it. It’s pretty simple:
When we execute the test again, it’ll fail. This is because we haven’t defined the Task::markCompleted method:
Every time we get an error of this kind, we’ll have to dig deeper and enter a new unit test cycle. In this case, to implement this method in Task. We don’t have direct access to the complete property, which we haven’t even defined yet, but we can indirectly control its state thanks to its representation.
The implementation is quite simple:
With this, the Task test passes, and we can return to the use case level. When running the test again, we see that it also passes, so we can also go back to the acceptance test.
This test, however, will not pass, as it expects that we implement the retrieve method in FileTaskRepository, which we don’t have yet. We go to the test.
As it we could have expected, the test will demand us to write the retrieve method.
And with this, the FileTaskRepository test turns green. We take the opportunity to do a small refactoring, so that the dependency is controlled:
And we launch the acceptance test one more time, which now passes cleanly.
Next steps
At this point, all three user stories have been implemented. What do we want to do now?
One of the improvements that we can do at the moment is fix the acceptance test so it can be used as a QA test. Now that we’ve developed all of the involved components, it’s possible to make the test more expressive and useful to describe the implemented behavior.
The unit tests may be used as they are. A common objection is that, as they’re based on mocks, they’re fragile due to their coupling to the implementation. However, we must remember that we’ve basically been designing each of the components that we needed, as well as the way in which we wanted them to interact. In other words: it’s not foreseeable that this implementation is going to change so much that it invalidates the tests. On the other hand, the unit tests that we’ve been using characterize the specific behavior of each unit. Altogether they’re fast and they provide us the necessary resolution to help us quickly diagnose any problem that might arise.
So, we’re going to retouch the acceptance test so that it has better business language:
Basically, we’ve rewritten the test using a Behavior Driven Development style. We didn’t need to make a Gherkin here, but we could have.
This has allowed us to get rid of the direct call to the storage engine that we had introduced at the beginning, and by doing it, we make the test more portable. Now it only uses endpoint calls, so it can work in different environments such as, for example, a local and a continuous integration one.
Fixing bugs with TDD
In our to-do list project, we’ve developed what we could call the application’s happy path. That is, we’ve assumed that, when trying to create a task, the consumer of the API wouldn’t make mistakes such as trying to create a task without a name. Another assumption is that, when marking a task as completed, the system will only use ids from existing tasks on the list.
This means that the application might fail if any of these assumptions isn’t met. Is this a bug? In a sense, yes, but we could also argue that they’re just features that haven’t been implemented yet.
When we develop using TDD we can prevent many defects due to faulty implementations. For example, imagine that we haven’t implemented a Task::markComplete method, and we don’t have any test that tries to call it. The result will be a bug.
Of course, if we haven’t written a test to verify a specific behavior, as it could be preventing the creation of tasks without description, the bug will end up surfacing.
It’s said that neither testing nor TDD rid us of every fault in a piece of software. Nevertheless, I think that we can rest assured that by using TDD, the defects will appear in those parts that aren’t covered by a test. If we look at them through this lens, therefore, bugs are actually unforeseen circumstances and unimplemented cases. This is as interesting as it is liberating, because in a certain sense, it makes the software flaws quite predictable and manageable, motivated by a lack of definition, or simply by a lack of information at the time of development.
So, let’s see how we would proceed in the case that a bug about our to-do list project was reported.
To-do list bugs
After some time using our API to generate task lists, we find a few defects. One of them is that we’re able to enter tasks that don’t have a description, which doesn’t make much sense, as it’s not very useful if we want to know what we have to do.
This bug is due to the fact that we’re not controlling at any point that we’re actually receiving the description of the task, that is, we’re trusting that the input is always correct. In that case, there’s two possibilities: that the payload of the request to the POST endpoint is completely empty, or that the task field is blank.
In this occasion, the system doesn’t have any specified behavior for these circumstances, nor is it expressed in the tests. In other cases, the bug is some kind of error not covered by the current tests.
So our first approach will be to create a test that exposes the bug.
Now then, where do we want to put that test? Let’s try and think a little about this.
On the one hand, the affected endpoints should return a 400 response (Bad Request) because in this case, that happens is that the request is badly constructed and the endpoint doesn’t understand it.
According to this, it would make sense to add an acceptance test. However, we also have the controller’s unit tests, which are much faster and would also allow us to verify that the response has the correct code.
On the other hand, we have to take into consideration which component is responsible of validating which questions.
So, for example, if the request doesn’t have any payload or the structure doesn’t include the required fields, it makes sense for the controller to be responsible of verifying it and failing as soon as it detects it. The test makes sense at the controller level.
However, if the request has the correct structure and the required fields can be found, the validation of its values may correspond to more inner layers. Thus, for example, if the value of task in the payload is an empty string, the controller may try to pass it to the use case, and let the Task constructor validate whether that value is acceptable or not. The test, in this case, would be at the AddTaskHandler use case level.
This, however, opens up a new predicament for the controller, and that is handling the errors or exceptions that come from the use case in order to return the appropriate error response.
As you can see, a bunch of circumstances arise and force us to intervene at different levels of the application.
One principle that we could try following is that if something goes wrong at the acceptance level, it should be reflected at the unit level. The first error tells us that the application has a defect, while the error at the unit level tells us which one is the component that’s failing.
So, let’s go bit by bit, tackling each of the problems.
Invalid payload
The case that we’ll be trying to handle and solve is sending an empty or badly formed request to the endpoint. In any case, it doesn’t include the task field.
We’ll start with the acceptance test, and we’ll try to reproduce the error by launching a request to the API with an invalid payload.
The test fails because the endpoint returns a 500 error instead of the 400 error that would be desireable in this case. What can we do now?
Well, we’ll move to the controller level to see what can we do there. And we’ll have to write another test characterizing the same situation.
Note that I’ve removed the expectations about the mock of the use case. In my opinion, it’s a test that wouldn’t add anything in this case, and contributes to the coupling to the implementation. It’s true that the same happens in all of the controller tests that we have so far, but there’s no reason to deliberately increase the coupling if we can avoid it.
When we run the test, it fails because it can no longer find the task index when it needs it in TodoListController:
This is part of the controller’s logic, so we’re not going to need to go any deeper. If we fix it here, we’ll have solved the problem.
It seems pretty clear that we have to check is the payload has the correct structure, and respond consequently:
Having passed the controller test, we return to the acceptance one. We run them all, and we check that they also pass perfectly.
And with that, we’ve solved the bug.
Invalid business values
One thing is that the payload is incorrect structurally, a validation that corresponds to the controller. However, given a structurally valid payload in which the controller is able to find all of the fields it needs, what happens if the values aren’t acceptable according to the business rules?
What happens is that the responsibility of detecting the problem lies in the domain objects, which will throw exceptions that should bubble up the application until some component is able to handle them.
For example, in the case of Task, we expect it to have a description. It hasn’t been defined in the user stories, but it’s something that we just assume. It may happen, then, that the payload comes with a task field containing any of these values:
null
A data type that isn’t string
An empty or too short string
A string of sufficient length
The first two points are particularly technical. At the controller level, we could validate that task is a string and fail if it isn’t.
However, the last two point to rules that must be defined by the business. That is to say, would we consider acceptable a task with a two-character description? It’s a business decision. Let’s assume that we’re told that one character is enough to accept a string as a valid description, so we’ll only have to control that it’s not an empty string.
This rule, in any case, belongs to the domain because it’s a business rule.
The location of the previous two restrictions is more open to debate, which may be summarizes as “the system cannot accept task if it’s not a string”.
But it’s best to see this from the point of view of a test, so let’s introduce one. In this case, to see what happens if task is null:
It’s very interesting to check that this test passes. Probably, fixing the previous bug has prevented us from running into this one.
Is it worth it to leave this test here? I’d say no, as we haven’t had to add anything to the code. I’ll make use of it by trying to see what would happen if we sent non-string data, like a number.
This test fails for the following reason:
That is. AddTaskHandler expects us to pass a string in execute, so it’s never going to admit a value that isn’t one. This poses an interesting problem: are we interested in forcing the string type for the description, so that if a number comes -as is the case-, it converts it and moves on?
In this example we’re going to assume that we don’t want that, that the type must be string no matter what.
As we’ve seen, the failure has also been produced in the controller, when trying to invoke the use case. Thereby, we go back to the controller test.
The controller test fails in the same way as the acceptance test. Next, we implement what’s necessary to make it pass.
This change makes the test pass, and it’s to be expected that it also makes the acceptance one pass, so we check it. Remember: it’s very important to pass all the tests every time, not only the one specific to the case, as we have to make sure that the changes that we introduce don’t alter the current behavior of the system.
The acceptance test also passes. Now we have to think about a couple of things.
On the one hand, the code that we’ve introduced is a little ugly, and it distracts us from the controller’s purpose. We need to refactor and clear things up a bit.
This is one possible solution:
It’s a first approach. We could advance it further, but it’s enough for now.
The other issue is the following. The acceptance tests that we’ve added have been useful to reproduce the bugs and guide us towards their solution. However, at the controller’s unit level, we’ve made practically the same test. What’s more, the acceptance test verifies exactly the same behavior as the controller one, as it’s over the latter where all of the responsibility about this behavior lies.
This duplication isn’t always useful. In fact, the business, which is only interested in the acceptance test, is not very worried about the kind of technical issues that we’re verifying. On the other hand, at the unit level, this kind of detail is more relevant.
So, in case of having tests at the acceptance level that are identical to others at the unit level, it’s preferable to delete the acceptance ones if they’re not bringing in any business value, having covered the same circumstance in the unit ones.
So we’re going to delete those tests before continuing.
Guaranteeing business rules
Our next test will verify that an empty description, even if it’s a string, will not generate a new task. We’ll put it in the acceptance test:
This test already indicates the violation of a business rule. This is the error that generates:
It tells us that tasks with empty description would be created as if they were valid. We have to go deeper into the application to see where we should be controlling the error.
So we go to the controller. But it mustn’t know anything about the business rules, as it lives in the Infrastructure layer. Nevertheless, it has to give the appropriate HTTP response, and that’s only possible is the use case somehow communicates to it that there’s been a problem.
The best way for it to do so is to throw an exception that the controller will capture, returning an adequate response.
This way, the controller test might end up like this:
As you can see we simulate the use case throwing an exception, and the test fails because it doesn’t get captured, and therefore an adequate response isn’t returned. To not complicate the solution too much, I’m not going to create a domain exception (but it’s something that I would do in a real project).
Let’s pass this test.
The controller unit test has passed. However, the acceptance test has not. This is because we haven’t touched the use case yet. We have to go down a bit further and write a test that can fail to tell us what to implement.
But first, let’s examine the code:
As it can be seen, the point at which the exception should be thrown, is when a Task is created, but there’s no reason for the use case to be the one who verifies that $taskDescription has a sufficient length.
Instead, it makes more sense that this logic is in Task. After all, the use case is not the place to apply business rules, but rather to coordinate domain objects, which are the ones that have the responsibility to maintain them.
So, we’d have to dive a little deeper, and modify the Task test to guarantee that it’s always constructed consistently, with a description at least one character long. In the case that the description is empty, we’ll throw the exception.
All that’s left is to implement it.
This implementation is enough to pass the unit test. Let’s see if the test at the other level also pass. Indeed, all of the unit tests keep passing, and so does the acceptance one.
We’ll leave the acceptance test that we’ve just introduced, as it carries business meaning. Moreover, in this case the controller test only verifies that it’s able to handle the exception thrown by the domain layer, while the Task test verifies that it has to be constructed with a valid description.
Not found tasks
Another of our app’s defects has to do with trying to mark as completed an inexistent task. Currently, the endpoint will return a 500 error, when the correct one would be a 404 indicating that the resource that we’re trying to modify doesn’t exist.
The following acceptance test puts a spotlight on it:
The result is:
Apart from an error in:
The base error occurs in the repository. However, we’re going to proceed systematically. As we’ve seen in the last example, the error can manifest itself in several manners in the different layers or levels of the application, so we have to go step by step, decide if that error has to be manifested in some way, and implement the necessary behavior.
The controller, as seen previously, is responsible of interpreting the problem and express it as a 404 error in the response. Therefore, it expects the use case to communicate it with an exception. In practice, it means that the controller has to react to a specific exception that will be thrown or rethrown by the use case.
So we express this with a test:
As expected, the controller test will fail because the exception simulated in the mock isn’t being captured. It’s time to implement the code to do so:
Given that the test is now passing at the controller level, we go back to the acceptance test. This level still fails because, in fact, no exception is being actually thrown in this action’s flow.
We need to go a bit deeper.
If we examine the Application layer, the use case doesn’t have much to do. As in the previous problem, its job is to delegate domain objects, so it’s those the ones who should fail. As I’ve pointed out before I’m using generic exceptions, but in real projects we’d be also using domain exceptions at different levels. For example, a TaskNotFound, that could perfectly be extending OutOfBoundsException.
So we won’t do anything to the use case, but we observe that the responsible of saying whether a task exists or not will be the repository. The first line of the execute method is clear.
It’s not worth it to write a test to simulate that the repository throws an exception and the use case does nothing. If the use case was capturing exceptions from a more inner level to rethrow them as a different exception -a technique that we could call exception nesting, then we would do it to verify that.
Since we’re not doing that, we go to the repository level and we write a test:
The test fails, as the exception is not being thrown. So we go to the production code:
With this simple solution we pass the test. We rerun the acceptance test to check that the problem is solved.
Solving defects
An interesting conclusion about what we just did is that, in reality, solving bugs is nothing more that implementing an inexistent behavior in the software. In this context, I prefer using the term defect instead of bug.
What’s more, it could be argued that this chapter, more than being about fixing bugs, is about adding features to the software that had been left behind either consciously or unconsciously. In real life, this kind of thing is usually reported as a bug, although we know that it’s a feature that hasn’t been developed yet, or one that we hadn’t taken into account before.
In fact, by developing software using TDD, we normally prevent the kinds of defects that are usually associated to bugs, as it might be a typing problem, or some slip in the code that the language, for some reason, allows to go undetected.
In any case, the procedure is more or less the following:
The first thing to do is reproduce the bug through a test at the most external level possible. Most likely it will manifest itself in the acceptance test, and it’s what’s expected if it’s a problem that can be detected by the users. But it may depend on the context.
Next, we go to the next level of the application, trying to reproduce the same bug with a unit test. There will be levels where this isn’t possible and the test passes. As we’ve seen in the last example, the manifestation of the bug may be different in each level, or it may not happen at all. If we can’t demonstrate the bug at that level, we advance to the next one.
In each level, we implement code to pass the test that showcases the bug. Once that level’s tests are green, we return to the acceptance test.
If the acceptance test continues to fail, we’ll have to go to a deeper level in the application, create a test that describes the bug, and solve it. After this, we go back to the acceptance level until the test passes again.
The moment the acceptance test passes, the defect is fixed.
Adding new features
In the previous chapter we talked about how, from the point of view of TDD-based development, all of the defects can almost be considered as features that just weren’t defined initially. Another way of looking at it is that they’re called features if we’re asked for them explicitly, and defects if they’re implicit in another feature, but haven’t been developed.
That is, when we say that we want to be able to mark a task as completed, to follow up with out to-do list project, we can assume that the system mustn’t break if we try to mark an inexistent task. For this reason, we’d say that this feature had a bug, and that’s precisely what we fixed in the previous chapter.
But in this chapter, we’ll tackle how to add new features to existing software by following a TDD approach. And, as it could be expected, we’re not actually going to make any change to our methodology. We’ll still be starting with an acceptance test, and delving deeper into the application and the necessary changes.
All in all, it’s a different scenario. A new behavior might require us to modify existing software units, and we need to make sure that the changes don’t break any of the already created functionality.
New user story
The next business requirement is to allow to edit an existing task.
US-4
As a user
I want to modify a existing task in the list
So that, I can express my ideas better
Initially, this story requires the creation of a new endpoint with which to change a task’s information.
If our application has a front-end, we might need an endpoint to recover the information of the task that we wish to edit, with the purpose of filling out the form with the current data. In this case, it would be:
In both cases, the procedure will be the same: we’ll begin by creating an acceptance test, initiating the development process. What we will find is that some of the necessary components are already created.
So, we run the test to see what it tells us. As expected, the endpoint cannot be found because we don’t have the route, so we start by defining it.
When rerunning the test after this change, it’ll tell us that there isn’t any action in the controller that’s able to respond to this route.
So we’ll have to add a new empty action.
In the new execution of the test, the error will be:
Which tells us that we have to dive into the unit level to implement this action in the controller. This cycle will ring a bell, because it’s what we’ve been doing in all this part of the book.
But the truth is that this routine is something positive. We always have a concrete task to tackle at every moment, be it creating a test or writing production code, and we don’t have to worry about anything else. The acceptance test tells us what to do, and at each level we just have to think about the specific component that we’re working on.
It’s time for us to implement the controller. As we already know, at this stage we have to design. Basically, it’s a similar action to adding a task, but in this case we’ll receive the id of the task that we’re going to modify, as well as its new description.
We’ll need a use case that expresses this user intention, to which we’ll pass the two pìeces of data that we need. If everything goes as planned, we’ll return the 204 response (no content).
We add a test that encompasses all this:
If we run the test, it will ask us to create the UpdateTaskHandler use case.
And next, it will ask for the execute method.
Once we have that, it again asks us to implement the controller’s action. So we get to it:
And the controller’s unit test passes. If we return to the acceptance test, as we should be doing now, it’ll tell us what we have to do next:
So, it’s time to get into the application layer. Again, we have to design this level, which poses an interesting problem.
In principle, we have defined that the field of the task that can be changed is just its description. Therefore, this action has to respect the current stated of the completed flag. So what we want is to recover the stored task, modify its description, and save it.
Therefore, we’ll ask the repository for the task, we’ll change it, and we’ll store it again.
When we run the test, it will ask us to implement the use case, as the repository had already been defined previously.
The implementation will surely force us to introduce some new method in Task, making a way for the description to be updated. This one, for example:
I’ve chosen this implementation to simplify. However, as I write this, I can come up some ideas that could be interesting in a realistic use case. One of them could be to apply a certain immutability, that is, instead of updating the Task object, we’d create a new one filled out with new values.
But we’ll leave those refinements for another occasion. If the run the test, it’ll tell us that Task is lacking the updateDescription method, which we’ll have to develop with the help of a unit test.
To make the test pass, we have to introduce the method.
The test passes, but we’ve noticed a problem. A few moments ago we had implemented a validation in order to prevent Task::description from being an empty string. To ensure that we’re fulfilling this business rule, we should introduce another test that verifies it and implement the answer that we want to give to this case.
However, we haven’t covered this in either the acceptance or the controller level. What should we do then? Solve it now and add tests at the other levels later, or wait and add this protection in a new iteration?
Personally, I think that the best option is to take note of this and solve it in a new cycle. It’s important to focus in the feature that we’re developing right now and finish the cycle.
Therefore, when we make the Task unit test, we first return to the UpdateTaskHandler and verify that it passes, which is exactly what happens.
And having this level in green, we try the acceptance one again, which also passes without any trouble.
The result is that the new story is now implemented, although as we’ve discovered, we need to do an extra iteration to prevent the problem of trying to update the description of a task with an invalid value.
Could we have prevented this earlier? Well it might be so. However, we’d still have needed to introduce tests at the different levels, just as we did in the previous chapter. The value of using TDD lies, precisely, in developing a series of thought habits and a certain automatization. In other words, to reach a degree of discipline and methodically reach every objective step by step.
Complete the story
In any case, every new behavior of the system should be covered by a test. So we’ll need a test to include the fulfillment of the business rule, which takes us back to the acceptance level.
As this is a business rule, we’ll be keeping this test afterwards.
The test fails:
Which indicates that, right now, tasks can be created and updated with an empty description.
Now let’s see how to fix this. With the available information, we don’t have a clue about where to intervene.
I mean: we obviously know that we need to add a validation to the updateDescription that we’ve included in Task. However, skipping steps would only lead us to generating blind spots in the development. It’s not enough to throw an exception from Task, we have to make sure that the appropriate captures it and reacts adequately. Proceeding systematically will help us prevent this risks.
In fact, the component that has the responsibility of communicating with the acceptance test in the first place is the controller, and as we’ve already seen, is the one who produces the response code that we evaluate in the acceptance test. Therefore, it’s the first place where we’re going to intervene. Of course, by defining a test with the expected behavior.
When we run the test at this level, we see it fail because the exception is thrown but not controlled. We implement the exception handling just like the creation action.
This passes the controller test. If we check the acceptance test, we see that it keeps returning the same error.
The next level is the use case, which as we’ve seen previously, is irrelevant because it will simply allow the exception to rise. As we know, it’s Task who must take responsibility, so now’s the time to tackle that change, defining the desired behavior in the test:
As there’s nothing implement, the test will fail.
We begin with a pretty obvious implementation:
The Task unit test is already in green. Before anything else, we rerun the acceptance test to se if we’ve solved he problem and we haven’t missed any loose ends. And indeed everything works!
Nonetheless, we could refactor our solution a bit, since we’re trying to maintain the same business rule in two places at once. We should unify it. To do so, we’ll use auto-encapsulation, that is, we’ll create a private method with which to assign and validate the description’s value. This is how Task looks after this change.
And with this, we’ve implemented the new user story. You’ve probably noticed that in every case, be them new user stories, modification of features, or correction of defects, our procedure is always the same. Define the desired behavior with a test, and add the necessary production code to make it pass.
Epilogue
TDD and quality of life (yours)
Be it as employees or as freelancers, we sell our time and work to companies and clients. One thing that distinguishes our profession from others is he fact that we sell intellectual work. Sometimes, even high-level intellectual work.
So, taking care of our mind and our intelligence seems like a reasonable activity that we should practice frequently.
There are many people in the software development world that think, or even claim, that testing is hard or expensive. And that’s without even talking about Test Driven Development. But what we want to show is that TDD is the most advisable path if you want to have a healthier software development life.
But first, let’s take a look at a couple of question about how our brain works.
Knowledge in the world, knowledge in the head
Doors
Do you know how to use a door? Sure? Have you ever seen an instruction manual for a door? I have. Lots of them, actually: all of of those doors with signs indicating whether you have to pull or push. I bet you’ve come across more than one of those.
Have you ever found yourself before a closed door and not known how to open it? I have. In fact, there are dozens of them across the world, such as automatic sliding doors with poorly adjusted sensors, or doors that open inwards when everything points to them opening outwards.
The point is that a door should be something easy to use, and that doesn’t always happen. The way to use a door should be obvious,right?
Switches
And what if we talk about switches? I mean those switch panels whose arrangement is not related to that of the lights they control. Sometimes they’re located in corners where you can’t even see the lamps, and you have to try and fail many times until you find the secret combination that turns on the lamp that you want.
The relationship between a switch and the light that it controls should be obvious, right?
Let’s talk about the obvious
When we talk about something obvious, we refer to knowledge that we shouldn’t have to search our head for. The knowledge is there, in the world. We just have to use it while doing other stuff. We want to be able to open doors and turn on lights without having to think about it for even a fraction of a second.
For this reason, when we’re forced to think about things that should be obvious, we’re wasting part of our mental resources, taking space in our working memory that we’d prefer to -or even should- be using for other purposes.
The knowledge is in the world when all of the clues that we need to use or interact with an object are present in the object itself. That’s why we needn’t worry, reason or remember how to use them. If we need to do it -that is, when we need to reason or remember instructions, we have to put the knowledge in our head in order to achieve our goal.
Therefore, if we have access to more knowledge in the world when we execute a task, we need less knowledge in the head, leaving more free space that we can use to think better about what we’re doing.
The less we have to think about the way of using the tools, the more we can think about what we’re doing with them.
But, how much space do we have in our working memory?
Well… it’s actually not a lot.
The capacity of our working memory
Our memory features a practically infinite storage capacity. Think about it as a huge intelligent hard drive that can store memories and data for years and years. It’s not a passive storage. In fact, it’s constantly rebuilding our memory in order to save and retrieve things. This is important, because when we think about it, we have to use our working memory to keep the data we’re using. Much like a computer.
However, our working memory is quite different to our long-term memory. There are those who call it “short-term memory”, while others refer to it as “working memory”. I think we can see it as a processor, with some registers that can store a limited amount of information units called chunks while it works. The chunks can be variable in size, but they’re significant units.
As computer programmers, a good way of understanding these chunks is to think about them as pointers that indicate memory positions. Those positions can have structures of any size. Sometimes they’re very small, like a single letter or number, and other times they’re enormous.
Can you remember a phone number? I bet you’ve grouped the digits so you only have to retain two or three numbers.
This is because our processor can ony handle a limited number of chunks. This number is approximately 7 (plus or minus 2). It’s something that changes with age and between individuals, but it’s a very good approximation. Therefore, we try to save as many registers as we can, grouping the information in chunks, and keeping some of the registers free.
What happens if we fill up all of the registers? Well, our precision and speed when doing a task decrease, increasing errors. In general, we perform worse if we try to keep too many things in our working memory at the same time.
Of course, this is an oversimplification. However, I think you get the idea. We can reduce the overload if we put the knowledge in the world instead of keeping it inside our head. This way, our performance in any task will improve.
You can take knowledge out of the working memory and put it outside with practice. This is what happens when we introduce a new technique, try to apply a new feature of a programming language, or use a new tool. In the beginning we go slow and make mistakes. We need time to automate things in our mind while we put knowledge in the world.
It’s time to go back to the main goal of this article. Let’s talk about our life as developers.
A day in the life
Programming without tests
Let’s analyze for a second what happens when we program without doing tests.
Actually, we’re always doing tests, but they’re frequently manual ones. We call it “debugging”. We use a trial and error process: Does this work? No…? Try again. Yes…? Keep going.
We try to write code and verify that it works at the same time that we write it, until it looks like it’s finished. After that, we try to verify that the code works as a whole. Then we realize that we had forgotten some details… After that, we deploy and discover new details that don’t work, so we need to fix them.
At the end of the day, we find ourselves suffering from huge headaches and under the impression of having missed something.
This happens because we try to keep all of the information in our head at the same time (remember it has a limited capacity!). We overload ourselves. The best strategy is to write a list of objectives and tasks, and try keep a certain organization and focus using this external support.
For example, writing a simple API end point requires a bunch of things:
1. An action in a controller 2. A route to that controller 3. An use case or command that executes that action 4. Probably one or more domain entities and their repository 5. The definition in the dependency container 6. Probably some service 7. Its definition in the dependency container 8. A response object 9. etc.
With this, our memory overloads as we greatly exceed the 7+/-2 items. This explains why we feel tired and stressed, with the feeling that we might have missed or forgotten something. And unsure about what we’re doing or about whether we’ve left anything important behind.
So, let’s take a look at how we’d execute the same process, this time with testing at the end.
Programming with tests at the end
This is actually almost the same, but now there’s tests at the end of the process. The kind of tests that we automate.
The end result is better, because now we’re more confident in the code. But we still have that same headache at the end of the day.
Yes. We’ve done the same amount of work, with the same mental overload and with the addition of having to write a test suite, while our brain screams at us: “Hey! The work is finished! What are you doing?”
In these conditions, it’s possible that our tests are not the best tests ever. It’s also possible that the suite doesn’t cover all of the possible scenarios.
In fact, we’re already tired when we start the testing phase. This explains why a lot of people think that testing is hard and even painful.
So the tests improve our confidence in the code, but at the cost of forcing us to do a lot of extra work. Our life isn’t made better by tests, even if we sleep better at night. Then, what is it that is wrong?
To improve our life you should try a different approach. You should try Test Driven Development.
Programming with TDD
This is what TDD is about: one thing at a time, and postponing decisions.
A simple failing test: don’t write any code while you don’t have a test.
Add code to make the test pass: don’t write no more no less than necessary.
Review the code to improve things, but don’t implement anything new and keep the existing tests passing.
Let’s see the process from the point of view of our working memory model. When we write the first failing test we’re focusing on that test. Therefore, we don’t have to waste our attention on anything besides it. Writing the test also means that we put the knowledge that we need in the world. Our memory is almost unoccupied.
Then, we focus in writing the code necessary to pass the test. The knowledge that we need is in the test, not in our head, and it’s what we need to achieve our most immediate goal.
We only have to think about the way to make the test pass. If it’s the first test, we only need to write the most obvious implementation that is possible. Even if that implementation is as simple as returning the exact same response that the test expects.
And once the test has passed, we can take a look at the code and see if we can make any improvement by refactoring. We don’t have to add any feature. We must keep the test passing while we tidy things up, eliminating unnecessary duplication, introducing better names, etc.
We’ll repeat the cycle until having completely implemented the functionality. We don’t need to write extra tests, we don’t run the risk of having forgotten anything. Our head doesn’t hurt. We’ve use the brain to think, preventing the overload.
It’s not magic, is TDD. Of course, this requires training. TDD is an intellectual tool, and the utilization of a tool should be automated. Therefore, you should do exercises such as the kata, both by yourself and with the help of colleagues, in a practice community, in the way that better suits you and your team. Practice, practice and practice. Once you’re able to proceed step by step, you’ll find out that you’re happier and less stressed in the medium and long term.
A final piece of advice
Store as much knowledge in the world as you need: use a backlog, use post-its, write a to-do list, draw diagrams, models, concept maps… Free your mind and leave room to work in one thing at a time.
TDD is more than writing tests. It’s putting the knowledge you need in the code and freeing your ming. It’s postponing decisions until the moment you’re ready to make them.
For real, try TDD, your life as a developer will improve.