Table of Contents
Basic TDD concepts
- What is TDD and why should I care about it?
- Basic concepts
- Coding-dojo y katas
- The laws of TDD
- Fizz Buzz
Solving the Fizz Buzz kata
- Statement of the kata
- Language and focus
- Define the class
Define a behavior for
- Generate a list of numbers
- We keep generating numbers
- The test that doesn’t fail
- Learning to say “Fizz”
- Saying “Fizz” at the right time
- Learning to say “Buzz”
- Saying “Buzz” at the right time
- Learning to say “FizzBuzz”
- Saying “FizzBuzz” at the right time
- Wrapping up
- What have we learned in this kata
- Selection of examples and finalization criteria
- Evolution of the behavior through tests
- Prime Factors
Solving the Prime Factors kaka
- Statement of the kata
- Language and approach
- Define the function
- Define the function’s signature
- Obtaining more imformation about the problem
- Introducing a test that doesn’t fail
- Questioning our algorithm
- Discovering the multiples of 2
- Introducing more factors
- New divisors
- The shortest path isn’t always the fastest
- Introducing new factors, second try
- More than two factors
- Do we have any criteria to select new examples?
- What have we learned in this kata
- The choice of the first test
Solving the NIF kata
- Statement of the kata
- Language and focus
- Create the constructor function
- Implement the first validation
- A test to rule them all
- Complete the validation of the length and start examining the structure
- The not very clean way of changing the test and production code a the same time
- Moving forward with the structure
- Invert the conditional
- The end of the structure
- Compacting the algorithm
- Finishing the structural validation
- Compacting the validation
- Let’s look on the bright side
- Changing the public interface
- NOW it’s time
- Moving forward with the algorithm
- More refactoring
- Validating more control letters
- A refactoring, for even more simplicity
- NIE support
- What have we learned in this kata
- The refactoring phase
- Bowling game
Solving the Bowling Game kata
- Statement of the kata
- Language and approach
- Starting the game
- Let’s throw the ball
- Time to refactor
- Counting the points
- The world’s worst thrower
- Organizing the code
- Teaching our game to count
- A step back to reach further
- Recovering a cancelled test
- Getting more comfortable
- How to handle a spare
- Introducing the concept of frame
- Continue handling spare
- Removing magic numbers and other refactorings
- Reorganizing the game knowledge
- The world’s best player
- What have we learned in this kata
Solving the Greetings kata
- Statement of the kata
- Language and approach
- Basic greeting
- Generic greeting
- Use the parameter
- Back to the generic greeting
- Answering with a yell
- Be able to greet two people
- Getting ready for several names
- A refactoring before proceeding
- Reintroducing a test
- Handle an indeterminate amount of numbers
- Shout to the shouters, but only to them
- Separate names that contain commas
- Escaping commas
- What have we learned in this kata
- TDD approaches
- To-do list project
- Mockist outside-in
- Classic outside-in TDD
TDD in real life
- Task list, outside-in TDD sliced in user stories
- Fixing bugs with TDD
- Adding new features
There is not a correct -or incorrect, if we’re getting there- way of reading this book. It all comes down to your interests.
To begin with, it’s structured in four main parts that may be read separately:
The first introduces the basic concepts of TDD, as well as some strategies to learn to use and introduce this discipline in your practice.
In the second, a selection of kata or code exercises are introduce, with which the Test Driven Development concepts and techniques are explained in depth in their classic definition. They range from some that are very well known, to some that are self-made.
Each of the kata is organized as follows:
- A theoretical chapter dedicated to a relevant aspect of TDD, highlighted by that kata, over which I have put special emphasis while solving it.
- An introduction to the kata, its origin if it’s unknown, its problem statement, and a series of recommendations and points of interest about it.
- A solution developed in a different programming language and explained in detail. There’s a repository with solutions to the kata in various languages.
The third part introduces the outside-in TDD methodology. Outside-in TDD is an approach that seeks to boost the design phase, and can be applied to real project development.
The fourth part is oriented to showcasing an example of a realistic project and how can TDD be incorporated in the various stages of development and maintenance, from the creation of a minimum viable product (MVP) to defect resolution and new feature incorporation.
If your looking for a manual to learn TDD from scratch, my advice would be to read it in order. The code exercises are laid out in order to introduce the concepts in a specific progression, one that I have reached through personal experience and when I’ve taught other people to use TDD.
In the beginning, you might think that the TDD exercises are too trivial and little realistic. Keep in mind that the name kata is not coincidental. A Kata, in martial arts, is a repetitive exercise that is practiced until its movements are automated and beyond. If you practice any sport you’ll have done dozens of exercises aimed at increasing your flexibility, strength, mobility and automations, without them having a direct application to that sport. TDD kata have that same function: they prepare your brain to automate certain routines, generate specific habits, and be able to detect particular patterns in the development process.
Possibly, the outside-in approach looks that much more applicable to your daily work to you. In fact, it’s a way of developing projects using TDD. However, a solid base in classic TDD is fundamental to be successful using it. Outside-in is very close to Behavior Driven Development.
As it has been mentioned before, the various parts and exercises are relatively independent. If you already have some experience with the Test Driven Development discipline, you can jump directly to the sections or exercises that you’re interested in. Oftentimes you’ll discover something new. One of the things I’ve found out is that even though you have practiced the same exercise dozens of times, new ideas always manage to come out.
If you’re looking to introduce TDD in your or in your team’s workflow, it’s possible that you skip directly to the part about TDD in real life. It’s the one that has, to put it that way, the most dependency on previous knowledge and experience. In that case, if you think you are laking fluency in TDD, it’s possible that you must first take a look at other parts of the book.
To reach a good level of TDD performance you should practice the exercise many times. I’m not talking three or four, I’m talking dozens of times, in different parts of your professional life and, ideally, in different languages. There exists several kata repositories in which to find exercises, and you can also invent or discover your own.
It’s also advisable to look at how other people do these exercises. In the web there are available lots of kata examples done in a variety of programming languages, and it’s a great way of comparing your solutions and process.
And last but not least, one of the best ways to learn is practicing with other people. Be it in work projects, trainings, or practice communities. Live discussion of the solutions, the size of the steps, the behavior to test… will contribute to the honing and strengthening of your development process.
For this book some assumptions are made:
- That you have a certain experience with any programming language and with a testing environment for that language. In other words: you know how to write and run tests. It doesn’t matter that your favorite language isn’t contemplated in this book.
- This book’s examples are written in various languages, and as far as possible the usage of language-specific qualities is avoided. In fact, I am inexperienced in many of them, and therefore the code may appear very simple. On the other hand, this is something desirable in TDD, as you’ll see throughout the book.
- It’s clear to you that the objetive of the code exercise is not so much solving the problem as its proposed, which is eventually solved, but the path through which we arrive at that solution.
- You understand that there’s neither a unique solution nor a precise path in the resolution of the kata. If your solution doesn’t perfectly math the one showcased in this book, it’s not a problem.
The proposed solutions to the kata are provided as explained examples of the reasoning processes that might be followed. They are not ideal solutions. When you do your version you could follow a completely different process that could be as valid (or even more) than the one presented here.
On the other hand, successive executions of the same kata by the same person could lead them to different solutions and paths. That is one of its benefits: by getting used to and automating certain thinking patterns, we can focus on more details each time and find new and more interesting points of intervention.
Likewise, as our fluency in a programming language increases, the implementations we achieve can be better and more elegant.
While preparing the kata presented in this book I have worked through several versions, in different languages, in order to find the most interesting routes and even purposefully causing some problems that I was interested in highlighting. The solution that I’ve finally decided to publish, in each case, is oriented towards some point that I wanted to accentuate about the TDD process, so it may not always be the optimal one.
That is to say, in a way, the kata come with a catch: they’re about forcing things up to the point they best achieve a didactical objetive.
In another order of thing, I have taken advantage of this project to force myself to experiment with different programming languages. In some cases, there are new to me or I have very little experience working with them, so it’s possible that the implementations are specially rough or don’t include some of there more specific and optimal features.
In this first part we’ll introduce the basic concepts to understand what is Test Driven Development and how it’s different from other disciplines and methodologies that use tests. We’ll also talk about how can you learn TDD, be it individually or in a team of practice community.
The first chapter has a general introduction to the process of Test Driven Development.
The chapter about basic concepts is a glossary that we’ll make use of throughout the book.
Finally, the chapter about coding-dojo and kata proposed some simple ideas to start practicing with a team or by oneself.
Test Driven Development is a software development methodology in which tests are written in order to guide the structure of production code.
The tests specify -in a formal, executable and exemplified manner- the behaviors that the software we’re working on should have, defining small objectives that, after being achieved, allow us to build the software in a progressive, safe and structured way.
Despite we’re talking about tests, we’re not referring to Quality Assurance (from now on: QA), even though by working with TDD methodology we achieve the secondary effect of obtaining a unitary test suite that is valid and has the maximum possible coverage. In fact, typically part of the tests created during TDD are unnecessary for a comprehensive battery of regression tests, and therefore end up being removed as new tests make them redundant.
That is to say: both TDD and QA are based in the utilizations of tests as tools, but this use is different in several respects. Specifically, in TDD:
- Tests are written before the software that they execute even exists.
- The tests are very small and their objective is to force writing the minimum amount of production code needed to pass the test, which has the effect of implementing the behavior defined by the test.
- The tests guide the development of the code, and the process contributes to the design of the system.
In TDD, the tests are defined as executable specifications of the behavior of a given unit of software, while in QA, tests are tools for verification of that same behavior.
Put in simpler words:
- When we do QA, we try to verify that the software that we’ve written behaves according to the defined requirements.
- When we do TDD, we write software to fulfill the defined requirements, one by one, so that we end up with a product that complies with them.
Although we will expand on this topic in depth throughout the book, we will briefly present the essentials of the methodology.
In TDD, tests are written in a way that we could think of as a dialogue with production code. This dialogue, the rules that regulate it, and the cycles that are generated by this way of interacting with code will be practiced in the first kata of the book: FizzBuzz.
Basically, it consists in:
- Writing a failing test
- Writing code that passes the test
- Improving the code’s (and the test’s) structure
Once we are clear about the piece of software which we’re going to work on and the functionality that we want to implement, the first thing to do is to define a very small first test that will fail hopelessly because the file containing the production code that it needs to run doesn’t even exist. While this is something that we’ll deal with in all of the kata, in the NIF kata we will delve into strategies that will help us to decide on the first tests.
Here’s an example in Go:
Although we can predict that the test won’t even be able to be compiled or interpreted, we’ll try to run it nonetheless. In TDD it’s fundamental to see the tests fail, assuming it isn’t enough. Our job is making the test fail for the right reason, and then making it pass by writing production code.
The error message will indicate us what to do next. Our short-term goal is to make that error message disappear, as well as those that might come after, one by one.
For instance, after introducing the
decToRoman function, the error will change. Now it’s telling us that it should return a value:
It could even happen that we get an unexpected message, such as that we’ve tried to load the
Book class and it turns out that we had mistakingly created a filled named
brok. That’s why it’s so important to run test, and see if it fails and how does it do it exactly.
This code results in the following message:
This error tells us that we have misspelled the name of the function, so we start by correcting it:
And we can continue. Since the test states that it expects the function to return “I” when we pass it 1 as an input, the failed test should indicate us that the actual result doesn’t match the expected one. However, at the moment, the test is telling us that the function doesn’t return anything. It’s still a compilation error and still not the correct reason to fail.
To make the test fail for the reason that we expect it to, we have to make the function return a
string, even if it’s an empty one.
So, this change turns the error into one related with the test definition, as it’s not obtaining the result that it expects. This is the correct reason for failure, the one that will force us to write the production code that will pass the test.
And so we would be ready to take the next step:
As a response to the previous result, we write the production code that is needed for the test to pass, but nothing else. Continuing with our example:
After making the test pass we can start creating the file that’ll contain the unit under test. We could even rerun the test now, which probably would cause the compiler or the interpreter to throw a different error message. At this point everything depends a bit on circumstances, such as conventions in the language we’re using, the IDE we’re working with, etc.
In any case, it’s a matter of taking small steps until the compiler or interpreter is satisfied and can run the test. In principle, the test should run and fail indicating that the result received from the unit of software doesn’t match the expected one.
At this point there’s a caveat, because depending on the language, the framework, and some testing practices, the concrete manner of doing this first test may vary. For example, there are test frameworks that just require for the test to not throw any errors or exceptions to succeed, so a test that simply instantiates an object or invokes any of its methods is enough. In other cases it’s necessary that the test includes an assertion, and if none is made it’s considered as not passing.
In any case, this phase’s objective is making the test run successfully.
With the Prime Factors we’ll study the way in which production code can change to implement new functionality.
When every test passes, we should examine the work done so far and check if it’s possible to refactor both the production and test code. Here we apply the usual principles: if we detect any smell, difficulty in understanding what’s happening, knowledge duplication, etc. we must refactor the code to make it better before continuing.
Ultimately, the questions at this point are:
- Is there a better way to organize the code that I’ve just written?
- Is there a better way to express what this code does and make it easier to understand?
- Can I find any regularity and make the algorithm more general?
For this reason, we should keep every test that we’ve written and made pass. If any of them turn red we would have a regression in our hands and we would have spoiled, so to speak, the already implemented functionality.
It’s usual not to find many refactoring opportunities after the first cycle, but don’t get comfortable just yet: there’s always another way of seeing and doing things.
Tras el primer ciclo es normal no encontrar muchas oportunidades de refactor, pero no te fíes: siempre hay otra manera de ver y hacer las cosas. As a general rule, the earlier you spot opportunities to reorganize and clean up your code and do so, the easier development will be.
For instance, we’ve created the function under test in the same file as the test.
Turns out there’s a better way to organize this code, and it is creating a new file to contain the function. In fact, it’s a recommended practice in almost every programming language. However, we may have skipped it at first.
And, in the case of Go, we can convert it in an exportable function if its name is capitalized.
To delve further into everything that has to do with the refactor when working we’ll have the Bowling Game kata.
Once the production code passes the test and is as nicely organized as it can be in this phase, it’s time to choose other functionality aspect and create a new failing test in order to describe it.
This new test fails because the existing code doesn’t cover the desired functionality and introducing a change is necessary. Therefore, our mission now is to turn this new test green by making the necessary transformations in the code, which will be small if we’ve been able to size our previous tests properly.
After making this new test pass, we search for refactoring opportunities to achieve a better code design. As we advance in the development of the piece of software, we’ll see that the possible refactorings become more and more significant.
In the first cycles we’ll begin with name changes, constant and variable extraction, etc. Then we’ll advance to introducing private methods or extracting certain aspects as functions. At some point we’ll discover the necessity of extracting functionality to helper classes, etc.
When we’re satisfied with the code’s state, we keep on repeating the loop as long has we have remaining functionality to add.
The obvious answer to this question could be: when all the functionality is implemented.
But, how do we know this?
Kent Beck suggested making a list of all of the aspects that would have to be fulfilled to consider the functionality as complete. Every time any one of them is attained it’s crossed off the list. Sometimes, while advancing in the development, we realize that we need to add, remove or change some elements in the list. It’s good advice.
There is a more formal way of making sure that a piece of functionality is complete. Basically, it consists in not being able to create a new failing test. Indeed, if an algorithm is implemented completely, it will be impossible to create a new test that can fail.
The result or
outcome of Test Driven Development is not creating flawless software free of any defect, although many of them are prevented; or generating a suite of unitary tests, although in practice it’s indeed obtained and has a coverage that can even reach 100% (with the tradeoff that it may have redundancy). But, none of these are TDD’s objectives, in any case they’re just certainly beneficial collateral effects.
Even though we use the same tools (tests), we use them for different purposes. In TDD, testing guides development, setting specific objectives that are reached by adding or changing code. The result of TDD is a suite of tests that can be used in QA as regression tests, although it’s frequent that we have to retouch those tests in some way or other. In some cases to delete redundant tests, and in others to ensure that the casuistries are well covered.
In any case, TDD helps enormously in the QA process because it prevents many of the most common flaws and contributes to building well structured and loosely coupled code, aspects that increase software reliability, our ability to intervene in case of errors, and even the possibility of creating new tests in the future.
TDD is a tool to aid in software design, but it doesn’t replace it.
When we develop small units with some very well defined functionality, TDD helps us establish the algorithm design thanks to the safety net provided by our tests.
But when considering a larger unit, a previous analysis that leads us to a “sketch” of the main elements of the solution allows us to have a development frame.
The outside-in approach tries to integrate the design process within the development one, using what Sandro Mancuso tags as Just-in-time design: we start from a general idea about how the system will be structured and how it will work, and we design within the context of the iteration that we’re in.
What TDD provides us is a tool that:
- Guides the software development in a systematic and progressive way.
- Allows us to verifiable claims about whether the required functionality has been implemented or not.
- Helps us avoid the need to design all of the implementation details in advance, since it’s a tool that helps the software component design in itself.
- Allows us to postpone decisions at various levels.
- Allows us to focus in very concrete problems, advancing in small steps that are easy to reverse if we introduce errors.
Several studies have shown evidence that suggests that the application of TDD has benefits in development teams. It’s not conclusive evidence, but research tends to agree that with TDD:
- More tests are written
- The software has fewer flaws
- The productivity is not diminished, it can even increase
It’s quite difficult to quantify the advantages of using TDD in terms of productivity or speed, but subjectively, many benefits can be experienced.
One of them is that the TDD methodology can lower the cognitive load of development. This is so because it favors the division of the problem in small tasks with a very defined focus, which allows us to save the limited capacity of our working memory.
Anecdotal evidence suggests that developers and teams introducing TDD reduce defects, diminish time spent on bugs, increase deployment confidence, and productivity is not adversely affected.
- Test Driven Development1
- Why Test-driven Development2
- Test driven development: empirical body of evidence3
- Does Test-Driven Development Really Improve Software Design Quality4
- 6 Misconceptions about TDD – Part 1. TDD Brings Little Business Value and Isn’t Worth it5
- TDD is about design, not testing6
- Does TDD really lead to good design?7
- Using TDD to influence design8
Next we’ll define some of the concepts that are used throughout the book. They must be understood within the context of Test Driven Development.
A test is a small piece of software, usually a function, which runs another piece of code and verifies if it produces an expected result or effect. A test is, basically, an example of usage of the unit under test in which a scenario is defined and the tested unit is executed to see if the results matches what we had in mind.
Many languages use the notion of TestCase, a class that groups several related tests together. In this approach, each method is a test, although it’s usual to refer to the test case as just “test”.
A test as specification utilizes usage examples from the tested piece of software in order to describe how it should work. Significant examples are used, above all, but it’s not always done in a formal way.
It’s opposite to the test as verification, typical of QA, in which the piece of software is tested by choosing the test cases in a systematic manner to verify that it fulfills what’s expected of it.
A failing test is a specification that cannot be fulfilled yet because the production code that lets it pass hasn’t been added. Testing frameworks typically picture them with a red color.
A passing test is a specification that runs production code which generates an expected result or response. Testing frameworks typically give them a green color.
They are tests that test an isolated unit of software; their dependencies are doubled to keep their influence on the result controlled.
Integration tests usually test groups of software units, so that we can verify their communication and combined action.
Acceptance tests are integration tests that test a software systems as if they were yet another of their consumers. We normally write them depending on the business’s interests.
It’s a class that groups several tests together.
It’s a set of test and/or test cases that can usually be executed together.
In TDD we use the name “production code” to refer to the code that we write to make tests pass and which, eventually, will end up being executed in a production system.
Software unit is a quite flexible concept that we have to interpret within a context, but it usually refers to a piece of software that can be executed in a unitary and isolated manner, even if when it’s composed of many elements.
The software unit that is exercised in a test. There’s a discussion about what is the scope of a unit. At one extreme there are those who consider that a unit is a function, a method, or even a class. However, we can also consider as unit under test a set of functions or classes that are tested through the public interface of one of them.
A refactoring is a change in code that doesn’t alter its behavior or its interface. The best way to ensure this is having at least one test that exercises the piece of code that is being modified, so that after each change we make sure that the test keeps passing. This proves that the behavior hasn’t changes even though the implementation has been modified.
Precisely because some refactorings are very well identified and characterized, it’s been possible to develop tools that can execute them automatically. These tools are available in the IDE.
In the software world we call kata to design and programming exercises that pose relatively simple and limited problems that we can use to practice development methodologies.
This term is a borrowing from the Japanese word that refers to the training exercises typical of martial arts. Its introduction is attributed to Dave Thomas (The Pragmatic Programmer)1, referring to the completion of small code exercises, repeated over and over again until achieving a high degree of fluency or automation.
Applied to TDD, kata seek to train the test-production-refactoring as well as the ability to add behavior by means of small code increments. These exercises will help you divide functionality in small parts, choose examples, advance in the project step by step, switch priorities depending on the information provided by the tests, etc.
The idea is to repeat the same kata many times. On top of gaining fluency in the application the process, in each of the repetitions there’s the possibility of discovering new strategies. With repeated practice, we’ll favor the development of certain habits and pattern recognition, automating our development process to a certain extent.
You can train with kata by yourself or with others. A systematic way of doing this is through a Coding Dojo.
A coding-dojo is a workshop in which a group of people, regardless of their level of knowledge, perform a kata in a collaborative and non-competitive way.
The idea of Coding Dojo or Coder’s Dojo was introduced in the XP2005 conference by Laurent Bossavit y Emmanuel Gaillot.
The basic structure of a coding-dojo is pretty simple:
- Presentation of the problem, explanation of the exercise (5-10 min)
- Coding session (30-40 min)
- Sharing of the status of the exercise (5-10 min)
- The coding session continues (30-40 min)
- Sharing and review of the achieved solutions
The coding session can be structured in several ways:
- Prepared kata. A presenter explains how to solve the exercise, but relying on the feedback from the attendants. No progress is made until consensus is reached. It’s a very suitable way of working when the group is just starting out and few people are familiar with the methodology.
- Randori kata. The kata is done in paring using some system to switch between a driver (on the keyboard) and a co-pilot. The rest of the attendants collaborate by making suggestions.
- Hands-on workshop. One alternative is to make the participants form pairs and work on the kata collaboratively. Halfway through the exercise, a few-minutes break is taken in order to discuss the work that has been done. At the end of the session, all of the different solutions are presented (at whatever step of the assignment each team has arrived). Participants can choose their preferred programming language, so it’s a great opportunity for those that are looking to get started with a new one. It can also be a good approach for beginners if the pair members have different levels of experience.
In the beginning it may be a good idea to attend directed kata. Essentially, it’s a kata performed by an expert in the shape of a live coding session where they explain or comment the different steps with the audience, in such a way that you can easily see the dynamic in action. If you don’t have this possibility, which may be the most common scenario, it’s a good idea to watch some kata on video. You will find some links in the chapters dedicated to each kata.
Above all, the goal of the kata is to exercise the TDD discipline, the application of the three laws, and the red-green-refactor cycle. Production code is actually less important, in the sense that it’s not the main objective of the learning, although it will always be correct as long as the tests pass. However, every execution of the kata can lead us to discover new details and new ways of facing each phase.
Namely, the kata are designed to learn to develop software using tests as a guide, and to train the mindset, the reasonings, and the analysis that help us perform this task. In general, developing a good TDD methodology will help us write better software thanks to the constraints it imposes.
Obviously, the first tries will take their time, you will get into paths with no apparent return, or you will straight up skip some of the steps of the cycle. When this happens, you just have to go back or start over from scratch. These are exercise that don’t have a unique correct answer.
In fact, every programming language, approach, or test environment could favor some solutions over the others. You can perform a kata several times, trying to assume different starting assumptions in each try, or applying different paradigms or conditions.
If you reach points in which you can choose between different courses of actions, take note of them in order to repeat the exercise, and try a different path later to see where it leads you.
In TDD it’s really important to focus on the here and the now that each failing tests defines, and not get anxious about reaching the final objective. This doesn’t mean putting it aside or dedicating ourselves to something else. It simply means that we have to tread that path step by step, and doing it like that will take us to the finish line almost without us realizing, with much less effort and more solidity. Acquiring this mindset, dealing only with the problem in front of us, will help us reduce stress and think more clearly.
If possible, try repeating the same kata using different languages, even different testing frameworks. The two best known families are:
- xSpec, which are oriented to TDD and tend to favor testing by example, providing specific syntax and utilities. Their handicap is that they don’t usually work well for QA.
- xUnit, which are the most generic testing frameworks, albeit more QA oriented. Nevertheless, they can be used in TDD without any problems.
Introducing the TDD methodology in development teams is a complex process. Above all, it’s important to contribute to generating a culture that’s open to innovation, to quality, and to learning. The greatest reluctance often comes from a fear of TDD slowing down the development, or that at first they cannot see direct applications to the daily problems.
I personally believe that using both formal and informal channels can be of interest. Here are some ideas.
- Establishing a weekly time, one or two hours, for a coding-dojo open to the whole team. Depending on the level of expertise, it could start with directed kata, hands-on type sessions, or the format that seems the most appropriate to us. Ideally, several people would be able to get them moving.
- Bringing on experienced people into the teams who could help us introduce TDD in pairing or mob-programming work sessions, guiding their coworkers.
- Organizing specific trainings, with external help if people with enough experience aren’t available.
- Introducing (if there isn’t one already) a technical blog in which to publish articles, exercises, and examples about the topic.
- The Programming Dojo7
- What is coding dojo8
- The Coder’s Dojo – A Different Way to Teach and Learn Programming - Abstract9
In this part we present a series of code exercises with which we’ll explore in depth how Test Driven Development is done.
We’ll use the discipline’s classic style or approach. TDD is a software development methodology re-discovered by Kent Beck, based on the way that the first computer programs used to be built. Then, calculations were first carried out by hand, so as to have the reference of the expected the result that would have to be reproduced in the computer. In TDD, we write a very simple program that tests that the result of other program matches the expected one. The key here is that this program hasn’t been written yet. It’s that simple.
The methodology was presented by Beck in his book TDD by example, in which, among other things, teaches how to build a testing framework using TDD. Subsequently, various authors have contributed to the refinement and systematization of the model.
Since the introduction of the TDD methodology by Kent Beck, there has been an effort to define a simple framework that serves as a guide for its application in practice.
Initially, Kent Beck proposed two very basic rules:
- Don’t write a line of new code unless you first have a failing automated test.
- Eliminate duplication.
That is, to be able to write production code, first we must have a test that fails and that requires us to write that code, precisely because that’s what’s needed to pass the test.
Once we’ve written it and checked that the test passes, our effort goes towards reviewing the written code and eliminating duplication as much as possible. This is very generic, because on the one hand it refers to refactoring, and in the other hand, to the coupling between the test and the production code. And being so generic, it’s hard to translate to practical actions.
On top of that, these rules don’t tell us anything about how big the jumps in the code involved in each cycle should be. In his book, Beck suggests that the steps -or baby steps- can be as small or as big as we find them useful. in general, he recommends using small steps when we’re unsure or have little knowledge about the algorithm, while allowing larger steps if we have enough experience and knowledge to be sure about what to do next.
With time, and starting from the methodology learnt from Beck himself, Robert C. Martin established the “three laws”, which not only define the action cycle in TDD, but also provide some criteria about how large the steps should be in each cycle.
- It’s not allowed to write any production code unless it passes a failing unit test
- It’s not allowed to write more than the one unit test that’s sufficient to fail; and compilation errors are failures
- It’s not allowed to write more production code than necessary to pass the one failing unit test
The three laws are what make TDD different to simply writing tests before code.
These three laws impose a series of restrictions whose objective is to force us to follow a specific order and workflow. The define several conditions that, if they’re fulfilled, generate a cycle and guide our decision-making. Understanding how they work will help us to make the most out of TDD to help us produce quality code that we’re able to maintain.
Theses laws have to be fulfilled all at the same time, because they work together.
The first law states that we can’t write any production code unless it passes an existing unit test that is currently failing. This implies the following:
- There has to be a test that describes a new aspect about the behavior of the unit that we’re describing.
- This test must fail because there isn’t anything that makes it pass in the production code.
In short, the first law forces us to write a test that defines the behavior that we’re about to implement in the unit of software that we want to develop, all before having to consider how to do it.
Now, how should this test be?
It’s not allowed to write more than the one unit test that’s sufficient to fail; and compilation errors are failures
The second law tells us that the test must be sufficient to fail, and that we have to consider compilation errors as failures (or their equivalent in interpreted languages). For examples, among these errors there would be some as obvious as that the class or function doesn’t exist or hasn’t been defined yet.
We must avoid the temptation of writing and skeleton of the class or function before writing the first test. Remember that we’re talking about Test Driven Development. Therefore, it’s the tests that tell us what production code to write and when to write it, and not the opposite.
For the test, “being sufficient to fail” means that it must be very small in several ways, and this is something that is quite difficult to define at first. Frequently we talk about the “simplest” test, the simplest case, but it’s not exactly this way.
What conditions must a TDD test meet, especially the first one?
Well, basically it should force us to write the minimum possible amount of code that can be executed. That minimum, in OOP, would be to instantiate the class that we want to develop without worrying about any further details, for now. The specific test will vary a little depending on the language and testing framework that we’re using.
Let’s take a look at this example from the Leap Year kata, where we write code to find out if a year is a leap year or not. In the example, my intention is to create a
Year object to which I can ask if it’s a leap year by sending it the message
IsLeap. I’ve come upon this exercise in several kata compilations. For this chapter the examples will be written in C#.
The rules are:
- The years not divisible by 4 aren’t leap years (for example, 1997).
- The years divisible by 4 are leap years (1999), unless:
- If they’re divisible by 100 they aren’t leap years (1900).
- If they’re divisible by 400 they are leap years (2000).
Our goal would be to be able to use
Year objects in this manner:
The usual impulse is to try and start in the following way, because it looks like it’s the example of the simplest possible case.
However, it’s not the simplest test that could fail for only one reason. Actually, it can fail for at least five different reasons:
Yearclass doesn’t exist yet.
- It also wouldn’t accept parameters passed by the constructor.
- It doesn’t answer to the
- It could return nothing.
- It could return an incorrect response.
That is, we can expect the test to fail for those five causes, and only the fifth is the one that’s actually being described by the test. We have to reduce them to just one.
In this case, it’s very easy to see that there’s a dependency between the various causes of failure, in such a way that for one to surface, the previous ones have to be solved. Obviously, it’s necessary to have a class that we can instantiate. Therefore, our first test should be much more modest and just expect that the class can be instantiated:
If we ran this test, we would see it fail for obvious reasons: the class that we try to instantiate isn’t anywhere to be found. The test is failing by a compilation -or equivalent- problem. Therefore, it could be a test that’s sufficient to fail.
Throughout the process we’ll see that this test is redundant and that we can do without it, but let’s not get ahead of ourselves. We still have to make it pass.
The first and the second laws state tha we have to write a test and tell us how that test should be. The third law tells us how production code should be: the condition is that it must pass the test that we’ve written.
It’s very important to understand that it’s the test the one that tells us what code we need to implement, and therefore, even though we have the certainty that it’s going to fail because we don’t even have a file with the necessary code to define the class, we still must execute the test and expect its error message.
That is: we must see that the test, indeed, fails.
The first thing that it’ll tell us when trying to run it is that the class doesn’t exist. In TDD, that’s not a problem, but rather an indication about what we have to do: add a file that contains the definition of the class. It’s possible that we’re able to generate that code automatically using the IDE tools, and it would be advisable to do it that way.
In our example, the message of the test says:
And we would simply have to create the
At this point we run the test again to make sure it passes from red to green. In many languages this code will be enough. In some cases you might need something more.
If this is so and the test passes, the first cycle is complete and we can move on to the next behavior, unless we think that we have the chance to refactor the existing code. For example, the usual thing to do here would be to move the
Year class to its own file.
If the test hasn’t passed, we’ll look at the message of the failed test and we’ll act accordingly, adding the minimum code necessary for it to finally pass and turn green.
When we’ve managed to pass the first test by applying the three laws, we might think that we haven’t really achieved anything. We haven’t even tackled the possible parameters that the class might need in order to be constructed, be them data, collaborators -in the case of services-, or use cases. Even the IDE is complaining that we’re not assigning the instantiated object to any variable.
However, it’s important to adhere to the methodology, especially in these first stages. With practice and the help of a good IDE, the first cycle will have taken a few seconds at most. In these few seconds we’ll have written a piece of code that, while certainly very small, is completely backed by a test.
Our goal still is to let the tests dictate what code we must write to implement each new behavior. Since our first test already passes, we have to write the second.
Applying the three laws, what comes next is:
- Write a new failing test that defines a behavior
- That this test is the smallest possible one that still forces us to make a change in the production code
- Write the minimum and sufficient production code to pass the test
Which could be the next behavior that we need to define? If in the first test we’ve forced ourselves to write the minimum necessary code to instantiate the class, the second test can lead us through two paths:
- Force us to write the necessary code to validate constructor parameters and, therefore, be able to instantiate an object with everything that it needs.
- Forcing us to introduce the method that executes the desired behavior.
This way, in our example, we could simply make sure that
Year is able to answer the
The test will throw this error message:
Which tells us that the next step should be to introduce the method that answers that message:
The test passes, indicating that the objects of type
Year can now attend to the
Having arrived at this point, we could ask ourselves: what if we don’t obey the three laws of TDD?
Disregarding the easy joke that we’ll end up fined or in jail for not following the laws of TDD, the truth is that we’d really have to suffer some consequences.
The most immediate consequence is that we break the red-green cycle. The code that we write is no longer guided or covered by tests. In fact, if we wish to have that part tested, we’ll have to write “a posteriori” tests (QA tests).
Imagine that we do this:
The existing tests will fail because we need to pass a parameter to the constructor, and also we don’t have any test that’s in charge of verifying the behavior that we’ve just implemented. We’d have to add tests to cover the functionality that we’ve introduced, but they’re no longer driving the development.
We can interpret this in two ways: writing multiple tests, or writing one test that involves too big a jump in behavior.
Writing multiple tests would cause several problems. To pass them all we would need to implement a large amount of code, and the guide that those tests could be providing gets so blurry that it’s almost like having none at all. We wouldn’t have clear and concrete indications to solve by implementing new code.
Here we’ve added two tests. To make them pass we’d have to define two behaviors. Also, they’re too large. We haven’t yet established, for example, that we’ll have to pass a parameter to the constructor, nor that the response will be of type
bool. These tests mix various responsibilities and try to test too many things at once. We’d have too write too much production code in only one go, leading to insecurity and room for errors to occur.
Instead, we need to make tests for smaller increments of functionality. We can see several possibilities:
To introduce that the answer is a
bool we can assume that, by default, the years aren’t leap years, so we’ll expect a
The error is:
Can be solved with:
However, we have another way to do it. Since the language is strongly typed, we can use the type system as a test. Thus, instead of creating a new test:
We change the return type of
When we run the test it will indicate that there’s a problem, as the function isn’t returning anything:
And finally, we just have to add a default response, which will be
To introduce the construction parameter we could use a refactoring. In this step we could be conditioned by the programming language, leading us to different solutions.
The refactoring path is straightforward. We just have to incorporate the parameter, although we won’t use it for now. In C# and other languages we could do it by introducing an alternative constructor, and in this way the tests would continue to pass. In other languages, we could mark the parameter as optional.
Since a parameterless constructor doesn’t make sense for us, now we could remove it, but first we’d have to refactor the tests so that they use the version with a parameter:
The truth is that we don’t need the first test anymore, since it’s implicit in the other one.
And now we can remove the parameterless constructor, as it won’t be used again in any case:
This one is probably the most common violation of the three. We often come to a point where we “see” the algorithm so clearly that we feel tempted to write it now and finish the process for good. However, this can lead us to miss some situations. For example, we could “see” the general algorithm of an application and implement it. But, this could have distracted us and prevented us from considering one or several particular cases. This possibly incomplete algorithm, once incorporated to the application and deployed, could lead us to errors in production and even to economic losses.
For example, if we add a test to check that we control non-leap years:
In the current state of our exercise, an example of excess of code would be this:
This code passes the test, but as you can see, we’ve introduced much more than it was necessary to achieve the behavior defined by the test, adding code to control leap years and special cases. So, apparently, everything is fine.
If we try a leap year we’ll see that the code works, which reinforces our impression that all’s good.
But, a new test fails. Years divisible by 100 should not be leap years (unless they are also divisible by 400), and this error has been in our program for a while, but until now we didn’t have any test that executed that part of the code.
This is the kind of problem that can go unnoticed when we add too much code at once in order to pass a test. The code excess probably doesn’t affect the test that we were working on, so we won’t know if it hides any kind of problem, and we won’t know it unless we happen to build a test that makes it surface. Or even worse, we won’t know it until the bug explodes in production.
The solution is pretty simple: only add the code that’s strictly necessary to pass the test, even if it’s just returning the value expected by the test itself. Don’t add any behavior if there’s not a failing test forcing you to do it.
In our case, it was the test which verified the handling of the non-leap years. In fact, the next test, which aimed to introduce the detection of standard leap years (years divisible by 4), passed without the need for adding any new code. This leads us to the next point.
When we write a test and it passes without us needing to add any production code, it can be due to any of these reasons:
- The algorithm that we’ve written is general enough to cover all of the possible cases: we’ve completed the development.
- The example that we’ve chosen isn’t qualitatively different from others that we had already used, and therefore it’s not forcing us to write production code. We have to find a different example.
- We’ve added too much code, which is what we’ve just talked about in the previous bit.
In this leap year kata, for example, we’ll arrive at a point in which there’s no way of writing a test that fails because the algorithm supports all possible cases: regular non-leap years, leap years, non-leap years every 100 years, and leap years every 400.
The other possibility is that the chosen example doesn’t really represent a new behavior, which can be a symptom of a bad definition of the task, or of not having properly analyzed the possible scenarios.
The three laws establish a framework which we could call a “low level” one. Martin Fowler, for his part, defines the TDD cycle in these three phases which would be at a higher level of abstraction:
- Write a test for the next piece of functionality that you wish to add.
- Write the production code necessary to make the test pass.
- Refactor the code, both the new and the old, so that all’s well structured.
These three stages define what’s usually known as the “red-green-refactor” cycle, named like this in relation to the state of the tests in each of the phases of the cycle:
- Red: the creation of a failing test (it’s red) that describes the functionality or behavior that we want to introduce in the production software.
- Green: the writing of the necessary production code to pass the test (making it green), with which it’s verified that the specified behavior has been added.
- Refactor: while keeping the tests green, reorganizing the code in order to improve its structure, making it more readable and sustainable without losing the functionality that we’ve developed up until this point.
In practice, refactoring cycles only arise after a certain number of cycles of the three laws. The small changes driven by these start to accumulate, until they arrive at a point in which code smells start to appear, and with them the need for refactoring.
- The three rules of TDD1
- The three rules of TDD - video2
- Refactoring the three laws of TDD3
- TDD with PHPSpec4
- The 3 Laws of TDD: Focus on One Thing at a Time5
- Test Driven Development6
- The cycles of TDD7
The FizzBuzz kata is one of the easiest kata to start practicing TDD. It poses a very simple and well-defined problem, so, in a first phase it’s very easy to solve it completely in a session of one or two hours. But, its requirements can also be expanded. Setting the requisite that the size of the list should be configurable, that new rules can be added, etc., should bump up the difficulty a bit and lead us to achieve more complex developments.
In this case, being our first kata, we’ll follow the simplest version.
According to Coding Dojo, the authorship of the kata is unknown1, but it’s commontly considered that it was introduced to society by Michael Feathers and Emily Bache in 2008, in the framework of the Agile2008 conference.
FizzBuzz is a game related to learning division in which a group of students take turns to count incrementally, saying a number each one, replacing any number divisible by three with the word “Fizz”, and any number divisible by five with the word “Buzz”. If the number is divisible by both three and five, then they say “FizzBuzz”.
So, our objective shall be to write a program that prints the numbers from 1 to 100 in such a way that:
- if the number is divisible by 3 it returns Fizz.
- if the number is divisible by 5 it returns Buzz.
- if the number is divisible by 3 and 5 it returns FizzBuzz.
The FizzBuzz kata is going to help us understard and start applying the Red-Green-Refactor cycle and the Three laws of TDD.
The first thing we should do is to consider the problem and get a general idea about how we’re going to solve it. TDD is a strategy that helps us avoid the necessity of having to make a detailed analysis and an exhaustive design prior to the solution, but that doesn’t mean that we shouldn’t first understand the problem and consider how we’re going to tackle it.
This is also necessary to avoid getting carried away by the literal statement of the kata, which can lead us to dead ends.
The first thing we’re going to do, once we have that general idea on how to approach the objective, is to apply the first law and write a test that fails.
This test should define the first behavior that we need to implement.
Writing a test that fails means, at this time, writing a test that won’t work because there isn’t any code to run, a fact that will be pointed out to us by the error messages. Even though you might find it absurd, you must try to run the test and confirm that it doesn’t pass. It’s the test error messages that will tell you what to do next.
To get the test to fail we have to apply the second law, which says that we can’t write more tests than necessary to fail. The smallest possible test should force us to define the class by instantiating it, and little more.
Last, to make the test pass, we’ll apply the third law, which says that we mustn’t write any more production code than necessary to pass the test. That is: define the class, the method that we’re going to exercise (if applicable), and make it return some response that will finally make the test pass.
The two first steps of this stage are pretty obvious, but the third one, not so much.
With the first two steps we try to make the test fail for the right reasons. That is, first it fails because we haven’t written the class, so we define it. Then, it will fail because the method that we’re calling is missing, so we define it as well. Finally, it will fail because it doesn’t return the response that we expect, which is what we’re testing in itself.
And what response should we be returning? Well, no more no less than the one that the test expects.
Once we have a first test and a first piece of production code that makes it pass, we’ll ask ourselves this question: what will be the next behavior that I should implement?
- Video of the kata by Jesús López de la Cruz2
- FizzBuzz in Kata-log3
- Solved FizzBuzz in SmallTalk4
- Code Katas Explained: FizzBuzz5
- TDD — Which Order to Write Your Tests6
- Solution in Python using a use case list7
Our objective will be to write a program that prints the numbers from 1 to 100 in such a way that:
- if the number is divisible by 3 it returns Fizz.
- if the number is divisible by 5 it returns Buzz.
- if the number is divisible by 3 and 5 it returns FizzBuzz.
We’re going to solve this kata in Python with
unittest as our testing environment. The task consists in creating a
FizzBuzz class which will have a
generate method to create the list, so it will be used more or less like this:
To do so, I create a folder called
fizzbuzzkata and to it I add the
What the exercise asks for is a list with the numbers from 1 to 100 changing some of them by the words “Fizz”, “Buzz”, or both of them in case of fulfilling certain condictions.
Note that it doesn’t ask for a list of any amount of numbers, but rather specifically from 1 to 100. We’ll come back to this in a moment.
Now we’re going to focus in that first test. The less we can do is make it possible to instantiate a
FizzBuzz type object. Here’s a possible first test:
It may look weird. This test is just limited to trying to instantiate the class and nothing else.
This first test should be enough to fail, which is what the second law states, and force us to define the class so the test can pass, fulfilling the third law. In some environments it would be necessary to add an assertion, given that they consider that the test hasn’t passed if it hasn’t been explicitly verified, but it’s not the case in Python.
So, we launch it to see if it really fails. The result, as it was expected, is that the test doesn’t pass, displaying the following error:
To pass the test we’ll have to define the FizzBuzz class, something we’ll do in the test file itself.
And with this, the test will pass. Now that we’re green we can think about refactoring. The class doesn’t have any code, but we could change the name of the test for a more adequate one:
Usually it’s better that the classes live in their own file (or Python module) because it makes it easier to manage the code and keep everything located. So, we create a
fizzbuzz.py file and we move the class to it.
And in the test, we import it:
When we introduce this change and run the test, we can verify that it passes and that we’re in green.
We’ve fulfilled the three laws and closed our first test-code-refactor cycle. There’s not much else to do here, except for moving on to the next test.
FizzBuzz class not only doesn’t do anything, it doesn’t even have any methods! We’ve said that we want it to have a
generate method, which is the one that will return the list of numbers from 1 to 100.
To force us to write the
generate method, we have to write a test that calls it. The method will have to return something, right? No, not really. It’s not always necessary to return something. It’s enough if nothing breaks when we call it.
When we run the test, it tells us that the object doesn’t have any
Of course it doesn’t, we have to add it:
Now we already have a class capable of answering to the
generate message. Can we do any refactoring here?
Well, yes, but not in the production code, but in the tests. It turns out that the test that we’ve just written overlaps the previous one. That is, the
test_responds_to_generate_message test covers the
test_can_instantiate test, making it redundant. Therefore, we can remove it:
Perhaps this surprises you. This is what we talk about in the beginning of the book, some of the tests that we use to drive the development stop being useful for some reason or another. Generally, they end up becoming redundant and don’t provide any information that we’re not already getting from other tests.
Specifically, we want it to return a list of numbers. But it doesn’t need to have the multiples of 3 and 5 converted just yet.
The test should verify this, but it must keep passing when we have developed the complete algorithm. What we could verify would be that it returns a 100 element list, without paying any attention to what it contains exactly.
This test will force us to give it a behavior in response to the
Of course, the test fails:
Right now, the method returns
None. We want a list:
When we change
generate so that it returns a list, the test fails because our condition isn’t met: that the list has a certain number of elements.
This one is finally an error from the test. The previous one were basically equivalent to compiling errors (syntax errors, etc.). That’s why it’s so important to see the tests fail, to use the feedback that the error messages provide us.
Making the test pass is quite easy:
With the test in green, let’s think a little.
In the first place, it could be argued that in this test we’ve asked
generate to return a response that meets two conditions:
- be of type list (or array, or collection)
- have exactly 100 elements
We could have forced this same thing with two even smaller tests.
This tiny little steps are often called baby steps, and the truth is that they don’t have a fixed length, they depend on our practice and experience instead.
Thus, for example, the test that we’ve created is small enough to not generate a big leap in the production code, although it’s capable of verifying both conditions at once.
In the second place, note that we’ve just written the necessary code to fulfill the test. In fact, we return a list of 100
None elements, which may seem a little pointless, but it’s enough to achieve this test’s objective. Remember: don’t write more code than necessary to pass the test.
In the third place, we have written enough code, between test and production, to be able to examine it and see if there’s any opportunity for refactoring.
The clearest refactoring opportunity that we have right now is the magic number 100, which we could store in a class variable. Again, each language will have its own options:
And we have some more in the test code. Once again, the new test that we’ve added overlaps and includes the old one, which we could remove.
In the same way, the name of the test could improve. Instead of referencing the specific number, we could simply indicate something more general, that doesn’t tie the test to a specific implementation detail.
Last but not least, we still have a magic number 100, which we will name:
And with this, we’ll have finished a new cycle in which we have already introduced the refactoring phase.
FizzBuzz can already generate a list with 100 elements, but at the moment each of them is literally nothing. It’s time to write a test that forces us to put some elements inside that list.
To do this, we could expect the generated list to contain the numbers from 1 to 100. However, we have a problem: at the end of the development process, the list wil contain the numbers but some of them will be represented by the words Fizz, Buzz, or FizzBuzz. If I don’t take this into account, this third test will start failing as soon as I start implementing the algorithm that converts the numbers. It doesn’t seem like a good solution.
A more promising approach would be: what numbers won’t be affected by the algorithm? Well, those that aren’t multiples of 3 or 5. Thereby, we could choose some of them to verify that they’re included in the untransformed list.
The simplest of them all is 1, which should occupy the first position of the list. For symmetry reasons we’re going to generate the numbers as
The test is very small and fails:
At this point, what change could we introduce in the production code to make the test pass? The most obvious one could be the following:
It’s enough to pass the test, so it suits us.
One problem that we have here is that the number ‘1’ doesn’t appear as such in the test. What it does appear is its representation, but we use its position in
num_list, which is a 0-index array. We’re going to make explicit the fact that we’re testing against the representation of a number. First, we introduce the concept of position:
And now the concept of number, as well as its relationship with position:
Now we don’t need to refer to the position at all, just to the number.
We could make the test easier to read. First, we separate the verification:
We extract the representation as a parameter in the assertion, and we make an inline of
number, to make the reading more fluent:
As you can see, we’ve work a lot in the test. Now introducing new examples will be very inexpensive, which will help us write more tests and make the process more pleasant and convenient.
Actually, we haven’t yet verified whether the
generate method is returning a list of numbers, so we need to keep writing new tests that force us to create that code.
Let’s make sure that the second position is occupied by the number two, which is the next simplest number that’s not a multiple of 3 or 5.
We have a new test which fails, so we’re going to add some code to production so that the tets passes.
However, we have some problems with this implementation:
To intervene in it, we’d need to refactor it a little first. At least, extract the response to a variable that we could manipulate before returning it.
But, since the test is failing right now, we can’t refactor. Before that we have to cancel or delete the test that we’ve just created. The easiest would be to comment it out to prevent its execution. Remember, to do any refactorings it’s compulsory that the tests are passing:
Now we can work:
And we activate the test again, which now fails because the number 2 is represented by a ‘1’. The simplest change that I can come up with, right now, is this one. So silly:
The truth is that the test is green. We know that this is not the implementation that will solve the full problem, but our production code is only obligated to satisfy the existing tests and nothing more. So, let’s not get ahead of ourselves. Let’s see what we can do.
To start, the name of the test is obsolete, let’s generalize it:
Now that this has been solved, let’s remember that previously we saw that the concepts of “number” and “representation” were necessary to better define the expected behavior in the tests. We can now introduce them in our production code:
It’s a first step. We can see the limitations of the current solution. For example, why does the
1 have a special treatment? And what will happen if we want to verify other number? There are several problems.
As for the number
1, the key lies in the list of numbers idea. Right now we’re generating a list of constants, but each of the elements of the list should be a correlative number, beginning with 1 until completing the desired number of elements.
And then we’d have to replace each number by its representation. Something like this:
This structure keeps passing the test, but it doesn’t seem very practical. However, we can see a pattern. We need to iterate over the list to give solution:
With the information that we have, we could simply assume that it’s enough to convert the number into a
string and put it in its place:
Of course, there are more compact and pythonic ways, such as this one:
But we should be careful, we’re probably getting too ahead of ourselves with this refactoring, and it’ll surely become a source of problems further down the line. For this reason, it’s preferable to keep a more direct and naive implementation, and leave the optimizations and more advanced structures for later, when the behavior of the method is completely defined. So, I would advise you to avoid this kind of approach.
All of this refactoring is done while the tests are green. This means that:
- With the test, we describe the behavior that we want to develop
- We make the test pass by writing the simplest possible code, as stupidly simple it looks, with the intent of implementing that behavior
- We use the green tests as a safety net to restructure the code until we find a better design: easy to understand, maintain, and extend.
Points 2 and 3 are build based on these principles:
- KISS: Keep it simply stupid, which means keeping the system as mindless as possible, that is, not trying to add intelligence prematurely. The more mechanical and simple, the better, as long as it meets its needs. This KISS is our first approach.
- Gall’s law: every working complex system has evolved from a simpler system that also worked. Therefore, we start with a very simple implementation that works (KISS), and we make it evolve towards a more complex one that works as well, something that we’re sure about because the test keeps passing.
- YAGNI: You aren’t gonna need it, which prevents us from implementing more code than strictly necessary to pass the current tests.
But now we have to implement new behaviors.
The next number which is not a multiple of 3, 5 o 15 is 4, so we add an example for this:
And the test passes. Good news? It depends. A test that passes just after its creation is always a reason for suspicion, at least from a TDD point of view. Remember: writing a failing test is always the first thing to do. If the test doesn’t fail, it means that:
- The behavior is already implemented
- It’s not the test we were looking for
In our case, the last refactoring has resulted in the general behavior of the numbers that don’t need transformation. In fact, we can categorize the numbers in these classes:
- Numbers that are represented as themselves
- Multiples of three, represented as ‘Fizz’
- Multiples of five, represented as ‘Buzz’
- Multiples of both three and five, represented as ‘FizzBuzz’
Numbers 1 and 2 belong to the first class, so they’re more than enough, since any of the numbers in that class would serve as an example. In TDD we need them both, because they’ve helped us to introduce the idea that we would have to iterate through the number list. However, just one of them would be sufficient for a QA test. For this reason, when we introduce the example of the number 4, we don’t have to add any additional code: the behavior is already implemented.
It’s time to move on to the other classes of numbers.
It’s time for our
FizzBuzz to be able to convert the 3 into “Fizz”. A minimal test to specify this would be the following:
Having a failing test, let’s see what minimal production code we could add to pass it:
We’ve added an
if that makes this particular case pass. For the time being, with the information that we have, there isn’t any other better way. Remember KISS, Gall and YAGNI to avoid advancing faster than you should.
Regarding the code, there may be a better way to populate the list. Instead of generating a list of numbers and changing it later, perhaps we could initialize an empty list and append the representations of the numbers one by one.
This works. Now
num_list becomes kind of pointless as a list. We can make a change:
And remove the temporary variable:
Everything continues to work correctly, as the tests attest.
Now we want it to add a “Fizz” when the corresponding number is a multiple of 3, and not just when it’s exactly 3. Of course, we have to add a test to specify this. This time we use the number 6, which is the closest multiple of 3 (and not of 5) that we have.
To pass the test we just have to make a pretty small change. We have to modify the condition to expand it to all of the multiples of three. But we’re going to do it incrementally.
First, we establish the behavior:
With this, the test passes. Now let’s change the code so that it uses the concept of multiple of:
The test keeps passing, which indicates that our hypothesis is correct. Now we can remove the redundant part of the code:
At this point you may want to try other examples from the same class, although it’s not really necessary since any multiple of three is an adequate representative. For this reason, we’ll move on to the next behavior.
This test lets us specify the new behavior:
So, we modify the production code to make the test pass. Same as we did before, we treat the particular case in a particular manner.
Yes, we already know how we should handle the general case of the multiples of five, but it’s preferable to force ourselves to go slowly. Remember that the main objective of the exercise isn’t to solve the list generation, but rather do it guided by tests. Our main interest now is to internalize this slow step cycle.
There’s not much else that we can do now, except for continuing to the next test.
At this point, the test is quite obvious, the next multiple of 5 is 10:
And, again, the change in the production code is simple at first:
Next, we perform the refactoring step by step, now that we’ve ensured the behavior:
And with this refactoring, we can proceed to the next -and last- class of numbers.
The structure is exactly the same. Let’s start with the simplest case: 15 should return FizzBuzz, since 15 is the first number that is a multiple of 3 and 5 at the same time.
The new test fails. Let’s make it pass:
And, again, we introduce a test for another case of the “multiples of 3 and 5” class, which will be 30.
This time I’ll jump directly to the final implementation, but you get the idea:
And we have our “FizzBuzz”!
We’ve completed the development of the specified behavior of the FizzBuzz class. In fact, any other test that we could add now would confirm that the algorithm is general enough to cover all cases. That is, there isn’t any conceivable test that could force us to add more production code: there’s nothing else we must do.
In a real work case, this code would de functional and deliverable. But we can certainly still improve it. The fact that all of the tests are passing indicates that the desired behavior is fully implemented, so we could fearlessly refactor and try to find a more flexible solution. For example, with the following solution it would be easier to add extra rules:
And if you look closely, you can see that it would be relatively easy to modify the class so we could introduce the rules from the outside, as it would be enough to pass the rule dictionary at the moment of instantiating the class, fulfilling the Open for extension and Closed for modification principle. In this case, we’ve allowed for the original rules to be used unless others are not specifically indicated, so the tests continue to pass in exactly the same manner as before.
- The laws of TDD
- The red->green->refactor cycle
- To use minimal test to make the production code advance
- To change the production code as minimally as possible to achieve the desired behavior
- To use the refactor phase to improve the code design
One of the most frequent questions when you start doing TDD is how many tests do you have to write until you can consider the development to be finished. The short answer is: you’ll have to do all of the necessary tests, and not one more. The long answer is this chapter.
A good technique would be to follow Kent Beck’s advice and write a control list or check-list in which to annotate all of the behaviors that we want to implement. Obviously, as we complete each behavior, we cross it off the list.
It’s also possible that, during the work, we discover that we need to test some other behavior, that we can remove some of the elements of the list, or that we’re interested in changing the order that we had planned. Of course, we can do all of this as convenient.
The list is nothing more than a tool to not have to rely on our memory so much during the process. After all, one of the benefits of doing Test Driven Development is to reduce the amount of information and knowlege that we have to use in each phase of the development process. Each TDD cycle involves a very small problem, that we can solve with pretty little effort. Smalls steps that end up carrying us very far.
Let’s see an example with the Leap Year kata, in which we have to create a function to calculate if a year is a leap year or not. A possible control list would be this one:
Another example for the Prime Factors kata, in which the exercise consists in developing a function that returns the prime factors of a number:
For each behavior that we want to implement, we’ll need a certain amount of examples with which to write the tests. In the following chapter we’ll see that TDD has two principal moments: one related to the establishment of the interface of the unit that we’re creating, and other in which we develop the behavior itself. It’s at this moment when we need examples that question the current implementation and force us to introduce code that produces the desired behavior.
A good idea is, therefore, to take not of several possible examples with which to test each of the items of the control list.
But, how many examples are necessary? In QA there are various techniques to choose representative examples with which to generate the tests, but they have the goal og optimizing the relationship between the number of tests and their ability to cover the possible scenarios.
We can use some of them in TDD, although in a slightly different manner, as we’ll see next. Keep in mind that we use TDD to develop an algorithm, and in many cases, we discover it as we go. For that, we’ll need several examples related to the same behavior, in such a way that we can identify patterns and discover how to generalize it.
The techniques that we’re going to look at are:
This technique relies on one idea: that the set of all possible conceivable cases can be divided in classes according to some criterion. All of the examples in a class would be equivalent between them, so it would suffice to test with only one example from each class, as all are equally representative.
This technique is similar to the previous one, but paying attention to the limits or boundaries between classes. We choose two examples from each class: precisely those that lie at its limits. Both examples are representatives of the class, but they lets us study what happens at the extremes of the interval.
It’s mainly used when the examples are continuous data, or we care especially about the change that occurs when passing from one class to another. Specifically, it’s the kind of situation where the result depends on whether the value being considered is larger, strictly larger, etc.
The decision table is nothing more than the result of combining the possible values, grouped by classes, of the parameters that are passed to the unit under test.
Let’s take a look the election of examples in the case of Leap Year. For this, we being with the list:
Let’s see the first item. We could use any number that’s not divisible by 4:
In the second item, the examples should meet the condition of being divisible by 4:
Let’s pay attention to the next element of the list. The condition of being divisible by 100 overlaps the previous condition. Therefore, we have to remove some of the examples from the previous item.
And the same thing happens with the last of the elements of the list. The examples for this item are the numbers that are divisible by 400. It also overlaps the previous example:
This way, the example list ends up like this:
On the other hand, the example selection for Prime Factors could be this one:
In simple code exercises such as the Leap Year kata, it’s relatively simple to anticipate the algorithm, so we don’t need to use too many examples to make it evolve and implement it. Actually, it would suffice to have an example from each class, as we’ve seen when we talked about the partition by equivalence classes, and in a few minutes we would be done with the problem.
However, if we’re just starting to learn TDD, it’s a good idea to go step by step. The same when we have to face complex behaviors. It’s preferable to take really small baby steps, introduce several examples, and wait to have sufficient information before trying to generalize. Having some amount of code duplication is preferable to choosing the wrong abstraction and keep constructing on top of it.
A heuristic that you may apply is the rule of three. This rule tells us that we shouldn’t try to generalize code until we have at least three repetitions of it. To do it, we’ll have to identify the parts that are fixed and the parts that change.
Consider this example, taken from an exercise from the Leap Year kata. At this point the tests are passing, but we haven’t generated an algorithm yet.
There we have our three repetitions. What do the have in common apart from the
if/then structure? Let’s force a small change:
Clearly, the three years are divisible by 4. So we could express it in a different way:
Which is now an obvious repetition and can be removed:
This has been very obvious, of course. However, things won’t always be this easy.
In summary, if we don’t know the problem very well, it can be useful to wait until the rule of three is fulfilled before we start thinking about code generalizations. This implies that, at least, we’ll introduce three examples that represent the same class before we refactor the solution to a more general one.
Let’s see another example from the same kata:
The divisible by concept is pretty obvious in this occasion and we don’t really need a third case to evaluate the possibility of extracting it. But, the main thing here isn’t actually duplication. Actually, it would have been enough with one example. We have encountered the idea that the condition that is being evaluated is the fact that the year number must be divisible by a certain factor. With this refactor, we make it explicit.
This gets clearer if we advance a bit further.
We find the same structure repeated three times, but we cannot really extract a common concept from here. Two of the repetitions represent the same concept (leap year), but the third one represents exceptional regular duration years.
Let’s try another approach:
If we divide the year by 4, we could propose another idea, since that could help us tell apart the parts that are common from the parts that are different.
It’s weird, but it works. Simpler:
It’s still working. But, what use is it to us?
- On the one hand, we still haven’t found a way to reconcile the three
- On the other hand, we’ve made the domain rules unrecognizable.
In other words: trying to find an abstraction relying only on the existence of code repetition can lead us to a dead end.
As we’ve pointed out before, the concept in which we’re interested is leap years and the rules that determine them. Can we make the code less repetitive? Maybe. Let’s do it again, from the top:
The question is that the “divisible by 400” is an exception to the “divisible by 100” rule:
Which lets us do this and compact the solution a little bit:
Maybe we could make it more explicit:
But now it looks a bit weird, we need to be more explicit here:
At this point, I wonder if this solution hasn’t become too unnatural. On the one hand, the abstraction is correct, but by taking it this far, we’re probably being guilty of a certain amount of over-engineering. The domain of the problem is very small and the rules are very simple and clear. If you compare this:
I think I would stick with the first solution. That said, in a more complex and harder to understand problem, the second solution might be a lot more appropriate, precisely because it would help us make the involved concepts explicit.
The moral of the story is that we mustn’t strive and struggle to find the perfect abstraction, but rather the one that’s sufficient at that particular moment.
The TDD methodology is based in work cycles with which we define a desired behavior in the form of a test, we make changes in the production code to implement it, and we refactor the solution once we know that it works.
While we have specific tooling to detect situations in need of refactoring, and even well-define methods to carry it out, we don’t have specific resources that guide the necessary code transformations in a similar manner. That is, is there any process that help us decide what kind of change to apply to the code in order to implement a behavior?
The Transformation Priority Premise1 is an article that suggests a useful framework in this sense. Starting from the idea that as the tests become more specific the code becomes more general, it proposes a sequence of the type of transformations that we can apply every time that we’re in the implementation phase, in the transition from red to green.
The development through TDD would have two main parts:
- In the first one we build the class’ public interface, defining how we’re going to communicate with it, and how it’s going to answer us. We analyze this question in it’s most generic way, which would be the data type that it returns.
- In the second part we develop the behavior, starting from the most general cases, and introducing the more specific ones later.
Let’s see this with a practical example. We’ll perform the Roman Numerals kata paying attention to how the tests help up guide these two parts.
We’ll always start with a test that forces us to define the class, as for now we don’t need anything else than an object with which to interact.
We run the test to see it fail, and the, we write the empty class definition, the minimum necessary to pass the test.
If we’ve created it in the same file as the test, now we can move it to its place during the refactoring phase.
We can already think about the second test, which we need to define the public interface, that it: how we’re going to communicate with the object once it’s instantiated, and what messages it’s going to be able to understand:
We’re modifying the first test. Now that we have some fluency we can afford this kind of license, so writing new tests is more inexpensive. Well, we check that it fails for the reason that it has to fail (the
toRoman message is not defined), and next we write the necessary code to make it pass. The compiler helps us: if we run the test we’ll see that it throws an exception that tells us that the method exists but it’s not implemented. And probably the IDE tells us something about it too, one way or another. Kotlin, which is the language that we’re working with here, ask us directly to implement it:
For now, we remove these indications introduced by the IDE:
And this passes the test. We already have the message with which we’re going to ask
RomanNumerals to do the conversion. The next step can be to define that the response we expect should be a
String. If we work with dynamic typing or Duck Typing we’ll need a specific test. However, in Kotlin we can do it without tets. It’s enough to specify the return type of the function:
This won’t compile and our current test will fail, so the way to make it pass would be to return any
String. Even an empty one.
We may consider this as a refactoring up to a certain point, but we can apply it as if it were a test.
Now we’re going to think about how to use this code to convert arabic numbers to roman notation. Since there is no zero in the latter, we have to start with number 1.
When we run the test, we can see that it fails because the function doesn’t expect an argument, so we add it:
And this passes the test. The public interface has been defined, but we still don’t have any behavior.
Once we’ve established the public interface of the class that we’re developing, we’ll want to start implementing its behavior. We need a first example, which for this exercise will be to convert the
To do this, we already need to assign the value to a variable and use an assertion. The test will end up like this:
"", which for all intents and purposes is equivalent to returning
What is the simplest transformation that we can make to make the test pass? In few words, we go from not returning anything to returning something, and to pass the test, that something ought to be the value “I”. That is, a constant:
The test passes. This solution might shock you if it’s your first time peeking at TDD, although if you’re reading this book you’ll have already seen more examples of this. But this solution is not stupid.
In fact, this is the best solution for the current state of the test. We may know that we want to build an arabic to roman numeral converter, but what the test specifies here and now is just that we expect our code to convert the integer
1 to the String
I. And that’s exactly what it does.
Therefore, the implementation has exactly the necessary complexity and specificity level. What we’re going to do next will be to question it with another example.
But first, we should do a refactoring.
We’ll do it to prepare for what comes next. When we change the example, the response will have to change as well. So, we’re going to do two things: use the parameter that we receive, and, at the same time, ensure that this test will always pass:
We run the test, which should pass without any issues. Moreover, we’ll make a small adjustment to the test itself:
The test continues to pass and we are already left with nothing to do, so we’re going to introduce a new example (something that is now easier to do):
When we run the test we check that it fails because it doesn’t return the expected
II. A way to make it pass is the following:
Note that, for now, we’re returning constants in all cases.
Let’s refactor, as we’re in green. First we refactor the test to make it even more compact, and easier to read and extend with examples.
We add yet another test. Now it’s even easier:
We see it fail, and, to make it pass, we add a new constant:
And now, expressing the same, but in a different manner and using only one constant:
We could extract it:
And now it’s easy to see how we could introduce a new transformation.
This transformation consists in using a variable to generate the response. That is, now instead of returning a fixed value for each example, we’re going to calculate the appropriate response. Basically, we’ve started to build an algorithm.
This transformation makes it clear that the algorithm consists in piling up as many
number indicates. A way of seeing it:
for loop could be better expressed as a
while, but first we have to make a change. It should be noted that the parameters in Kotlin are
final, so we can’t modify them. For this reason, we’ve had to introduce a variable and initialize it to the value of the parameter.
On the other hand, since the
i constant is only used once and its meaning is pretty evident, we’re going to remove it.
This way we’ve started to build a more general solution to the algorithm, at least up to the point that’s currently defined by the tests. As we know, it’s not “legal” to accumulate more than 3 equal symbols in the roman notation, so in it’s current state, the algorithm will generate the wrong roman representations if we use it on any number larger than 3.
This indicates that we need a new test to be able to incorporate a new behavior and develop the algorithm further, which is still very specific.
But, what is the next example that we could implement?
In the first first place we got number 4, which in roman notation is expressed as
IV. It introduces a new symbol, which is a combination of symbols in itself. For all we know it’s just a particular case, so we introduce a conditional to separate the flow into two branches: one for the behavior that we already know, and other for the new one.
The test will fail because it tries to convert the number 4 to
IIII. We introduce the conditional to handle this particular case.
Oops. The test fails because we have forgotten to subtract the consumed value. We fix it like this, and we leave a note for our future selves:
We advance to a new number:
We check that the test fails for the expected reasons and we get
IIIII as a result. To make it pass we’ll take another path, introducing a new conditional because it’s a new different case. This time we don’t forget to subtract the value of 5.
The truth is that we had already used conditional before, when our responses were constant, to choose “which constant to return”, so to speak. Now we introduce the conditional in order to be able to handle new case families, as we’ve already exhausted the capabilities of the existing code to solve the new cases that we’re introducing. And within that execution branch that didn’t exist before, we resort to a constant again in order to solve it.
We introduce a new failing test to force another algorithm advance:
This case is especially interesting to see fail:
We need to include the “V” symbol, something that we can do in a very simple way by changing the
== for a
A minimal change has sufficed to make the test pass. The next two examples pass without any extra effort:
This happens because our current algorithm is already general enough to be able to handle these cases. However, when we introduce the
9, we face a different casuistry:
The result is:
We need a specific treatment, so we add a conditional for the new case:
We keep running through the examples:
Being it a new symbol, we handle it in a special manner:
If we take a look at the production code we can identify structures that are similar between them, but we can’t clearly see a pattern that we could refactor and generalize. Maybe we need more information. Let’s proceed to the next case:
This test results in:
To begin, we need to enter the “X” symbol’s conditional, so we make this change:
And this is enough to make the test pass. With numbers 12 and 13 the test continues to pass, but when we reach 14, something happens:
The result is:
This happens because we’re not accumulating the roman notation in the return variable, so in some cases we crush the existing result. Let’s change from a simple assignment to an expression:
This discovery hints that we could try some specific examples with which to manifest this problem and solve it for other numbers, such as 15.
And we apply the same change:
19 also has the same solution. But if we try 20, we’ll see a new error, a rather curious one:
This is the result:
The problem is that we need to replace all of the
10 that are contained in the number by
To handle this case, the simplest thing to do is changing the
if to a
whileis an structure that is both a conditional and a loop at the same time.
if executes the conditioned branch only once, but
while does it as long as the condition continues to be met.
Could we use
while in all cases?NOw that we’re in green, we’ll try to change all of the conditions from
while. And the tests prove that it’s possible to do so:
This is interesting, we can see that the structure get more similar each time. Let’s try changing the cases in which we use an equality to see if we can use
>= in its place.
And the tests keep on passing. This indicates a possible refactoring to unify the code.
It’s a big refactoring, the one that we’re going to do here in just one step. Basically, it consists in introducing a dictionary structure (
Map, in Kotlin), that contains the various conversion rules:
The tests continue to pass, indication that our refactoring is correct. In fact, we wouldn’t have any error until reaching number
39. Something to be expected, as we introduce a new symbol:
The implementation is simple now:
And now that we’ve checked that it’s working properly, we move it to a better place:
We could keep adding examples that are not yet covered in order to add the remaining transformation rules, but essentially, this algorithm isn’t going to change anymore, so we’re reached a general solution to convert any natural number to roman notation. In fact, this is how it would end up. The necessary tests, first:
And the implementation:
We could try several acceptance tests to verify that it’s possible to generate any roman number:
Small production code transformation can result in big behavioral changes, although to do that we’ll need to spend some time on the refactoring, so that the introduction of changes is as simple as possible.
- Applying Transformation Priority Premise to Roman Numerals Kata2
- The Transformation Priority Premise3
- The Transformation Priority Premise (TPP)4
This kata demonstrates that, as the tests get more specific, the algorithm becomes more general. But, besides that, it’s a wonderful kata to reflect on example selection, and why the tests that pass as soon as we write them aren’t really useful.
On the other hand, the kata reveals a much more intriguing concept: the premise of the priority of the transformations, according to which, in the same way that there are refactorings (which are changes in the structure of a code that don’t alter its behavior), there would also be transformations (that are changes in the code that do produce changes in its behavior).
These transformations would have an order, from simplest to the most complex, and a priority in their application that dictates that we should apply the simpler ones first.
The kata was created by Robert C. Martin1 when he was writing a program for his son to calculate the prime factors of a number. Thinking about its development, his attention was caught by the way in which the algorithm evolved and became simpler as it became more general.
Write a class with a
generate method that returns a list of the prime factors of an integer. If yo uprefer a more procedural -or even functional- approach, try to write a
To not overcomplicate the exercise, the result may be expressed as an array, list or collection, without having to group the factors as powers. For example:
This is a very simple kata, as well as a very potent one: you don’t need many cycles to carry it out, but nevertheless, highlights some specially important features of TDD.
To begin, we can analyze the examples that we could try. In principle, the arguments will be natural numbers. We have three main categories:
- Numbers that don’t have any prime factors, the only case is 1.
- Numbers that are prime, such as 2, 3, or 5.
- Numbers that are product of several prime numbers, such as 4, 6, 8, or 9.
Beside, among non-prime numbers, we find those that are the product of 2, 3, or n factors, repeated or not.
Applying the laws of TDD that we’ve already seen, we’ll start with the smallest possible failing test. Then, we’ll write the necessary production code to make the test pass.
We’ll go through the different cases by writing the test first, and then, the production code that makes it pass without breaking any of the previous tests.
One of the curiosities of this kata is that we can just go through the list of natural numbers in order, taking examples as we go until we consider that we can stop. However, is this the best strategy? Can it lead us to selecting unhelpful examples?
Our objective will be to write a program that decomposes a natural number into its prime factors. For the sake of simplicity, we won’t group the factors as powers. We’ll leave that as a posterior exercise, if you wish to advance a bit further.
primefactors function to which we’ll pass the number that we wish to decompose, obtaining as a response an array of its prime factors sorted from lowest to highest.
Our first test expects the
primefactors function to exist:
Which, as we already know, hasn’t been defined yet:
We introduce it without further ado. For now, in the test file itself:
We haven’t yet communicated with the function in the test, so we’re going to introduce that idea, passing it a first example of a number to decompose, along the result that we expect. The first thing that should draw our attention is, that due to the peculiarities of the definition and distribution of primes among natural numbers, we have a very intuitive way of organizing the examples and writing the test. It’s almost enough to start with number one and advance incrementally.
Number one, moreover, is a particular case (it doesn’t have any prime factor), so it suits us especially well as a first test.
To pass the test we need a minimal implementation of the function:
Note that we don’t even implement the function’s necessity for a parameter. We’re gonna make the test be the one that asks for it first. Meanwhile, we delete the first test, given that it has become redundant.
The second test should help us define the function’s signature. To do so, we need a case in which we expect a response different than
, something we’ll be able to do if we receive a parameter that introduces the necessary variation. Number 2 is a good example with which to achieve this:
To solve this case we need to take into account the parameter defined by the function, which forces us to introduce and use it. In our solution, we handle the case that the previous test states, and we make an obvious implementation to pass the test that we’ve just introduced. We’re postponing the implementation of the algorithm until we have more information:
The next case that we’re going to try is decomposing number 3, which is prime like number 2. This test will help us to better understand how to handle these cases:
Now that we have this failing test we’ll make an obvious implementation, such as returning the passed number itself. Since it’s a prime number, this is perfectly correct. There’s not much else to do here.
In the presentation of the kata we’ve divided the cases into categories. Let’s review:
- Edge or special cases, such as 1
- Prime numbers, like 2, 3, or 5
- Non-prime numbers, like 4, 6, or 8
We’ve already covered the first category, since there are no more edge cases to consider.
We haven’t begun to treat the third category yet, and we haven’t done any test with any of its examples.
The second category is the one that we’ve been testing until now. At this point, we could keep selecting examples from this category and trying new cases. But, what would happen? Let’s see it.
The test passes without implementing anything!
It was pretty obvious, wasn’t it? At this moment, the so-called algorithm, doesn’t do anything else than consider all numbers as primes. For this reason, if we keep using prime numbers as examples, nothing will force us to make changes to the implementation.
When we add a test that doesn’t fail, it means that the algorithm that we’re developing is already general enough to solve every case from that category, and therefore, it’s time to move on to a different category that we cannot yet successfully handle. Or, if we’ve already discovered all of the possible categories, it means that we’ve already finished!
We’re going to start using example from the non-prime category. But, we’re also going to refactor the solution to be able to see these categories in a more explicit manner:
The first non-prime that we have is number 4, and it’s the simplest of them all for many reasons. So, this time we write a test that will fail:
There are many ways of approaching this implementation. For example, we have this one that, while especially naive, is effective:
In spite of its simpleness, it’s interesting. It help us understand that we have to distinguish between primes and non-primes in order to develop the algorithm.
Nevertheless, it has a very unkept look. Let’s try to organize it a bit more neatly:
It basically says: if a number is higher than 1 we try to decompose it. If it’s 4, we return its factorization. And if it’s not, we return the same number, for it will be prime. Which is true for all of our current examples.
The next number that we can decompose is 6. A nice thing about this kata is that every non-prime number gives us a different response, and that means that every test is going to provide us with some information. Here it is:
Let’s begin with the naive implementation:
There’s nothing wrong with doing it this way. On the contrary, this way of solving the problem starts highlighting regularities. 4 and 6 are multiples of 2, so we want to introduce this knowledge in the shape of a refactoring. And we could do this thanks to our tests, that demonstrate that the function already decomposes them correctly. So, we’re going to modify the code without changing the behavior that we’ve already defined through tests.
Our first try relies on the fact that the first factor is 2 and is common between them. That is, we can design an algorithm that processes multiples of 2 and, for now, we assume that the remainder of that first division by 2 is the second of its factors, whichever it is.
To do so we have to introduce an array-type variable with which to deliver the response, to which we’ll be adding the factors as we discover them.
This has been a first step, now it’s clearer how it would work, and we can generalize it by expressing it like this:
This refactoring almost works, but the test for number 2 has stopped passing. We fix it, and we advance a step further.
This new implementation passes all the tests, and we’re ready to force a new change.
Among the non-prime numbers we could consider several groupings to the effect of selecting examples. There are some cases in which the numbers are decomposed as the product of two prime factors, and others in which they’re decomposed as the product of three or more. This is relevant because our next examples are 8 and 9. 8 is 2 * 2 * 2, while 9 is 3 * 3. The 8 forces us to consider the cases in which we can decompose a number in more than two factors, while the 9, those in which new divisors are introduced.
In principle, we don’t care much about which case to start with. Maybe the key is to pick the case that seems the easiest to manage. Here we’ll start by decomposing the number 8. This way we keep working with the 2 as the only divisor, which at the moment looks a little easier to approach.
Let’s write a test:
To implement, we have the change an
if for a
while. That is, we have to keep dividing the number by 2 until we can’t do it anymore.
This change is quite spectacular because, while being very small, is also very powerful. By applying this, we can decompose any number that is a power of 2, nothing more and nothing less. But this is not the final goal, we want to be able to decompose any number, and to do so we must introduce new divisors.
At this point, we need an example to force us introduce new divisors. Earlier we had left number 9 for later, and now the time has come to resume it. 9 is a good example because it’s a multiple of 3 without being a multiple of 2. Let’s write a test that we’re sure will fall.
Again, let’s start with an implementation that’s very naive but works. The important thing is to pass the test, proof that we’ve implemented the specified behavior.
With the previous code, all tests are green. At this point it’s made obvious that each new divisor that we wish to introduce, such as 5, will need a repetition of the block, so let’s refactor it into a general solution.
This algorithm looks pretty general. Let’s test a couple of cases:
We’ve added two tests that pass. It seems that we’ve solve the problem, but… don’t you have the sensation of having leaped too far with this last step?
The development road in TDD isn’t always easy. The next test is sometimes very obvious, but other time we are faced with several alternatives. Choosing the wrong path can lead us to a dead end or, like it has happened here, to a point where we have to implement too much in one go. And as we’ve already discussed, the changes that we add to the production code should always be as small as possible.
In the sixth test, we decided to explore the path of “repetitions of the same factor” instead of forcing other prime factors to appear. Would it have been better to follow that ramification of the problem? Let’s try it, let’s rewind and go back to the situation as it was before that sixth test.
This is the version of the production code that we had when we arrived at the sixth test:
Now, let’s go down the other route:
The following production code let’s us pass the new test, and all the previous ones:
Now we could refactor:
To introduce more than two factors we need a test:
The necessary change is a simple one:
And we can rid ourselves of that last
if, as it’s covered by the
while that we’ve just introduced:
If we add new tests we’ll see that we can refactor any number without problems. That is to say, after this last change and its refactoring, we’ve finished the development of the class. Has this path been any better? Partly yes. We’ve come up with an almost identical algorithm, but I’d say that the journey has been smoother, the jumps in production code have been less steep, and everything has gone better.
From the traditional QA point of view, there is a series of methods to choose the test cases. However, these methods are not necessarily applicable in TDD. Remember how we started this book: QA and TDD are not the same despite using the same tools and overlapping a lot. TDD is a methodology to drive software development, and the most adequate tests to do it can be slightly different to those that we would use to verify the behavior of the finished software.
For example, our categorization of the numbers into primes and non-primes may be more than enough in QA, but in TDD, the case of non-prime numbers could be further subdivided:
- Powers of a prime factor, such as 4, 8 or 9, which involve just one prime number multiplied several times by itself.
- Products of different primes, such as 6 or 10, which involve more than one prime number.
- Products of n prime factors, with n larger than two.
Each of these categories forces us to implement different parts of the algorithm, which can set up problems that are more or less difficult to solve. Even, a bad choice could lead us to a dead end.
Nevertheless, nothing prevents us from rewinding and going back if we get stuck. When we are faced with reasonable doubt about going one way or another in TDD, it’s best to take note about the state of the development, and mark that point as a point of return in case we get ourselves into some code swamp. Just go back and think again. Making mistakes is also information.
- With this kata we’ve learned how, as we add tests and they become more specific, the algorithm becomes more general
- We’ve also seen that we get better results when we prioritize the simplest transformations (changes in the production code)
In TDD, the choice of the first test is an interesting problem. In some papers and tutorials about TDD they tend to talk about “the simplest case” and don’t elaborate further. But in reality, we should get used to looking for the smallest test that can fail, which is usually a very different thing.
All in all, it doesn’t seem like a very practical definition, so it probably deserves a more thorough explanation. Is there any somewhat objective way to decide which is the minimum test that can fail?
Suppose the Roman Numerals kata. It consists in creating a converter between decimal and roman numbers. Suppose that the class is going to be
RomanNumeralsConverter, and that the function is called
toRoman, so that it would be used more or less like this:
According to the “simplest case” approach, we could write a test not unlike this one:
Looks right, doesn’t it? However, this is not the simplest test that can fail. Actually, there are at least two simpler test that could fail, and both of them will force us to create production code.
– Let’s take a moment to think about the test that we’ve just written: what will happen if we execute it?
– It’s going to fail.
– But, why will it fail?
– Obvious: because we haven’t even defined the class. When we try to execute the test it cannot find the class.
– Can we say that the test fails for the reason that we expect it to fail?
– Hmmm. What do you mean?
– I mean that the test establishes that we expect it to able to convert the decimal 1 to the roman I. It should fail because it can’t do the conversion, not because it can’t find the class. Actually, the test can fail for at least three causes: that the class doesn’t exist, that the class doesn’t have the
toRoman method, and that it doesn’t return the result “I”. And it should only fail because of one of them.
– Are you telling me that the first test should be just instantiating the class?
– And what’s the point of that?
– That the test, when it fails, can only fail for the reason that we expect it to fail.
– I have to think about that for a moment.
– No problem. I’ll wait for you at the next paragraph.
That is the question. In spite of it being the simplest case, this first test can fail for three different reasons that make us consider the test as not-passing (remember the second law: not compiling is failing), therefore, we should reduce it so that it fails for just one cause.
As a side note: it’s true that it could fail for many other causes, such as typing the wrong name, putting the class in the wrong namespace, etc. We assume that those errors are unintentional. Also, running the test will tell us the error. Hence the importance of running the tests, seeing how they fail, and making sure they fail properly.
Let’s go a bit slower, then.
The first test should just ask that the class exists and can be instantiated.
In PhpUnit, a test without assertions fails or is at least considered risky. In order to make it pass clearly we specify that we’re not going to make assertions. In other languages this is unnecessary.
To pass the test I have to create the class. Once created, I will see the test pass and then I’ll be able to face the next step.
The second test should force me to define the desired class method, although we still don’t know what to do with it or what parameters it will need.
Again, this test is a little special in PHP. For example, in PHP and other languages we can ignore the return value of a method if it’s not typed. Other languages will require us to explicitly type the method, which in this step could be done using
void so that it doesn’t return anything at all. Other strategy would be to return an empty value with the correct type (
string in this case). There are languages, on the other hand, that require the result of the method to be used -if it’s even returned-, but they also allow you to ignore it.
An interesting issue is that once we’ve passes this second test, the first one becomes redundant: the case is already covered by this second test, and if a change in code made it fail, the second test would fail as well. You can delete it in the refactoring phase.
And now is when the “simplest case” makes sense, because this test, after the others, will fail for the right reason:
This is already a test that would fail for the expected reason: the class doesn’t have anything defined to do the conversion.
Again, once you make this test pass and you’re in the refactoring phase, you can delete the previous one. It has fulfilled its tasks of forcing us to add a change to the code. Additionally, in case that second test fails due to a change in the code, our current test will also fail. Thereby, to can also delete it.
I guess now you’re asking yourself two questions:
- Why write three tests to end up keeping the one that I had though at the beginning
- Why can I delete those tests
Let’s do this bit by bit, then.
A test should have only one reason to fail. Imagine it as an application of the Single Responsibility Principle. If a test has more than one reason to fail, chances are that we’re trying to make a test provoke many changes in the code at once.
It’s true that in testing there’s a technique called triangulation, in which, precisely, several possible aspects that must occur together are verified in order to consider that the test passes or fails. But, as we’ve said at the beginning of the book, TDD is not QA, so the same techniques are not applicable.
What we want in TDD is that the tests tell us what is the change that we have to make in the software, and this change should be as small and unambiguous as possible.
When we don’t have any written production code, the smallest thing we can do is creating a file in which we define the function or class that we’re developing. It’s the first step. And even then, there are chances that we don’t do it correctly:
- we make a mistake in the name in the file name
- we make a mistake in its location in the project
- we mistype the class’s or the function’s name
- we make a mistake when locating it in a name space
We have to prevent all of those problems just to be able to instantiate a class or be able to use a function, but this minimal test will fail if either of them happen. When correcting all the thing that can occur, we’ll make the test pass.
However, if the test can fail for more reasons, the potential sources of error will multiply, as there are more things that we need to do to make it pass. Also, some of them can depend and mix between themselves. In general, the necessary change in production code will be too large with a red test, and therefore, making it become green will be more costly and less obvious.
In TDD many tests are redundant. From the point of view of QA, we test too much in TDD. In the first place, because many times we use several examples of the same class, precisely to find the general rule that characterizes that class. On the other hand, there are tests that we do in TDD that are already included in other ones.
This is the case of these first tests that we’ve just shown.
The test that forces us to define the software unit for the first time is included in any other test we can imagine, for the simple reason that we need the class in order to be able to execute those other tests. Put into a different way, if the first test fails, then all of the rest will fail.
In this situation, the test is redundant and we can delete it.
It’s not always easy to identify redundant tests. In some stages of TDD we use examples of the same class to move the development, so we may reach a point in which some of those tests are redundant and we can delete them as they’ve become unnecessary.
On the other hand, a different possibility is to refactor the test using data providers or other similar techniques with which to cheapen the creation of new examples.
We call the flow a program happy path when there aren’t any problems and it’s able to execute the entire algorithm. The happy path occurs when no errors are generated during the process because all of the handled data are valid and their values lie within their acceptable ranges, nor are there any other failures that can affect the software unit that we’re developing.
In TDD, happy path testing consists in choosing examples that must return predictable values as a result, which we can use to test. For example, in the kata Roman Numerals, one possible happy path test would be:
Very frequently we work with happy path tests in the kata. This is so because we’re interested in focusing in the development of the algorithm and that the exercise doesn’t last too long.
On the contrary, sad paths are those program flows that end badly. When we say that they end badly, we mean that an error occurs and the algorithm cannot finish executing.
However, the errors and the ways in which production code deals with them are a part of the behavior of the software, and in real work they deserve to be considered when using the TDD methodology.
In that sense, sad path testing would be precisely the choice of test cases that describe situations in which the production code has to handle wrong input data or responses from their collaborators which we also have to manage. An example of this would be something like this:
That is: our roman numeral converter cannot handle negative numbers nor numbers with decimal digits, and therefore, in a real program we’d have to handle this situation. In the example, the consequence is throwing an exception. But it could be any other form of reaction that suits the purposes of the application.
This kata consists, originally, in creating a Value Object to represent the NIF, or Spanish Tax Identification Number. Its usual form is a string of eight numeric characters and a control letter, which helps us ensure its validity.
As it’s a Value Object, we want to be able to instantiate it from a
string and guarantee that it’s valid in order to use it without problems anywhere else in the code of an application.
One of the difficulties of developing these kinds of objects in TDD is that sometimes they don’t need to implement significant behaviors, and it’s more important to make sure that they are created consistent.
The algorithm to validate them is relatively simple, as we’ll see, but we’re mostly interested in how to rid ourselves of all of the strings of characters that can’t form a valid NIF.
This kata is original, and it came about by chance while I was preparing a small introduction to TDD and live coding workshop about the benefits of using the methodology in day-to-day work.
While I was delving into this example, two very interesting questions made themselves apparent:
- Starting with tests that discard the invalid cases allows to avoid having to deal with the development of the algorithm as soon as we start, getting them out of the way and reducing the problem space. The consequence is that we end up designing more resilient objects, with cleaner algorithms, contributing to preventing the apparition of bugs in the production code.
- The mechanism of postponing the solution of each problem until the next text becomes apparent. That is: to make each new test pass, we introduce an inflexible implementation that allows us to pass that test, but in order for the previous ones to keep passing, we are forced to refactor the code that we already had.
Create a Nif class, which will be a Value Object, to represent the Spanish Tax Identification Number. This number is a string of eight numeric characters and a final letter that acts as a control character.
This control letter is obtained by calculating the remainder of diving the numeric part of the NIF by 23 (mod 23). The result indicates us in which row of the following table to look up the control letter.
There’s a special case of NIF, which is the NIE, or Foreigners’ Identification Number. In this case, the first character will be one of the letters X, Y and Z. For the calculation of mod 23, they are replaced by the values 0, 1 and 2, respectively.
This kata can help us learn several things, both about TDD and about data types and validation.
In kata, it’s common to ignore issues such as data validation in order to simplify the exercise and focus in the development of the algorithm. In a real development we cannot do this: we should actually put a lot of emphasis on validating the data at different levels, both for security reasons and to avoid errors in the calculations.
So we’ve included this kata precisely to practice how to use TDD to develop algorithms that first handle all of the values that they cannot manage, both from the structural point of view as well as from the domain one.
Specifically, this example is based on the fact that the effective behavior of the constructor that we’re going to create is assigning the value that we pass to it. All else it does is check that the value is suitable for that, so it acts as a barrier for unwanted values.
Being a Value Object, we’ll try to create a class to which we pass the candidate string in the constructor. If the string turns out to be invalid for a NIF, the constructor will throw an exception, preventing the instantiation of objects with inadequate values. Fundamentally, our first tests will expect exceptions or errors.
From the infinite amount of strings that this constructor could receive, only a few of them will be valid NIF, so our first goal could be to delete the most obvious ones: those that could never fit because they have the wrong number of characters.
In a second stage, we’ll try to control those that could never be NIF due to their structure, basically due to them not following the “eight numeric character plus one final letter” pattern (taking into account the exception of the NIE, which could indeed have a letter at the beginning).
With this, we’d have everything we need to implement the validation algorithm, as we’d only have to handle
strings that could be NIF from a structural point of view.
One thing that the previous steps guarantees us is that the tests won’t start failing when we introduce the algorithm, as its examples could never be valid. If we started using
strings that had that valid NIF structure, even if we’d written them randomly, we could run out into one string that was valid by chance, and when implementing the corresponding part of the algorithm that test would fail for the wrong reason.
In this kata we’re going to follow an approach that tackles the sad paths first, that is, we’re going to handle the cases that would cause an error first. Thus, we’ll first develop the validation of the input structure, and then move on to the algorithm.
It’s usual that the kata ignore issues such as validation, but in this case we’ve decided to go for a more realistic example, in the sense that it’s a situation with which we have to deal quite often. In the code of a project in production, the validation of input data is essential, and it’s worth practicing with an exercise that focuses almost exclusively on it.
Besides, we’ll see a couple of interesting techniques to transform a public interface without breaking the tests.
Create a Nif class, which will be a Value Object to represent the Spanish Tax Identification Number. It’s a string of eight numeric characters, with a final letter that acts as a control character.
This control letter is obtained by calculating the remainder of diving the numeric part of the NIF by 23 (mod 23). The result indicates us in which row of the following table to look up the control letter. In this table I’ve also included some simple examples of valid NIF so you can use them in the tests.
|Numeric part||Remainder||Letter||Valid NIF example|
You can create invalid NIF simply by choosing a numeric part and adding a letter that doesn’t correspond it.
There’s an exception: the NIF for foreigners (or NIE) may start by the letters
Z, which for the purposes of the calculations are replaced by the numbers
2, respectively. In this case,
X0000000T is equivalent to
To prevent confusion we’ve excluded the letters
A string that starts with a letter other than
Z, or that contains alphabetic characters in the central positions is also invalid.
We’re going to solve this kata using Go, so we’re going to clarify its result a bit. In this example we’re going to create a data type
Nif, which will basically be a
string, and a factory function
NewNif which will allow us to build validated NIF starting from an input
On the other hand, testing in Go is also a bit peculiar. Even though the language includes support for testing as a standard feature, it doesn’t include common utilities such as
To solve this kata I’m going to take advantage of the way in which Go handles errors. These can be returned as one of the responses of a function, which forces you to always handle them explicitly.
Designing tests based on error messages is not a good practice, as they can easily change, making tests fail even when there hasn’t really been an alteration of the functionality. However, in this kata we’re going to use the error messages as a sort of temporary wildcard on which to rely, making them go from more specific to more general. By the end of the exercise, we’ll be handling only two possible errors.
In this kata, we want to start by focusing on the sad paths, the cases in which we won’t be able to use the argument that’s been passed to the constructor function. From all the innumerable string combinations that the function could receive, let’s first give an answer to those that we know won’t be of use because they don’t meet the requirements. This answer will be an error.
We’ll start by rejecting the strings that are too long, those that have more than nine characters. We can describe this with the following test:
In the nif/nif_test.go file
For now we’ll ignore the function’s responses, just to force ourselves to implement the minimum amount of code.
As expected, the test will fail because it doesn’t compile. So we’ll implement the minimum necessary code, which can be as small as this:
With this, we get a foundation on which to build.
Now we can go a step further. The function should accept a parameter:
We make the test pass again with:
And finally return:
- the NIF, when the one we’ve passed is valid.
- an error in the case it’s not possible.
In Go, a function can return multiple values and, by convention, errors are also returned as the last return value.
This provides a flexibility that is not common to find in other languages, and let us play with some ideas that are at least curious. For example, for now we’re going to ignore the response on the function and focus exclusively on the errors. Our next test is going to ask the function to return only the error without doing anything with it. The
if is there, for now, to keep the compiler from complaining.
This test tells us that we must return something, so for now we indicate that we’re going to return an error, which can be
Let’s go a step further by expecting a specific error when the condition defined by the test is met: the string is too long. With this, we’ll have a proper first test:
Again, the test will fail, and to make it pass we return the error unconditionally:
And with this, we’ve already completed our first test and made it pass. We could be a little more strict in the handling of the response to contemplate the case in which
nil, but it’s something that doesn’t have to affect us for the time being.
At this point, I’d like to draw your attention to the fact that we’re not solving anything yet: the error is returned unconditionally, so we’re postponing this validation for later.
Our second test has the goal of forcing the implementation of the validation we’ve just postponed. It may sound a little weird, but it showcases that one the great benefits of TDD is the ability to postpone decisions. By doing so we’ll have a little more information, which is always an advantage.
This test is very similar to the previous one:
This test already forces us to act differently in each case, so we’re going to implement the validation that limits the strings that are too long:
Again, I point out that at the moment we’re not implementing what the test says. We’ll do it in the next cycle, but the test is fulfilled by returning the expected error unconditionally.
There’s not much else we can do in the production code, but looking at the tests we can see that it would be possible to unify their structure a bit. After all, we’re going to make a series of them to which we pass a value and expect a specific error in response.
In Go there is a test structure similar to the one provided by the use of Data Providers in other languages:
With this, it’s now very easy and fast to add tests, especially if they are from the same family, like in this case in which we pass invalid candidate strings and check for the error. Also, if we make changes to the constructor’s interface, we only have a place in which to apply them.
With this, we’d have everything ready to continue developing.
With the two previous tests we verify that the string that we’re examining meets the specification of having exactly nine characters, although that’s not implemented yet. We’ll do it now.
However, you may be asking yourself why don’t we simply test that the function rejects the
strings that don’t fulfill it, something that we could do in just one test.
The reason is that there are actually two possible ways in which the specification may not be met: the
string has more than nine characters, or the
string has less. If we do a single test, we’ll have to choose one of the two cases, so we cannot guarantee that the other is fulfilled.
In this specific example, in which we’re interested in just one value, we could raise the dichotomy between
strings with length nine and
strings with lengths other than nine. However, it’s common for us to have to work with interval of values that, moreover, can be open or closed. In that situation, the strategy of having two or even more tests is far safer.
In any case, in the point at which we are, we need to add another requirement in the form of a test in order to drive the development. The two existing tests define the
string’s valid length. The next test asks about its structure.
And with the refactoring that we’ve just made, adding a test is extremely simple.
We’ll start at the beginning. Valid NIF begin with a number, except a for a subset of the that begin with one of the letters
Z. One way of defining the test is the following:
To pass the test, we first solve the pending problem of the previous one:
Here we have a pretty clear refactoring opportunity that would consist in joining the conditionals that evaluate the lengths of the
string. However, that will cause the test to fail since we would at least have to change an error message.
One possibility is to temporarily “skip” our self-imposed condition of only doing refactorings with all tests in green, and making changes in both production and test code at the same time. Let’s see what happens.
The first thing would be to change the test so it expects a different error message, which will be more generic and the same for all of the cases that we want to consolidate in this step:
This will cause the test to fail. An issue that can be solved by changing the production code in the same way:
The test passes again and we are ready to refactor. But we’re not going to do that.
Other option is to make a temporary refactoring in the test in order to make it more tolerant. We just make it possible to return a more generic error apart from the specific one.
This change allows us to make the change in production without breaking anything:
The test keeps passing, and now we can perform the refactoring.
Unifying the conditionals is easy now. This is the first step, which I include here to have a reference of how to do this in case we were working with an interval of valid lengths.
But it can be done better:
And a little more expressive:
Finally, a new refactoring of the test to contemplate these changes. We remove our temporary change, although it’s possible that we need it again in the future.
Note that we’ve been able to make all these changes without the tests failing.
The code is pretty compact, so we’re going to add a new test that lets us move forward with the validity of the structure. The central fragment of the NIF is composed only of numbers, exactly seven:
We run it to make sure that it fails for the right reason. To pass the test, we have to solve the previous test first, so we’ll add code to verify that the first symbol is either a number or a letter in the
Z set. We’ll do it with a regular expression:
This code is enough to pass the test, but we’re going to make a refactoring.
It makes sense that, instead of matching a regular expression that excludes the non-valid strings, we match an expression that detects them. If we do that, we’ll have to invert the conditional. To be honest, the change is pretty small:
We’re reaching the end of the structural validation of the NIF, we need a test that tells us which candidates to reject depending on its last symbol, which leads us to solving the pending problem from the previous test:
From the four non-valid letters we take the
U as an example, but it could be
However, to make the test pass, what we do is make sure that the previous one is fulfilled. It’s easier to implement that separately:
This passes the test, and we are met by a familiar situation which we’ve already solved before: we have to make the errors more generic with the temporary help of some extra control in the test:
We change the error messages in the production code:
Now we unify the regular expression and the conditionals:
We can still make a small but important change. The last part of the regular expression,
.*, is there to fulfill the requirement of matching the whole string. However, we don’t really need the quantifier as one character is enough:
And this reveals a detail, the regular expression only matches strings that have exactly nine characters, so the initial length validation is unnecessary:
We’ve walked so far… only to retrace our steps. However, we didn’t know this in the beginning, and that’s where the value of the process lies.
Lastly, we change the test to reflect the changes and, again, remove our temporary support:
We need a new test to finish the structural validation part. The existing tests guarantee that the
strings are correct, but the following validation already involves the algorithm that calculates the control letter.
This test should ensure that we can’t use a structurally valid NIF with an incorrect control letter. When we presented the kata we gave some examples, such as
00000000S. This is the test:
And here is the code that passes it:
And now, of course, it’s time to refactor.
This refactoring is pretty obvious, but we have to temporarily protect the test again:
We make the error more general to be able to unify the regular expressions and the conditionals:
And now we join them while the tests keep passing:
With this we finish the structure validation, and we’d have the implementations of the mod23 algorithm left. But to do that we need a little change of approach.
The algorithm is, in fact, very simple: calculate the remainder (of dividing by 23), and use it as an index to look up the corresponding letter in a table. Implementing it in only one iteration would be easy. However, we’re going to do it more slowly.
Until now, our tests have been pessimistic: they expected incorrect NIF results in order to pass. But now, our new test must be optimistic, that is, they’re going to expect that we pass them valid NIF examples.
At this point, we’ll introduce a change. You may remember that for now we’re only returning the error, but the final interface of the function will return the validated
string as a NIF type that we shall create for the occasion.
That is, we must change the code so that it returns something, and that something has to be of a type that doesn’t exist yet.
To make this change without breaking the test, we’re going to make use of a somewhat contrived refactoring technique.
In the first place, we extract the body of
NewNif to another function:
The tests keep on passing. Now, we introduce a variable:
With this, we can make it so that
FullNewNif returns the
string without affecting the test, because it stays encapsulated within
The tests still pass, and we’re almost finished. In the test, we change the usage of
And they’re still passing. Now, the function returns the two parameters that we wanted and the tests remain unbroken. We can proceed to remove the original
And use the IDE tools to change the function name from
Our goal now is to push the implementation of the mod23 algorithm. This time, the tests expect the
string to be valid. Also, we want to force the return of
Nif type objects instead of
As a first step, we change the production code to introduce and use the
Now the test will fail because we haven’t validated anything yet. To make it pass we add a conditional:
A note about Go: custom types can’t have
nil value, they should have an empty value instead. For this reason, we return an empty
string in the case of an error.
For now we don’t have many reasons to refactor, so we’re going to introduce a test that should help us move forward a bit. In principle, we want it to drive us to separate the numeric part from the control letter.
One possibility would be to test another NIF that ends with the letter
T, such as
To pass the test, we could do this simple implementation:
And now we start refactoring.
In the production code, we can take a look at what’s different and what’s common between the examples. Both of them have
T as their control letter, and their numeric part is divisible by 23. Therefore, their mod23 will be 0.
Now we can perform the refactoring. A first step.
And, after seeing the tests pass, the second step:
With this change, the tests pass, and the function accepts all of the valid NIF that end with a
In this kind of algorithm there isn’t much of a point in trying to validate all of the control letters, but we can introduce another one to force ourselves to understand how the code should evolve. We’ll try a new one:
This test is already failing, so let’s make a very simple implementation:
This already gives us an idea about what we’re getting at: a map between letters and the remainder after dividing by 23. However, in many languages
strings can work as
arrays, so it would be sufficient to have a
string with all the control letters properly sorted, and access the letter that’s in the position indicated by the modulus.
First we implement a simple version of this idea:
We have our first version! We’ll add the full letter list later, but for now we can try to fix up the current code a little. First, we make
Actually, we could extract all of the modulus calculation part to another function. First we rearrange the code to better control the extraction:
Remember to verify that the tests keep passing. Now we extract the function:
And we can compact the code a little bit further while we add the rest of the control letters. At first sight it could look like “cheating”, but in the end it’s nothing more than generalizing an algorithm that could be enunciated as “take the letter that’s in the position given by the mod23 of the numeric part”.
With this we can already validate all of the NIF excepting the NIE, which begin with the letters
Now that we’ve implemented the general algorithm let’s try to handle its exceptions, which aren’t all that many. NIE begin with a letter that, to all effects of the calculation, gets replaced by a number.
The test that seems to be the most obvious at this point is the following:
X0000023T is equivalent to
00000023T, will this affect the result of the test?
We run the test and… surprise? The test passes. This happens because the conversion that we do in this line generates an error that we’re currently ignoring, but causes the numeric part to still be equivalent to 23 (whose mod23 is 0 and should be paired with the letter
In other languages the conversion doesn’t fail, but assumes the X as 0.
In any case, this opens up two possible paths:
- remove this test, refactor the production code to treat the error, and see that it fails when we put it back
- test other example that we know will fail (
Y0000000Z) and make the change later
Possibly, in this case the second option would be more than enough, since our structural validations would be assuring that the error couldn’t appear once the function was completely developed.
However, it could interesting to introduce the handling of the error. Managing errors, including those that could never happen, is always good practice.
So, let’s cancel the test and introduce a refactoring to handle the error:
Here’s the refactor. In this case, I handle the error causing a
panic, which is not the best way of managing an error, but allows us to make the test fail and to force ourselves to implement the solution.
If we run the tests we can check that they’re still green. But, if we reactivate the last test, we can see it fail:
And this already forces us to introduce a special treatment for these cases. It’s basically replacing the
X with a
It can be refactored by using a
At this point, we could make a test to force us to introduce the rest of the replacements. It’s cheap, although it’s ultimately not very necessary for the reason we discussed earlier: we could interpret this part of the algorithm as “replacing the initial letters X, Y and Z with the numbers 0, 1 and 2, respectively”.
We only need to add the corresponding pairs:
After a short while of refactoring, this would be a possible solution:
- Use sad paths to move development
- Use table tests in Go to reduce the cost of adding new tests
- A technique to change the returned errors by a more general one without breaking the tests
- A technique to change the public interface of the production code without breaking the tests
In earlier chapters we mentioned the laws of TDD. Originally these laws were two, in Kent Beck’s formulation:
- Don’t write a line of new code unless you first have a failing automated test.
- Eliminate duplication.
Essentially, what Kent Beck proposed, was to first define a small part of the specification through a test, implement a very small algorithm that satisfies it, and then, revise the code in search of duplication cases to refactor into a more general and flexible algorithm.
And this is, more or less, the way Martin Fowler defined the Red-Green-Refactor cycle:
- Write a test for the next piece of functionality that you wish to add.
- Write the production code necessary to make the test pass.
- Refactor the code, both the new and the old, so that all’s well structured.
This statement seems to assume that the refactoring is, so to speak, the end of each stage of the process. But, paradoxically, if we interpret the cycle literally, we’ll fall into a bad practice.
In general, in Test Driven Development it’s favored that both tests and changes in production code are as small as possible. This minimalist approach is beneficial because it allows us to work with a light cognitive load in each cycle, while we learn and reach a more extensive and deeper comprehension of the problem, postponing decisions to a moment at which we’re knowledgeable enough to face them.
Usually, our small TDD steps let us make very simple code changes every time. Many times these changes are obvious and lead us to implementation that we could consider naive. However, as simple or rough they might seem, these implementations do pass the tests, and therefore meet the specifications. We could ship this code if we needed to, because the behavior has been developed.
And once we make the last test pass and all of them are green, we’re in good condition to refactor. This green state gives us freedom to change the shape of the implementation, assuring that we’re not accidentally changing the achieved functionality.
The refactoring phase is there, precisely, to evolve those naive implementations and turn them into better designs, making use of the safety net provided by the passing tests.
In each cycle there are many possible refactorings. Obviously, during the first phases they will be smaller, and we might even think that they’re unnecessary. However, it’s wise to take the opportunity when it presents itself.
We can perform many types of refactorings, such as:
- Replace magic numbers with constants.
- Change variable and parameter names to better reflect their intentions.
- Extract private methods.
- Extract conditionals to methods when they become complex.
- Flatten nested conditional structures.
- Extract conditional branches to private methods.
- Extract functionality to collaborators.
Sometimes, an excess of refactoring can lead us to an implementation that’s too complicated and prevents us from advancing the TDD process. This happens when we introduce patterns prematurely without having finished the development first. It would be a premature refactoring similar to the premature optimization, generating code that’s hard to maintain.
We could say that there are two kinds of refactoring involved:
- One kind with limited reach, applicable in each red-green-refactor cycle, whose function is to make the algorithm more legible, sustainable, and capable to evolve.
- The other kind, which will take place once we’ve completed all of the functionality, and whose objective is to introduce a more evolved and pattern-oriented design.
Another interesting question in the introduction of language-exclusive features, which in principal we’d also like to leave until that final phase. Why leave them for that moment? Precisely, because they can limit our capability to refactor a code if we’re not yet sure about towards where it could evolve.
For example, this construction in Ruby:
It could be refactored -in fact it’s recommended to do so- in this way. I think it’s really beautiful:
In this case, the structure represents the idea of assigning a default value to the variable, something that we could also achieve in this way, which is common in other languages:
The three variations make the tests pass, but each of them puts us in a slightly different spot regarding future requirements.
Por example, let’s assume that our next requirement is to be able to introduce several names. One possible solution would be to use splat parameters, that is, let the function admit an undefined number of parameters that will later be presented in the method as an
array. In Ruby this is expressed like this:
This declaration, for example, is incompatible with the third variant, as the splat operator doesn’t admit a default value and we’d have to re-implement that step, which would lead us back to using one of the other variants.
In principle this doesn’t seem like that big of an inconvenience, but it means undoing all of the logic determined by that structure, and depending on the development stage that we’re in, it can even lead us to dead ends.
The other options are a little less inconvenient. Apart from changing the signature, the only thing we have to change is the question (
empty?) and the default value, which instead of a
string, becomes an
strings. Of course, to finish we have to
join the collection in order to show it in the greeting.
Or the rubified version:
Apart from that, at this point it would be necessary to refactor the name of the parameter so it more clearly reflects it new meaning:
So, as a general recommendation, it’s convenient to seek equilibrium between the refactors that help us maintain the code clean and legible, and those that we could consider over-engineering. An implementation that is not the most refined could be easier to change in the long run as more and more tests get introduced than a very evolved one.
Don’t over-refactor ahead of time.
To refactor, the sine qua non condition is that all of the existing tests are passing. At this moment we’re interested in analyzing the state of the implementation and applying the most appropriate refactorings.
If a test is red, it’s telling us that a part of the specification hasn’t been achieved yet, and therefore, we should be working on doing that instead of refactoring.
But there’s a special case: when we add a new failing test and we realize that we need to do some previous refactoring to be able to implement the most obvious or simple solution.
What do we do in this case? Well, we have to take a step back.
Let’s assume a simple example. we’re going to start the Test double greeting kata. We begin with a test with which to define the interface:
Our next step is to create the simplest implementation that passes the test, which we could achieve like this:
The next requirement is to handle the situation where a name is not provided, in which case it should offer some sort of anonymous formula such as the one we use as an example in this test:
In the first place, the test fails because the argument isn’t optional. But on top of that, it’s not even used in the current implementation, and we need to use it to fulfill this test’s most obvious requirement. We have to execute several preparatory steps before we’re able to carry out the implementation:
- Make the
- Use the parameter in the return value
The thing is that the new requirement provides us with new information that would be useful to refactor what’s already been developed with the first test. However, as we have a failing test, we shouldn’t be doing any refactoring. For this reason we delete or cancel the previous test, for example, by commenting it out:
By doing this, we’re back to having all tests in green and we can apply the necessary changes, which don’t alter the behavior that’s been implemented so far.
We make the name parameter optional.
And here, we start using the parameter:
This has allowed us to advance from our first rough implementation to another that’s flexible enough to still pass the first test, while setting up some better conditions to reintroduce the next one:
Obviously the test fails, but this time the reason for failure is, precisely, that we’re missing the code that solves the requirement. Now, the only thing we have to do is check if we’re receiving a name or not, and act consequently.
In a sense, it turns out that the information from the future, that is, the new test that we design to introduce the next piece of functionality affects the past, that is, the adequate state of the code that we need to continue. This forces us to consider the depth of the necessary refactor before facing the new cycle. In this situation, it’s best to go back to the last passing test, cancelling the new on, and work on the refactoring until we’re better prepared to keep going.
In the previous kata, in general, the TDD cycles we’re execute in quite a fluent manner.
However, you may have noticed that sometimes, making a new test pass involved doing a certain refactoring in the production code before we were able to face the necessary changes to make the test pass.
The kata that we’re about to practice, apart from being one of the classic ones, has a peculiarity: almost every new piece of functionality that we add, every new test, requires a relatively large refactoring of the algorithm. This creates a dilemma: we can’t refactor if the test is not green.
Or, put in another way: sometimes we’ll run into a situation where a new test provides us with some new information that we didn’t have before, and shows us a refactoring opportunity that we have to take before implementing the new piece of functionality.
For this reason, with the Bowling Game kata we’ll learn how to handle this situation and take a step back to refactor the production code using what we learn when we think about the new test.
In a sense, the information from the future will help us change the past.
The Bowling kata is very well known. We owe it to Robert C. Martin, although there’s a very popular version by Ron Jeffries in the book Adventures in C#.
The kata consists in creating a program to calculate the scores of a Bowling game, although to avoid complicating it too much, only the final result is calculated without performing any validations.
If you’re not familiar with the game and its scoring system, here are the rules that you need to know:
- In each game, the player has 10 turns called frames.
- In each frame, the player has two tries, or rolls, to knock down the 10 pins (which results in a total of 20 ball rolls throughout the whole game).
- In each roll, the knocked down pins are counted.
- If no pin was knocked down, that’s a Gutter.
- If the player hasn’t knocked all of the bowling pins by the end of their second roll, the score of the frame is just the sum of both rolls. For example 3 + 5 = 8 points in the frame.
- If the player knocks down all 10 pins in the frame (for example 4 + 6), that’s called spare, and grants a bonus equal to the score of the next roll, the first one of the next frame (10 from the current frame, plus 3 from the next throw, for example, equalling 13). That is, the final score of a spare is calculated after the following roll, and in a sense, that roll is counted twice (once as a bonus, and a second time as a regular roll).
- If the player knocks down all 10 pins in a single roll, that’s a strike. In that case, the bonus is equal to the score of the whole next frame (for example, 10 + (3 + 4) = 17). After a strike, the frame ends without a second roll.
- In the case that this happens in the tenth and last frame, there may be one or two extra rolls as necessary.
The Bowling Game is an interesting kata because of the challenge of handling the spares and the strikes. When we detect one of these cases we have to look up the result of the following rolls, and therefore we need to keep track of the history of the match.
This will force us to change the algorithm several times in quite a radical way, which leads us to the problem of how to manage the changes without breaking the TDD cycles, that is, refactoring the production code while keeping the tests green.
To better understand what we’re talking about, a situation in which we might find ourselves would be the following:
After a couple of cycles, we start testing the spare case. At this point, we realize that we need to make a relatively large change to the way that we were calculating the total score. Ultimately, what happens is that we have to refactor at while having a test that’s not passing. But, this contradicts the refactoring phase definition, which requires all tests to be passing.
The solution, fortunately, is very simple: take a step back.
Once we know that we want to refactor the algorithm, it’s enough to comment out the new test to deactivate, and keeping the last test green, refactor the production code. When we’re done, we bring the new test back to life, and develop the new behavior.
The kata consists in creating a program to calculate the scores of a Bowling game, although to avoid complicating it too much, only the final result is calculated without performing any validations.
A brief remainder of the rules:
- Each game has 10 frames, each one with 2 rolls.
- In each turn, the knocked down bowls are counted, and that number is the score
* 0 points is a gutter
* If all the pins are knocked down in two rolls, it’s called a spare, and the score of the next roll is added as a bonus
* If all the pins are knocked down in just one roll, it’s called a strike, and the score of the next two rolls is added as a bonus
- If a strike or spare are achieved in the last frame, there are extra rolls.
To do this kata, I’ve chose Ruby and RSpec. You may notice that I have a certain preference towards the *Spec family testing frameworks. The thing is that they have been designed with TDD in mind, considering the tests as specifications, which helps a lot to escape the mindset of thinking about the tests as QA.
Having said that, there’s no problem in using any other testing framework, such as those from the *Unit family.
On the other hand, we’ll use object oriented programming.
At this point, the first test should be enough to force us to define and instantiate the class:
The test will fail, forcing us to write the minimum production code necessary to make it pass.
And once we’ve made the test pass, we move the class to its own file, and make the test require it:
We’re ready for the next test.
BowlingGame to be useful, we’ll need at least two things:
- A way to indicate the result of a roll, passing the number of knocked down pins, which would be a command A command results in an effect in the state of an object, but doesn’t return anything. We’ll need an alternative way to observe that effect.
- A way to obtain the score at a given moment, which would be a query. A query returns an answer, so we can verify that it’s the one that we expect.
You may be wondering: which of the two should we tackle first?
There is not a fixed rule, but a of seeing it could be the following:
Query methods return a result, so their effect can be tested, but we have to make sure that the returned responses won’t make it harder for us to create new failing tests.
On the other hand, command methods are easy to introduce with a minimum amount of code without having to worry about their effect in future tests, except from making sure that the parameters that they receive are valid.
So, we’re going to start by introducing a method to throw the ball, which simply expects to receive the number of knocked down pins, which can be 0. But to force that, we must first write a test:
And the minimum necessary code to make the test pass is, simply, the definition of the method. Basically, we can now communicate to
BowlingGame that we’ve thrown the ball.
In this kata, we’ll pay special attention to the refactoring phase. We have strike find a balance so that certain refactorings don’t condition our chances to make the code evolve. In the same way that premature optimization is a smell, premature over-engineering also is.
The production code doesn’t offer any refactoring opportunity yet, but the tests start showing a pattern. The
game object could live as an instance variable, and be initialized in a
setup method of the specification or test case. Here, we use