Leanpub: Publish Early, Publish Often

Need for Specifications

Why Specifications?

Many of the nastiest bugs in programs arise because of misunderstandings about behaviour at the interface between two pieces of code. Although every programmer has specifications in mind, not all programmers write them down. As a result, different programmers on a team have different specifications in mind. When the program fails, it’s hard to determine where the error is. Precise specifications in the code let you apportion blame (to code fragments, not people!), and can spare you the agony of puzzling over where a fix should go.

Specifications are good for the client of a method because they spare the task of reading code. If you’re not convinced that reading a spec is easier than reading code, take a look at some of the standard Java specs and compare them to the source code that implements them. ArrayList, for example, in the package java.util, has a very simple spec but its code is not at all simple.

Specifications are good for the implementer of a method because they give the freedom to change the implementation without telling clients. Specifications can make code faster, too. Sometimes a weak specification makes it possible to do a much more efficient implementation. In particular, a precondition may rule out certain states in which a method might have been invoked that would have incurred an expensive check that is no longer necessary.

The contract acts as a firewall between client and implementor. It shields the client from the details of the workings of the unit — you don’t need to read the source code of the procedure if you have its specification. And it shields the implementor from the details of the usage of the unit; he doesn’t have to ask every client how she plans to use the unit. This firewall results in decoupling, allowing the code of the unit and the code of a client to be changed independently, so long as the changes respect the specification — each obeying its obligation.

Can we eliminate the need for specifications by allowing everyone access to all source code?

Source code is complicated and provides more details than needed. Understanding or even reading every line of code is an excessive burden.

Suppose you had to read the source code of Java libraries in order to use them!
The same applies to developers of different parts of the libraries.

A client cares only about what the code does, not how it does it.

Source code is ambiguous even though it may appear unambiguous and concrete.

Which details of code’s behavior are essential, and which are incidental?
Code invariably gets rewritten.
Client needs to know what they can rely on.
What properties will be maintained over time?
What properties might be changed by future optimization, improved algorithms, or just bug fixes?
Implementer needs to know what features the client depends on, and which can be changed.

The Role of a Specification

With a specification, + the client (the user of a method or a class) agrees to rely only on information in the description/specification of the method or class for their part, and + the implementer promises to support everything in the specification, and outside of the specification has perfect liberty to make implementation decisions.

Specifications facilitate change by reducing the Medusa effect: it is the specifications that should be turned to stone rather than the implementation.

Sadly, much code that is written lacks clear specifications. Clients have to work out what a method/class does by using it and observing the results, and this leads to bugs because programs have unclear dependencies (that reduce simplicity and flexibility).

Behavioural equivalence

Consider these two methods. Are they the same or different?

 1 static int findA(int[] a, int val) {
 2     for (int i = 0; i < a.length; i++) {
 3         if (a[i] == val) return i;
 4     }
 5     return a.length;
 6 }
 7 
 8 static int findB(int[] a, int val) {
 9     for (int i = a.length -1 ; i >= 0; i--) {
10         if (a[i] == val) return i;
11     }
12     return -1;
13 }

Of course the code is different, so in that sense they are different. Our question is whether we could substitute one implementation for the other. Not only do these methods have different code, they actually have different behaviour:

when val is missing, findA returns the length of a and findB returns -1;
when val appears twice, findA returns the lower index and findB returns the higher.

But when val occurs at exactly one index of the array, the two methods behave the same. It may be that clients never rely on the behaviour in the other cases. So the notion of equivalence is in the eye of the beholder, that is, the client. In order to make it possible to substitute one implementation for another, and to know when this is acceptable, we need a specification that states exactly what the client depends on.

In this case, our specification might be: requires: val occurs in a effects: returns index i such that a[i] = val

The Structure of a Specification

A specification of a method consists of several clauses:

a precondition, indicated by the keyword requires
a postcondition, indicated by the keyword effects

The precondition is an obligation on the client (i.e., the caller of the method). It’s a condition over the state in which the method is invoked. If the precondition does not hold, the implementation of the method is free to do anything (including not terminating, throwing an exception, returning arbitrary results, making arbitrary modifications, etc.).

The postcondition is an obligation on the implementer of the method. If the precondition holds for the invoking state, the method is obliged to obey the postcondition, by returning appropriate values, throwing specified exceptions, modifying or not modifying objects, and so on.

The overall structure is a logical implication: if the precondition holds when the method is called, then the postcondition must hold when the method completes.

Specifications in Java

Some languages (notably Eiffel) incorporate preconditions and postconditions as a fundamental part of the language, as expressions that the runtime system (or even the compiler) can automatically check to enforce the contracts between clients and implementers.

Java does not go quite so far, but its static type declarations are effectively part of the precondition and postcondition of a method, a part that is automatically checked and enforced by the compiler. The rest of the contract — the parts that we can’t write as types — must be described in a comment preceding the method, and generally depends on human beings to check it and guarantee it.

Java has a convention for documentation comments, in which parameters are described by @param clauses and results are described by @return and @throws clauses. You should put the preconditions into @param where possible, and postconditions into @return and @throws.

A specification like this: static int find(int[] a, int val) requires: val occurs exactly once in a effects: returns index i such that a[i] = val … might be rendered in Java like this:

1 /**
2  * Find value in an array.
3  * @param a array to search; requires that val occurs exactly once in a
4  * @param val value to search for
5  * @return index i such that a[i] = val
6  */
7 static int find(int[] a, int val)

The Java API documentation is produced from Javadoc comments in the Java standard library source code. Documenting your specifications in Javadoc allows Eclipse to show you (and clients of your code) useful information, and allows you to produce HTML documentation in the same format as the Java API docs.

`Null` references

In Java, references to objects and arrays can also take on the special value null, which means that the reference doesn’t point to an object. Null values are an unfortunate hole in Java’s type system.

You can assign null to any non-primitive variable:

1 String s = null;
2 int a[] = null;

and the compiler happily accepts this code at compile time. But you’ll get errors at runtime because you can’t call any methods or use any fields with one of these references:

1 s.length()    // throws NullPointerException  
2 a.length      // throws NullPointerException

Note, in particular, that null is not the same as an empty string "" or an empty array. On an empty string or empty array, you can call methods and access fields. The length of an empty array or an empty string is 0. The length of a string variable that points to null throws a NullPointerException.

Null values are troublesome and unsafe, so much so that you’re well advised to remove them from your design vocabulary. For good Java programming — null values are implicitly disallowed as parameters and return values. So every method implicitly has a precondition on its object and array parameters that they be non-null. Every method that returns an object or an array implicitly has a postcondition that its return value is non-null. If a method allows null values for a parameter, it should explicitly state it, or if it might return a null value as a result, it should explicitly state it. But these are in general not good ideas.

Avoid null.

There are extensions to Java that allow you to forbid null directly in the type declaration, e.g.:

1 static boolean addAll(@NonNull List<T> list1, @NonNull List<T> list2)

where it can be checked automatically at compile time or runtime.

What a Specification May Talk About

A specification of a method can talk about the parameters and return value of the method, but it should never talk about local variables of the method or private fields of the method’s class. You should consider the implementation invisible to the reader of the spec.

In Java, the source code of the method is often unavailable to the reader of your spec, because the Javadoc tool extracts the spec comments from your code and renders them as HTML.

Testing and Specifications

In testing, we talk about black box tests that are chosen with only the specification in mind, and glass box tests that are chosen with knowledge of the actual implementation. But it’s important to note that even glass box tests must follow the specification. Your implementation may provide stronger guarantees than the specification calls for, or it may have specific behaviour where the specification is undefined. But your test cases should not count on that behaviour. Test cases must obey the contract, just like every other client.

For example, suppose you are testing this specification of find:

1 static int find(int[] a, int val)
2   requires: val occurs in a
3   effects:   returns index i such that a[i] = val

This spec has a strong precondition in the sense that val is required to be found; and it has a fairly weak postcondition in the sense that if val appears more than once in the array, this specification says nothing about which particular index of val is returned. Even if you implemented find so that it always returns the lowest index, your test case can’t assume that specific behaviour:

1 int[] array = new int[] { 7, 7, 7 };
2 assertEquals(0, find(array, 7));  // bad test case: violates the spec
3 assertEquals(7, array[find(array, 7)]);  // correct

Similarly, even if you implemented find so that it (sensibly) throws an exception when val isn’t found, instead of returning some arbitrary misleading index, your test case can’t assume that behaviour, because it can’t call find() in a way that violates the precondition.

So what does glass box testing mean, if it can’t go beyond the spec? It means you are trying to find new test cases that exercise different parts of the implementation, but still checking those test cases in an implementation-independent way.

Specifications for Mutating Methods

We will later discuss mutable vs. immutable objects, but here we will briefly mention the role of specification when dealing with mutability. Our specifications of find didn’t give us the opportunity to illustrate how to describe side-effects — changes to mutable data — in the postcondition.

Here’s a specification that describes a method that mutates an object:

1 static boolean addAll(List<T> list1, List<T> list2)
2   requires: list1 != list2
3   effects:   modifies list1 by adding the elements of list2 to the end of it, 
4   		     and returns true if list1 changed as a result of call

We’ve taken this, slightly simplified, from the Java List interface.

First, look at the postcondition. It gives two constraints: the first telling us how list1 is modified, and the second telling us how the return value is determined.

Second, look at the precondition. It tells us that the behaviour of the method if you attempt to add the elements of a list to itself is undefined. You can easily imagine why the implementor of the method would want to impose this constraint: it’s not likely to rule out any useful applications of the method, and it makes it easier to implement. The specification allows a simple implementation in which you take an element from list2 and add it to list1, then go on to the next element of list2 until you get to the end.

If list1 and list2 are the same list, this algorithm will not terminate — an outcome permitted by the specification.

Remember also our implicit precondition that list1 and list2 must be valid objects, rather than null. We’ll usually omit saying this because it’s virtually always required of object references.

Here is another example of a mutating method:

1 static void sort(List<String> lst)
2   requires: nothing
3   effects:   puts lst in sorted order, i.e. lst[i] <= lst[j] for all 0 <= i < j \
4 < lst.size()

And an example of a method that does not mutate its argument:

1 static List<String> toLowerCase(List<String> lst)
2   requires: nothing
3   effects:   returns a new list t where t[i] = lst[i].toLowerCase()

Just as we’ve said that null is implicitly disallowed unless stated otherwise, we will also use the convention that mutation is disallowed unless stated otherwise. The spec of toLowerCase could explicitly state as an effect that “lst is not modified”, but in the absence of a postcondition describing mutation, we demand no mutation of the inputs.

Up next

Exceptions