Leanpub: Publish Early, Publish Often

How to Debug

Instance Diagrams

It will be useful for us to draw pictures of what’s happening at runtime, in order to understand subtle questions. Instance diagrams represent the internal state of an object or even a program at runtime – its stack (methods in progress and their local variables) and its heap (objects that currently exist).

Why should we use instance diagrams?

To talk to each other through pictures (in class and in team meetings)
To illustrate concepts like primitive types vs. object types, immutable values vs. immutable references, pointer aliasing, stack vs. heap, abstractions vs. concrete representations.
To help explain your design for your team project (with each other and with your TA)
To pave the way for richer design notations in subsequent courses.

Although the diagrams in this course use examples from Java, the notation can be applied to any modern programming language, e.g., Python, Javascript, C++, Ruby.

Primitive values

Primitive values are represented by bare constants. The incoming arrow is a reference to the value from a variable or an object field.

Object values

An object value is a circle labeled by its type. When we want to show more detail, we write field names inside it, with arrows pointing out to their values. For still more detail, the fields can include their declared types. Some people prefer to write x:int instead of int x, but both are fine.

Mutating Values vs. Reassigning Variables

Instance diagrams give us a way to visualize the distinction between changing a variable and changing a value. When you assign to a variable or a field, you’re changing where the variable’s arrow points. You can point it to a different value.

When you assign to the contents of a mutable value – such as an array or list – you’re changing references inside that value.

Immutability (immunity from change) is a major design principle in this course. Immutable types are types whose values can never change once they have been created.

Java also gives us immutable references: variables that are assigned once and never reassigned.To make a reference immutable, declare it with the keyword final:

1 final int n = 5;

If the Java compiler isn’t convinced that your final variable will only be assigned once at runtime, then it will produce a compiler error. So final gives you static checking for immutable references.

In an instance diagram, an immutable reference (final) is denoted by a double arrow. Here’s an object whose id never changes (it can’t be reassigned to a different number), but whose age can change.

Immutable Objects vs. Mutable Objects

String is immutable: once created, a String object always has the same value. To add something to the end of a String, you have to create a new String object: java String s = "a"; s = s.concat("b"); /// s+="b" and s=s+"b" mean the same thing as this call

Immutable objects (intended by their designer to always represent the same value) are denoted by a double border. For example, here’s an Integer object, the result of new Integer(7). By design, this Integer object can never change value during its lifetime. There is no method of Integer that will change it to a different integer value.

By contrast, StringBuilder (another built-in Java class) is a mutable object that represents a string of characters. It has methods that change the value of the object, rather than just returning new values: java StringBuilder sb = new StringBuilder("a"); sb.append("b");

StringBuilder has other methods as well, for deleting parts of the string, inserting in the middle, or changing individual characters.

So what? In both cases, you end up with s and sb referring to the string of characters “abcdef”. The difference between mutability and immutability doesn’t matter much when there’s only one reference to the object. But there are big differences in how they behave when there are other references to the object. For example, when another variable t points to the same String object as s, and another variable tb points to the same StringBuilder as sb, then the differences between the immutable and mutable objects become more evident:

1 String t = s;
2 t = t + "c";
3 
4 StringBuilder tb = sb;
5 tb.append("c");

Why do we need the mutable StringBuilder in programming? A common use for it is to concatenate a large number of strings together, like this:

1 String s = "";
2 for (int i = 0; i < n; ++i) {
3     s = s + n;
4 }

Using immutable Strings, this makes a lot of temporary copies – the first number of the string (“0”) is actually copied n times in the course of building up the final string, the second number is copied n-1 times, and so on. It actually costs O(n^2) time just to do all that copying, even though we only concatenated n elements.

StringBuilder is designed to minimize this copying. It uses a simple but clever internal data structure to avoid doing any copying at all until the very end, when you ask for the final String with a toString() call:

1 StringBuilder sb = new StringBuilder();
2 for (int i = 0; i < n; ++i) {
3   sb.append(String.valueOf(n));
4 }
5 String s = sb.toString();

Getting good performance is one reason why we use mutable objects. Another is convenient sharing: two parts of your program can communicate more conveniently by sharing a common mutable data structure.

But the convenience of mutable data comes with big risks. Mutability makes it harder to understand what your program is doing, and much harder to enforce contracts.

Arrays and Lists

Like other object values, arrays and lists are labeled with their type. In lieu of field names, we label the outgoing arrows with indexes 0, 1, 2, … When the sequence of elements is obvious, we may omit the index labels.

Both the array and List objects are mutable, as indicated by the single-line border and the single-line arrows that can be reassigned.

Debug Systematically

Sometimes you have no choice but to debug, however – particularly when the bug is found only when you plug the whole system together, or reported by a user after the system is deployed, in which case it may be hard to localize it to a particular module. For those situations, we can suggest a systematic strategy for more effective debugging.

Reproduce the Bug

Start by finding a small, repeatable test case that produces the failure. If the bug was found by regression testing, then you’re in luck; you already have a failing test case in your test suite. If the bug was reported by a user, it may take some effort to reproduce the bug. For graphical user interfaces and multithreaded programs, a bug may be hard to reproduce consistently if it depends on timing of events or thread execution.

Nevertheless, any effort you put into making the test case small and repeatable will pay off, because you’ll have to run it over and over while you search for the bug and develop a fix for it. Furthermore, after you’ve successfully fixed the bug, you’ll want to add the test case to your regression test suite, so that the bug never crops up again. Once you have a test case for the bug, making this test work becomes your goal.

Understand the Location and Cause of the Bug

To localize the bug and its cause, you can use the scientific method:

Study the data. Look at the test input that causes the bug, and the incorrect results, failed assertions, and stack traces that result from it.
Hypothesize.
- Propose a hypothesis, consistent with all the data, about where the bug might be, or where it cannot be.
- It’s good to make this hypothesis general at first. Here’s an example. You’re developing a web browser, and a user has found that displaying a certain web page causes the wrong text to appear on the screen. You might hypothesize that the bug is not in the networking code that fetches the page from the server, but in one of the modules that parses the web page or displays it.
Experiment. Devise an experiment that tests your hypothesis. The experiment might be a different test case. In our web browser example, you might test your hypothesis by downloading the page to disk and loading it from a disk file instead of over the network. Another experiment inserts probes in the running program – print statements, assertions, or debugger breakpoints. It’s tempting to try to insert fixes to the hypothesized bug, instead of mere probes. This is almost always the wrong thing to do, because your fixes may just mask the true bug. For example, if you’re getting an ArrayOutOfBoundsException, try to understand what’s going on first. Don’t just add code to avoid the exception without fixing the real problem.
Repeat. Add the data you collected from your experiment to what you knew before, and make a fresh hypothesis.

Bug localization by binary search.

Debugging is a search process, and you can sometimes use binary search to speed up the process.

For example, in a web browser, the web page might flow through four modules before being displayed on the screen.

To do a binary search, you would divide this workflow in half, guessing that the bug is found somewhere in the first two modules, and insert probes (like breakpoints, print statements, or assertions) after the second module to check its results.

From the results of that experiment, you would further divide in half.

Prioritize your hypotheses.

When making your hypothesis, you may want to keep in mind that different parts of the system have different likelihoods of failure.

For example, old, well-tested code is probably more trustworthy than recently-added code.

Java library code is probably more trustworthy than yours.

The Java compiler and runtime, operating system platform, and hardware are increasingly more trustworthy, because they are more tried and tested.

You should trust these lower levels until you’ve found good reason not to.

Make sure your source code and object code are up to date.

Pull the latest version from the repository, and delete all your .class files and recompile everything (in Eclipse, this is done by <kbd>Project / Clean</kbd>).

Swap components.

If you have another implementation of a module that satisfies the same interface, and you suspect the module, you may try swapping in the alternative.

For example, if you suspected java.util.ArrayList, you could swap in java.util.LinkedList instead. If you suspect the binarySearch() method, then substitute a simpler linearSearch() instead. If you suspect the Java runtime, run with a different version of Java. If you suspect the operating system, run your program on a different OS. If you suspect the hardware, run on a different machine.

You can waste a lot of time swapping unfailing components, however, so don’t do this unless you have good reason to suspect a component.

Get help.

It often helps to explain your problem to someone else, even if the person you’re talking to has no idea what you’re talking about. Teaching assistants and fellow students usually do know what you’re talking about, so they’re even better.

Sleep on it.

If you’re too tired, you won’t be an effective debugger. Trade latency for efficiency.

Fix the Bug

Once you’ve found the bug and understand its cause, the third step is to devise a fix for it. Avoid the temptation to slap a patch on it and move on. Ask yourself whether the bug was a coding error, like a misspelled variable or interchanged method parameters, or a design error, like an underspecified or insufficient interface. Design errors may suggest that you step back and revisit your design, or at the very least consider all the other clients of the failing interface to see if they suffer from the bug too.

Think also whether the bug has any relatives. If I just found a divide-by-zero error here, did I do that anywhere else in the code? Try to make the code safe from future bugs like this. Also consider what effects your fix will have. Will it break any other code?

Finally, after you have applied your fix, add the bug’s test case to your regression test suite, and run all the tests to assure yourself that (a) the bug is fixed, and (b) no new bugs have been introduced.

Summary

In this reading, we looked at instance diagrams to understand the difference between assignment and mutation. In instance diagrams, + objects are represented by circles with a type and fields inside them + immutable objects have a double border + a variable or field reference is represented by an arrow + an immutable reference is an double arrow

Debug systematically + reproduce the bug as a test case, and put it in your regression suite + find the bug using the scientific method + fix the bug thoughtfully, not slapdash

Up next

Mutability and Immutability