Leanpub: Publish Early, Publish Often

7 Errors

Errors are one of the most important parts of your API. They’re a large part of the reason I decided to write this book; they’re where the title comes from.

For most API authors, errors are an afterthought. The author designs the way the API works, not the way the API breaks, so this is only natural. However, in practice, handling errors turns out to be a vast majority of the code written when writing a (production-ready, at least) client library. Networks and software are tricky things; there are far more ways for things to go wrong than there are for things to go right. And nothing lowers the quality of a user experience like poorly-handled errors. “The server is down” should not crash the app, and “your request is bad and you should feel bad” is not a helpful error message.

I, personally, believe errors exist for one reason: to point out what the user needs to do to fix the error. If the client library has no way to present the error to the user that suggests one and only one possible course of action, the error is a bad error, in my mind.

There are dissenting opinions on this, and that’s okay. (That’s great. People disagreeing with me means people are thinking about APIs!)

Some people think that an error should also inform the developer what went wrong, to make development more convenient. They like to include links to API documentation about the error in the response.

Some people think that an error should help the user report it, by generating a unique identifier that the user can refer to in a support case.

These are all valid approaches to solving the problem of creating a useful error. None of these are bad APIs—they’re just not my style of API. But not everyone has the same style, and that’s ok.

Again: there is no one right way to write an API. There are just lots and lots of wrong ways to do it.

So what does a bad error look like, then?

Nothing but a message

A lot of APIs will return an error that looks something like this:

{
"error": "That username is taken.",
"code": 400,
}

They return a string telling you what went wrong, and repeat the HTTP status code (…sometimes. Sometimes you can even just get the string).

Why is this a problem?

This is a problem because when you’re trying to write a client to gracefully handle these errors, you basically have four options:

Use string matching on the entire string ("That username is taken." == resp) to detect the error. This is problematic because API authors that return errors like this aren’t known for… consistency. An update might remove that period, or lowercase the first word. And then your client breaks. Definitely not ideal.
Use string matching on a substring (resp.contains("username is taken")) to detect the error. Once again, you’re dependent on the API author’s consistency in something that is probably not considered to be constant. In general, matching human-readable strings or substrings to detect the type of error is an API smell—it means back away slowly, you’re going to have a Bad Time™.
Don’t handle the error at all; just pass it on to the user. I’m sure your users will love you for this. I hope your API isn’t in multiple languages (or a language other than the one the API authors speak) and that the API authors are good at writing user-facing error messages.
Hope that error doesn’t occur. Ignore it, and cross your fingers.

You need an error that makes sense to a machine, because a machine is the one that’s going to have to handle it. A human readable message is basically just saying “your request is bad and you should feel bad”.

Stack trace as a service

Some API authors just fall back on the built-in error handling of their framework or stack, a fallback that usually includes some debugging information.

Like a stack trace.

A stack trace is a trace of the calls that produced the errors. It’s incredibly convenient for debugging.

It is not incredibly useful as an API result. It is in fact, as close to useless as you can get without failing to send a response at all. Developers can’t use string matching on it. Developers can’t pass it on to the user. All they can do is say “well, that didn’t work.”

Even worse, this leaks information about your system. It is a security risk. You can accidentally tell users where code or sensitive data lives, what version of a framework or library you’re running (which helps attackers find known vulnerabilities), or a bunch of other data that a user has no legitimate reason to know.

This isn’t a security book, and I’m not a security expert, but here’s a security tip: never share more information about your server or API than a user has a legitimate reason to know.

Never rely on the development server’s default response as your error response.

Everything’s OK

Status codes mean things. A status code between 200 and 299, inclusive, means that things are OK, there was no error. A status code between 400 and 499 means the request is problematic, and no matter how many times you try it, it isn’t going to work. A status code between 500 and 599 means that the server had a problem—it makes no claim about the validity of the request. Retrying these requests can actually yield a successful result.

That’s kind of an important thing to know, as a client library author. What should I retry? What shouldn’t I?

Some API developers helpfully respond with a 200 OK status code to everything. Success? 200 OK. Error? 200 OK. This is incredibly frustrating, because you’re then required to mix your networking code (retry!) and your application logic.

Always use the most appropriate status code for your response. We’ve covered this previously, but it bears repeating. Status codes are important, there are a lot of them, and they are incredibly powerful.

Recommendations from the author

So how do I handle errors? I like to treat errors like I do any other errors:

{
"errors": [
	{
		// an error resource
	}
]
}

That way, I can return multiple errors for a single request, if I know multiple things were wrong with it, saving network trips for the client. It also matches the decoding logic for the rest of the response, making it easy and intuitive to parse.

Consistency and orthogonality are the hallmarks of an API that is simple to use and reason about.

Each error resource consists of two fields:

An application-specific error code. This is a description of what went wrong; consider it a resource-agnostic way of describing an invalid state. I usually have things like:
- overflow: for errors that signify a value is too large. Causes could be a string that is too long, a number that is too high, an array containing too many elements.
- insufficient: for errors that signify a value is too small. Causes could be a string that is too short, a number that is too small, an array containing too few elements.
- invalid_format: for errors that signify a formatting problem. Causes could be an error in the JSON formatting, a string where a number is expected, etc.
- invalid_value: for errors that signify a valid format, but invalid value. Causes could be an invalid character in a string, a value outside an accepted set of values, or a value that violates application-specific rules. While these are only a few of the codes, I tend to have between ten and twenty of these per application.
A field or parameter. A field is a JSON pointer to the element in the JSON request body that caused the error. A parameter is the URL parameter that cased the error.

These two fields, combined, allow the client to pinpoint the precise cause of the error and suggest a single fix to the user. The error code tells the user what’s wrong, the cause tells the user what needs to be fixed. The client can then display, next to the invalid user input, a suggestion that would resolve the error code. An overflow error may cause the client to suggest removing characters from a string. An insufficient may prompt the client to select an item to act on.

There are some possible variations on this. The error code could be a URL—because the URL is unique, it serves the same function as a constant string. The URL can then also be used to point a developer towards documentation. You could also include an instance ID that the user can use to report an error or you can use to track errors.

Or you can use something different entirely. It’s up to you. Just try to give your clients useful information.

For bonus points

If you want to go above and beyond and create an API that developers will compliment ad-nauseum, take care when documenting your API. Not only should you document what request is expected and its options, and what response is sent and its variations, you should also document the possible errors. This is a lot of work to maintain and do, but it is so incredibly useful for an application developer that cares about handling all the possible errors as elegantly as possible.

Recap

So what have we learned this chapter?

String matching error responses to figure out what went wrong is an API smell.
Status codes are Really Darn Important. Use them correctly so your API consumers can handle responses intelligently.
Stack traces as errors are not only useless, they’re security risks.
A good approach is to have errors describe what went wrong and what input caused it, but you’re welcome to use what works for you

Errors are one of the things most APIs overlook. Don’t overlook them in yours, and your developers will thank you for it.

Our next chapter is going to talk a bit about authentication. It’ll be a good time. Pinky swear.