Your API Is Bad

Your API Is Bad
Your API Is Bad
Buy on Leanpub

1 Acknowledgements

This book wouldn’t be possible without the aid and support of a bunch of great people. In no particular order:

Matthew Turland

http://www.matthewturland.com
@elazar on Twitter

Matthew Turland is an old friend of mine, and one of the resources I continually utilise to spitball ideas with. He’s great at talking through semantics or explaining best practices, and his tutelage has been indispensable over the years. He graciously agreed to act as a reviewer for this project, and his insights and suggestions are responsible for what you’re about to read.

Unless you don’t like it, in which case he had nothing to do with it.

John Sheehan

http://www.john-sheehan.com
@johnsheehan on Twitter

John Sheehan is the CEO of Runscope, one of my favourite new companies. Runscope is building tools around APIs, and addressing a niche that has long been underserved. John takes time out of his busy days to engage in lengthy, specific, and enormously helpful feedback sessions as a reviewer for this book. His experience with APIs and ability to identify murky or unclear explanations have been invaluable.

But when we talk about Runscope tools, it’s just because they’re some of the best on the market and what I use, personally. Not because John volunteered his time.

Promise.

Michael Mahemoff

http://mahemoff.com
@mahemoff on Twitter

Michael Mahemoff is the founder of Player.FM, a great podcasting project. He’s also one of the former Googlers behind the developer experience movement, and an expert in the field of APIs. Michael kindly volunteered to write a foreword for this little project, which made me fangirl a little. He took it a step further, however, and agreed to write the foreword as I wrote the book, allowing Leanpub customers to see the evolution of the foreword as it is written. Which is super cool.

2 Foreword

This is a placeholder for the moment. Eventually, this will become a foreword by Michael Mahemoff, founder of Player.FM and developer experience veteran. For now, it’s a chunk of text. Nothing to see here, move along.

3 Introduction

I love APIs. I’ve been working with them for years now; writing them, writing against them, and discussing them. When APIs are done right, they’re powerful, semantic definitions of a useful piece of functionality.

Let’s try something. I want you to think of a definition of a chair. It should encompass everything that is a chair and nothing that is not a chair. Does it fit swivel chairs? Stools? Chairs with three legs? Does it include a bench? Do you accidentally include a table? What makes a chair a chair?

Defining things is incredibly hard.

This is why most APIs aren’t done right.

Why Are APIs Important?

A lot of companies don’t offer APIs that any third-party developer can write software against. And that’s okay. A lot of companies shouldn’t provide such an API.

But every company should have an API.

The era of single-device computing is over. Whether it’s tablets or phones or some new category of device that we haven’t seen yet, it is no longer reasonable for you to expect that everyone will use your website. It would be nice if it were; the web was designed to work on a variety of devices. But the capabilities are too varied and the pace of standardisation too slow to incorporate these capabilities into the web natively. As much as I love the web and what it represents, to offer the best user experience on mobile devices, you can’t just rely on the web interface.

You need an API your mobile apps can talk to.

And if your mobile apps require an API, your website should be an unprivileged consumer of that API. That is, your website should not offer any access to your service not exposed by your API. Otherwise, you’re going to have A Bad Time™ replicating that feature in your mobile clients.

APIs are the write-once-run-everywhere of today. They’re the abstraction that allows you to draw your business logic into a single place.

They’re not going anywhere, so making them easy to use is worth the effort.

Your API Is Bad (and You Should Feel Bad)

I’ve written client libraries for more APIs than I care to think about. Sometimes against APIs I wrote and defined myself. More often than not, it was an incredibly painful experience.

Why?

Most API providers don’t want using the API to be a painful experience. On the contrary, they want the experience to be as pleasant and simple as possible. So why are the majority of APIs painful to use?

Because it’s hard to write an API that is a pleasure to use. It’s exceedingly tricky. You need to resist the temptation to constrain your API to a set of predefined use cases, or you’ve just written an application with a terrible interface. An API should be simple definition of building blocks that you build an application out of. But these building blocks need to be fundamental enough that you can build wildly different experiences from them, to best take advantage of each platform you’re building for. And defining things at that level means reducing them to their core components, and deeply understanding those components.

A lot of APIs also don’t take into account what it’s like to actually use that API. How hard is it to figure out what caused an error and what the error means? What useful information is being provided? How do I need to contort my code to make sense of the data being returned by the API? What extra work is being created for me that could be avoided by a little forethought from the API designers?

Errors are my personal pet peeve. A lot of the time, an error tells you nothing more than “your request is bad and you should feel bad”. There’s no useful information at all.

My response is “your API is bad and you should feel bad”. And that’s why I’m writing this—I don’t want to have to work with bad APIs.

This book is not a description of a good API. That is extremely subjective, and specifying Properties Of A Good API would only encourage people to blindly follow the “rules”. Protip: if you’re blindly following anything, your API is bad and you should feel bad. Think about each decision completely. Consider the effects on your API’s usability. Consider what each decision means, semantically.

Instead of giving you rules, I’m going to take you on a tour of things that make me flip a table in rage when I work with APIs. Things that API designers didn’t think about or consider, things that people change when I point them out. So I’m pointing them out here. Consider them ahead of time, and you’ll be on your way to a truly elegant API.

And you won’t have to feel bad.

4 Anatomy of an API

This chapter is going to be a huge infodump. Sorry, everyone. I want to make sure we’ve got all our basics covered before we get to the table-flipping portion, so this is essentially a primer. If you’re super 100% extra positive that you know what you’re doing and are familiar with REST and HTTP, you have my permission to skip this chapter and the next one, and get right to the good stuff: crimes against APIs.

Still with me? Sweet. Away we go.

Limiting Our Scope

An API is a really hard thing to define. Technically, an API (which, by the way, stands for “Application Programming Interface”) just describes how software components should interact with each other. That means that client libraries and even programming languages have APIs. But, for the sake of my sanity, we’re going to limit our discussion to APIs that interact with services, specifically services over HTTP.

There are lots of ways to build APIs on top of HTTP. There are RPC (“remote procedure call”) APIs, REST (“representational state transfer”) APIs, SOAP (“simple object access protocol”) APIs, and IMTUAIG (“I’m making things up as I go”) APIs. I can’t talk about all of them in this book, because your attention span isn’t that long and I don’t have years to devote to researching and writing a comprehensive compendium of every variation on HTTP APIs that anybody has ever concocted.

Instead, we’re going to focus on the most popular: REST. Because if you’re not using REST, you should have good, strong reasons for not doing so. And if that’s the case, you’re probably already thinking a lot about your API and its design, and therefore won’t get much use out of this book. Your API is not bad, you do not need to feel bad. Carry on being awesome.

For the rest of us mere mortals, we have one final scope-limiter to add: REST is a controversial term. Technically, for an API to be RESTful it needs to support HATEOAS (“Hypermedia As The Engine Of Application State”). This is a really fancy way of saying that your API should be self-describing; given a single endpoint and knowing nothing else about your API, client libraries should be able to access all of its functionality. This is Really Hard™, and most people don’t bother. It’s also really, really good API practice, so if you have the time and interest to try and pursue that, definitely go for it. But in the interest of being inclusive, we’re going to be talking about the most common type of RESTful API; that is, an API that applies 99% of the properties of RESTful APIs, but totally ignores HATEOAS. It’s the dirty little secret of APIs today: most implement the resource part of REST, apply most the principles, and say “good enough”, ignoring HATEOAS. Which is sad, but there you have it.

We’re going to be talking about HATEOAS in a later chapter, so don’t be sad if you were all excited to hear about it. Patience.

To be entirely fair, this really isn’t REST anymore. This is just using the concept of resources, and manipulating them using the semantics of HTTP. But we don’t have a catchy, shorthand name for that yet, so we’re going to stick with REST.

Now that we’ve properly narrowed our scope to something manageable, let’s get to the good stuff.

Resources, Operations, and Representations

REST deals with three core concepts that drive everything about the API:

Resources are the conceptual model of the data you want to expose with the API. Think of them as the “classes” or “types” of the API.

Operations are the things you do to the resources. Think of them as the “methods” or “functions” of the API.

Representations are expressions or instances of a specific resource. This is a bit abstract, but think of it this way: APIs can return responses in different formats (like JSON or XML), right? You can respond with JSON, or XML, or a Protocol Buffer, or with a URL-encoded string. Each different format would represent the same data, so they’d each be their own representation. To make this even more confusing, suppose you have a Book resource. The Book has a Title property and an Author property. When you retrieve a list of a Books, the response contains the Title and the Author. When you retrieve a list of Books, filtered by a specific Author, the response contains just the Title. These are different presentations of the same conceptual objects, even though one presents more data about the object, so they’re both representations of a resource, instead of being independent resources.

HTTP: A Woefully Inadequate Primer

Not to shave a yak, but we also need to talk really briefly about HTTP. I’m going to skim horribly, touching only briefly on a lot of Really Cool Things™ in this section, so please read up on HTTP. Wikipedia has some great information about it, and I bet a Google search turns up a lot of literature on it. It has been documented for over 20 years and it’s fundamental to the internet as we know it today, so lots of people have written and are writing about it.

In the interest of being awesome, let’s make this section illustrative.

 1 GET / HTTP/1.1  
 2 User-Agent: curl/7.30.0  
 3 Host: yourapiisbad.com  
 4 Accept: */*  
 5   
 6 HTTP/1.1 200 OK  
 7 Content-Type: text/plain; charset=utf-8  
 8 Content-Length: 12  
 9 Date: Mon, 28 Oct 2013 21:44:37 GMT  
10   
11 Hello, world  

That looks really complicated, but it’s actually deceptively simple: those first five lines are the request we sent to the server, the other six are the response the server sent back.

Requests

You’ll notice that the first thing we sent was GET / HTTP/1.1, after we connected to www.yourapiisbad.com. www.yourapiisbad.com, by the way, is what’s called a “domain”, but in this instance it’s also our “host”. The domain/host will be important later, when we discuss endpoints, but for now it’s enough to know that it means “where the request should be sent”.

GET is just an HTTP verb, whichi is used to describe the operation that should be performed against the resource. There are currently 9 HTTP verbs:

  • GET: request a representation of a resource or group of resources.
  • POST: request that the server create a new resource using the body of the request (… sort of. POST has some other uses, which we’ll talk about below, but creating resources is the main one)
  • PUT: request that the server set the specified resource to the body of the request
  • DELETE: request that the server remove the specified resource.
  • HEAD: request a response identical to the one a GET would receive, but without any body; typically, this is used to get response headers (which we’ll discuss shortly).
  • TRACE: echo back the received request as a response.
  • OPTIONS: request a list of HTTP verbs that the server supports for a specific URL.
  • CONNECT: converts the request connection to a TCP tunnel.
  • PATCH: request that the server update part of the specified resource using the body of the request.

Of these 9 HTTP verbs, you’ll mostly just use the first four: GET, POST, PUT, and DELETE.

After the verb, the path is specified (/, often called “the root”, in our example above). This is how a request identifies which resource it would like to operate on.

After the path, you’ll see the HTTP version (HTTP/1.1, in our example). This just tells the server which version of HTTP the request is in. You generally don’t need to concern yourself with this.

The next series of lines, each beginning with a word, followed by a colon, followed by some seemingly random stuff, are called “request headers”, and they’re useful for specifying metadata about the request itself. In our example, we have User-Agent, Host, and Accept. The User-Agent is just an identifier for the client that made the request, useful for tailoring responses to specific clients, debugging, and recording analytics. It should be noted that there is no guarantee the User-Agent has not been tampered with, so you should never trust it. Host just identifies the host (or “server” or “domain”) the request was sent to. Accept is where it gets interesting. Accept allows a request to specify the representation it will accept in a response. In our example, */* is specified, meaning any type of representation is acceptable. An Accept header of application/json would mean only JSON-encoded representations should be returned. This is the client’s way of saying “Here’s what I know how to handle.”

The next part would be our request body, the information we want to send with the request, but we didn’t send any information with the request, so there’s no body.

GET requests tend not to have a body, and generally should not have one. Things may not work as expected if you try to send a body with GET.

All said and done, pretty basic: verb (what you want to do), path (what you want to do it to), headers (information about your request), body (the data needed to do what you want). With these four components, we can make magic.

Responses

I’ll reproduce our sample response again, in case you forgot what it looked like:

1 HTTP/1.1 200 OK  
2 Content-Type: text/plain; charset=utf-8  
3 Content-Length: 12  
4 Date: Mon, 28 Oct 2013 21:44:37 GMT  
5   
6 Hello, world  

This looks familiar. We’re starting off with the HTTP version of the response. Again, you generally don’t need to concern yourself with this. After that, we see 200 OK. This is the status code and status message of the response. It’s a short generalisation of how the server is answering your request. A 200 response means “your request succeeded, everything’s fine, good job!”

There are 5 levels of HTTP response codes:

  • 1XX: informational; this should probably never be used in your APIs.
  • 2XX: success; your request succeeded, good job.
  • 3XX: redirection; the client needs to do something else before the request can be handled. Generally used to redirect.
  • 4XX: client error; your request was bad and you should feel bad.
  • 5XX: server error; oops, there’s something preventing the server from serving your request. Try again later.

Again, that’s painting with a broad brush stroke, but it should be enough to give you a general idea.

Next, we have a bunch of headers again. Request headers give metadata about the request; response headers give metadata about the response. In this case, we have Date, Content-Length, and Content-Type.

Date just tells the client when the response was sent from the server.

Content-Length tells the client how many bytes it should expect in the response body. If our request had had a body, it would have had a Content-Length header, too.

Content-Type is the twin to Accept. It describes the representation the server actually responded with, which should be a member of the list of representations passed by Accept in the request. This gives the client information about how to decode the request.

There are actually really cool and powerful rules for whether a content type matches an Accept header. Definitely read up on them.

Finally, we have the response body. The response body is where the data of the response will live; in this case, the data just says “Hello, world”. In real APIs, this would be the information you were trying to retrieve.

Again, pretty basic tools: status (success/fail indicator), headers (information about the response), body (actual response). From these three components, you can give really powerful, useful responses.

Which, of course, doesn’t necessarily mean that most people do.

Back on Track: Conventions

Our long foray into HTTP finished, let’s get back to the REST stuff. Some conventions have taken hold and become de-facto standards. You should follow them unless you have a good reason not to, if only because that’s what your users will expect. And as any designer can tell you, if your user expects your software to do something, that better be what your software does.

Endpoints

Endpoints identify where a resource lives. They’re just URLs used to interact with APIs. “http://api.yourapiisbad.com/” is an endpoint, and “http://yourapiisbad.com/my/awesome/endpoint” is an endpoint. All endpoints provide is a location that a resource can be found at.

Resources are addressed by endpoints, but how do we build those endpoints?

First, a brain exploding moment: a collection of resources is a resource.

Got that? So if I’ve got a list of awesome things (a unicorn, my puppy Roxy, Call Me Maybe), each item in that list is a resource. But the list itself is also a resource. Weird, right?

But that makes building endpoints simple: the endpoint is just a resource hierarchy. We start with the host or the domain, which is the context of the request. This is where the request should is sent, but it also defines the conceptual boundaries of your API. After all, my API and your API may both define User resources, but these resources may not be the same and may not be interchangeable. The domain namespaces resources to the context you’re defining.

Then we start with a base resource. So in our list example, the base resource would be the entire list of awesome things. We could call that collection awesome_things, or things_that_make_paddy_smile. It doesn’t really matter what we call it, it just needs to be unique and should be a good description of the resources in the collection.

After our base resource, we can specify a specific member of that resource to return. So, for example, if we wanted just information about my puppy Roxy (and, let’s face it, everyone wants information about my puppy. You can follow her on Twitter at @RoxyThePuppy, by the way) we’d need her Resource Identifier. Let’s say it’s roxy, though your API may use a database ID or a username or something else. Again, the ID only needs to be unique—non-collection resources don’t even need a descriptive ID, though it’s always nice if they can have one. Now we want to say that roxy is a member of the awesome_things resource (which is actually a collection of resources), so our endpoint would be awesome_things/roxy. And awesome_things is not a member of any resource (it’s a base resource), so the final endpoint is /awesome_things/roxy.

If it helps, think of collection resources as database tables, and regular resources as a specific row in that table. You need to specify the table name and the row ID to get a specific row. If you specify just the table name, you’re really saying “all the rows in the table”

Of course, you can (and should) have more complex hierarchies than that. A blog, for example, may have /posts/1/comments/4, meaning the fourth comment on the first post. /posts/1/comments would mean all the comments on the first post. /posts/1 would mean the first post. /posts would mean all the posts.

Verbs

Though you can theoretically use HTTP verbs however the heck you want when making requests, it’s best to respect the semantic meaning of the verbs. You’ll have less pain that way.

GET is meant to retrieve a representation of a resource. It’s also considered a “safe” verb, meaning that it can be called without any side-effects. So if you’re creating, updating, or deleting things, you really shouldn’t be using GET.

POST is reserved for any non-idempotent action—that is, any action that has side-effects every time it’s performed. Generally, POST requests are used to create objects. For example, you’d use a POST request to /awesome_things if you wanted to add another resource as a “child” of the awesome_things resource. A POST request should specify the resource to be created in the body. You should generally respond with the created resource.

Note: but keep in mind that POST requests can just mean “change this resource”. That’s a powerful definition, and it gets forgotten a lot.

PUT is meant to replace a resource. Semantically, a PUT request’s body should overwrite everything about the resource except for its ID. Think of it as knocking down the eastern wall of your house and building a new wall in its place; totally new wall, but it’s still the eastern wall of your house. If you want to update just part of a resource, you should be using PATCH. PATCH is the equivalent of painting the eastern wall of your house; the wall is a different colour, but is still the eastern wall and everything else about it is the same.

PATCH is a fairly new verb in HTTP. A lot of APIs and clients will use PUT or POST for updates instead.

DELETE is meant to destroy a resource.

Data Formatting

Once every six months or so, I come across an API that doesn’t support JSON representations. This never fails to confuse and bewilder me; it’s like the API designers haven’t been paying attention for the last decade or so.

Don’t be that API.

I’m not here to tell you one data format is empirically better than another. If, to escape the decadent luxury of self-flagellation, you decide to punish yourself by working with some XML, who am I to stop you?

I am going to tell you, however, that pretty much every user expects you to be returning JSON for them to parse. JSON has tooling readily available for it, it’s easy to express data structures in it, it’s easy to read. Which is not to say that JSON is better, just that JSON is massively popular. If you are ignoring the expectations of your users, you should have a very good reason for it.

“I need to send and receive binary data.”

That’s totally a good reason. Use your protocol buffers or your msgpack or your BSON or whatever it is your use case requires. You’ve thought this through. Gold star.

“I’m serving mobile apps for users with low bandwidth, and JSON is too verbose.”

That’s totally a good reason. You also have thought this through. You considered the needs of the user, so choose an appropriate data format for that job.

“But XML is better.”

Your reasoning is bad and you should feel bad.

This isn’t to say that you can’t return XML or whatever data format floats your boat. By all means, go for it. Just also be able to return JSON, and respect the Accept header. If your users are used to working with JSON (protip: in 99.999% of all cases, your users are used to working with JSON) that’s what you should let them work with.

For the rest of this book, we’re going to be using JSON for the examples. Just mentally translate it into whatever data format you prefer.

Wrapping Up

That’s the end of our horribly brief overview of HTTP. Definitely dig more into it. Read the specifications, check out Wikipedia, Google around for some blog posts. HTTP is incredibly powerful and able to create very flexible semantics when you know how to use it, so developing good intuition around it will definitely work out in your favour.

In the next chapter, we’re going to talk a bit about how to go about designing APIs, and it’ll be the last chapter of primer information, I swear.

To recap:

  • HTTP is made up of requests and responses.
  • Requests have a verb, a path, a host, some headers, and a body. The verb is what you’re doing, the path is what you’re doing it to, the host is namespace or context for what you’re doing it to, the headers are metadata about the request, and the body is the data you want to use.
  • Responses have a status code & message, some headers, and a body. The status code & message tell you whether the request succeeded or not, the headers are metadata about the response, and the body is the data you’re receiving.
  • Requests can tell the server how they want the data represented using the Accept header, and the server can tell clients how the data is represented in the response by using the Content-Type header.

If you’ve got all that down, we’re ready to talk about how to go about designing your API.

5 URLs

In the last chapter, we talked about some of the basics behind HTTP, but we didn’t really talk about how to go the process of designing an API. So we’re going to do that now. We’re going to design the URL endpoints for an imaginary blog. Again, if you’re super extra 100% with sprinkles on top sure that you have this stuff down, go ahead to the next chapter, where I promise to start being entertainingly upset over bad API design. If you’re in doubt, continue on with this chapter. It’s interesting, I promise.

The Conceptual Structure

In the last chapter, we talked about the structure of URL endpoints, and we decided that they can be structured as a hierarchy of resources.

Our hypothetical blog has several resources we need to provide access to:

  • Articles, which are the content of the blog.
  • Authors, who write the content.
  • Comments, which are responses to the content of the blog.

These resources have a hierarchy to them that we’ll explore as we construct the URLs.

Let’s start with Articles. The URLs for Articles can be pretty simple—as they should be, as Articles are going to be the content the audience wants to access most frequently.

Your API should take the typical use case into consideration and try to simplify it as much as possible. Be careful not to constrain the potential of your API in favour of simplifying the typical use case, however; like many things in design, it’s a trade-off and balance that needs to be struck.

For example, retrieving a list of the most recent Articles could look like this:

1 GET /articles

Retrieving a single Article (maybe the Article with an ID of 4) would then be a simple extension of that:

1 GET /articles/4

IDs should reflect the way your audience usually accesses the data. In our example, our audience typically wants to use a “slug” or short, unique string to access the Article. That would look like this: GET /articles/my-post

The idea is that you want the resource identified by 4 that is a member of the articles resource—remember, resources can contain other resources.

The important thing to remember is that URL endpoints should only contain nouns. That means that verbs like create or get should never be found in URLs—that’s what the HTTP verbs are for, as we discussed in the last chapter.

Filter or Endpoint?

URLs have a “query string” that allows you specify “query parameters”. In /articles?published=true&shared=false, published and shared are both query parameters. published has a value of true, and shared has a value of false. Note that parameters and values can be any string, though you should make sure to URL encode them.

Semantically, query parameters and their values are meant to filter the resources returned. In the sample above, only resources that have a published property set to true and a shared property set to false should be returned.

This raises a conundrum, however. It’s best illustrated by trying to construct our Comments endpoint, so let’s do that now. The simple way to do that is to just create an Comments endpoint, like we did for Articles:

1 GET /comments

This is fine. There’s nothing wrong with this. But this could be a bit easier. Because how does your audience typically want to use this information? They want to retrieve Comments on a specific Article. Assuming the Comments have an article_id property containing the ID of the Article the Comment belongs to, you can always use a query parameter to filter the Comments:

1 GET /comments?article_id=4

Semantically, that means “show me all the Comments whose article_id property is set to 4”, which will retrieve the list of Comments you want, so it’s a valid API design decision.

But that logic isn’t how your user approaches the problem. They don’t want a list of Comments that have a property set to a value, they want a list of Comments that belong to an Article. That suggests a hierarchy:

1 GET /articles/4/comments

Semantically, that says “show me a list of Comments that belong to the Article with ID 4.” See how that aligns better with the user’s approach to the request? That affordance will make your API easier to reason about and, therefore, use.

We need to be careful, though, because look at the functionality that was lost to that affordance: we can no longer retrieve a list of Comments, regardless of the Article they belong to. If we want to say “show me all the Comments”, we can’t. We have to go through each Article and retrieve its Comments manually.

Which may be fine. That may be ideal, based on your backend and the capabilities you want to offer. It’s just important to be mindful of the restrictions your decisions create and to offer affordances without unnecessarily restricting use cases. A balance needs to be struck between versatility and ease of use.

Should you want to offer both, the affordance and the versatility of being able to list all the Comments, there’s no crime in including both:

1 GET /comments?article_id=4
2 GET /articles/4/comments

Consider it a helper endpoint that calls out the common use-case. While it’s technically undesirable to have multiple endpoints representing the same resource, the benefit of making typical use cases obvious can outweigh the technical purity.

As always, API design, like all design, is contextual. You need to examine your API and its needs, and try to meet your users’ expectations with as little friction as possible.

Pseudo-Resources

Some APIs have a practice of offering, for example, an /authors/me endpoint that returns the authenticating Author. I like to call these endpoints “pseudo-resources”, because they don’t point to a single resource, they just proxy to a contextual resource.

This is problematic. First, because URLs are supposed to refer to the same conceptual resource. If I request /authors/1, I should get the same thing you get.

But that’s a question of technical purity, and that’s undesirable in some situations. For example, if I have admin access to the blog, /articles may return unpublished posts for me, but not for you. So, technically, the resource would be different, unless you contort the definition of that resource in unintuitive ways. And yet, that’s the intuitive response that your user expects, so that should be how your API works. Technical purity is a great way to decide between two equivalent implementations, but it should always take a backseat to the experience of your user.

The more important thing to remember is that it may sometimes be desirable to store more than one authenticating user’s response for the same resource. Consider a situation like Twitter, where you may have multiple accounts. Returning a different resource for /authors/me breaks the caching expectations and adds a layer of complexity.

The worst part is that it doesn’t need to be there. Your users know who they are. They logged in. They don’t need you to remind them. You don’t need an /articles/newest endpoint; just use /articles, sort by the publish date, and limit the request to 1 result.

That is, unless the entire point of your application is to somehow show only the latest Article. In that case, you absolutely should have an /articles/newest endpoint.

Pagination

I mentioned it in that last section, so we may as well cover it now: how do we tell the server how many resources we want back, or which range of resources, or how they should be ordered?

There are a few approaches to this. The way most APIs approach it is to include that information in the query parameters. E.g., /articles?sort_by=date_published&order=desc would tell the server it wants the Articles sorted by the date they were published, with the most recent dates first.

And that’s fine. There is a technical argument to be made there: the query string is meant to filter a resource, not order them. These APIs (again, a majority of the ones I’ve used) appropriate the query string as a place to include meta information.

Why? We already have headers for that purpose, as we discussed in the previous chapter. If you want to specify the bytes you get in response, you use the Range header. Why wouldn’t you use a header to specify the range of resources you want or how you want them returned?

The answer is that, once again, it’s a trade-off. While returning that information in the headers is going to make your query string cleaner and maintain technical purity, it means it’s harder to debug and explore your API in the browser. I, personally, don’t frequently debug APIs using the browser, but I’m a minority in that respect. So a lot of APIs are designed around web browser limitations, to make it easier for developers to work with them.

This is great. I love that this is happening. It means that API designers are taking into consideration the way that people are using their APIs and designing to make that easier.

Personally, I prefer to have the pagination information in the headers, because it’s technically purer, presents a cleaner query string, and has no drawbacks for me. But that’s a preference; my life is not made harder by having the pagination information in the query string, so it’s really just a tradeoff. Does the aesthetic advantage outweigh the loss of debugging with the browser?

Wrapping Up

In this chapter, we talked about:

  • How to approach API design.
  • Query parameters and values, the way resources are filtered.
  • URL endpoints, the way resources are accessed.
  • Pseudo elements and their problems.
  • Pagination and browser debugging.

If there’s a key takeaway, it’s this:

Good API design is not a matter of following principles, it’s a matter of fulfilling your users’ expectations while trying to maintain technical and semantic purity.

User expectations trump technical and semantic purity, but tossing technical and semantic purity out the window will yield an API that’s hard to reason about and is very constrained. API design is the process of balancing these concerns.

With that in mind, let’s talk about some bad API design decisions. What is a bad API design decision? It’s not just a matter of opinion or aesthetic; those are API preferences, like my preference for pagination in the header.

A bad API decision is a decision that sacrifices user experience without gaining anything or where the gains are so incredibly outweighed by the problems exposed, no designer would ever consciously make that choice.

In short, a bad API decision is simply the lack of a conscious decision.

Let’s take a look at some of those in the next chapter, which is going to cover requests and responses.

6 Requests and Responses

As we just covered in Chapter 1, requests and responses are how you interact with APIs. They’re kind of the paradigm that everything else rests on. So getting your requests and responses right is Really Important™.

Let’s talk about how easy it is to get your requests and responses wrong.

Polymorphism

Polymorphism is just the idea that the same concept can appear in many forms. If you remember, resources are polymorphic—they can be returned as XML or JSON, they can have some fields omitted or included, etc.

Polymorphism, by itself, is not inherently evil. By supporting different data formats, you enable multiple environments to interact with your APIs really easily, for example.

But when your API starts supporting polymorphism in internally conflicting ways, that’s when you’re going to start making people Hulk out on your API.

What I mean by internally conflicting is that the property or resource can be one of any number of things that need to be handled differently, and there’s no way of knowing what it will be ahead of time. For example, say you have an Article resource, which looks like this:

1 {
2 	"title": "My Title",
3 	"author": "Paddy Foran",
4 }

But sometimes, your Article resource can also look like this:

1 {
2 	"title": "My Title",
3 	"author": {
4 		"name": "Paddy Foran",
5 		"user_id": 123
6 	}
7 }

See how author is sometimes a string and is sometimes an object? Trying to treat it as a string when it’s returned as an object or trying to treat it as an object when it’s returned as a string is going to make code break, so now developers have to keep careful track of which API call returns which, and need separate code to handle them. Especially for developers working in strongly-typed languages, not knowing the type of the response ahead of time is going to cause a lot of code bloat and waste a lot of developer time writing boilerplate around this, when it could easily be solved by just always returning an object or string for the author field.

There are a few instances where polymorphism is hard to avoid. The one I hear most often is the news feed—that is, a resource like Facebook’s news feed or Github’s news feed; basically, a collection of other resources that have been changed, sorted in descending order. The argument is that each item of the feed is polymorphic, because it could be any of the resources that is tracked on the news feed.

I disagree. Each item of the feed is a news feed item resource, and that shares all the same properties. Maybe a timestamp, a title, a summary, and a link to the resource it’s describing. Maybe a timestamp, a title, a summary, the type of change that occurred, the user that made the change, and a link to the resource it’s describing. No matter how elaborate or simple, these are still just the same type of resource, and should be exposed as such. Then there’s no polymorphism, it is a lot easier for developers in any language to work with your API, and your developers always know what they’re going to get in a response.

If your API is a contract between you and your developers, polymorphism is basically saying “I’m going to respond with whatever I feel like.” It’s a really unfair and untenable position to put developers into, and it’s something you should avoid.

Response Structures

It’s sometimes tempting to recreate the structure of your response for each and every resource, or even each and every action for each and every resource. After all, if the user is asking for a list of Articles, you should respond with a list of Articles, right? So it should look like this:

1 [
2 	{ title: "Article 1" },
3 	{ title: "Article 2" },
4 	{ title: "Article 3" }
5 ]

But the reality of the situation is a little more complicated than that.

Consider the Experience

When you’re writing an API, whose experience are you trying to optimise? For most APIs, it’s the client library authors who will see the gains of a better design. Optimising for the experience those authors will have is going to be one of the more important things you do in making your API usable.

Writing a client library consists of two main steps:

  1. Mirroring the resources in local data types (whether they be objects, classes, structs, or what have you) and added helper methods to manipulate those resources, so API internals like endpoints and methods aren’t strewn around your codebase. Basically, you create the Article type, then an Article.create("my title") function that makes the relevant API call.
  2. Writing the networking and deserialization code it takes to tie your API and the local data types together. This means making the actual HTTP request, serializing the request data before it’s sent, and deserializing the response data when it returns.

Note that serializing and deserializing the data are separated from the data type. That means you need to be able to serialize and deserialize things without knowing what they are in advance. In this way, request and responses are almost like resources themselves. The developer isn’t asking for a list of Articles, they’re asking for a Response that contains a list of Articles. If you give them a response structure that is consistent across all your resources, they only need to write the code to deserialize things once, and can just write it with the same code that makes the network requests. It makes their lives unbelievably easier, especially when you start to consider retrying requests and handling errors. The more complex the networking code, the more helpful it is to be able to think of whatever the server passes you as a Response resource containing the resources you requested.

Property-Per-Resource

For a lot of programming languages—Go, Python, Javascript, and pretty much every other language I’ve used personally—the easiest way to do this is to create a Response resource that contains a property for each resource in your API:

1 {
2 	"articles": [],
3 	"users": [],
4 	"comments": []
5 }

These are then optionally filled out per response, and the empty ones are stripped from the JSON before it’s sent. So a single response may look like this:

1 {
2 	"articles": [
3 		{title: "Article 1"},
4 		{title: "Article 2"},
5 		{title: "Article 3"}
6 	]
7 }

A response containing only a single Article would look like this:

1 {
2 	"articles": [
3 		{title: "Article 1"}
4 	]
5 }

Note that the Article is still in the array, even though there will only ever be one. This is because we want to avoid polymorphism; if the articles attribute contains an array of Article objects for one response, it should contain an array of Article objects for every response.

Requests are a little trickier. The same format applies, but it’s rare to have multiple resources in a single request. If you can, you should support this, as one is just a special case of many, but this can sometimes be prohibitive to support. As always, do what is right for your API.

But it can be confusing for clients to need to construct an array to create a single item, especially when you don’t support creating multiple items. The array may lead clients to believe that you can create multiple items, which is then an expectation you’ve broken. For requests, therefore, I have two properties for each resource: one, like responses, containing an array of that resource’s type; the other is a singular version that contains the resource directly. For example, the request object may look something like:

1 {
2 	"articles": [ { title: "New article" } ]
3 }

Or it could look like this:

1 {
2 	"article": { title: "New article" }
3 }

Note that the property is now “article”, the singular, to avoid polymorphism.

My recommendation, and what I do, is to support both formats. If the plural is not populated, then the singular is the fallback, as a crutch for clients. There comes a cost with supporting two ways to do something, but the Robustness Principle comes into play here. Make affordances for your developers, and they will love you for it.

Polymorphic “Data” Responses

As I was getting technical feedback on this, one of the reviewers (John Sheehan, CEO of Runscope) brought something to my attention—the attribute-per-resource method actually makes some people’s lives actively more difficult. The extremely illustrative example is the Twilio C# client. At the time of this writing, the client had to define 79 classes for the sole purpose of being able to map response attributes to classes. Essentially, when parsing the response, they needed a class for each response attribute, which led to the creation of 79 classes.

I’ve never written C#, so I can’t say if there’s an easier way to do this, but Twilio’s a company full of smart people—who, incidentally, are some of the best API evangelists in the world—and I feel pretty confident that if there was a better way, they’d be using it.

None of this looks fun. If it seems likely that your API will have an audience with C# developers, I’d strongly recommend against the property-per-resource approach. In that case, I’d actually recommend embracing polymorphism, as it would offer a better user experience.

So in this case, your response may actually look something like this:

1 {
2 	"data": [
3 		{title: "Article 1"},
4 		{title: "Article 2"},
5 		{title: "Article 3"}
6 	]
7 }

Notice that the data attribute has replaced our resource name, so we can now get to the JSON through data every time? The trade-off is that your client now needs to know what’s coming from the API, or needs their software to be able to intelligently detect it. In this case, it may be a good idea to supply a new content type (for use in Accept and Content-Type headers) for each of your resources, so the client can determine the type of your resource without parsing the JSON.

A request under this scheme, similarly, can be polymorphic as well. In this case, we don’t even need a wrapper in JSON. Just send the data:

1 {
2 	title: "My title"
3 }

For requests containing multiple resources, send the array:

1 [ { title: "Article 1" },
2 	{ title: "Article 2" },
3 	{ title: "Article 3" }]

Supporting a Mixed Crowd

What if you have an audience that is a mix of Go developers and C# developers? Someone has to lose, right?

Not necessarily. The Go developers don’t have to put up with the polymorphism, and the C# developers don’t need to spend entire days stubbing out classes just to get at JSON data.

These are, after all, simply two different representations of the same data. How do we support two different representations of the same data?

That’s right, headers. You can use a request header to signal that polymorphism is preferred or not supported, and a response header to signal that polymorphism is or is not present.

In the end, do what’s best for the developers in your audience. If you have 100 developers in your audience, and you can spend an hour saving them 15 minutes, technology just advanced 24 hours for free.

Paging Responses

It’s very rare for a client to want all the information you ever stored in the history of ever. It’s even rarer that you want to supply all the information you ever stored in the history of ever in a single request/response cycle.

Which is why APIs decided to page things, offering a single “page” of the response—returning the first N results, and telling you to request page 2 for the next N results, and so on.

Which is all well and good. This saves bandwidth and computation for everyone involved. I shudder to think what it would do to my phone to parse the JSON for every tweet in my stream every time I wanted to check for new tweets. That’s clearly more work than we need to be doing.

Note: be wary of race conditions while paging responses; make sure that you’re returning your resources sorted in such a manner that a new resource being created doesn’t affect paging already in process. A good way to do this is, instead of saying “show me page 2”, say “show me the 20 results following this resource ID”. That way, even when new resources are created, your clients still get the results they asked for, with no repetitions or skips. The downside of this approach is that it makes caching harder, as the “page” links change every time a new resource is created or a resource changes its position in the list. It also makes it harder to “jump to” certain pages; you can only say “show me the next (or previous) page”, not “show me the fifth page”.

We talked about how you can specify which page you want and the range of resources on each page in the last chapter, and the tradeoffs in that decision. But how do you return that information to the client in the response?

For the love of your developers, by default sort your data in the manner it is most often consumed. Don’t order tweets alphabetically or sort them by timestamp ascending. Let your sort order reflect the common use cases of the data.

A lot of the APIs that use query parameters to specify page ranges will then use a “page” parameter in the body to reflect that information in the response.

1 GET /things?page=3
2 
3 {
4   "things": [{...}, {...}],
5   "page": {
6     "current": 3,
7     "max": 12
8   }
9 }

Much like using the query parameters for specifying the page, this approach optimises for browser compatibility. You can view the request and responses entirely in a browser.

My preferred approach is to relegate meta information to the headers. Just like a request header is used to specify which page and range to return, a response header is used to specify which page and range were returned:

1 Page: 4
2 TotalPages: 12
3 Count: 40

Again, just make that tradeoff with your audience in mind.

Don’t Ignore My Accept Header

If a client specifically specifies an Accept header, denoting the type of information they’re capable of handling, you should abide by it. This allows the client to tell you which data format they prefer, but you should never default to a format the client has not specifically asked for. If a client sends an Accept header of application/json and you, for whatever reason, can only respond with XML, you should not just serve the XML. That’s a waste of everyone’s time and bandwidth, because the client likely can’t do anything with that data you just gave them.

Instead, use the 406 Not Acceptable status code in response. It’s a status code that is specifically intended for requests that the server is unable to fulfill because their Accept header asks for a data format the server doesn’t use or can’t provide.

Don’t Misuse Status Codes

On that note, you should be generally aware of the HTTP status codes you have available for your use, and their meanings. Don’t return 200 when a resource is created, return a 201. That’s what the separate code exists for. These codes are surfaced in pretty much every language and HTTP library, and provide a lot of information for how a response should be handled before the response is even received. Using them correctly will give clients a lot of information and requires very little effort on your part.

Wrapping Up

These are the basic necessities of serving requests and responses. You don’t need to do everything my way, but you should avoid the pitfalls we discussed above. These are not abstract, theoretical principles I derived from meditation or a very expensive degree; they’re hard lessons I’ve learned while writing API clients. If you ever think “Ah, nobody will care if this response type is polymorphic” (as I’m sure one just idly thinks all the time), I’m here as proof against your assertion. It hurts. Please don’t do these things, they make me cry.

To recap:

  • Polymorphism can make the lives of your developers much easier or much harder, and should not be undertaken lightly.
  • Requests and responses should be moulded to make life easier for your developers. Remember, supporting multiple representations of the same data can save hours for people.
  • Meta information in the response is undesirable, but may be unavoidable, depending on your preference for browser compatibility.
  • Accept headers are important and should be obeyed.
  • Status codes are powerful and can signify very specific things, so be aware of them and use them appropriately and specifically.

In the next chapter, we’re going to talk about errors for a bit. They’re one of the things that can make or break an API experience, so definitely stick around for that.

7 Errors

Errors are one of the most important parts of your API. They’re a large part of the reason I decided to write this book; they’re where the title comes from.

For most API authors, errors are an afterthought. The author designs the way the API works, not the way the API breaks, so this is only natural. However, in practice, handling errors turns out to be a vast majority of the code written when writing a (production-ready, at least) client library. Networks and software are tricky things; there are far more ways for things to go wrong than there are for things to go right. And nothing lowers the quality of a user experience like poorly-handled errors. “The server is down” should not crash the app, and “your request is bad and you should feel bad” is not a helpful error message.

I, personally, believe errors exist for one reason: to point out what the user needs to do to fix the error. If the client library has no way to present the error to the user that suggests one and only one possible course of action, the error is a bad error, in my mind.

There are dissenting opinions on this, and that’s okay. (That’s great. People disagreeing with me means people are thinking about APIs!)

Some people think that an error should also inform the developer what went wrong, to make development more convenient. They like to include links to API documentation about the error in the response.

Some people think that an error should help the user report it, by generating a unique identifier that the user can refer to in a support case.

These are all valid approaches to solving the problem of creating a useful error. None of these are bad APIs—they’re just not my style of API. But not everyone has the same style, and that’s ok.

Again: there is no one right way to write an API. There are just lots and lots of wrong ways to do it.

So what does a bad error look like, then?

Nothing but a message

A lot of APIs will return an error that looks something like this:

1 {
2 	"error": "That username is taken.",
3 	"code": 400,
4 }

They return a string telling you what went wrong, and repeat the HTTP status code (…sometimes. Sometimes you can even just get the string).

Why is this a problem?

This is a problem because when you’re trying to write a client to gracefully handle these errors, you basically have four options:

  • Use string matching on the entire string ("That username is taken." == resp) to detect the error. This is problematic because API authors that return errors like this aren’t known for… consistency. An update might remove that period, or lowercase the first word. And then your client breaks. Definitely not ideal.
  • Use string matching on a substring (resp.contains("username is taken")) to detect the error. Once again, you’re dependent on the API author’s consistency in something that is probably not considered to be constant. In general, matching human-readable strings or substrings to detect the type of error is an API smell—it means back away slowly, you’re going to have a Bad Time™.
  • Don’t handle the error at all; just pass it on to the user. I’m sure your users will love you for this. I hope your API isn’t in multiple languages (or a language other than the one the API authors speak) and that the API authors are good at writing user-facing error messages.
  • Hope that error doesn’t occur. Ignore it, and cross your fingers.

You need an error that makes sense to a machine, because a machine is the one that’s going to have to handle it. A human readable message is basically just saying “your request is bad and you should feel bad”.

Stack trace as a service

Some API authors just fall back on the built-in error handling of their framework or stack, a fallback that usually includes some debugging information.

Like a stack trace.

A stack trace is a trace of the calls that produced the errors. It’s incredibly convenient for debugging.

It is not incredibly useful as an API result. It is in fact, as close to useless as you can get without failing to send a response at all. Developers can’t use string matching on it. Developers can’t pass it on to the user. All they can do is say “well, that didn’t work.”

Even worse, this leaks information about your system. It is a security risk. You can accidentally tell users where code or sensitive data lives, what version of a framework or library you’re running (which helps attackers find known vulnerabilities), or a bunch of other data that a user has no legitimate reason to know.

This isn’t a security book, and I’m not a security expert, but here’s a security tip: never share more information about your server or API than a user has a legitimate reason to know.

Never rely on the development server’s default response as your error response.

Everything’s OK

Status codes mean things. A status code between 200 and 299, inclusive, means that things are OK, there was no error. A status code between 400 and 499 means the request is problematic, and no matter how many times you try it, it isn’t going to work. A status code between 500 and 599 means that the server had a problem—it makes no claim about the validity of the request. Retrying these requests can actually yield a successful result.

That’s kind of an important thing to know, as a client library author. What should I retry? What shouldn’t I?

Some API developers helpfully respond with a 200 OK status code to everything. Success? 200 OK. Error? 200 OK. This is incredibly frustrating, because you’re then required to mix your networking code (retry!) and your application logic.

Always use the most appropriate status code for your response. We’ve covered this previously, but it bears repeating. Status codes are important, there are a lot of them, and they are incredibly powerful.

Recommendations from the author

So how do I handle errors? I like to treat errors like I do any other errors:

1 {
2 	"errors": [
3 		{
4 			// an error resource
5 		}
6 	]
7 }

That way, I can return multiple errors for a single request, if I know multiple things were wrong with it, saving network trips for the client. It also matches the decoding logic for the rest of the response, making it easy and intuitive to parse.

Consistency and orthogonality are the hallmarks of an API that is simple to use and reason about.

Each error resource consists of two fields:

  • An application-specific error code. This is a description of what went wrong; consider it a resource-agnostic way of describing an invalid state. I usually have things like:
    • overflow: for errors that signify a value is too large. Causes could be a string that is too long, a number that is too high, an array containing too many elements.
    • insufficient: for errors that signify a value is too small. Causes could be a string that is too short, a number that is too small, an array containing too few elements.
    • invalid_format: for errors that signify a formatting problem. Causes could be an error in the JSON formatting, a string where a number is expected, etc.
    • invalid_value: for errors that signify a valid format, but invalid value. Causes could be an invalid character in a string, a value outside an accepted set of values, or a value that violates application-specific rules. While these are only a few of the codes, I tend to have between ten and twenty of these per application.
  • A field or parameter. A field is a JSON pointer to the element in the JSON request body that caused the error. A parameter is the URL parameter that cased the error.

These two fields, combined, allow the client to pinpoint the precise cause of the error and suggest a single fix to the user. The error code tells the user what’s wrong, the cause tells the user what needs to be fixed. The client can then display, next to the invalid user input, a suggestion that would resolve the error code. An overflow error may cause the client to suggest removing characters from a string. An insufficient may prompt the client to select an item to act on.

There are some possible variations on this. The error code could be a URL—because the URL is unique, it serves the same function as a constant string. The URL can then also be used to point a developer towards documentation. You could also include an instance ID that the user can use to report an error or you can use to track errors.

Or you can use something different entirely. It’s up to you. Just try to give your clients useful information.

For bonus points

If you want to go above and beyond and create an API that developers will compliment ad-nauseum, take care when documenting your API. Not only should you document what request is expected and its options, and what response is sent and its variations, you should also document the possible errors. This is a lot of work to maintain and do, but it is so incredibly useful for an application developer that cares about handling all the possible errors as elegantly as possible.

Recap

So what have we learned this chapter?

  • String matching error responses to figure out what went wrong is an API smell.
  • Status codes are Really Darn Important. Use them correctly so your API consumers can handle responses intelligently.
  • Stack traces as errors are not only useless, they’re security risks.
  • A good approach is to have errors describe what went wrong and what input caused it, but you’re welcome to use what works for you

Errors are one of the things most APIs overlook. Don’t overlook them in yours, and your developers will thank you for it.

Our next chapter is going to talk a bit about authentication. It’ll be a good time. Pinky swear.