Pure functional HTTP APIs in Scala
Table of Contents
- Our use case
- The state of the art
- Maybe there is another way
Thank you for your interest in this book. I hope you’ll have a good time reading it and learn something from it.
This book is intended for the intermediate Scala programmer who is interested in functional programming and works mainly on the web service backend side. Ideally she has experience with libraries like Akka HTTP1 and Slick2 which are in heavy use in that area.
However maybe you have wondered if we can’t do better even though aforementioned projects are battle tested and proven.
The answer to this can be found in this book which is intended to be read from cover to cover in the given order. Within the book the following libraries will be used: Cats3, Cats Effect4, http4s5, Doobie6, Refined7, fs28 and probably others. ;-)
This book uses the Creative Commons Attribution ShareAlike 4.0 International (CC BY-SA 4.0) license1. The code snippets in this book are licensed under CC02 which means you can use them without restriction. Excerpts from libraries maintain their license.
I would like to thank my beloved wife and family who bear with me and make all of this possible.
Also I send a big thank you to alle the nice people from the Scala community which I’ve had the pleasure to meet.
For better understanding we will implement a small use case in both impure and pure way. The following section will outline the specification.
First we need to specify the exact scope and API of our service. We’ll design a service with a minimal API to keep things simple. It shall fulfil the following requirements.
The service shall provide HTTP API endpoints for:
- the creation of a product data type identified by a unique id
- adding translations for a product name by language code and unique id
- returning the existing translations for a product
- returning a list of all existing products with their translations
We will keep the model very simple to avoid going overboard with the implementation.
- A language code shall be defined by the ISO 639-1 (e.g. a two letter code).
- A translation shall contain a language code and a product name (non-empty string).
- A product shall contain a unique id (UUID version 4) and a list of translations.
The data will be stored in a relational database (RDBMS). Therefore we need to define the tables and relations within the database.
products must contain only the unique id which is also the primary key.
names must contain a column for the product id, one for the language code and one for the name. Its primary key is the combination of the product id and the language code. All columns must not be null. The relation to the products is realised by a foreign key constraint to the
products table via the product id.
The HTTP API shall provide the following endpoints on the given paths:
||POST||Create a product.|
||GET||Get all products and translations.|
||GET||Get all translations for the product.|
The data shall be encoded in JSON using the following specification:
This should be enough to get us started.
Within the Scala ecosystem the Akka-HTTP library is a popular choice for implementing server side backends for HTTP APIs. Another quite popular option is the Play framework but using a full blown web framework to just provide a thin API is overkill in most cases. As most services need a database the Slick library is another popular choice which completes the picture.
However while all mentioned libraries are battle tested and proven they still have problems.
In the domain of functional programming we want referential transparency which we will define in the following way:
Building on that we need pure functions which are
- only dependent on their input
- have no side effects
This means in turn that our functions will be referential transparent.
But, the mentioned libraries are built upon the
Future from Scala which uses eager evaluation and breaks referential transparency. Let’s look at an example.
The code above will print the text
Hi there! two times. But how about the following one?
Instead of printing the text two times it will print it only once even when there is no usage of
printF at all (try omitting the for comprehension). This means that
Future breaks referential transparency!
If we want referential transparency, we must push the side effects to the boundaries of our system (program) which can be done by using lazy evaluation. Let’s repeat the previous example in a different way.
The above code will produce no output. Only if we evaluate the variable
effect which is of type
IO[Unit] will the output be generated (try
effect.unsafeRunSync in the REPL). Also the second approach works like expected.
Suddenly we can much more easily reason about our code! And why is that? Well we don’t have unexpected side effects caused by code running even when it doesn’t need to. This is a sneak peak how pure code looks like. Now we only need to implement pure libraries for our use, or do we?
Luckily for us meanwhile there are several pure options available in the Scala ecosystem. We will stick to the cats family of libraries namely http4s and Doobie as replacements for Akka-HTTP and Slick. They build upon the Cats Effect library which is an implementation of an IO monad for Scala. Some other options exist but we’ll stick to the one from Cats.
To be able to contrast both ways of implementing a service we will first implement it using Akka-HTTP and Slick and will then migrate to http4s and Doobie.
We’ll be using the following libraries for the impure version of the service:
- Akka (including Akka-HTTP and Akka-Streams)
- Slick (as database layer)
- Flyway for database migrations (or evolutions)
- Circe for JSON codecs and akka-http-json as wrapper
- Refined for using refined types
- the PostgreSQL JDBC driver
I’ll spare you the sbt setup as you can look that up in the code repository (e.g. the
impure folder in the book repo).
First we’ll implement our models which are simple and straightforward. At first we need a class to store our translations or better a single translation.
Technically it is okay but we have a bad feeling about it. Using
Option[String] is of no use because both fields have to be set. But a
String can always be
null and contain a lot of unexpected stuff (literally anything).
So let us define some refined types which we can use later on. At first we need a language code which obeys the restrictions of ISO-639-1 and we need a stronger definition for a product name. For the former we use a regular expression and for the latter we simply expect a string which is not empty.
Now we can give our translation model another try.
Much better and while we’re at it we can also write the JSON codecs using the refined module of the Circe library. We put them into the companion object of the model.
Now onwards to the product model. Because we already know of refined types we can use them from start here.
If we look closely we realise that a
List maybe empty. Which is valid for the list but not for our product because we need at least one entry. Luckily for us the cats library has us covered with the
NonEmptyList data type. Including the JSON codecs this leads us to our final implementation.
Last but not least we really should be using the existing
UUID data type instead of rolling our own refined string version - even when it is cool. ;-)
We kept the type name
ProductId by using a type alias. This is convenient but remember that a type alias does not add extra type safety (e.g.
type Foo = String will be a
Now we have the models covered and can move on to the database layer.
The database layer should provide a programmatic access to the database but also should it manage changes in the database. The latter one is called migrations or evolutions. From the available options we chose Flyway as the tool to manage our database schema.
Flyway uses raw SQL scripts which have to be put into a certain location being
/db/migration (under the
resources folder) in our case. Also the files have to be named like
XX being a number) starting with
V1. Please note that there are two underscores between the version prefix and the rest of the name! Because our database schema is very simply we’re done quickly:
In the code you’ll see that we additionally set comments which are omitted from the code snippet above. This might be overkill here but it is a very handy feature to have and I advice you to use it for more complicated database schemas. Because the right comment (read information) in the right place might save a lot of time when trying to understand things.
Next we move on to the programmatic part which at first needs a configuration of our database connection. With Slick you have a multitude of options but we’ll use the “Typesafe Config”1 approach.
After we have this in place we can run the migrations via the API of Flyway. For this we have to load the configuration (we do it by creating an actor system), extract the needed information and create a JDBC url and use that with username and password to obtain a Flyway instance. On that one we simply call the method
migrate() which will do the right thing. Basically it will check if the schema exists and decide to either create it, apply pending migrations or simply do nothing. The method will return the number of applied migrations.
Let us continue to dive into the Slick table definitions.
Slick offers several options for approaching the database. For our example we will be using the lifted embedding but if needed Slick also provides the ability to perform plain SQL queries.
For the lifted embedding we have to define out tables in a way Slick can understand. While this can be tricky under certain circumstances our simple model is straightforward to implement.
As you can see above we’re using simple data types (not the refined ones) to have a more easy Slick implementation. However we can also use refined types for the price of using either the slick-refined library or writing custom column mappers.
Next we’ll implement the table for the translations which will also need some constraints.
As you can see the definition of constraints is also pretty simple. Now our repository needs some functions for a more convenient access to the data.
The last two functions are helpers to enable us to create a load queries which we can compose. They are used in the
updateProduct functions to create a list of queries that are executed as bulk while the call to
transactionally ensures that they will run within a transaction. When updating a product we first delete all existing translations to allow the removal of existing translations via an update.
loadProduct function simply returns a list of database rows from the needed join. Therefore we need a function which builds a
Product type out of that.
So far we should have everything in place to make use of our database. Now we need to wire it all together.
Defining the routes is pretty simple if you’re used to the Akka-HTTP routing DSL syntax.
We will fill in the details later on. But now for starting the actual server to make use of our routes.
The code will fire up a server using the defined routes and hostname and port from the configuration to start a server. It will run until you press enter and then terminate. Let us now visit the code for each routing endpoint. We will start with the one for returning a single product.
We load the raw product data from the repository and convert it into a proper product model. But to make the types align we have to wrap the second call in a
Future otherwise we would get a compiler error. We don’t need to marshal the response because we are using the akka-http-json library which provides for example an
ErrorAccumulatingCirceSupport import that handles this. Unless of course you do not have circe codecs defined for your types.
The route for updating a product is also very simple. We’re extracting the product entity via the
entity(as[T]) directive from the request body and simply give it to the appropriate repository function. Now onwards to creating a new product.
As you can see the function is basically the same except that we’re calling a different function from the repository. Last but not least let us take a look at the return all products endpoint.
This looks more complicated that the other endpoints. So what exactly are we doing here?
Well first we load the raw product data from the repository. Afterwards we convert it into the proper data model or to be more exact into a list of product entities.
The first thing that comes to mind is that we’re performing operations in memory. This is not different from the last time when we converted the data for a single product. Now however we’re talking about all products which may be a lot of data. Another obvious point is that we get a list of
Option[Product] which we explicitly flatten at the end.
Maybe we should consider streaming the results. But we still have to group and combine the rows which belong to a single product into a product entity. Can we achieve that with streaming? Well, let’s look at our data flow.
We receive a list of 3 columns from the database in the following format:
product id, language code, name. The tricky part being that multiple rows (list entries) can belong to the same product recognizable by the same value for the first column
product id. At first we should simplify our problem by ensuring that the list will be sorted by the
product id. This is done by adjusting the function
loadProducts in the repository.
Now we can rely on the fact that we have seen all entries for one product if the product id in our list changes. Let’s adjust our code in the endpoint to make use of streaming now. Because Akka-HTTP is based on Akka-Streams we can simply use that.
Wow, this may look scary but let’s break it apart piece by piece. At first we need an implicit value which provides streaming support for JSON. Next we create a
Source from the database stream. Now we implement the processing logic via the high level streams API. We collect every defined output of our helper function
fromDatabase which leads to a stream of
Product entities. But we have created way too many (Each product will be created as often as it has translations.). So we group our stream by the product id which creates a new stream for each product id holding only the entities for the specific product. We fold over each of these streams by merging together the list of translations (
names). Afterwards we merge the streams back together and run another collect function to simply get a result stream of
Product and not of
Option[Product]. Last but not least the stream is passed to the
complete function which will do the right thing.
The solution has two problems:
- The number of individual streams (and thus products) is limited to
groupByoperator holds the references to these streams in memory opening a possible out of memory issue here.
As the first problem is simply related to the usage of
groupBy we may say that we only have one problem: The usage of
For a limited amount of data the proposed solution is perfectly fine so we will leave it as is for now.
Regarding the state of our service we have a working solution, so congratulations and let’s move on to the pure implementation.
Like in the previous section I will spare you the details of the sbt setup. We will be using the following set of libraries:
- Doobie (as database layer)
- Flyway for database migrations (or evolutions)
- Circe for JSON codecs
- Refined for using refined types
- the PostgreSQL JDBC driver
- pureconfig (for proper configuration loading)
Last time we simply loaded our configuration via the typesafe config library but can’t we do a bit better here? The answer is yes by using the pureconfig1 library. First we start by implementing the necessary parts of our configuration as data types.
As we can see the code is pretty simple. The implicits in the companion objects are needed for pureconfig to actually map from a configuration to your data types. As you can see we are using a function
deriveReader which will derive (like in mathematics) the codec (Yes, it is similar to a JSON codec thus the name.) for us.
Because we have already written our models we just re-use them here. The only thing we change is the semi automatic derivation of the JSON codecs. We just need to import the appropriate circe package and call the derive functions.
In general the same applies to the database layer as we have already read in the “impure” section.
For the sake of simplicity we will stick to Flyway for our database migrations. However we will wrap the migration code in a different way (read Encapsulate it properly within an
IO to defer side effects.). While we’re at it we may just as well write our migration code using the interpreter pattern (it became famous under the name “tagless final” in Scala).
We define a trait which describes the functionality desired by our interpreter and use a higher kinded type parameter to be able to abstract over the type. But now let’s continue with our Flyway interpreter.
As we can see, the implementation is pretty simple and we just wrap our code into an
IO monad to constrain the effect. Having the migration code settled we can move on to the repository.
If we take a closer look at the method definition of
Flyway.migrate, we see this:
IO will gladly defer side effects for us it won’t stop enclosed code from throwing exceptions. This is not that great. So what can we do about it?
Having an instance of
MonadError in scope we could just use the
.attempt function provided by it. But is this enough or better does this provide a sensible solution for us? Let’s play a bit on the REPL.
This looks like we just have to use
MonadError then. Hurray, we don’t need to change our code in the migrator. As model citizens of the functional programming camp we just defer the responsibility upwards to the calling site.
As we already started with using a tagless final approach we might as well continue with it and define a base for our repository.
There is nothing exciting here except that we feel brave now and try to use proper refined types in our database functions. This is possible due to the usage of the doobie-refined module. To be able to map the
UUID data type (and others) we also need to include the doobie-postgresql module. For convenience we are still using
ProductId instead of
UUID in our definition. In addition we wire the return type of
loadProducts to be a
fs2.Stream because we want to achieve pure functional streaming here. :-)
So let’s see what a repository using doobie looks like.
We keep our higher kinded type as abstract as we can but we want it to be able to suspend our side effects. Therefore we require an implicit
If we look at the detailed function definitions further below, the first big difference is that with doobie you write plain SQL queries. You can do this with Slick too3 but with doobie it is the only way. If you’re used to object relational mapping (ORM) or other forms of query compilers then this may seem strange at first. But: “In data processing it seems, all roads eventually lead back to SQL!”4 ;-)
We won’t discuss the benefits or drawbacks here but in general I also lean towards the approach of using the de facto lingua franca for database access because it was made for this and so far no query compiler was able to beat hand crafted SQL in terms of performance. Another benefit is that if you ask a database guru for help, she will be much more able to help you with plain SQL queries than with some meta query which is compiled into something that you have no idea of.
loadProduct function simply returns all rows for a single product from the database like its Slick counterpart in the impure variant. The parameter will be correctly interpolated by Doobie therefore we don’t need to worry about SQL injections here. We specify the type of the query, instruct Doobie to transform it into a sequence and give it to the transactor.
loadProducts function is equivalent to the first one but it returns the data for all products sorted by product and as a stream using the fs2 library which provides pure functional streaming.
When saving a product we use monadic notation for our program to have it short circuit in the case of failure. Doobie will also put all commands into a database transaction. The function itself will try to create the “master” entry into the products table and save all translations afterwards.
updateProduct function uses also monadic notation like the
saveProduct function we talked about before. The difference is that it first deletes all known translations before saving the given ones.
The routing DSL of http4s differs from the one of Akka-HTTP. Although I like the latter one more it poses no problem to model out a base for our routes.
As we can see the DSL is closer to Scala syntax and quite easy to read. But before we move on to the details of each route let’s think about how we can model this a bit more abstract. While it is fine to have our routes bound to
IO it would be better to have more flexibility here. We have several options here but for starters we just extract our routes into their own classes like in the following schema.
So far they only need the repository to access and manipulate data. Now let’s take on the single route implementations.
First we need to bring JSON codecs in scope for http4s thus the implicit definitions on top of the file. In the route for loading a single product we simply load the database rows which we pipe through our helper function to construct a proper
Product and return that.
The update route (via
PUT) transforms the request body into a
Product and gives that to the update function of the repository. Finally a
NoContent response is returned.
Our first take on the routes for products looks pretty complete already. Again we need implicit definitions for our JSON codecs to be able to serialize and de-serialize our entities. The
POST route for creating a product is basically the same as the update route from the previous part. We create a
Product from the request body, pass it to the save function of the repository and return a 205
GET route for returning all products calls the appropriate repository function which returns a stream which we map over using our helper function. Afterwards we use
collect to convert our stream from
Option[Product] to a stream of
Product which we pass to the
Ok function of http4s.
To solve this we need to dive into the fs2 API and leverage it’s power to merge our products back together. So let’s see how we do.
Because we believe ourselves to be clever we pick the simple sledge hammer approach and just run some accumulator on the stream. So what do we need? A helper function and some code changes on the stream (e.g. in the route).
So this function will take a list (that may be empty) and a product and will merge the top most element (the head) of the list with the given one. It will return an updated list that either contains an updated head element or a new head. Leaving aside the question of who guarantees that the relevant list element will always be the head, we may use it.
Looks so simple, does it? Just a simple
fold which uses our accumulator and we should be settled. But life is not that simple…
The compiler complains that we have changed the type of the stream and rightly so. So let’s fix that compiler error.
Let’s take a look again and think about what it means to change a stream of products into a stream of a list of products. It means that we will be building the whole thing in memory! Well if we wanted that we could have skipped streaming at all. So back to the drawing board.
We need to process our stream of database columns (or products if we use the converter like before) in such a way that all related entities will be grouped into one product and emitted as such.
About this book
Maybe there is another way