Foreword

Thank you for your interest in this book. I hope you’ll have a good time reading it and learn something from it.

This book is intended for the intermediate Scala programmer who is interested in functional programming and works mainly on the web service backend side. Ideally she has experience with libraries like Akka HTTP1 and Slick2 which are in heavy use in that area.

However maybe you have wondered if we can’t do better even though aforementioned projects are battle tested and proven.

The answer to this can be found in this book which is intended to be read from cover to cover in the given order. Within the book the following libraries will be used: Cats3, Cats Effect4, http4s5, Doobie6, Refined7, fs28, tapir9, Monocle10 and probably others. ;-)

Code and book source can be found in the following repository: https://github.com/jan0sch/pfhais

Copyleft Notice

This book uses the Creative Commons Attribution ShareAlike 4.0 International (CC BY-SA 4.0) license11. The code snippets in this book are licensed under CC012 which means you can use them without restriction. Excerpts from libraries maintain their license.

Thanks

I would like to thank my beloved wife and family who bear with me and make all of this possible.
Also I send a big thank you to alle the nice people from the Scala community which I’ve had the pleasure to meet. Special thanks go to Adam Warski (tapir), Frank S. Thomas (refined), Julien Truffaut (Monocle) and Ross A. Baker (http4s) for their help, advise and patience.

Our use case

For better understanding we will implement a small use case in both impure and pure way. The following section will outline the specification.

Service specification

First we need to specify the exact scope and API of our service. We’ll design a service with a minimal API to keep things simple. It shall fulfil the following requirements.

The service shall provide HTTP API endpoints for:

1. the creation of a product data type identified by a unique id
2. adding translations for a product name by language code and unique id
3. returning the existing translations for a product
4. returning a list of all existing products with their translations

Data model

We will keep the model very simple to avoid going overboard with the implementation.

1. A language code shall be defined by the ISO 639-1 (e.g. a two letter code).
2. A translation shall contain a language code and a product name (non-empty string).
3. A product shall contain a unique id (UUID version 4) and a list of translations.

Database

The data will be stored in a relational database (RDBMS). Therefore we need to define the tables and relations within the database.

The products table

The table products must contain only the unique id which is also the primary key.

The names table

The table names must contain a column for the product id, one for the language code and one for the name. Its primary key is the combination of the product id and the language code. All columns must not be null. The relation to the products is realised by a foreign key constraint to the products table via the product id.

HTTP API

The HTTP API shall provide the following endpoints on the given paths:

Path HTTP method Function
/products POST Create a product.
/products GET Get all products and translations.
/product/{UUID} PUT Add translations.
/product/{UUID} GET Get all translations for the product.

The data shall be encoded in JSON using the following specification:

This should be enough to get us started.

The state of the art

Within the Scala ecosystem the Akka-HTTP library is a popular choice for implementing server side backends for HTTP APIs. Another quite popular option is the Play framework but using a full blown web framework to just provide a thin API is overkill in most cases. As most services need a database the Slick library is another popular choice which completes the picture.

However while all mentioned libraries are battle tested and proven they still have problems.

Problems

In the domain of functional programming we want referential transparency which we will define in the following way:

Building on that we need pure functions which are

1. only dependent on their input
2. have no side effects

This means in turn that our functions will be referential transparent.

But, the mentioned libraries are built upon the Future from Scala which uses eager evaluation and breaks referential transparency. Let’s look at an example.

The code above will print the text Hi there! two times. But how about the following one?

Instead of printing the text two times it will print it only once even when there is no usage of printF at all (try omitting the for comprehension). This means that Future breaks referential transparency!

Maybe there is another way

If we want referential transparency, we must push the side effects to the boundaries of our system (program) which can be done by using lazy evaluation. Let’s repeat the previous example in a different way.

The above code will produce no output. Only if we evaluate the variable effect which is of type IO[Unit] will the output be generated (try effect.unsafeRunSync in the REPL). Also the second approach works like expected.

Suddenly we can much more easily reason about our code! And why is that? Well we don’t have unexpected side effects caused by code running even when it doesn’t need to. This is a sneak peak how pure code looks like. Now we only need to implement pure libraries for our use, or do we?

Luckily for us meanwhile there are several pure options available in the Scala ecosystem. We will stick to the Cats family of libraries namely http4s and Doobie as replacements for Akka-HTTP and Slick. They build upon the Cats Effect library which is an implementation of an IO monad for Scala. Some other options exist but we’ll stick to the one from Cats.

To be able to contrast both ways of implementing a service we will first implement it using Akka-HTTP and Slick and will then migrate to http4s and Doobie.

Impure implementation

We’ll be using the following libraries for the impure version of the service:

1. Akka (including Akka-HTTP and Akka-Streams)
2. Slick (as database layer)
3. Flyway for database migrations (or evolutions)
4. Circe for JSON codecs and akka-http-json as wrapper
5. Refined for using refined types
6. the PostgreSQL JDBC driver

I’ll spare you the sbt setup as you can look that up in the code repository (i.e. the impure folder in the book repo).

Models

First we’ll implement our models which are simple and straightforward. At first we need a class to store our translations or better a single translation.

Technically it is okay but we have a bad feeling about it. Using Option[String] is of no use because both fields have to be set. But a String can always be null and contain a lot of unexpected stuff (literally anything).

So let us define some refined types which we can use later on. At first we need a language code which obeys the restrictions of ISO-639-1 and we need a stronger definition for a product name. For the former we use a regular expression and for the latter we simply expect a string which is not empty.

Now we can give our translation model another try.

Much better and while we’re at it we can also write the JSON codecs using the refined module of the Circe library. We put them into the companion object of the model.

Now onwards to the product model. Because we already know of refined types we can use them from start here.

If we look closely we realise that a List maybe empty. Which is valid for the list but not for our product because we need at least one entry. Luckily for us the Cats library has us covered with the NonEmptyList data type. Including the JSON codecs this leads us to our final implementation.
Last but not least we really should be using the existing UUID data type instead of rolling our own refined string version - even when it is cool. ;-)

We kept the type name ProductId by using a type alias. This is convenient but remember that a type alias does not add extra type safety (e.g.Â type Foo = String will be a String).

Well, maybe because a list may contain duplicate entries but the database will surely not because of unique constraints! So, let’s switch to a NonEmptySet which is also provided by Cats.

Now we have the models covered and can move on to the database layer.

Database layer

The database layer should provide a programmatic access to the database but also should it manage changes in the database. The latter one is called migrations or evolutions. From the available options we chose Flyway as the tool to manage our database schema.

Migrations

Flyway uses raw SQL scripts which have to be put into a certain location being /db/migration (under the resources folder) in our case. Also the files have to be named like VXX__some_name.sql (XX being a number) starting with V1. Please note that there are two underscores between the version prefix and the rest of the name! Because our database schema is very simply we’re done quickly:

In the code you’ll see that we additionally set comments which are omitted from the code snippet above. This might be overkill here but it is a very handy feature to have and I advice you to use it for more complicated database schemas. Because the right comment (read information) in the right place might save a lot of time when trying to understand things.

Next we move on to the programmatic part which at first needs a configuration of our database connection. With Slick you have a multitude of options but we’ll use the “Typesafe Config”1 approach.

After we have this in place we can run the migrations via the API of Flyway. For this we have to load the configuration (we do it by creating an actor system), extract the needed information and create a JDBC url and use that with username and password to obtain a Flyway instance. On that one we simply call the method migrate() which will do the right thing. Basically it will check if the schema exists and decide to either create it, apply pending migrations or simply do nothing. The method will return the number of applied migrations.

Let us continue to dive into the Slick table definitions.

Slick tables

Slick offers several options for approaching the database. For our example we will be using the lifted embedding but if needed Slick also provides the ability to perform plain SQL queries.
For the lifted embedding we have to define out tables in a way Slick can understand. While this can be tricky under certain circumstances our simple model is straightforward to implement.

As you can see above we’re using simple data types (not the refined ones) to have a more easy Slick implementation. However we can also use refined types for the price of using either the slick-refined library or writing custom column mappers.
Next we’ll implement the table for the translations which will also need some constraints.

As you can see the definition of constraints is also pretty simple. Now our repository needs some functions for a more convenient access to the data.

The last two functions are helpers to enable us to create a load queries which we can compose. They are used in the saveProduct and updateProduct functions to create a list of queries that are executed as bulk while the call to transactionally ensures that they will run within a transaction. When updating a product we first delete all existing translations to allow the removal of existing translations via an update. To be able to do so we use the andThen helper from Slick.
The loadProduct function simply returns a list of database rows from the needed join. Therefore we need a function which builds a Product type out of that.

But oh no! The compiler refuses to build it:

It seems we have to provide an instance of Order for our Translation model to make Cats happy. So we have think of an ordering for our model. A simple approach would be to simply order by the language code. Let’s try this:

You might have noticed the explicit call to .value to get the underlying string instance of our refined type. This is needed because the other option (using x.compare(y)) will compile but bless you with stack overflow errors. The reason is probably that the latter is compiled into code calling OrderOps#compare which is recursive.

So far we should have everything in place to make use of our database. Now we need to wire it all together.

Akka-HTTP routes

Defining the routes is pretty simple if you’re used to the Akka-HTTP routing DSL syntax.

We will fill in the details later on. But now for starting the actual server to make use of our routes.

The code will fire up a server using the defined routes and hostname and port from the configuration to start a server. It will run until you press enter and then terminate. Let us now visit the code for each routing endpoint. We will start with the one for returning a single product.

We load the raw product data from the repository and convert it into a proper product model. But to make the types align we have to wrap the second call in a Future otherwise we would get a compiler error. We don’t need to marshal the response because we are using the akka-http-json library which provides for example an ErrorAccumulatingCirceSupport import that handles this. Unless of course you do not have circe codecs defined for your types.

The route for updating a product is also very simple. We’re extracting the product entity via the entity(as[T]) directive from the request body and simply give it to the appropriate repository function. Now onwards to creating a new product.

As you can see the function is basically the same except that we’re calling a different function from the repository. Last but not least let us take a look at the return all products endpoint.

This looks more complicated that the other endpoints. So what exactly are we doing here?
Well first we load the raw product data from the repository. Afterwards we convert it into the proper data model or to be more exact into a list of product entities.

The first thing that comes to mind is that we’re performing operations in memory. This is not different from the last time when we converted the data for a single product. Now however we’re talking about all products which may be a lot of data. Another obvious point is that we get a list of Option[Product] which we explicitly flatten at the end.

Maybe we should consider streaming the results. But we still have to group and combine the rows which belong to a single product into a product entity. Can we achieve that with streaming? Well, let’s look at our data flow.
We receive a list of 3 columns from the database in the following format: product id, language code, name. The tricky part being that multiple rows (list entries) can belong to the same product recognizable by the same value for the first column product id. At first we should simplify our problem by ensuring that the list will be sorted by the product id. This is done by adjusting the function loadProducts in the repository.

Now we can rely on the fact that we have seen all entries for one product if the product id in our list changes. Let’s adjust our code in the endpoint to make use of streaming now. Because Akka-HTTP is based on Akka-Streams we can simply use that.

Wow, this may look scary but let’s break it apart piece by piece. At first we need an implicit value which provides streaming support for JSON. Next we create a Source from the database stream. Now we implement the processing logic via the high level streams API. We collect every defined output of our helper function fromDatabase which leads to a stream of Product entities. But we have created way too many (Each product will be created as often as it has translations.). So we group our stream by the product id which creates a new stream for each product id holding only the entities for the specific product. We fold over each of these streams by merging together the list of translations (names). Afterwards we merge the streams back together and run another collect function to simply get a result stream of Product and not of Option[Product]. Last but not least the stream is passed to the complete function which will do the right thing.

Problems with the solution

The solution has two problems:

1. The number of individual streams (and thus products) is limited to Int.MaxValue.
2. The groupBy operator holds the references to these streams in memory opening a possible out of memory issue here.

As the first problem is simply related to the usage of groupBy we may say that we only have one problem: The usage of groupBy. ;-)
For a limited amount of data the proposed solution is perfectly fine so we will leave it as is for now.

Regarding the state of our service we have a working solution, so congratulations and let’s move on to the pure implementation.

Pure implementation

Like in the previous section I will spare you the details of the sbt setup. We will be using the following set of libraries:

1. http4s
2. Doobie (as database layer)
3. Flyway for database migrations (or evolutions)
4. Circe for JSON codecs
5. Refined for using refined types
6. the PostgreSQL JDBC driver

Pure configuration handling

Last time we simply loaded our configuration via the typesafe config library but can’t we do a bit better here? The answer is yes by using the pureconfig1 library. First we start by implementing the necessary parts of our configuration as data types.

As we can see the code is pretty simple. The implicits in the companion objects are needed for pureconfig to actually map from a configuration to your data types. As you can see we are using a function deriveReader which will derive (like in mathematics) the codec (Yes, it is similar to a JSON codec thus the name.) for us.

Below is an example of deriving a Order instance using the kittens 2library. It uses shapeless under the hood and provides automatic and semi automatic derivation for a lot of type class instances from Cats like Eq, Order, Show, Functor and so on.

Models

Because we have already written our models we just re-use them here. The only thing we change is the semi automatic derivation of the JSON codecs. We just need to import the appropriate circe package and call the derive functions.

Database layer

In general the same applies to the database layer as we have already read in the “impure” section.

Migrations

For the sake of simplicity we will stick to Flyway for our database migrations. However we will wrap the migration code in a different way (read Encapsulate it properly within an IO to defer side effects.). While we’re at it we may just as well write our migration code using the interpreter pattern (it became famous under the name “tagless final” in Scala).

We define a trait which describes the functionality desired by our interpreter and use a higher kinded type parameter to be able to abstract over the type. But now let’s continue with our Flyway interpreter.

As we can see, the implementation is pretty simple and we just wrap our code into an IO monad to constrain the effect. Having the migration code settled we can move on to the repository.

If we take a closer look at the method definition of Flyway.migrate, we see this:

While IO will gladly defer side effects for us it won’t stop enclosed code from throwing exceptions. This is not that great. So what can we do about it?
Having an instance of MonadError in scope we could just use the .attempt function provided by it. But is this enough or better does this provide a sensible solution for us? Let’s play a bit on the REPL.

This looks like we just have to use MonadError then. Hurray, we don’t need to change our code in the migrator. As model citizens of the functional programming camp we just defer the responsibility upwards to the calling site.

Doobie

As we already started with using a tagless final approach we might as well continue with it and define a base for our repository.

There is nothing exciting here except that we feel brave now and try to use proper refined types in our database functions. This is possible due to the usage of the doobie-refined module. To be able to map the UUID data type (and others) we also need to include the doobie-postgresql module. For convenience we are still using ProductId instead of UUID in our definition. In addition we wire the return type of loadProducts to be a fs2.Stream because we want to achieve pure functional streaming here. :-)
So let’s see what a repository using doobie looks like.

We keep our higher kinded type as abstract as we can but we want it to be able to suspend our side effects. Therefore we require an implicit Sync.3
If we look at the detailed function definitions further below, the first big difference is that with doobie you write plain SQL queries. You can do this with Slick too4 but with doobie it is the only way. If you’re used to object relational mapping (ORM) or other forms of query compilers then this may seem strange at first. But: “In data processing it seems, all roads eventually lead back to SQL!”5 ;-)
We won’t discuss the benefits or drawbacks here but in general I also lean towards the approach of using the de facto lingua franca for database access because it was made for this and so far no query compiler was able to beat hand crafted SQL in terms of performance. Another benefit is that if you ask a database guru for help, she will be much more able to help you with plain SQL queries than with some meta query which is compiled into something that you have no idea of.

The loadProduct function simply returns all rows for a single product from the database like its Slick counterpart in the impure variant. The parameter will be correctly interpolated by Doobie therefore we don’t need to worry about SQL injections here. We specify the type of the query, instruct Doobie to transform it into a sequence and give it to the transactor.

Our loadProducts function is equivalent to the first one but it returns the data for all products sorted by product and as a stream using the fs2 library which provides pure functional streaming.

When saving a product we use monadic notation for our program to have it short circuit in the case of failure. Doobie will also put all commands into a database transaction. The function itself will try to create the “master” entry into the products table and save all translations afterwards.

The updateProduct function uses also monadic notation like the saveProduct function we talked about before. The difference is that it first deletes all known translations before saving the given ones.

http4s routes

The routing DSL of http4s differs from the one of Akka-HTTP. Although I like the latter one more it poses no problem to model out a base for our routes.

As we can see the DSL is closer to Scala syntax and quite easy to read. But before we move on to the details of each route let’s think about how we can model this a bit more abstract. While it is fine to have our routes bound to IO it would be better to have more flexibility here. We have several options here but for starters we just extract our routes into their own classes like in the following schema.

So far they only need the repository to access and manipulate data. Now let’s take on the single route implementations.

First we need to bring JSON codecs in scope for http4s thus the implicit definitions on top of the file. In the route for loading a single product we simply load the database rows which we pipe through our helper function to construct a proper Product and return that.
The update route (via PUT) transforms the request body into a Product and gives that to the update function of the repository. Finally a NoContent response is returned.

Our first take on the routes for products looks pretty complete already. Again we need implicit definitions for our JSON codecs to be able to serialize and de-serialize our entities. The POST route for creating a product is basically the same as the update route from the previous part. We create a Product from the request body, pass it to the save function of the repository and return a 205 NoContent response.
The GET route for returning all products calls the appropriate repository function which returns a stream which we map over using our helper function. Afterwards we use collect to convert our stream from Option[Product] to a stream of Product which we pass to the Ok function of http4s.

To solve this we need to dive into the fs2 API and leverage it’s power to merge our products back together. So let’s see how we do.

Streaming - Take 1

Because we believe ourselves to be clever we pick the simple sledge hammer approach and just run some accumulator on the stream. So what do we need? A helper function and some code changes on the stream (e.g. in the route).

So this function will take a list (that may be empty) and a product and will merge the top most element (the head) of the list with the given one. It will return an updated list that either contains an updated head element or a new head. Leaving aside the question of who guarantees that the relevant list element will always be the head, we may use it.

Looks so simple, does it? Just a simple fold which uses our accumulator and we should be settled. But life is not that simple…

The compiler complains that we have changed the type of the stream and rightly so. So let’s fix that compiler error.

Let’s take a look again and think about what it means to change a stream of products into a stream of a list of products. It means that we will be building the whole thing in memory! Well if we wanted that we could have skipped streaming at all. So back to the drawing board.

Streaming - Take 2

We need to process our stream of database columns (or products if we use the converter like before) in such a way that all related entities will be grouped into one product and emitted as such. After browsing the documentation of fs2 we stumble upon a function called groupAdjacentBy so we try that one.

Okay, this does not look complicated and it even compiles - Hooray! :-)
So let’s break it apart piece by piece. The group function of fs2 will partition the input depending on the given function into chunks. A Chunk is used internally by fs2 for all kinds of stuff. You may compare it to a sub-stream of Akka-Streams. However the documentation labels it as: Strict, finite sequence of values that allows index-based random access of elements.
Having our chunks we can map over each one converting it into a list which is then passed to our helper function fromDatabase to create proper products. Last but not least we need to collect our entities to get from an Option[Product] to a stream of Product.

JSON trouble

Now that we have a proper streaming solution we try it out but what do we get when we expect a list of products?

Well, whatever this is, it is not JSON! It might look like it, but it isn’t. However quite often you can see such things in the wild (read in production).

If we think about it then this sounds like a bug in http4s and indeed we find an issue7 for it. Because the underlying problem is not as trivial as it first sounds maybe we should try to work around the issue.
The fs2 API offers concatenation of streams and the nifty intersperse function to insert elements between emitted ones. So let’s give it a try.

First we create streams for the first and last JSON that we need to emit. Please note that we cannot simply use a String here but have to lift it into our HKT F. The usage of pure is okay because we simply lift a fixed value. Then we extend our original stream processing by explicitly converting our products to JSON and inserting the delimiter (a comma) manually using the intersperse function. In the end we simply concatenate our streams and return the result.
Our solution is quite simple, having the downside that we need to suppress a warning from the wartremover8 tool. This is somewhat annoying but can happen. If we remove the annotation, we’ll get a compiler error:

So let’s check if we have succeeded:

This looks good, so we congratulations: We are done with our routes!

Starting the application

Within our main entry point we simply initialise all needed components and wire them together. We’ll step through each part in this section. The first thing you’ll notice is that we use the IOApp provided by the Cats effect library9.

Yet again we need to suppress a warning from wartremover here. But let’s continue to initialising the database connection.

We create our database migrator explicitly wired to the IO data type. Now we start with a for comprehension in which we load our configuration via pureconfig yet again within an IO. After successful loading of the configuration we continue with migrating the database. Finally we create the transactor needed by Doobie and the database repository.

Here we create our routes via the classes, combine them (via <+> operator) and create the http4s app explicitly using an IO thus wiring our abstract routes to IO. The service will - like the impure one - run until you press enter. But it won’t run yet. ;-)

If you remember playing around with MonadError then you’ll recognize the attempt here. We attempt to run our program and execute possible side effects via the unsafeRunSync method from Cats effect. But to provide a proper return type for the IOApp we need to evaluate the return value which is either an error or a proper exit code. In case of an error we print it out on the console (no fancy logging here) and explicitly set an error code as the return value.

As it seems we are done with our pure service! Or are we? Let’s see what we need add to both services if we want to test them.

In the domain of strong static typing (not necessarily functional) you might hear phrases like “It compiles therefore it must be correct thus we don’t need tests!”. While there is a point that certain kinds of tests can be omitted in favour of strong static typing such stances overlook that even a correctly typed program may produce the wrong output. The other extreme (coming from dynamic typed land) is to substitute typing with testing - which is even worse. Remember that testing is usually a probabilistic approach and cannot guarantee the absence of bugs. If you have ever refactored a large code base in both paradigms then you’ll very likely come to esteem a good type system.

However, we need tests, so let’s write some. But before let us think a bit about what kinds of tests we need. :-)
Our service must read and create data in the JSON format. This format should be fixed and changes to it should raise some red flag because: Hey, we just broke our API! Furthermore we want to unleash the power of ScalaCheck1 to benefit from property based testing. But even when we’re not using that we can still use it to generate test data for us.
Besides the regular unit tests there should be integration tests if a service is written. We can test a lot of things on the unit test side but in the end the integration of all our moving parts is what matters and often (usually on the not so pure side of things) you would have to trick (read mock) a lot to test things in isolation.

Testing the impure service

We will start with writing some data generators using the ScalaCheck library.

Generators

ScalaCheck already provides several generators for primitives but for our data models we have to do some more plumbing. Let’s start with generating a language code.

Using the Gen.oneOf helper from the library the code becomes dead simple. Generating a UUID is nothing special either.

You might be tempted to use Gen.const here but please don’t because that one will be memorized and thus never change. Another option is using a list of randomly generated UUID values from which we then chose one. That would be sufficient for generators which only generate a single product but if we want to generate lists of them we would have duplicate ids sooner than later.

So what do we have here? We want to generate a non empty string (because that is a requirement for our ProductName) but we also want to return a properly typed entity. First we let ScalaCheck generate a non empty list of random characters which we give to a utility function of refined. However we need a fallback value in case the validation done by refined fails. Therefore we defined a general default product name beforehand.

Now that we have generators for language codes and product names we can write a generator for our Translation type.

As we can see the code is also quite simple. Additionally we create an implicit arbitrary value which will be used automatically by the forAll test helper if it is in scope. To be able to generate a Product we will need to provide a non empty list of translations.

The first generator will create a non empty list of translations but will be typed as a simple List. Therefore we create a second generator which uses the fromList helper of the non empty list from Cats. Because that helper returns an Option (read is a safe function) we need to fallback to using a simple of function at the end.
With all these in place we can finally create our Product instances.

The code is basically the same as for Translation - the arbitrary implicit included.

Unit Tests

To avoid repeating the construction of our unit test classes we will implement a base class for tests which is quite simple.

Feel free to use other test styles - after all ScalaTest offers a lot2 of them. I tend to lean towards the more verbose ones like WordSpec. Maybe that is because I spent a lot of time with RSpec3 in the Ruby world. ;-)

The code above is a very simple test of our helper function fromDatabase which works in the following way:

1. The forAll will generate a lot of Product entities using the generator.
2. From each entity a list of “rows” is constructed like they would appear in the database.
3. These constructed rows are given to the fromDatabase function.
4. The returned Option must then contain the generated value.

Because we construct the input for the function from a valid generated instance the function must always return a valid output.

Now let’s continue with testing our JSON codec for Product.

Testing a JSON codec

We need our JSON codec to provide several guarantees:

1. It must fail to decode invalid JSON input format (read garbage).
2. It must fail to decode valid JSON input format with invalid data (read wrong semantics).
3. It must succeed to decode completely valid input.
4. It must encode JSON which contains all fields included in the model.
5. It must be able to decode JSON that itself encoded.

The first one is pretty simple and to be honest: You don’t have to write a test for this because that should be guaranteed by the Circe library. Things look a bit different for very simple JSON representations though (read when encoding to numbers or strings).
I’ve seen people arguing about point 5 and there may be applications for it but implementing encoders and decoders in a non-reversible way will make your life way more complicated.

There is not much to say about the test above: It will generate a lot of random strings which will be passed to the decoder which must fail.

This test will generate random instances for id which are all wrong because it must be a UUID and not a string. Also the instances for names will mostly (but maybe not always) be wrong because there might be empty strings or an even empty list. So the decoder is given a valid JSON format but invalid values, therefore it must fail.

In this case we manually construct a valid JSON input using values from a generated valid Product entity. This is passed to the decoder and the decoder must not only succeed but return an instance equal to the generated one.

The test will generate again a lot of entities and we construct a JSON string from each. We then expect the string to include several field names and their correctly encoded values. You might ask why we do not check for more things like: Are these fields the only ones within the JSON string? Well, this would be more cumbersome to test and a JSON containing more fields than we specify won’t matter for the decoder because it will just ignore them.

Here we encode a generated entity and pass it to the encoder which must return the same entity.

More tests

So this is basically what we do for models. Because we have more than one we will have to write tests for each of the others. I will spare you the JSON tests for Translation but that one also has a helper function called fromUnsafe so let’s take a look at the function.

This function simply tries to create a valid Translation entity from unsafe input values using the helpers provided by refined. As we can see it is a total function (read is safe to use). To cover all corner cases we must test it with safe and unsafe input.

Here we generate two random strings which we explicitly check to be invalid using the whenever helper. Finally the function must return an empty Option e.g. None for such values.

The test for valid input is very simple because we simply use the values from our automatically generated valid instances. :-)

So far we have no tests for our Repository class which handles all the database work. Neither have we tests for our routes. We have several options for testing here but before can test either of them we have do to some refactoring. For starters we should move our routes out of our main application into separate classes to be able to test them more easily.

Yes, we should. There are of course limits and pros and cons to that but in general this makes sense. Also this has nothing to do with being “impure” or “pure” but with clean structure.

Some refactoring

Moving the routes into separate classes poses no big problem we simply create a ProductRoutes and a ProductsRoutes class which will hold the appropriate routes. As a result our somewhat messy main application code becomes more readable.

We simply create our instances from our routing classes and construct our global routes directly from them. This is good but if we want to test the routes in isolation we still have the problem that they are hard-wired to our Repository class which is implemented via Slick. Several options exist to handle this:

1. Use an in-memory test database with according configuration.
2. Abstract further and use a trait instead of the concrete repository implementation.
3. Write integration tests which will require a working database.

Using option 1 is tempting but think about it some more. While the benefit is that we can use our actual implementation and just have to fire up an in-memory database (for example h2), there are also some drawbacks:

1. You have to handle evolutions for the in-memory database.
2. Your evolutions have to be completely portable SQL (read ANSI SQL). Otherwise you’ll have to write each of your evolutions scripts two times (one for production, one for testing).
3. Your code has to be database agnostic. This sounds easier than it is. Even the tools you’re using may use database specific features under the hood.
4. Several features are simply not implemented in some databases. Think of things like cascading deletion via foreign keys.

Taking option 2 is a valid choice but it will result in more code. Also you must pay close attention to the “test repository” implementation to avoid introducing bugs there. Going for the most simple approach is usually feasible. Think of a simple test repository implementation that will just return hard coded values or values passed to it via constructor.

However we will go with option 3 in this case. It has the drawback that you’ll have to provide a real database environment (and maybe more) for testing. But it is as close to production as you can get. Also you will need these either way to test your actual repository implementation, so let’s get going.

Integration Tests

First we need to configure our test database because we do not want to accidentally wipe a production database. For our case we leave everything as is and just change the database name.

Another thing we should do is provide a test configuration for our logging framework. We use the logback library and Slick will produce a lot of logging output on the DEBUG level so we should fix that. It is nice to have logging if you need it but it also clutters up your log files. We create a file logback-test.xml in the directory src/it/resources which should look like this:

Due to the nature of integration tests we want to use production or “production like” settings and environment. But really starting our application or service for each test will be quite cumbersome so we should provide a base class for our tests. In this class we will start our service, migrate our database and provide an opportunity to shut it down properly after testing.

Because we do not want to get into trouble when running an Akka-HTTP on the same port, we first create a helper function which will determine a free port number.

The code is quite simple and very useful for such cases. Please note that it is important to use setReuseAddress because otherwise the found socket will be blocked for a certain amount of time. But now let us continue with our base test class.

As you can see we are using the Akka-Testkit to initialise an actor system. This is useful because there are several helpers available which you might need. We configure the actor system with our free port using the loaded configuration as fallback. Next we globally create an actor materializer which is needed by Akka-HTTP and Akka-Streams. Also we create a globally available Flyway instance to make cleaning and migrating the database easier.
The base class also implements the beforeAll and afterAll methods which will be run before and after all tests. They are used to initially migrate the database and to shut down the actor system properly in the end.

Testing the repository

Now that we our parts in place we can write an integration test for our repository implementation.

First we need to do some things globally for the test scope.

We create one Repository instance for all our tests here. The downside is that if one test crashes it then the other will be affected too. On the other hand we avoid running into database connection limits and severe code limbo to ensure closing a repository connection after each test no matter the result.
Also we clean and migrate before each test and clean also after each test. This ensures having a clean environment.

Loading a non existing product must not produce any result and is simple to test if our database is empty. Testing the loading of a real product is not that much more complicated. We use the ScalaCheck generators to create one, save it and load it again. The loaded product must of course be equal to the saved one.

Testing the loading of all products if none exit is trivial like the one for a non existing single product. For the case of multiple products we generate a list of them which we save. Afterwards we load them and use the same transformation logic like in the routes to be able to construct proper Product instances. One thing you might notice is the explicit sorting which is due to the fact that we want to ensure that our product lists are both sorted before comparing them.

Here we test the saving which in the first case should simply write the appropriate data into the database. If the product already exists however this should not happen. Our database constraints will ensure that this does not happen (or so we hope ;-)). Slick will throw an exception which we catch in the test code using the recover method from Future to return a zero indicating no affected database rows. In the end we test for this zero and also check if the originally saved product has not been changed.

For testing an update we generate two samples, save one to the database, change the id of the other to the one from the first and execute an update. This update should proceed without problems and the data in the database must have been changed correctly.
If the product does not exist then we use the same recover technique like in the saveProduct test.

Congratulations, we have made a check mark on our first integration test using a real database using randomly generated data!

Testing the routes

Regarding the route testing there several options as always. For this one we will define some use cases and then develop some more helper code which will allow us to fire up our routes and do real HTTP requests and then check the database and the responses. We build upon our BaseSpec class and call it BaseUseCaseSpec. In it we will do some more things like define a global base URL which can be used from the tests to make correct requests. Additionally we will write a small actor which simply starts an Akka-HTTP server.

As you can see, the actor is quite simple. Upon receiving the Start command it will initialise the routes and start an Akka-HTTP server. To be able to do this it needs the database repository and an actor materializer which are passed via the constructor. Regarding the BaseUseCaseSpec we will concentrate on the code that differs from the base class.

Here we create our base URL and our database repository while we use the beforeAll function to initialise our actor before any test is run. Please note that this has the same drawback like sharing the repository across tests: If a test crashes your service then the others will be affected too.
But let’s write a test for our first use case: loading a product!

Again we will use the beforeEach and afterEach helpers to clean up our database. Now let’s take a look at a test for loading a product that does not exist.

Maybe a bit verbose but we do a real HTTP request here and check the response status code. So how does it look if we run it?

That is not cool. What happened? Let’s take a look at the code.

Well, no wonder - we are simply returning the output of our fromDatabase helper function which may be empty. This will result in an HTTP status code 200 with an empty body. If you don’t believe me just fire up the service and do a request by hand via curl or httpie.

Luckily for us Akka-HTTP has us covered with the rejectEmptyResponse directive which we can use.

Cool, it seems we’re set with this one. So onward to testing to load an existing product via the API.

Here we simple save our generated product into the database before executing our request. We also check if the product has actually been written to be on the safe side. Additionally we also check if the decoded response body matches the product we expect. In contrast to our first test this one works instantly so not all hope is lost for our coding skills. ;-)
We’ll continue with the use case of saving (or creating) a product via the API. This time we will make the code snippets shorter.

Here we test posting garbage instead of valid JSON to the endpoint which must result in a “400 Bad Request” returned to us.

This one executes it bit more but basically we try to save (or better create) an already existing product. Therefore the constraints of our database should product an error which in turn must return a “500 Internal Server Error” to use. Additionally we verify that the existing product in the database was not changed.

Last but not least we are testing to save a not already existing valid product into the database. Again we check for the expected status code of “200 OK” and verify that the product saved into the database is the one we sent to the API. Let’s move on to testing the loading of all products now.

Our first test case is loading all products if no product exists so we expect an empty list here and the appropriate status code.

This one is also straight forward: We save our generated list of products to the database and query the API which must return a “200 OK” status code and a correct list of products in JSON format. Looks like we have one more use case to tackle: Updating a product via the API.

First we test with garbage JSON in the request. Before doing the request we actually create a product to avoid getting an error caused by a possibly missing product. Afterwards we check for the expected “400 Bad Request” status code and verify that our product has not been updated.

Next we test updating an existing product using valid JSON. We check the status code and if the product has been correctly updated within the database.

Finally we test updating a non existing product which should produce the expected status code and not save into the database. Wow, it seems we done for good with our impure implementation. Well except for some benchmarking but let’s save that for later.
If we look onto our test code coverage (which is a metric that you should use) then things look pretty good. We are missing some parts but in general we should have things covered.

Testing the pure service

We will skip the explanation of the ScalaCheck generators because they only differ slightly from the ones used in the impure part.

Unit Tests

The model tests are omitted here because they are basically the same as in the impure section. If you are interested in them just look at the source code. In contrast to the impure part we will now write unit tests for our routes. Meaning we will be able to test our routing logic without spinning up a database.

Testing our routes

To be able to test our routes we will have to implement a TestRepository first which we will use instead of the concrete implementation which is wired to a database.

As you can see we try to stay abstract (using our HKT F[_] here) and we have basically left out the implementation of loadProducts because it will just return an empty stream. We will get back to it later on. Aside from that the class can be initialised with a (potentially empty) list of Product entities which will be used as a “database”. The save and update functions won’t change any data, they will just return a 0 or a 1 depending on the product being present in the seed data list.

Above you can see the test for querying a non existing product which must return an empty response using a “404 Not Found” status code. First we try to create a valid URI from our generated ProductId. If that succeeds we create a small service wrapper for our routes in which we inject our empty TestRepository. Finally we create a response using this service and a request that we construct. Because we are in the land of IO we have to actually execute it (via unsafeRunSync) to get any results back. Finally we validate the status code and the response body.

It seems that we just pass an empty response if we do not find the product. This is not nice, so let’s fix this. The culprit is the following line in our ProductRoutes file:

Okay, this looks easy. Let’s try this one:

Oh no, the compiler complains:

Right, before we had an Option[Product] here for which we created an implicit JSON encoder. So if we create one for the Product itself then we should be fine.

Now back to our test:

Great! Seems like we are doing fine, so let’s continue. Next in line is testing a query for an existing product.

This time we generate a whole product, again try to create a valid URI and continue as before. But this time we inject our TestRepository containing a list with our generated product. In the end we test our expected status code and the response body must contain our product. For the last part to work we must have an implicit EntityDecoder in scope.

Here we are trying to update a product which doesn’t have to exist because we send totally garbage JSON with the request which should result in a “400 Bad Request” status code. However if we run our test we get an exception instead:

So let’s take a deep breath and look at our code:

As we can see we do no error handling at all. So maybe we can rewrite this a little bit.

Now we explicitly handle any error which occurs when decoding the request entity. But in fact: I am lying to you. We only handle the invalid message body failure here. On the other hand it is enough to make our test happy. :-)

Onwards to our next test cases in which we use a valid JSON payload for our request.

We expect a “404 Not Found” if we try to update a valid product which does not exist. But what do we get in the tests?

Well not exactly what we planned for but it is our own fault. We used the *> operator which ignores the value from the previous operation. So we need to fix that.

We rely on the return value of our update function which contains the number of affected database rows. If it is zero then nothing has been done implying that the product was not found. Otherwise we return our “204 No Content” response as before. Still we miss one last test for our product routes.

Basically this is the same test as before with the exception that we now give our routes a properly seeded test repository. Great, we have tested our ProductRoutes and without having to spin up a database! But we still have work to do, so let’s move on to testing the ProductsRoutes implementation. Before we do that we adapt the code for creating a product using our gained knowledge from our update test.

Now to our tests, we will start with sending garbage JSON via the POST request.

There is nothing special here, the test is same as for the ProductRoutes except for the changed URI and HTTP method. Also the code is a bit simpler because we do not need to generate a dynamic request URI like before.

Saving a product using valid JSON payload should succeed and in fact it does because of the code we have in our TestRepository instance. If you remember we use the following code for saveProduct:

This code will return a 0 if the product we try to save does not exist in the seed data set and only a 1 if it can be found within aforementioned set. This code clearly doesn’t make any sense except for our testing. This way we can ensure the behaviour of the save function without having to create a new Repository instance with hard coded behaviour. :-)

We use the empty repository this time to ensure that the saveProduct function will return a zero, triggering the desired logic in our endpoint. Almost done, so let’s check the endpoint for returning all products.

We simply expect an empty list if no products exist. It is as simple as that and works right out of the box. Last but not least we need to test the return of existing products. But before we do this let’s take a look at our TestRepository implementation.

Uh, oh, that does not bode well! So we will need to fix that first. Because we were wise to chose fs2 as our streaming library of choice the solution is as simple as this.

Now we can write our last test.

To decode the response correctly an implicit EntityDecoder of the appropriate type is needed in scope. But the rest of the test should look pretty familiar to you by now.

It seems the only parts left to test are the FlywayDatabaseMigrator and our DoobieRepository classes. Testing them will require a running database so we are leaving the cosy world of unit tests behind and venture forth into integration test land. But fear not, we already have some - albeit impure - experience here.

Integration Tests

As usual we start up by implementing a base class that we can use to provide common settings and functions across our tests.

You can see that we keep it simple here and only load the database configuration and ensure that is has been indeed loaded correctly in the beforeAll function.

Testing the FlywayDatabaseMigrator

To ensure the basic behaviour of our FlywayDatabaseMigrator we write a simple test.

Within this test we construct an invalid database configuration and expect that the call to migrate throws an exception. If you remember, we had this issue already and chose not to handle any exceptions but let the calling site do this - for example via a MonadError instance.

The other two tests are also quite simple, we just expect it to return either zero or the number of applied migrations depending on the state of the database. It goes without saying that we of course use the beforeEach and afterEach helpers within the test to prepare and clean our database properly.

Last but not least we take a look at testing our actual repository implementation which uses Doobie. To avoid trouble we need to define a globally available ContextShift in our test which is as simple as this:

Now we can start writing our tests.

Here we simply test that the loadProduct function returns an empty list if the requested product does not exist in the database.

From now on we’ll omit the transactor and repository creation from the code examples. As you can see a generated product is saved to the database and loaded again and verified in the end.

Testing that loadProducts returns an empty stream if no products exist is as simple as the code above. :-)

In contrast the test code for checking the return of existing products is a bit more involving. But let’s step through it together. First we save the list of generated products to the database which we do using the traverse function provided by Cats. In impure land we used Future.sequence here if you remember - but now we want to stay pure. ;-)
Next we call our loadProducts function and apply a part of the logic from our ProductsRoutes to it, namely we construct a proper stream of products which we turn into a list via compile and list in the end. Finally we check that the list is not empty and equal to our generated list.

The code for testing saveProduct is nearly identical to the loadProduct test as you can see. We simply check additionally that the function returns the number of affected database rows.

Updating a non existing product must return a zero and save nothing to the database, which is what we test above.

Finally we test updating a concrete product by generating two of them, saving the first into the database and running an update using the second with the id from the first.

Wow, it seems we are finished! Congratulations, we can now check mark the point “write a pure http service in Scala” on our list. :-)

Now that we have our implementations in place we can start comparing them. We will start with implementing some benchmarks to test the performance of both implementations.

There are several applications available to perform load tests and benchmarks. Regarding the latter the Apache JMeter1 project is a good starting point. It is quite easy to get something running. Like the documentation says: For the real stuff you should only use the command line application and use the GUI to create and test your benchmark.

We’ll skip a long introduction and tutorial for JMeter because you can find a lot within the documentation and there are lots of tutorials online.

Our environment

Within the book repository you’ll find a folder named jmeter it contains several things:

1. Several files ending with .jmx like Pure-Create-Products.jmx and so on.
2. A CSV file containing 100.000 valid product IDs named product-ids.csv.
3. A file named benchmarks.md.

The .jmx files are the configuration files for JMeter which can be used to run the benchmarks. They are hopefully named understandable and are expected to be run in the following order:

1. Create products
3. Update products

The file product-ids.csv is expected in the /tmp folder, so you’ll have to copy it there or adjust the benchmark configurations. Finally the file benchmarks.md holds detailed information about the benchmark runs (each one was done three times in a row).

System environment

Service and testing software (Apache JMeter) were run on different workstations connected via 100 MBit/s network connection.

Service workstation

 CPU Core i5-9600K, 6 Cores, 3,7 GHz RAM 32 GB HDD 2x Samsung SSD 860 PRO 512GB, SATA OS FreeBSD 12 (HT disabled) JDK 11.0.4+11-2 DB PostgreSQL 11.3

Client workstation

 CPU AMD Ryzen Threadripper 2950X RAM 32 GB HDD 2x Samsung SSD 970 PRO 512GB, M.2 OS FreeBSD 12 (HT disabled) JDK 11.0.4+11-2

Apache JMeter version 5.1.1 was used to run the benchmark and if not noted otherwise 10 threads were used with a 10 seconds ramp up time for each benchmark.

Comparison

So let’s start with comparing the results. As mentioned more details can be found in the file benchmarks.md. We’ll stick to using the average of the metrics across all three benchmark runs. The following abbreviations will be used in the tables and legends.

AVG
The average response time in milli seconds.
MED
The median response time in milli seconds.
90%
90 percent of all requests were handled within the response time in milli seconds or less.
95%
95 percent of all requests were handled within the response time in milli seconds or less.
99%
99 percent of all requests were handled within the response time in milli seconds or less.
MIN
The minium response time in milli seconds.
MAX
The maximum response time in milli seconds.
ERR
The error rate in percent.
R/S
The number of requests per second that could be handled.
MEM
The maximum amount of memory used by the service during the benchmark in MB.
LD
The average system load on service machine during the benchmark.

Create 100.000 products

Metric Impure Pure
AVG 98 12
MED 95 11
90% 129 15
95% 143 18
99% 172 30
MIN 53 5
MAX 1288 675
ERR 0% 0%
R/S 100.56 765.33
MEM 1158 1308
LD 16 9

Wow, I honestly have to say that I didn’t expect that. Usually the world has come to believe that the impure approach might be dirty but is definitely always faster. Well it seems we’re about to correct that. I don’t know about you but I’m totally fine with that. ;-)
But now let’s break it apart piece by piece. The first thing that catches the eye is that the pure service seems to be about seven times faster than the impure one! The average 100 requests per second on the impure side stand against an average 765 requests per second on the pure side. Also the metrics regarding the response times support that. Regarding the memory usage we can see that the pure service needed about 13% more memory than the impure one. Living in times in which memory is cheap I consider this a small price to pay for a significant performance boost.
Last but not least I found it very interesting that the average system load was much higher (nearly twice as high) in the impure implementation. While it is okay to make use of your resources a lower utilisation allows more “breathing room” for other tasks (operating system, database, etc.).

Metric Impure Pure
AVG 7 7
MED 8 7
90% 10 9
95% 11 10
99% 14 19
MIN 4 2
MAX 347 118
ERR 0% 0%
R/S 1162.20 1248.63
MEM 1449 1538
LD 13 8

The loading benchmark provides a more balanced picture. While the pure service is still slightly ahead (about 7% faster), it uses about 6% more memory than the impure one. Overall both implementations deliver nearly the same results. But again the pure one causes significant lower system load like in the first benchmark.

Update 100.000 products

Metric Impure Pure
AVG 78 12
MED 75 11
90% 104 16
95% 115 20
99% 140 34
MIN 42 5
MAX 798 707
ERR 0% 0%
R/S 125.66 765.26
MEM 1176 1279
LD 16 8

Updating existing products results in nearly the same picture as the “create products” benchmark. Interestingly the impure service performs about 20% better on an update than on a create. I have no idea why but it caught my eye. The other metrics are as said nearly identical to the first benchmark. The pure service uses a bit more memory (around 8%) but is around 6 times faster than the impure one causing only half of the system load.

For our last benchmark we load all existing products via the GET /products route. Because this causes a lot of load we reduce the number of threads in our JMeter configuration from 10 to 2 and only use 50 iterations. But enough talk here are the numbers.

Metric Impure Pure
AVG 19061 14496
MED 19007 14468
90% 19524 14689
95% 19875 14775
99% 20360 14992
MIN 17848 14008
MAX 21315 16115
ERR 0% 0%
R/M 6.30 8.30
MEM 7889 1190
LD 5 4

As you can see the difference in system load is way smaller this time. While it is still 25% a load of 4 versus 5 on a machine like the test machine makes almost no difference. However the pure service is again faster (about 25%). Looking at the memory footprint we can see that the impure one uses nearly seven times as much memory as the pure one.
But before we burst into cheers about that let’s remember what we did in the impure implementation! Yes, we used the groupBy operator of Akka which keeps a lot of stuff in memory so the fault for this is ours. ;-)
Because I’m not in the mood to mess with Akka until we’re on the memory safe side here, we’ll just ignore the memory footprint for this benchmark. Summarising that the pure service is again faster than the impure one.

Summary

If you look around in the internet (and also in the literature) you’ll find a lot of sources stating that “functional programming is slow” or “functional programming does not perform” and so on. Well I would argue that we have proven that this is not the case! Although we cannot generalise our findings because we only took a look at a specific niche within the corner of a specific environment, I think this is pretty exciting!

Not only do you benefit from having code that can more easily reasoned about but you gain better testing possibilities and in the end your application also performs better! :-)

While we pay some price (increased memory footprint) for it because there is no free lunch. It seems that it is worth it to work in a clean and pure fashion. So next time someone argues in favour of some dirty impure monstrosity because it is faster, just remember and tell ‘em that this might not be true!

Before we celebrate ourselves we have to tackle one missing point: We have to document our API.

No, it is not! Leaving the issues of proper documented code aside here, we will concentrate on documenting the API. The de facto standard in our days seems to be using Swagger1 for this. To keep things simple we will stick to it. Besides that it won’t hurt to have some documentation in text form (a small file could be enough) which should explain the quirks of our API. The bigger the project the earlier you may encounter flaws in the logic which might not be changeable because whatever reasons there are. ;-)

The lay of the land

Swagger provides a bit of tooling and there are likely lots of projects trying to bring it to your favourite web framework or tool kit. In many cases it might be a good idea to bundle the Swagger UI2 with your service and make it available on a specific path. Depending on your needs and environment you’ll want to protect that path via authentication and make it configurable for turning it off in production.

Having the UI you must decide which path you want to go down:

1. Write a swagger.yml file which describes your API, create JSON from it and deliver that as an asset.
2. Create the JSON dynamically via some library at runtime.

The first point has the benefit that you description will be more or less set in stone and you avoid possible performance impacts and other quirks at runtime. However you must think of a way to test that your service actually fulfils the description (read specification) that you deliver. The most common tool for writing your API description is probably the Swagger Editor3.

Taking the second path will result in a description which reflects your actual code. However none of the tools available I’ve seen so far has fulfilled that promise to 100 percent. You’ll very likely have to make extensive use of annotations to make your API description usable, resulting also in a decoupling of code and description. Also you might encounter “funny” deduced data types like Future1 and so on - which are annoying and confusing for the user. For those favouring the impure approach there is the swagger-akka-http library4.

The answer is yes! We can describe our API using static typing and have the compiler check it and can deduce server and client code from it.

Using types to describe an API

Describing your API using types is not bleeding edge academic research stuff like you might have guessed. There are several libraries existing for it! :-)

Personally I stumbled upon such things first some years ago when seeing the talk “Using object algebras to design embedded DSLs” (Curry On 2016)5. The related project is the library endpoints6. However there are other projects too including the rho library7 included in the http4s project. Another one is tapir8 which we will be using in our example.
In the Haskell camp there is the beautiful Servant library9

First I wanted to use something which allows us to generate a http4s server. This already narrowed down the options a bit. Also it should be able to generate API documentation (which nowadays means Swagger/OpenAPI support). Furthermore it should support not only http4s but more options. So after playing a bit around I decided to use the tapir library.

A pure implementation using tapir.

We basically cloned our pure folder into the tapir folder and start to apply our changes to the already pure implementation. But first some theory.

Basics

The tapir library assumes that you describe your API using the Endpoint type which is more concrete defined as follows: Endpoint[I, E, O, S]

• The type I defines the input given into the endpoint.
• Type type E defines the error (or errors) which may be returned by the endpoint.
• The type O defines the possible output of the endpoint.
• The type S specifies the type of streams which are used for in- and output.

An endpoint can have attributes like name or description which will be used in the generated documentation. You can also map input and output parameters into case classes.

Regarding the encoding and decoding of data we will need our Circe codecs but additionally some schema definitions required by tapir. In concrete we will need to define implicit SchemaFor[T] instances for each of our models. We will start with the Translation model.

As you can see this is quite straightforward and not very complicated. In the future we might be able to derive such things but for now we have to define them. The schema is defined as a product type which is further described by the “object info” type containing a name, the field names and their schemas and at least a list of field names which are required to construct the type.

This is basically the same thing as before except that we rely on the existing schema definition of Translation and use that here. You might notice that we explicitly define our NonEmptySet as an SArray here. This is because we want a list like representation on the JSON side.

Yes, we can! :-) To gain more flexibility we can provide a generic schema for our NonEmptySet type.

Here we define that a NonEmptySet will be an array using the schema of whatever type it contains. Now we rewrite our previous schema definition as follows.

It is not necessarily shorter but we have gained some more flexibility and can reuse our schema for the non empty set in several places.

Product routes

Having the basics settled we can try to write our first endpoint. Let’s refactor our product routes. We will define our endpoints in the companion object of the class.

So what do we have here? First we specify the HTTP method by using the get function of the endpoint. Now we need to define our path and inputs. We do this using the in helper which accepts path fragments separated by slashes and also a path[T] helper which allows us to extract a type directly from a path fragment. This way we define our entry point product/id in which id must match our ProductId type.
To be able to be more flexible about our returned status codes we use the errorOut function which in our case just receives a status code (indicated by passing statusCode to it).
Finally we define that the endpoint will return the JSON representation of a product by using the out and jsonBody helpers. This all is reflected in the actual type signature of our endpoint which reads Endpoint[ProductId, StatusCode, Product, Nothing]. If we remember the basics then we know that this amounts to an endpoint which takes a ProductId as input, produces a StatusCode as possible error and returns a Product upon success.

Our endpoint alone won’t do us any good so we need an actual server side implementation of it. While we could have used the serverLogic function to directly attach our logic onto the endpoint definition this would have nailed us down to a concrete server implementation.
So we’re going to implement it in the ProductRoutes class.

We use the toRoutes helper of tapir which expects a function with the actual logic. As you can see the implementation is straightforward and only differs slightly from our original one. Currently there is no other way to handle our “not found” case than using the fold at the end. But if you remember, we did the same thing in the original code.

That was not that difficult, for something which some people like to talk about as “academic fantasies from fairy tale land”. ;-)

Onward to our next route: updating an existing product. First we need to define our endpoint.

This is only slightly more code than our first endpoint. We use the put method this time and the same logic as before to extract our product id from the path. But we also need our product input which we expect as JSON in the request body. The jsonBody function used is also extended with the description helper here which will provide data for a possibly generated documentation. We’ll come to generated API docs later on.
We also restrict our errors to status codes via the errorOut(statusCode) directive. Last but not least we have to define our output. Per default a status code of “200 OK” will be used which is why we override it in the out function with the “204 No Content” preferred by us.

The implementation is again very similar to our original one. Except that it is even a bit simpler. This is because we do not have to worry about wrongly encoded input (Remember our handleErrorWith directive?). The tapir library will by default return a “400 Bad Request” status if any provided input cannot be decoded.
Within the pattern match we use the status codes provided by tapir and map the returned values to the correct type which is an Either[StatusCode, Unit] because of our endpoint type. This results from our endpoint type signature being Endpoint[(ProductId, Product), StatusCode, Unit, Nothing]. This translates to having an input of both a ProductId and a Product and returning a StatusCode in the error case or Unit upon success.

Now we only need to combine both routes and we’re set.

So, let us run our tests and see what happens.

Well, not what we expected, or is it? To be honest I personally expected more errors but maybe I’m just doing this stuff for too long. ;-)
If we look into the error we find that because the encoding problems are now handled for us the response not only contains a status code of “400 Bad Request” but also an error message: “Invalid value for: body”. Because I’m fine with that I just adjust the test and let it be good. :-)

Pretty awesome, we have already half of our endpoints done. So let’s move on to the remaining ones and finally see how to generate documentation and also a client for our API.

Products routes

As we can see the endpoint definition for creating a product does not differ from the one that was used to update one. Except that we have a different path here and do not need to extract our ProductId from the URL path.

The implementation is again pretty simple. In case that the saveProduct function returns a zero we output a “500 Internal Server Error” because the product has not been saved into the database.

Finally we have our streaming endpoint left, so let’s see how we can do this via tapir.

The first thing we can see is that we use a def instead of a val this time. This is caused by some necessities on the Scala side. If we want to abstract over a type parameter then we need to use a def here.
We also have set the last type parameter not to Nothing but to something concrete this time. This is because we actually want to stream something. ;-)
It is a bit annoying that we have to define it two times (once for the output type and once for the “stream” type). Much nicer would be something like Endpoint[I, E, Byte, Stream[F, _]] but currently this is not the way we can do it.
So we again specify the HTTP method (via get) and the path (which is “products”). The errorOut helper once again restricts our error output to the status code. Finally we set the output of the endpoint by declaring a streaming entity (via streamBody).

But is is sufficient and we also directly specify the returned media type to be JSON.

Again our implementation is quite the same compared to the original one. Except that in the end we convert our stream of String into a stream of Byte using the utf8Encode helper from the fs2 library.

Damn, so close. But let’s keep calm and think. Or ask around on the internet. Which is totally fine. Actually it is all in the compiler error message.

We first convert our response explicitly into the right side of an Either because the left side is used for the error case. Afterwards we provide the needed Unit => ... function in which we lift our response value via pure into the context of F.
So let’s go crazy and simply combine our routes like in the previous part and run the test via testOnly *.ProductsRoutesTest on the sbt console.

Yes! Very nice, it seems like we are done with implementing our routes via tapir endpoints.

Documentation via OpenAPI

So, we can now look at documenting our API via OpenAPI using the tooling provided by tapir. But first we should actually modify our main application entry point to provide the documentation for us.

We use the toOpenAPI helper provided by tapir which generates a class structure describing our API from a list of given endpoints. Additionally we use the SwaggerHttp4s helper which includes the Swagger UI for simple documentation browsing. All of it is made available under the /docs path. So calling http://localhost:57344/docs with your browser should open the UI and the correct documentation.
But while browsing there we can see that it provides our models and endpoints but documentation could be better. So what can we do about it?
The answer is simple: Use the helpers provided by tapir to add additional information to our endpoints.

Providing example data

Besides functions like description or name tapir also provides example which will result in having concrete examples in the documentation. To use this we must construct example values of the needed type. A Product example could look like this.

We can now use it in our product endpoint description.

As you can see we make use of description and example here. Also the path parameter id is described that way.

Here we also add a description to the simple status code output explaining explicitly that no content will be returned upon success. While the 204 status code should be enough to say this you can never be sure enough. ;-)
We’ll skip the create endpoint because it looks nearly the same as the update endpoint. Instead let’s take a look at our streaming endpoint.

This time we need to provide our example as a string because of the nature (read type) of our endpoint. We use the non empty list of examples that we created (you can look it up in ProductsRoutes.scala) and convert it into a JSON string.
If we now visit our swagger endpoint we’ll see nice examples included in the documentation. Pretty cool, especially because many people will look at the examples not at the specification. This might be because (too) many of us have seen an API not fulfilling its specification. This shouldn’t happen in our case because we’re deriving it, yeah! But nonetheless examples are very nice to have. :-)

If we take a closer look at our model descriptions then we might see that we could do better in some cases. I’m thinking of our ID fields being simple strings instead of UUIDs and the language code which is also defined as a simple string. So let’s get going and clean that up!

Refining the generated documentation

Looking at the intermediate model for our API documentation (see the OpenAPI class structure in the tapir library) we realise that modifying such a deeply nested case class structure might result in some really messy code. I mean we have probably all been there at some point in our life as developer. ;-)

Yes, we can! Confronted with big and nested structures we should pick a tool from our functional programming toolbox which is called optics10.
Don’t be scared by the name or all that mathematics, there exist some usable libraries for it. In our case we will pick Monocle11 which provides profunctor optics for Scala. The basic idea of optics is to provide pure functional abstractions for the manipulation of immutable objects. Because of their pure nature they are composable which results in code which is more flexible and can more easily be reasoned about.

Now that we have that cleared let’s make a plan what we actually want to do.

1. Adjust the URL parameter descriptions of {id} to mark them as kind of UUID.
2. Adjust the id attribute of our Product model to mark it as kind of UUID.
3. Adjust the lang attribute of our Translation model to mark it as an ISO-639-1 language code.

If we take a look at the code within the tapir library, we see that the Schema and SchemaFor code which is used for codecs does not yet support a dedicated UUID type. There is a detour for it using Schema.SString.

Now we look a bit further into the OpenAPI code and find that it supports several interesting attributes which we might use. For now we will stick to the attribute pattern of the Schema class in that part. It is intended to hold a pattern (read regular expression) which describes the format of a string type.
Okay, so we need to define some (or better exactly 2) regular expressions. But hey wait, we already have one for our language code! :-)

Well, using some shapeless12 magic we might use an implicit Witness which should be provided by our refined type.

The code above is a rough idea so don’t count on it. We’ll see later on if I was right or did suffer from the hallucination of actually understanding what I am doing. ;-)
In the hope that this is settled we need one additional regular expression for a UUID. These are defined in RFC-412213 and ignoring the special edge case of a “NIL UUID” we come up with the following solution.

According to the OpenAPI specification we are nailed down to Javascript regular expressions (read the ECMA 262 regular expression dialect14). Oh why cruel fate? Well, let us deal with that when we have our other puzzle pieces in order.

Before we dive right in let’s play around a bit to get used to this fancy optics thing.

We defined our first Lens via the GenLens macro. It is supposed to give us the path definitions from an OpenAPI object. In the last part we use the compose functionality to query a specific item from the paths which we address via a string because it is a ListMap using string keys.

Oh no, the compiler yells at us! Some investigation leads to the conclusion that there is a type class instance missing for ListMap. So we build one. Luckily for us this is practically identical to the one for Map.

And now our code compiles, hooray! :-)
While we’re at it we also provide an instance for Index on ListMap.

Okay, back to work. Thinking a bit about our problem we realise that we need a couple of lenses which we can compose to modify the needed parts of the documentation structure.

Well this is quite a lot but let’s break it apart piece by piece. In general we use the GenLens macro of Monocle to create the lenses for us.
First we create a lens which returns the defined path items which is a ListMap. We later use the at function of the type class to grab a concrete entry from it. This “concrete” entry will be a PathItem that contains more information. Next are some lenses which will return an Operation from the aforementioned PathItem. Depending on the type of the operation (GET, POST, etc.) we return the appropriate entry.
Now we need to grab the parameters used for the endpoints which can either be collected directly from a PathItem or an Operation. These are both lists of the type Parameter. Okay, I’m lying straight to your face here. In fact they are ReferenceOr[Parameter] which means an Either[Reference, Parameter]. But in our use case we only have them as parameters so we’ll ignore this for now.
Last but not least we need to grab the Schema of a parameter and from that one the pattern field. The schema describing the parameter is also an Either[Reference, Schema] which we will ignore too in this case. Now we can play around with our lenses and various combinations.

This can be confusing at a first look but it is actually very powerful and clean for modifying deeply nested structures. The current API is a bit verbose but there is hope.

So what we can take away from this is that we can update our pattern field within all parameters using the following code. Before you ask: We can update all parameters because we have only one. ;-)

Seems we can (as good as) check off the first point on our list. Leaving us with modifying the pattern field of the actual model schemas. To get to these we have to define some more lenses.

The power of lenses allows us to traverse all of our structures and modify all affected models at once. So we will use the functionality provided by the Each type class here. Let’s try something like the following code.

Looks good so far but it results in a compiler error.

This looks like an error from a binary incompatible cats version. But hey, this time the compiler is lying to us. Maybe you have guessed it already: we’re missing another type class instance. This time the one for Each which we need to traverse our ListMap structure. So let’s write one! :-)

As noted above these things will hopefully be in the next Monocle release together with a much nicer API. :-D
The information we need to modify is within the components field of the generated documentation. While we could traverse all the stuff (paths, operations) these only hold references which are of no use for us. So let’s update our Product model.

This is quite a lot but we actually only instruct our optics how to traverse down the structure. In general we need to take care of our possible field types here. Due to the nature of the generated structure having a lot of Option and Either fields we need way more boilerplate here. But as mentioned it is really not that complicated. We compose our lenses via composeLens but may need things like composeOptional possible to compose on a defined Option[T] which we have sometimes to duplicate when having nested occurrences of these. The same instructions can be used to zoom in on the right side of an Either.

The modification for the Translation model looks quite the same so we could possibly make a function out of it which would get some parameters and return the updated structure.

Oh yeah, right we thought about extracting the regular expression directly from the refined type to avoid code duplication. So let’s try to use the function defined earlier on like this val langRegex = extractRegEx[LanguageCode].

Okay, seems like I really don’t understand what I’m doing. ;-) Luckily for us the Scala community has nice, smart and helpful people in it. The solution is to write a type class which will support us extracting the desired parameter.

This allows us to have the desired effect in just this small piece of code.

Now if we take a look at our API documentation it looks better. Although we can see that the URL parameter pattern information is not used. But as mentioned before there is a lot of work going on at the tapir site currently and this will also get fixed then.

However don’t forget about optics because their usage goes far beyond what we have done here.

Epilogue

I personally hope that you had some fun reading this, have learned something and got some ideas about what is possible in the realm of programming. For me it was fun to write this book and I also learned a lot while writing it. I want to thank all the people that got back to me with questions, hints and ideas about what might be missing.

The next time some person points out that all this “academic functional nonsense” is for naught please remember some of the things you read here. We have seen several things that imply that we can use pure functional programming in our daily work:

1. better understandable code
2. better testable code
3. better performing code (Yes!)
4. better type safety using refined types
5. deriving boilerplate code from our statically typed models and logic
6. easier working with deeply nested structures using optics

There really is no excuse to stick to impure and messy stuff that is hard to maintain. But please remember also that functional programming is not done for the sake of itself. It is another and in my opinion better way to deliver value. In our field of work this (value) usually means some working software / program / application / whatever.

Also please do not consider this book a complete knowledge trove. Several things might already be outdated, some topics have only been scratched on the surface and other things are missing completely.

So, people of the baud and the electron: Go out there, venture forth, cross boundaries, use the knowledge which is there and last but not least - Have fun!

Foreword

1https://akka.io

2http://slick.lightbend.com

3https://typelevel.org/cats/

4https://typelevel.org/cats-effect/

5https://http4s.org/

6https://tpolecat.github.io/doobie/

7https://github.com/fthomas/refined

8https://fs2.io/

9https://github.com/softwaremill/tapir

10https://github.com/julien-truffaut/Monocle

11http://slick.lightbend.com/doc/3.3.1/database.html

12https://pureconfig.github.io/

Maybe there is another way

Impure implementation

1http://slick.lightbend.com/doc/3.3.1/database.html

Pure implementation

1https://pureconfig.github.io/

2https://github.com/typelevel/kittens

3https://typelevel.org/cats-effect/typeclasses/sync.html

4http://slick.lightbend.com/doc/3.3.1/sql.html

5https://blog.acolyer.org/2019/07/03/one-sql-to-rule-them-all/

6https://httpie.org/

7https://github.com/http4s/http4s/issues/2371

8https://www.wartremover.org/

9https://typelevel.org/cats-effect/datatypes/ioapp.html

1http://www.scalacheck.org/

2http://www.scalatest.org/user_guide/selecting_a_style

3https://rspec.info/

4https://github.com/scalatest/scalatest/issues/1370

1https://jmeter.apache.org/

2https://github.com/http4s/http4s/issues/2855

1https://swagger.io/

2https://swagger.io/tools/swagger-ui/

3https://swagger.io/tools/swagger-editor/

4https://github.com/swagger-akka-http/swagger-akka-http

6http://julienrf.github.io/endpoints/

7https://github.com/http4s/rho

8https://github.com/softwaremill/tapir

9https://www.servant.dev/

10https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/poptics.pdf

11https://github.com/julien-truffaut/Monocle

12https://swagger.io/tools/swagger-ui/

13https://swagger.io/tools/swagger-editor/

14https://github.com/swagger-akka-http/swagger-akka-http