Adding benchmarks

Now that we have our implementations in place we can start comparing them. We will start with implementing some benchmarks to test the performance of both implementations.

There are several applications available to perform load tests and benchmarks. Regarding the latter the Apache JMeter¹ project is a good starting point. It is quite easy to get something running. Like the documentation says: For the real stuff you should only use the command line application and use the GUI to create and test your benchmark.

We’ll skip a long introduction and tutorial for JMeter because you can find a lot within the documentation and there are lots of tutorials online.

Our environment

Within the book repository you’ll find a folder named jmeter it contains several things:

Several files ending with .jmx like Pure-Create-Products.jmx and so on.
A CSV file containing 100.000 valid product IDs named product-ids.csv.
A file named benchmarks.md.

The .jmx files are the configuration files for JMeter which can be used to run the benchmarks. They are hopefully named understandable and are expected to be run in the following order:

Create products
Load products
Update products
Load all products

The file product-ids.csv is expected in the /tmp folder, so you’ll have to copy it there or adjust the benchmark configurations. Finally the file benchmarks.md holds detailed information about the benchmark runs (each one was done three times in a row).

System environment

Service and testing software (Apache JMeter) were run on different workstations connected via 100 MBit/s network connection.

Service workstation

CPU	Core i5-9600K, 6 Cores, 3,7 GHz
RAM	32 GB
HDD	2x Samsung SSD 860 PRO 512GB, SATA
OS	FreeBSD 12 (HT disabled)
JDK	11.0.4+11-2
DB	PostgreSQL 11.3

Client workstation

CPU	AMD Ryzen Threadripper 2950X
RAM	32 GB
HDD	2x Samsung SSD 970 PRO 512GB, M.2
OS	FreeBSD 12 (HT disabled)
JDK	11.0.4+11-2

Apache JMeter version 5.1.1 was used to run the benchmark and if not noted otherwise 10 threads were used with a 10 seconds ramp up time for each benchmark.

Comparison

So let’s start with comparing the results. As mentioned more details can be found in the file benchmarks.md. We’ll stick to using the average of the metrics across all three benchmark runs. The following abbreviations will be used in the tables and legends.

AVG: The average response time in milli seconds.
MED: The median response time in milli seconds.
90%: 90 percent of all requests were handled within the response time in milli seconds or less.
95%: 95 percent of all requests were handled within the response time in milli seconds or less.
99%: 99 percent of all requests were handled within the response time in milli seconds or less.
MIN: The minium response time in milli seconds.
MAX: The maximum response time in milli seconds.
ERR: The error rate in percent.
R/S: The number of requests per second that could be handled.
MEM: The maximum amount of memory used by the service during the benchmark in MB.
LD: The average system load on service machine during the benchmark.

Create 100.000 products

Metric	Impure	Pure
AVG	98	12
MED	95	11
90%	129	15
95%	143	18
99%	172	30
MIN	53	5
MAX	1288	675
ERR	0%	0%
R/S	100.56	765.33
MEM	1158	1308
LD	16	9

Wow, I honestly have to say that I didn’t expect that. Usually the world has come to believe that the impure approach might be dirty but is definitely always faster. Well it seems we’re about to correct that. I don’t know about you but I’m totally fine with that. ;-)
But now let’s break it apart piece by piece. The first thing that catches the eye is that the pure service seems to be about seven times faster than the impure one! The average 100 requests per second on the impure side stand against an average 765 requests per second on the pure side. Also the metrics regarding the response times support that. Regarding the memory usage we can see that the pure service needed about 13% more memory than the impure one. Living in times in which memory is cheap I consider this a small price to pay for a significant performance boost.
Last but not least I found it very interesting that the average system load was much higher (nearly twice as high) in the impure implementation. While it is okay to make use of your resources a lower utilisation allows more “breathing room” for other tasks (operating system, database, etc.).

Load 100.000 products

Metric	Impure	Pure
AVG	7	7
MED	8	7
90%	10	9
95%	11	10
99%	14	19
MIN	4	2
MAX	347	118
ERR	0%	0%
R/S	1162.20	1248.63
MEM	1449	1538
LD	13	8

The loading benchmark provides a more balanced picture. While the pure service is still slightly ahead (about 7% faster), it uses about 6% more memory than the impure one. Overall both implementations deliver nearly the same results. But again the pure one causes significant lower system load like in the first benchmark.

Update 100.000 products

Metric	Impure	Pure
AVG	78	12
MED	75	11
90%	104	16
95%	115	20
99%	140	34
MIN	42	5
MAX	798	707
ERR	0%	0%
R/S	125.66	765.26
MEM	1176	1279
LD	16	8

Updating existing products results in nearly the same picture as the “create products” benchmark. Interestingly the impure service performs about 20% better on an update than on a create. I have no idea why but it caught my eye. The other metrics are as said nearly identical to the first benchmark. The pure service uses a bit more memory (around 8%) but is around 6 times faster than the impure one causing only half of the system load.

Bulk load all 100.000 products

For our last benchmark we load all existing products via the GET /products route. Because this causes a lot of load we reduce the number of threads in our JMeter configuration from 10 to 2 and only use 50 iterations. But enough talk here are the numbers.

Metric	Impure	Pure
AVG	19061	14496
MED	19007	14468
90%	19524	14689
95%	19875	14775
99%	20360	14992
MIN	17848	14008
MAX	21315	16115
ERR	0%	0%
R/M	6.30	8.30
MEM	7889	1190
LD	5	4

As you can see the difference in system load is way smaller this time. While it is still 25% a load of 4 versus 5 on a machine like the test machine makes almost no difference. However the pure service is again faster (about 25%). Looking at the memory footprint we can see that the impure one uses nearly seven times as much memory as the pure one.
But before we burst into cheers about that let’s remember what we did in the impure implementation! Yes, we used the groupBy operator of Akka which keeps a lot of stuff in memory so the fault for this is ours. ;-)
Because I’m not in the mood to mess with Akka until we’re on the memory safe side here, we’ll just ignore the memory footprint for this benchmark. Summarising that the pure service is again faster than the impure one.

Summary

If you look around in the internet (and also in the literature) you’ll find a lot of sources stating that “functional programming is slow” or “functional programming does not perform” and so on. Well I would argue that we have proven that this is not the case! Although we cannot generalise our findings because we only took a look at a specific niche within the corner of a specific environment, I think this is pretty exciting!

Not only do you benefit from having code that can more easily reasoned about but you gain better testing possibilities and in the end your application also performs better! :-)

While we pay some price (increased memory footprint) for it because there is no free lunch. It seems that it is worth it to work in a clean and pure fashion. So next time someone argues in favour of some dirty impure monstrosity because it is faster, just remember and tell ‘em that this might not be true!

Up next

Documenting your API