Pure Functions

The first and most crucial aspect of functional programming is the idea of a “pure function.” What makes a function pure? The fact that it’s an actual function and not something else.

If that seems circular, let’s back up a moment. One of the key challenges in discussing functional programming is software engineering and computer science are, in fact, two separate fields that have evolved in parallel with each other. They interact frequently, but not consistently. As a result, a great many concepts have a different name in software engineering than they do in computer science, and many others have the same word in both disciplines that mean different things. In order to establish a common vocabulary, therefore, let’s start at the very beginning.

In nearly all programming languages, there is an idea of a block of code that is reusable over and over again in multiple places, often with slight variation. The term for such a block is a procedure, or sub-routine. A procedure is any syntactically reusable block of code that can be referenced and, optionally, passed some parameters or arguments.

In mathematics, a function is a black box that is given inputs and reliably returns an output. How it does so is irrelevant; it could do some computation, or it could simply have a giant internal lookup table. However, the key term there is “reliably”; the same inputs will always produce the same outputs, guaranteed, and nothing else will change about the state of the universe except that a value is returned. Two inputs may result in the same output, but the same input can never, ever produce different output.

A function in mathematics is not a thing that happens but a relationship that defines the map from some input to some output.

In almost every mainstream modern language, the concept of a procedure and a function are folded together into a single concept termed “function.” In PHP-speak, both of the following are “functions”:

 1 function output(string $message): void
 2 {
 3     static $n = 1;
 4     global $preamble;
 5     echo "The {$n}th message is: {$preamble} {$message}". PHP_EOL;
 6     $n++;
 7 }
 8 
 9 function add(int $a, int $b): int
10 {
11     return $a + $b;
12 }

Both are procedures; however, only the second is a “function” as mathematicians would define it. The output() function takes a stealth input (a global variable), it produces a stealth output (it prints a message), and it has a different effect if called a second time with the same input (because it retains internal state via a static variable). All of those attributes violate what we expect of a “function” in the mathematical sense.

Of course, modern languages confuse things by using the term “function” willy nilly. Therefore, what mathematics calls a “function” programming instead calls a “pure function.”

Define “Pure”

What makes a function pure? A procedure is a pure function if it follows two restrictions:

  1. It is idempotent. That is, the same explicit inputs are guaranteed to always produce the same outputs.
  2. It has no side-effects. That is, there is no effect on any value or data stream other than the return value being returned.

These rules are very simple and usually easy to achieve, but very powerful. By restricting ourselves in this way, we buy a number of assumptions.

  1. A pure function is incredibly easy to unit test. There is essentially no set up and no mocking necessary; just call it with known parameters and see if the result is what you expect.
  2. It has referential transparency. That is the fancy way of saying a function is equivalent to its return value. As long as add() is pure, then replacing all instances of add(5, 5) with 10 is guaranteed to not change the result of the program.
  3. Because we know that the same inputs produce the same output and because it’s referentially transparent, it is trivially easy to cache. Caching is, in essence, replacing subsequent function calls with a previously computed value.
  4. Conversely, while calling a pure function multiple times with the same input may be wasteful, it cannot be a source of bugs. That’s assuming you meant to call it with the same value multiple times; if not, the bug is in your parameters, not the function.

As we’ll see in the next chapter, those assumptions allow us to do a great deal more once we are dealing with higher-order functions. Stay tuned!

Composing Functions

Another important attribute is that composing pure functions always produces a pure function, but calling an impure function from a pure function makes the pure function impure. That is:

 1 function add(int $a, int $b): int
 2 {
 3     return $a + $b;
 4 }
 5 
 6 function subtract(int $a, int $b): int
 7 {
 8     return $a - $b;
 9 }
10 
11 function compute(int $a, int $b, int $c): int
12 {
13     return subtract(add($a, $b), $c);
14 }
15 
16 function output(string $message): void
17 {
18     echo "The message is: {$message}". PHP_EOL;
19 }
20 
21 function run()
22 {
23     $val = compute(30, 10, 8);
24     output("The result is: $val");
25 }

add() and subtract() are pure functions, because they obey the two rules of pure functions. This means compute() is also a pure function, because it obeys the same rules and only calls other pure functions. However, output() is not a pure function because it has a side effect of producing output. run() calls both a pure and an impure function, and is thus impure. We cannot make any such assumptions about it.

Functional programming is built on pure functions. In strictly functional languages, writing a not-pure function is hard and requires extra syntax to remind you that you’re doing something dirty. PHP is not a strictly functional language, so it’s on us as developers to self-check and ensure our functions are pure whenever possible.

In practice, ensuring you have pure functions is straightforward:

  1. Avoid global values. Always. Period.
  2. Avoid static values in functions. Pretend they do not exist.
  3. Avoid passing or returning values by reference. In practice, there are extremely few use cases for that anyway in PHP, as PHP automatically supports copy-on-write making passing large, read-only parameters very memory efficient. This becomes a bit trickier when passing around objects, but there are ways to handle that which we will see later.
  4. If you need to do IO (read from a database, print to a screen, make an HTTP request, etc.), have a function (procedure) that does just that and nothing else, and returns its value. That function is impure, as is its caller, but other functions called by its caller need not be.

For example, this function reads a value from a database and then does some processing on it:

 1 function get_user(Connection $db, int $id): array
 2 {
 3     $result = $db->query("SELECT * FROM users WHERE uid=:uid", [':uid' => $id]);
 4     $row = $result->fetchRow();
 5 
 6     if ($row == 'false') {
 7         return [];
 8     }
 9     if ($row['expired'] == 1) {
10       return [];
11     }
12 }

Whereas a more robust approach would split the non-fetch functionality off to its own routine:

 1 function fetch_user(Connection $db, int $id): array
 2 {
 3     $result = $db->query("SELECT * FROM users WHERE uid=:uid", [':uid' => $id]);
 4     return $result->fetchRow();
 5 }
 6 
 7 function validate_result($row): bool {
 8     if ($row == false) {
 9         return false;
10     }
11     if ($row['expired'] == 1) {
12       return false;
13     }
14     return true;
15 }
16 
17 function get_user(int $id)
18 {
19     // ...
20     $user = fetch_user($db, $id);
21     if (validate_result($user)) {
22       return $user;
23     }
24     else {
25         // Some kind of error checking.
26     }
27 }

validate_result() is now a pure function, and thus far more predictable and easier to test.

Because most of functional programming is based on pure functions, for the rest of this book, whenever we talk about “functions” assume we mean pure functions unless explicitly stated otherwise.

Binary Functions

A particular subset of functions are binary functions. Binary functions are functions which take two parameters of the same type and return an item of that type. That is, the following is a binary function:

1 function multiply(int $a, int $b): int
2 {
3     return $a * $b;
4 }

But this is not, because it does not return the same type:

1 function greater_than(int $a, int $b): bool
2 {
3     return $a > $b;
4 }

So far, that’s not all that interesting. However, there’s another subset of binary functions that are; specifically, binary functions can be associative and have an identity element.

A binary function is associative if the order of grouping doesn’t matter when chaining multiple calls together. Let’s look at multiplication again, this time with more traditional syntax.

1 $a * $b * $c * $d

Multiplying $a by $b, then by $c, then by $d produces the same result as multiplying $a by $b, $c by $d, and then those results together. Or, more visually:

1 $a * $b * $c * $d == ($a * $b) * ($c * $d)

An identity element is also known as a neutral element and is essentially the “no op” value. For multiplication over integers, the no-op value is 1. That is:

1 $a * 1 == $a

For every possible integer value of $a.

It turns out this combination of properties is rather useful in general. So useful that it has a fancy name: a monoid.

A “monoid” refers to a function that:

  • Is pure
  • Is a binary function (takes two parameters and returns one value, all of the same type)
  • Is associative
  • Has an identity value

More formally, we can say that multiplication is a monoid over integers. Stated more precisely, there is a monoid <ℤ, multiply, 1>: a set of values (integers), a binary operation (multiply), and an identity value (1). Multiplication may or may not be a monoid over other types of values, and for some types of values (e.g., turnips, employees, or shopping carts), it doesn’t even make sense.

While multiplication is fairly basic, the concept of a monoid applies to all sorts of types and operations. Consider list concatenation, for instance:

1 function concat(array $a, array $b): array {}

Is that a monoid? Let’s check the rules:

  • Is it pure? Check.
  • Is it binary? Check.
  • Is there an identity value? Yes, an empty array.
  • Is it associative?

We could go through a formal proof of associativity, or write tests to demonstrate it, but for now, let’s just logic it through. Given lists [1, 3], [5, 7], and [9, 11], is the following true?

1 concat(concat([1, 3], [5, 7]), [9, 11])) == concat([1, 3], concat([5, 7], [9, 11]))

Yep, it is. Both give the same result, [1, 3, 5, 7, 9, 11]. For now we’ll go with that intuition and conclude it is associative (spoiler alert: It really is.), so we can conclude concatenation is a monoid over lists; or, more formally, <lists, concat, []> is a monoid.

This concept will become more important later when we discuss value objects.