Table of Contents
Preface: Why Specter?
Specter for selecting
Clojure has a a nice function, get-in
, for looking deep into a data structure:
Let’s say that the second argument of get-in
is a selector, which is
composed of selector elements. For get-in
, every element must be a key
into an associative structure. (Remember that, in Clojure, vectors are
associative structures with integer keys, so 1
above is the same
sort of element as :a
and :b
.)
get-in
is a fine function, but it’s limited to traversal of associative structures to retrieve a single value (such as 4
).
Specter is more powerful than get-in
because it has more types of
selector elements. For example, ALL
lets you select many leaves of the structure.
Here’s
how you select all values of :b
from the structure above:
You can combine ALL
with arbitrary predicates to reduce the number
of selected values to those that satisfy the predicate:
As a final teaser, walker
is a selector for walking code. Here’s how you’d select every even integer in a nested expression:
Specter for updating
transform
is to select
as update-in
is to get-in
: you provide
a function that’s given each selected element, and the result is the
original input “modified” with the results of the function. The following transforms even integers by
multiplying them by 1000:
As you can see from above, any of the variety of selector elements you can use with select
can also be used with transform
.
There are also elements especially useful in transform
.
For example, consider this structure:
We can change the :a
values with the selector elements you already know:
… and do the same to the more-deeply-nested :y
values:
But how do we add 1000 times the :a
values to the corresponding :y
values? We can do that using the collect-one
element, which stashes
a value for later use. Before we get to its use with transform
,
consider collect-one
in a select
expression:
The result is not the :y
values (2 and 4), but rather pairs of
:a
and :y
values. (If you’re familiar with monads, you can think
of this as something like the
state monad.)
In a transform
expression, the transformation function is applied to each pair:
Specter is not as general-purpose as, say,
zippers,
because you can only visit a subtree once. (With zippers, you can
navigate to a point in the tree and then, based on the value there,
navigate backwards to revisit part of the tree.) However, many of the
problems you’d otherwise use zippers for can be solved with Specter,
often—I’d argue—more clearly. The above use of collect-one
is one of many examples.
Specter for learning
In order to learn how Specter works, I decided to grow it myself, feature by feature, using the existing implementation as a guide. As I did, I kept having little shocks of recognition: “Ah, continuation-passing style. Clever!” or “Ah, recursively precomputing the call graph—I’ve seen that before!” I was recognizing design patterns1, and it occurred to me that this book could also introduce those patterns to people who perhaps hadn’t seen them before.
Originally, I planned to discuss Specter’s design in a Part II of the book, about extending specter with your own selector elements. You need to understand something of the design to do that well. But as I was writing the first chapter, about the most basic Specter selectors, I realized that I could make that chapter a lot less dry by showing the design, and—in fact—by having you implement Specter yourself (or, if you prefer, follow along with a narrated implementation).
That was the moment when the title switched from Using and Extending Specter to Extending and Using Specter, and when the book became both a book about a useful tool and also a little bit of a book about software design.
- Some say functional languages don’t have design patterns. I’d argue that those people are confusing the idea of design patterns with the particular design patterns applicable to object-oriented languages.↩
How to Use this Book
I read some technical books while riding an exercise bike. I read others by carefully working through their exercises on my laptop. This book is designed to support both styles.
Reading-while-riding is adequate for a broad knowledge of a topic. I read Saša Jurić’s Elixir In Action on an exercise bike. As a result, I know what Elixir can do, but I don’t remember much of how. To be able to actually program in Elixir, much less build a real system, I’d have to sit down and write code. Typing things at a computer and seeing what it types back really helps memory and comprehension.
Specter API exercises
In order to make Specter features better stick in your mind, I encourage you to try out exercises at the REPL. Such exercises look like this:
Because you might be reading on an exercise bike, I immediately give you the answers to the exercise. They look like this:
That’s executable Clojure for the
Midje testing tool. Midje is
designed so that tests look like examples in Clojure books: an
expression you can type on the left-hand side, a separator—very
often some sort of arrow—and then the expected value following.
You can think of =>
expressions as very specific
facts from which you can readily deduce a more general fact1.
The namespace at the top of the solution is where the executable version lives in the book’s Github repo (https://github.com/marick/specter-book-code). You can run it if you like, using the instructions below.
Specter implementation exercises
As I said in the Preface, I encourage you to learn Specter’s design
more deeply by implementing it yourself. There are exercises that
guide you through that (and, immediately after, my solutions). You can
implement your solutions solely in the REPL, but you can also edit
files that live under the exercises
namespace. They have
boilerplate code to save you some typing. They also have tests so that
you can know you’ve gotten it right more easily than retyping the same
old examples in the REPL.
Because Midje is the bee’s knees2, exercise templates use it. That gives you three ways to check your work. The first is to use the command line to check the specific namespace you’re working in:
The WORK TO DO
message is because tests for unfinished code
are marked as facts that aren’t yet true:
Just delete the “future-“ to see this:
I personally am not patient enough to enjoy the amount of time it takes both Clojure and Midje to get loaded and start doing work, so I usually use “autotest”. There are two ways to do that. You can do it from the command line:
That will first run all the tests. Thereafter, whenever you save a changed file, it will reload it and thus run the tests again.
In my day-to-day work, I prefer to work in the REPL:
That starts by running all the tests, but then it gives you an ordinary REPL prompt. You can use the REPL as usual, but a background process monitors changed files and reloads them as necessary.
Thanks for the comments and corrections
- Bahadir Cambel
- Beau Fabry
- Jason Felice
Contacting me
Mail me at marick@exampler.com, tweet at @marick, or write to the book discussion page.
Please include the book’s version number as shown below.
Changelog
Version 2
- Update to Specter 0.11.X (which has API changes)
Understanding Specter By Building It
The simpler selectors contain keywords, predicates, and the special
token ALL
. Here’s a sample that extracts all the even values of all the :a
keys in a vector of maps:
In this chapter, you’ll see how each of those three kinds of elements work and are implemented.
Keywords
We’ll start with keywords. Before you can write code that mimics Specter, you have to know what it does. To find out, start it up and try out the following expressions.
Alternately, you can just look here:
It’s interesting that Specter returns a vector of matches even though
there can only one for a keyword. That makes selectors like these
consistent with ones that use ALL
.
Specter follows Clojure’s
somewhat idiosyncratic interpretation of keyword lookup in that you
can “look up” a keyword from anything—it’s never possible for get
of a keyword to fail.
You now have a “specification”, so…
Here’s my solution:
This is a recursive solution, which is somewhat un-idiomatic
Clojure1. The idiomatic loop ... recur
solution doesn’t
transform as smoothly into the way Specter actually works.
Predicates
Try out the following examples.
The interesting difference between predicates and keywords is that
keywords select, whereas predicates filter. As a result of that, a
predicate selector can return a “nothing” (nil
) value. Keywords
always return some kind of vector.
This is my solution:
To illustrate the difference between the two solutions, here they are, combined:
In both cases, the last thing the function does when it runs out of
selector is to wrap the final result in a vector. Both variants are
recursive, but the fn?
clause has an extra escape hatch: it returns
nil
on failure (not wrapping that result in a vector). It
also just passes the original structure
along, rather than a piece of
it (as the keyword case does).
Extensibility
Our select-both
isn’t a very good design because it makes it awkward
to add new kinds of selector elements. The typical Clojure approach
to this sort of extensibility is protocols.
[[[Add short appendix on protocols.]]] Here’s a protocol that can be
specialized to each kind of selector element:
Its signature looks almost the same as that of select-both
, except
that the two parts of the selector are passed in separately instead of
destructured with [x & xs :as selector]
That’s required because
protocol functions are specialized according to the type of the first
argument. Now a generic select
can be written like this:
… and the protocols can be implemented like this:
That works, but it’s ugly:
-
select*
takes aremainder
argument that it does nothing with but pass along. -
select
knows aboutselect*
, andselect*
in turn knows aboutselect
(because it calls it). Smearing information throughout an implementation is generally a poor idea.
Let’s start working on these problems by looking at the stack trace for
a call to select
with selector [:a :b]
. Time flows downward here:
An interesting thing is that the third argument is irrelevant to how the stack trace evolves. Here’s the trace for select [:a :b] {:a {}}
:
That suggests something of a crazy idea. Suppose we convert this two-argument function call:
… into a pair of one-argument calls:
predict-select-computation
will convert [:a :b]
into something
with roughly this computational shape:
We can implement that using a design pattern called continuation-passing style. It works like this:
Consider the expression (odd? (inc (+ 2 3)))
. We can express that as a series of let
steps:
We can generalize that: any computation can be expressed as “evaluate some subexpression, then do the rest of the computation with the result of that subexpression”.
Now, let
isn’t a required part of a functional language. You could
take it completely out of Clojure and lose no computational
power. That’s because anything you can do with let
, you can do with
fn
. The following computes the same result:
Alternately, you could also remove the use of the ->
macro and get pure, unadorned, read-them-backwards function calls:
I prefer the first form because it more obviously suggests a
simplification. What if, instead of using the ->
macro to feed a
result into another function, we gave every clojure.core
function a
variant with an additional argument: a function that represents “the
rest of the computation”. (Such a function is typically called a
continuation). If you’ll allow me to pretend +
only takes two
arguments, we’d have new functions like these:
We can now write (odd? (inc (+ 2 3)))
like this:
Well. We’re obviously not going to give every clojure.core
function
an alter ego that takes a continuation, so how is this relevant to our goal?
Like this: what if we changed the
signature of select*
from this:
… to this:
Then the result of (predict-select-computation [:a :b])
could be this:
Once predict-select-computation
is implemented, we’ll have
eliminated the deficiencies of the previous version:
- There’s no need to explicitly pass around the
remainder
of the selector. Indeed the idea of the selector has completely vanished from (“been compiled out of”) the code. - We’ve
also lost the intermediate call to
select
. Its job—determining if we were at the end of the computation and need to return a vector of the result—has been computed and turned into the use ofvector
as the final continuation.
But wait! There’s more:
- In Specter proper, though perhaps not in this implementation (at least not yet), precomputing a path function and then using it is substantially faster than building the same computational structure at runtime. (“Substantially” here can mean a more than 10X speedup for some paths and some data, and a 3X speedup for simpler cases.)
- If we’re going to reuse the
[:a :b]
path with many structures, we can avoid recomputing it each time. Instead of this:… we can have this:
… giving even more efficiency gains.
A solution:
Predictive vs. runtime protocol lookup
predict-select-computation
produces a frozen computation like this:
select*
is a function polymorphic in the type of its first
argument. That means we can think of it as having, in effect, this
implementation:
But every call to selector-function-for
is predetermined by the
specific selector given to predict-select-computation
. Given the
selector [:a even?]
, we know that the first argument to the first
select*
is :a
, and the first argument to the second select*
is
even?
. So why shouldn’t selector-function-for
do the computation
and save select*
the trouble?
That is, instead of building this:
… we want to build something with specific select*
functions substituted in:
That’s certainly more … cryptic … but is it worth the trouble?
As it turns out, yes. In some extremely crude tests, I saw a speedup of around 2.8X.
The specific details of how Specter goes from a particular value (like
the keyword :a
or the function even?
) to a raw selection function
aren’t particularly relevant here, and they use some undocumented (but
public) clojure.core
functions. If you still want to see them, I’ve
written a
simplified version of Specter’s implementation.
The important thing here is that the protocol and signature of
select*
haven’t changed, but Specter has still managed to squeeze
out a substantial speedup.
ALL
Specter also supplies an ALL
selector element. Try out these examples to see how it works:
I think of ALL
as mapping the rest
of the selector over each
element of the structure it receives, then glueing all the results
together. It seems like that could be easily done: just add a
select*
protocol function to ALL
, much as we’ve done on keywords
and functions. And indeed it is, once a few formalities are taken care of:
- Since protocols are selected by the type of an value, we must
arrange for
ALL
to be of a distinct type (as keywords and functions are). That can be done like this: - We need to define
ALL
’s type like this: - Finally, we need to add the
AllType
implementation ofselect*
(If you’re fluent with protocols, you may wonder why we don’t define
select*
when we define the type:The reason is that the implementation-dependent lookup of specific protocol functions doesn’t work for functions defined that way.)
All that given, …
Here’s my solution:
Specter takes another stab at increasing performance by using reducers for performance. We won’t worry about that in this book.
- When a function calls itself as the very last action in a path through its code, that’s called tail-recursive. In many languages, the compiler automatically converts those calls into more efficient loops. Clojure doesn’t do that, which is why it has a special
loop ... recur
construct.↩