Understanding Specter By Building It
The simpler paths contain keywords, predicates, and the special
token ALL. Here’s a sample that extracts all the even values of all the :a
keys in a vector of maps:
user=> (select [ALL :a even?]
[ {:a 1} {:a 2} {:a 3} {:a 4}])
=> [2 4]
In this chapter, you’ll see how the “navigators” for each of those three kinds of path elements work.
Keywords
We’ll start with keywords. Before you can write code that mimics Specter, you have to know what it does. To find out, start it up and try out the following expressions.
Alternately, you can just look here:
It’s interesting that Specter returns a vector of matches even though
there can be only one for a keyword. That makes paths like these
consistent with ones that use ALL.
Specter follows Clojure’s
somewhat idiosyncratic interpretation of keyword lookup in that you
can “look up” a keyword from anything—it’s never possible for get
of a keyword to fail.
You now have a “specification”, so…
Here’s my solution:
This is a recursive solution, which is somewhat un-idiomatic
Clojure4. The idiomatic loop ... recur solution doesn’t
transform as smoothly into the way Specter actually works.
Predicates
Try out the following examples.
The interesting difference between predicates and keywords is that
keywords select, whereas predicates filter. As a result of that, a
predicate can return a “nothing” (nil) value. Keywords
always return some kind of vector.
This is my solution:
To illustrate the difference between the two solutions, here they are, combined:
(defn select-both [[x & xs :as path] structure]
(cond (empty? path)
(vector structure)
(keyword? x)
(select-both xs (get structure x))
(fn? x)
(if (x structure)
(select-both xs structure)
nil)))
In both cases, the last thing the function does when it runs out of
path is to wrap the final result in a vector. Both variants are
recursive, but the fn? clause has an extra escape hatch: it returns
nil on failure (not wrapping that result in a vector). It
also just passes the original structure along, rather than a piece of
it (as the keyword case does).
Extensibility
Our select-both isn’t a very good design because it makes it awkward
to add new kinds of path elements. The typical Clojure approach
to this sort of extensibility is protocols.
[[[Add short appendix on protocols.]]] Here’s a protocol that can be
specialized to each kind of path element:
(defprotocol Navigator
(select* [this remainder structure]))
Its signature looks almost the same as that of select-both, except
that the two parts of the path are passed in separately instead of
destructured with [x & xs :as path] That’s required because
protocol functions are specialized according to the type of the first
argument. Now a generic select can be written like this:
(defn select [[x & xs :as path] structure]
(if (empty? path)
(vector structure)
(select* x xs structure)))
… and the protocols can be implemented like this:
(extend-type clojure.lang.Keyword
Navigator
(select* [this remainder structure]
(select remainder (get structure this))))
(extend-type clojure.lang.AFn
Navigator
(select* [this remainder structure]
(if (this structure)
(select remainder structure)
nil)))
That works, but it’s ugly:
-
select*takes aremainderargument that it does nothing with but pass along. -
selectknows aboutselect*, andselect*in turn knows aboutselect(because it calls it). Smearing information throughout an implementation is generally a poor idea.
Let’s start working on these problems by looking at the stack trace for
a call to select with path [:a :b]. Time flows downward here:
(select [:a :b] {:a {:b 1}})
(select* :a [:b] {:a {:b 1}})
(select [:b] {:b 1})
(select* :b [] {:b 1})
(select [] 1)
(vector 1)
An interesting thing is that the third argument is irrelevant to how the stack trace evolves. Here’s the trace for select [:a :b] {:a {}}:
(select [:a :b] {:a { }}
(select* :a [:b] {:a { }}
(select [:b] { }
(select* :b [] { }
(select [] nil)
(vector nil)
That suggests something of a crazy idea. Suppose we convert this two-argument function call:
(select [:a :b] {:a {}})
… into a pair of one-argument calls:
( (predict-select-computation [:a :b])
{:a {}})
predict-select-computation will convert [:a :b] into something
with roughly this computational shape:
(fn [structure] ... (select* :a ... (select* :b ... (vector ...
We can implement that using a design pattern called continuation-passing style. It works like this:
Consider the expression (odd? (inc (+ 2 3))). We can express that as a series of let steps:
(let [result (+ 2 3)
result (inc result)
result (odd? result)]
result)
We can generalize that: any computation can be expressed as “evaluate some subexpression, then do the rest of the computation with the result of that subexpression”.
Now, let isn’t a required part of a functional language. You could
take it completely out of Clojure and lose no computational
power. That’s because anything you can do with let, you can do with
fn. The following computes the same result:
(-> (+ a b)
((fn [result] (-> (inc result)
((fn [result] (-> (odd? result)
((fn [result] result))))))))))
Alternately, you could also remove the use of the -> macro and get pure, unadorned, read-them-backwards function calls:
((fn [result] result)
((fn [result] (odd? result))
((fn [result] (inc result))
(+ a b)))))
I prefer the first form because it more obviously suggests a
simplification. What if, instead of using the -> macro to feed a
result into another function, we gave every clojure.core function a
variant with an additional argument: a function that represents “the
rest of the computation”. (Such a function is typically called a
continuation). If you’ll allow me to pretend clojure.core/+ only takes two
arguments, we’d have new functions like these:
(defn cont-+ [a b continuation]
(continuation (+ a b)))
(defn cont-inc [n continuation]
(continuation (inc n)))
(defn cont-odd? [n continuation]
(continuation (odd? n)))
We can now write (odd? (inc (+ 2 3))) like this:
(cont-+ 2 3
(fn [result]
(cont-inc result
(fn [result]
(cont-odd? result
identity)))))
Well. We’re obviously not going to give every clojure.core function
an alter ego that takes a continuation, so how is this relevant to our goal?
Like this: what if we changed the
signature of select* from this:
(select* [this remainder structure]))
… to this:
(select* [this structure continuation]))
Then the result of (predict-select-computation [:a :b]) could be this:
(fn [structure]
(select* :a structure
(fn [structure]
(select* :b structure
vector))))
Once predict-select-computation is implemented, we’ll have
eliminated the deficiencies of the previous version:
- There’s no need to explicitly pass around the
remainderof the path. Indeed the idea of the path has completely vanished from (“been compiled out of”) the code. - We’ve
also lost the intermediate call to
select. Its job—determining if we were at the end of the computation and need to return a vector of the result—has been computed and turned into the use ofvectoras the final continuation.
But wait! There’s more:
- In Specter proper, though perhaps not in this implementation (at least not yet), precomputing a path function and then using it is substantially faster than building the same computational structure at runtime. (“Substantially” here can mean a more than 10X speedup for some paths and some data, and a 3X speedup for simpler cases.)
- If we’re going to reuse the
[:a :b]path with many structures, we can avoid recomputing it each time. Instead of this:(map(partial select[:a:b])[...]… we can have this:
(map(predict-select-computation[:a:b])[...])… giving even more efficiency gains5.
A solution:
Predictive vs. runtime protocol lookup
predict-select-computation produces a frozen computation like this:
(fn [structure]
(select* :a
structure
(fn [structure]
(select* even?
structure
vector)))))
select* is a function polymorphic in the type of its first
argument. That means we can think of it as having, in effect, this
implementation:
;; Forget about protocol Navigator
(defn path-function-for [this]
(cond (keyword? this)
(fn [this structure continuation]
(continuation (get structure this)))
(fn? this)
(fn [this structure continuation]
(if (this structure)
(continuation structure)
nil))))
;; There's now only one version of select*:
(defn select* [this & args]
(apply (path-function-for this) this args))
But every call to path-function-for is predetermined by the
specific path given to predict-select-computation. Given the
path [:a even?], we know that the first argument to the first
select* is :a, and the first argument to the second select* is
even?. So why shouldn’t path-function-for do the computation
and save select* the trouble?
That is, instead of building this:
(fn [structure]
(select* :a
structure
(fn [structure]
(select* even?
structure
vector)))))
… we want to build something with specific select* functions substituted in:
(fn [structure]
( (fn [this structure continuation]
(continuation (get structure this)))
:a structure
( (fn [this structure continuation]
(if (this structure)
(continuation structure)
nil))
even? structure
vector)))
That’s certainly more … cryptic … but is it worth the trouble?
As it turns out, yes. In some extremely crude tests, I saw a speedup of around 2.8X.
The specific details of how Specter goes from a particular path element (like
the keyword :a or the function even?) to a raw function
aren’t particularly relevant here, and they use some undocumented (but
public) clojure.core functions. If you still want to see them, I’ve
written a
simplified version of Specter’s implementation.
The important thing here is that the protocol and signature of
select* haven’t changed, but Specter has still managed to squeeze
out a substantial speedup6.
The rest of the book won’t depend on these performance-enhancing details. While performance is important, this is not a book about squeezing maximum performance out of Clojure. I’ve included this section to remind you it’s often good to push runtime work into a “compile time” step7.
ALL
Specter also supplies an ALL path element. Try out these examples to see how it works:
I think of ALL as mapping the rest of the path over each
element of the structure it receives, then glueing all the results
together. It seems like that could be easily done: just add a
select* protocol function to ALL, much as we’ve done on keywords
and functions. And indeed it is, once a few formalities are taken care of:
- Since protocols are selected by the type of an value, we must
arrange for
ALLto be of a distinct type (as keywords and functions are). That can be done like this:(defALL(->AllType)); `ALL` is the only instance of `AllType` - We need to define
ALL’s type like this:(deftypeAllType[]) - Finally, we need to add the
AllTypeimplementation ofselect*(extend-typeAllTypeNavigator(select*[thisstructurecontinuation]...))(If you’re fluent with protocols, you may wonder why we don’t define
select*when we define the type:(deftypeAllType[]Navigator(select*[thisstructurecontinuation]...))The reason is that the implementation-dependent lookup of specific protocol functions doesn’t work for functions defined that way.)
All that given, …
Here’s my solution:
Specter takes another stab at increasing performance by using reducers for performance. We won’t worry about that in this book.