Understanding Specter By Building It

The simpler paths contain keywords, predicates, and the special token ALL. Here’s a sample that extracts all the even values of all the :a keys in a vector of maps:

user=> (select [ALL :a even?]
               [ {:a 1} {:a 2} {:a 3} {:a 4}])
=> [2 4]

In this chapter, you’ll see how the “navigators” for each of those three kinds of path elements work.

Keywords

We’ll start with keywords. Before you can write code that mimics Specter, you have to know what it does. To find out, start it up and try out the following expressions.

Alternately, you can just look here:

It’s interesting that Specter returns a vector of matches even though there can be only one for a keyword. That makes paths like these consistent with ones that use ALL.

Specter follows Clojure’s somewhat idiosyncratic interpretation of keyword lookup in that you can “look up” a keyword from anything—it’s never possible for get of a keyword to fail.

You now have a “specification”, so…

Here’s my solution:

This is a recursive solution, which is somewhat un-idiomatic Clojure4. The idiomatic loop ... recur solution doesn’t transform as smoothly into the way Specter actually works.

Predicates

Try out the following examples.

The interesting difference between predicates and keywords is that keywords select, whereas predicates filter. As a result of that, a predicate can return a “nothing” (nil) value. Keywords always return some kind of vector.

This is my solution:

To illustrate the difference between the two solutions, here they are, combined:

(defn select-both [[x & xs :as path] structure]
  (cond (empty? path)
        (vector structure)

        (keyword? x)
        (select-both xs (get structure x))

        (fn? x)
        (if (x structure)
          (select-both xs structure)
          nil)))

In both cases, the last thing the function does when it runs out of path is to wrap the final result in a vector. Both variants are recursive, but the fn? clause has an extra escape hatch: it returns nil on failure (not wrapping that result in a vector). It also just passes the original structure along, rather than a piece of it (as the keyword case does).

Extensibility

Our select-both isn’t a very good design because it makes it awkward to add new kinds of path elements. The typical Clojure approach to this sort of extensibility is protocols. [[[Add short appendix on protocols.]]] Here’s a protocol that can be specialized to each kind of path element:

(defprotocol Navigator
  (select* [this remainder structure]))

Its signature looks almost the same as that of select-both, except that the two parts of the path are passed in separately instead of destructured with [x & xs :as path] That’s required because protocol functions are specialized according to the type of the first argument. Now a generic select can be written like this:

(defn select [[x & xs :as path] structure]
  (if (empty? path)
    (vector structure)
    (select* x xs structure)))

… and the protocols can be implemented like this:

(extend-type clojure.lang.Keyword
  Navigator
  (select* [this remainder structure]
    (select remainder (get structure this))))

(extend-type clojure.lang.AFn
  Navigator
  (select* [this remainder structure]
    (if (this structure)
      (select remainder structure)
      nil)))

That works, but it’s ugly:

  1. select* takes a remainder argument that it does nothing with but pass along.
  2. select knows about select*, and select* in turn knows about select (because it calls it). Smearing information throughout an implementation is generally a poor idea.

Let’s start working on these problems by looking at the stack trace for a call to select with path [:a :b]. Time flows downward here:

(select          [:a    :b]             {:a {:b 1}})
  (select*        :a   [:b]             {:a {:b 1}})
    (select            [:b]                 {:b 1})
      (select*          :b   []             {:b 1})
        (select              []                 1)
          (vector                               1)

An interesting thing is that the third argument is irrelevant to how the stack trace evolves. Here’s the trace for select [:a :b] {:a {}}:

(select          [:a    :b]             {:a {   }}
  (select*        :a   [:b]             {:a {   }}
    (select            [:b]                 {   }
      (select*          :b   []             {   }
        (select              []              nil)
          (vector                            nil)

That suggests something of a crazy idea. Suppose we convert this two-argument function call:

(select [:a :b] {:a {}})

… into a pair of one-argument calls:

( (predict-select-computation [:a :b])
  {:a {}})

predict-select-computation will convert [:a :b] into something with roughly this computational shape:

(fn [structure] ... (select* :a ... (select* :b ... (vector ...

We can implement that using a design pattern called continuation-passing style. It works like this:

Consider the expression (odd? (inc (+ 2 3))). We can express that as a series of let steps:

(let [result (+ 2 3)
      result (inc result)
      result (odd? result)]
  result)

We can generalize that: any computation can be expressed as “evaluate some subexpression, then do the rest of the computation with the result of that subexpression”.

Now, let isn’t a required part of a functional language. You could take it completely out of Clojure and lose no computational power. That’s because anything you can do with let, you can do with fn. The following computes the same result:

  (-> (+ a b)
      ((fn [result] (-> (inc result)
                        ((fn [result] (-> (odd? result)
                                          ((fn [result] result))))))))))

Alternately, you could also remove the use of the -> macro and get pure, unadorned, read-them-backwards function calls:

  ((fn [result] result)
   ((fn [result] (odd? result))
    ((fn [result] (inc result))
     (+ a b)))))

I prefer the first form because it more obviously suggests a simplification. What if, instead of using the -> macro to feed a result into another function, we gave every clojure.core function a variant with an additional argument: a function that represents “the rest of the computation”. (Such a function is typically called a continuation). If you’ll allow me to pretend clojure.core/+ only takes two arguments, we’d have new functions like these:

(defn cont-+ [a b continuation]
  (continuation                  (+ a b)))
(defn cont-inc [n continuation]
  (continuation                  (inc n)))
(defn cont-odd? [n continuation]
  (continuation                  (odd? n)))

We can now write (odd? (inc (+ 2 3))) like this:

(cont-+ 2 3
        (fn [result]
          (cont-inc result
                    (fn [result]
                      (cont-odd? result
                                 identity)))))

Well. We’re obviously not going to give every clojure.core function an alter ego that takes a continuation, so how is this relevant to our goal?

Like this: what if we changed the signature of select* from this:

(select* [this remainder structure]))

… to this:

(select* [this structure continuation]))

Then the result of (predict-select-computation [:a :b]) could be this:

(fn [structure]
  (select* :a structure
           (fn [structure]
             (select* :b structure
                      vector))))

Once predict-select-computation is implemented, we’ll have eliminated the deficiencies of the previous version:

  1. There’s no need to explicitly pass around the remainder of the path. Indeed the idea of the path has completely vanished from (“been compiled out of”) the code.
  2. We’ve also lost the intermediate call to select. Its job—determining if we were at the end of the computation and need to return a vector of the result—has been computed and turned into the use of vector as the final continuation.

But wait! There’s more:

  1. In Specter proper, though perhaps not in this implementation (at least not yet), precomputing a path function and then using it is substantially faster than building the same computational structure at runtime. (“Substantially” here can mean a more than 10X speedup for some paths and some data, and a 3X speedup for simpler cases.)
  2. If we’re going to reuse the [:a :b] path with many structures, we can avoid recomputing it each time. Instead of this:
     (map (partial select [:a :b]) [...]
    

    … we can have this:

     (map (predict-select-computation [:a :b]) [...])
    

    … giving even more efficiency gains5.

A solution:

Predictive vs. runtime protocol lookup

predict-select-computation produces a frozen computation like this:

(fn [structure]
  (select* :a
           structure
           (fn [structure]
             (select* even?
                      structure
                      vector)))))

select* is a function polymorphic in the type of its first argument. That means we can think of it as having, in effect, this implementation:

;; Forget about protocol Navigator

(defn path-function-for [this]
  (cond (keyword? this)
        (fn [this structure continuation]
          (continuation (get structure this)))

        (fn? this)
        (fn [this structure continuation]
          (if (this structure)
            (continuation structure)
            nil))))

;; There's now only one version of select*:
(defn select* [this & args]
  (apply (path-function-for this) this args))

But every call to path-function-for is predetermined by the specific path given to predict-select-computation. Given the path [:a even?], we know that the first argument to the first select* is :a, and the first argument to the second select* is even?. So why shouldn’t path-function-for do the computation and save select* the trouble?

That is, instead of building this:

(fn [structure]
  (select* :a
           structure
           (fn [structure]
             (select* even?
                      structure
                      vector)))))

… we want to build something with specific select* functions substituted in:

(fn [structure]
  (  (fn [this structure continuation]
       (continuation (get structure this)))
     :a structure
     (  (fn [this structure continuation]
          (if (this structure)
            (continuation structure)
            nil))
        even? structure
        vector)))

That’s certainly more … cryptic … but is it worth the trouble?

As it turns out, yes. In some extremely crude tests, I saw a speedup of around 2.8X.

The specific details of how Specter goes from a particular path element (like the keyword :a or the function even?) to a raw function aren’t particularly relevant here, and they use some undocumented (but public) clojure.core functions. If you still want to see them, I’ve written a simplified version of Specter’s implementation.

The important thing here is that the protocol and signature of select* haven’t changed, but Specter has still managed to squeeze out a substantial speedup6.

The rest of the book won’t depend on these performance-enhancing details. While performance is important, this is not a book about squeezing maximum performance out of Clojure. I’ve included this section to remind you it’s often good to push runtime work into a “compile time” step7.

ALL

Specter also supplies an ALL path element. Try out these examples to see how it works:

I think of ALL as mapping the rest of the path over each element of the structure it receives, then glueing all the results together. It seems like that could be easily done: just add a select* protocol function to ALL, much as we’ve done on keywords and functions. And indeed it is, once a few formalities are taken care of:

  1. Since protocols are selected by the type of an value, we must arrange for ALL to be of a distinct type (as keywords and functions are). That can be done like this:
     (def ALL (->AllType)) ; `ALL` is the only instance of `AllType`
    
  2. We need to define ALL’s type like this:
     (deftype AllType [])
    
  3. Finally, we need to add the AllType implementation of select*
     (extend-type AllType
       Navigator
       (select* [this structure continuation]
         ...))
    

    (If you’re fluent with protocols, you may wonder why we don’t define select* when we define the type:

     (deftype AllType []
       Navigator
       (select* [this structure continuation]
         ...))
    

    The reason is that the implementation-dependent lookup of specific protocol functions doesn’t work for functions defined that way.)

All that given, …

Here’s my solution:

Specter takes another stab at increasing performance by using reducers for performance. We won’t worry about that in this book.