Table of Contents
A note
Read-Eval-Print-λove is an N-monthly zine of original content and curation about the Lisp family of programming languages and little-languages in general.
Pity
One day a macrolyte approached the master macrologist Pi Mu and quoted to him a poem:
The voices of torrents are from one great LISP dialect, the lions of the hills are the pure body of McCarthy.
Looking expectantly at Pi Mu the macrolyte exclaimed:
Isn’t that right?
Pi Mu looked at the macrolyte and, in a low tone, told him:
It is … But it’s a pity to say so.
Dedication
Introduction - UrLISP
So you want to create a Lisp? In this installment I’ll talk a little about the creation of new Lisps.
The infinitely quotable Erik Naggum had this to say about the desire to reinvent Lisp:1
… yet another demonstration of the problem of too strong an ego that cannot live with what somebody else has made.
However, in the same message Erik Naggum also said this:
Please let each person determine what is the “greater use” of their own energy, as long as it is not harmful.
The phrase above has, for better or worse, become a mantra by which many people present as a justification for doing things that are detrimental to themselves and ultimately others. Indeed, Naggum spoke directly to this condition when he concluded his post with a very poignant diatribe on the rabid drive toward reinvention in IT:
Why is the IT world defined by people who are unable to deal with the concepts of continuity, longevity, and design for long-term stability? Why do they believe they are so much smarter than everything that has gone before them, when they clearly are not and repeat mistakes that take years to undo, if not replaced by another stupid “revolution” that itself starts from scratch?
In this light, how could anyone justify the creation of yet another Lisp? First of all, Naggum spoke from a position that could be summarized as: Common Lisp is the most powerful programming language in the world and allows a programmer absolute freedom in accomplishing any of their goals. While I tend to agree with the spirit of Naggum’s position, I tend to disagree that just because Common Lisp allows you to do anything2 doesn’t mean that you should use it for everything. Sometimes a focused language is a better choice than one that gives you absolute freedom. That said, this installment is not about replacing Common Lisp with any other language. Instead, in this issue I’ll discuss the importance of understanding Lisp, especially its origins and original implementation.
So I now posit the following question: How can we avoid the problem of reinvention?
My answer to this question is simple: through the act of reinvention can we avoid the problem of reinvention.
Let me rephrase the seemingly Discordian statement above. That is, in order to avoid the problem of reinvention we need to fully (or to the best of our abilities) understand what came before. In my opinion, the best way to understand what came before is to reinvent it.
Caveat emptor
The Lisp implementation described in this installment is a snapshot in time. Therefore, the code herein might not match the code at http://www.github.com/fogus/lithp exactly. Indeed, the codebase online is subject to my changing fancies and at times may be a WiP. That said, this essay is a living document, so it too is subject to change. However, this essay is designed to stand alone and any changes to it will work to foster that goal.
Lithp
In late 2008 I had an idea to write a database in Clojure. I had just discovered Clojure and wanted a project to cut my teeth on. At the time, my previous true exposure to Lisp was during my college years and that exposure was constrained to XLISP, Common Lisp and various Schemes. I had hacked away at a bit of ELisp after college, but even that was done in the spirit of immediacy. I knew that I wanted to try Clojure, but I wanted to learn more than just its API. What I wanted was to learn why it existed in the first place.
To get to an understanding of why Clojure was created I decided to start at the beginning. In hindsight that might not have been the most efficient way of coming to an answer. That said, what came out of that exploration3 was that my database turned into a tiny Lisp interpreter named Lithp,4 written in Python.5
The Lithp interpreter provides the absolute core functions of McCarthy’s original ideal, as outlined in his classical paper, Recursive functions of symbolic expressions and their computation by machine, part I. That is, there are only seven functions and two special forms.
The Magnificent Seven
- atom
- car
- cdr
- cond
- cons
- eq
- quote
Two special forms
- label
- lambda
In addition to the Magnificent Seven and the two special forms I’ll also implement a variant of Lithp called Lithp’ using Lithp itself.
Implementation notes
If you’re not familiar with Python then I apologize. I’ve tried to write as clear code as possible, but much of the style is circa 2008-2010 Python. If you’re interested in helping to maintain and/or modernize the Lithp code-base then please do consider sending Github pull-requests to the repository at https://github.com/fogus/lithp.
The code herein is structured and described at the level of individual Python modules and Lisp source. The parts that I will cover are as follows:
- atom.py - Code implementing and dealing with symbols
- env.py - Code implementing dynamic binding and lookup
- fun.py - Code implementing lambda functions
- lisp.py - Code for the maginificent seven
- lithp.py - Code implementing the Lithp REPL and code processor
- reader.py - Code implementing a Lisp reader
- seq.py - Code implementing pseudo-Cons-cells via Python lists
- core.lisp - Lisp in Lithp
I will not go over the entirety of the implementation because the code is both freely available and there are some parts that have nothing to do with Lispiness. That is, the code discussed in this installment are the interesting bits.
With all of that said — let’s begin.
The best place to start learning about Lithp and its implementation of McCarthy’s original vision is probably the env.py source file, which implements Lithp’s dynamic binding environment.
env.py
The Environment
class represents the dynamic environment of McCarthy’s original Lisp. The creation of this class is actually an interesting story. As many of you probably know, Paul Graham wrote a paper and
code for McCarthy’s original Lisp 6 and it was my first exposure to the stark simplicity of the language. The simplicity is breath-taking – not to put too fine a point on it.
However, while playing around with the code I found that in using the core functions (i.e. null.
, not.
, etc.) I was not experiencing the full effect of the original. That is, the original Lisp was dynamically scoped, but the Common Lisp used to implement and run (CLisp in the latter case) Graham’s code was lexically scoped. Therefore, by attempting to write high-level functions using only the Magnificent Seven and Graham’s core functions in the Common Lisp
I was taking advantage of lexical scope; something not available to McCarthy and company. Of course, the whole reason that Graham wrote eval.
was to enforce dynamic scoping (he used a list of symbol-value pairs where the dynamic variables were added to its front when introduced; which I will adopt later):
1
(
eval
.
'
a
'
((
a
1
)
(
a
2
)))
2
;
=>
1
So I then implemented a simple REPL in Common Lisp that fed input into eval.
and maintained the current environment list. That was fun, but I wasn’t sure that I was learning anything at all. Eventually I decided to try to implement my own core environment for the Magnificent Seven to try and get a feel for what it took to build a simple language up from scratch. I suppose if I were a purist then I would have found an IBM 704, but that would be taking this exercise to the extreme. (email me if you have one that you’d like to sell for cheap)
Anyway, minus a 704, I needed to start with creating an Environment
that provided dynamic scoping, and the result is monstrosity 7 described herein.
class Environment: def init(self, par=None, bnd=None): if bnd: self.binds = bnd else: self.binds = {}
1
self
.
parent
=
par
2
3
if
par
:
4
self
.
level
=
self
.
parent
.
level
+
1
5
else:
6
self
.
level
=
0
The code fragment above sets up the dynamic binding environment by establishing a contextual binding map named binds
that is a simple lookup from a Symbol
type (described later) to a Lithp value type (e.g. a Lambda
or Symbol
). In addition, the stack discipline of classical dynamic binding is emulated through the parent
link. The parent link is simply an instance of another Environment
or None
if the constructed environment is considered a root binding context.
This dynamic stack discipline is on full display when one attempts to lookup a binding within an Environment
:
1
def
get
(
self
,
key
)
:
2
if
key
in
self
.
binds
:
3
return
self
.
binds
[
key
]
4
elif
self
.
parent
:
5
return
self
.
parent
.
get
(
key
)
6
else:
7
raise
ValueError
(
"Invalid symbol "
+
key
)
The first check in a dynamic lookup is always to see if a binding exists in the current context. However, ultimately retrieving a binding potentially requires the traversal of the parent link, as shown in the elif
case of the lookup check. You can see that this lookup descends down a chain of Environment
instances until a binding is found, or the search bottoms-out at a root context. This is the simplest possible binding scenario and naturally the one that McCarthy and his team settled on for the initial Lisp interpreter.
Now that you understand the lookup of a dynamic binding, the establishment of a new binding should not be a mystery:
1
def
set
(
self
,
key
,
value
)
:
2
if
key
in
self
.
binds
:
3
self
.
binds
[
key
]
=
value
4
elif
self
.
parent
:
5
self
.
parent
.
set
(
key
,
value
)
6
else:
7
self
.
binds
[
key
]
=
value
That is, the process of creating a new binding is symmetric to the process of finding.
There are or course useful functions that make the process of dealing with Environments more straight-forward:
1
def
definedp
(
self
,
key
)
:
2
if
key
in
self
.
binds
.
keys
()
:
3
return
True
4
return
False
The definedp
method just traverses the binding stack and returns True
or False
if the binding is found, or not, respectively.
Manipulating the dynamic environment stack
You may recall that the constructor of the Environment
class took a preexisting binding and assigned it as the parent
link. This, for all of its faults is a stack. Likewise, the logic of the #put
and #get
methods ensured that these links were traversed when creating and looking up a binding. Rather than manipulating the Environment#parent
links directly via construction, I decided to create a convenience method #push
that returns a new Environment
instance given an existing instance. Perhaps this is unnecessary, but I personally find this approach a little cleaner.
1
def
push
(
self
,
bnd
=
None
)
:
2
return
Environment
(
self
,
bnd
)
Again, whenever a variable is created its binding is pushed onto a stack. In this case, the stack is simulated through a chain of parent links. So if you were to create the following:
1
(
label
a
nil
)
2
(
label
frobnicate
(
lambda
()
(
cons
a
nil
)))
3
4
((
lambda
(
a
)
5
(
frobnicate
))
6
(
quote
x
))
Then the stack would look like the figure below within the body of frobnicate
:
1
|
|
2
|
|
3
|
a
=
'
x
|
4
|
-------
|
5
|
a
=
nil
|
6
+---------+
Meaning that when accessing a
, frobnicate
will get the binding at the top of the stack, producing the result (x)
. This push/pop can become difficult, so people have to do all kinds of tricks to avoid confusion (i.e. pseudo-namespace via variable naming schemes — if you’ve ever written classic ELisp, then maybe you’ll remember this approach).
Of course, any time you see a push, a pop is soon to follow:
1
def
pop
(
self
)
:
2
return
self
.
parent
And that’s it for the Environment
class. Looking back on this code it astonishes me how simple the dynamic environment scheme is — elegant almost. In an attempt to maintain this train of thought, I’d like to explore how the dynamic environment interacts with Lisp functions, both anonymous and not.
fun.py
As you might have imagined, McCarthy’s Lisp derives much of its power from the function. The Function
class is used exclusively for built-in functions (i.e. the Magnificent Seven). Each core function is implemented as a regular Python method, each taking an Environment
and its arguments.
1
class
Function
(
Eval
)
:
2
def
__init__
(
self
,
fn
)
:
3
self
.
fn
=
fn
4
5
def
eval
(
self
,
env
,
args
)
:
6
return
self
.
fn
(
env
,
args
)
The implementation of the default function class is as simple as can be. That is, I’m cheating a bit and delegating the functionality of execution out to a Python function or lambda. However, bear in mind that this is just the tip of the iceberg. Lisp really gets interesting.
The real power of McCarthy’s Lisp springs from the influence of Alonzo Church’s λ-calculus. While the original Lisp implementation was not entirely adherent to the dictates of the λ-calculus, the spirit of Church was in attendance. First, the initializer of the Lambda
class is trivial:
1
class
Lambda
(
Eval
)
:
2
def
__init__
(
self
,
n
,
b
)
:
3
self
.
names
=
n
4
self
.
body
=
b
Interestingly, every invocation of a lambda causes its bound variables to be pushed onto the dynamic bindings stack. McCarthy only touches briefly on the idea that combining functions built from lambda is problematic. In almost a throw-away sentence he states, “different bound variables may be represented by the same symbol. This is called collision of bound variables.” If you take the time to explore core.lisp (described later herein) then you will see what this means in practice. The Elisp-like named variables should provide a hint.
Of course, the reason for these difficulties is a direct result of dynamic scoping. McCarthy suggests that a way to avoid these issues is to use point-free combinators to eliminate the need for variables entirely. This approach is a book unto itself – which is likely the reason that McCarthy skips it. That said,
1
def
push_bindings
(
self
,
context
,
values
)
:
2
context
.
push
()
3
self
.
set_bindings
(
context
,
values
)
The bindings are set one by one corresponding to the input values.
1
def
set_bindings
(
self
,
context
,
values
)
:
2
for
i
in
range
(
len
(
values
))
:
3
val
=
values
[
i
].
eval
(
context
.
environment
)
4
context
.
environment
.
binds
[
self
.
names
[
i
].
data
]
=
val
The evaluation of a Lambda
instance is not much more complicated than a built-in function, except that it will establish bindings in the root context. Additionally, the root context will hold all bindings, so free variables will also be in play.
1
def
eval
(
self
,
env
,
args
)
:
2
values
=
[
a
for
a
in
args
]
3
4
if
len
(
values
)
!=
len
(
self
.
names
)
:
5
raise
ValueError
(
"""Wrong number of args,
6
expected
{
0
},
got
{
1
}
"""
7
.
format
(
len
(
self
.
names
),
len
(
args
)))
OK, so I throw an exception here… sue me. It’s arg-arity checking, not rocket science – but I digress. Dynamic scope requires that names be bound on the global environment stack:
1
LITHP
=
env
.
get
(
"__lithp__"
)
First exceptions, now a big honking global variable. Not only that, but this is the global to usurp all other globals. That is, the dictionary stored at env.get("__lithp__")
contains all of the top-level bindings for the entire Lithp system. While horrifying in its abruptness, the global dictionary is a fairly straight-forward way to represent the same global dynamic environment found in the original Lisp implementation. That being the case and since I have a handle to the global stack, I just go ahead and push some bindings on to it:
1
self
.
push_bindings
(
LITHP
,
values
)
If you’re like me when reading this for the first time you might be thinking “OMG” or maybe “ZOMG.” That reaction is pure and wise and never lose that sense of danger. Having said that, it’s important to know that when Lisp was originally created no one knew about any of the SOLID principles (not even McCarthy) nor had Wulf and Shaw penned their perennial classic Global Variables Considered Harmful. Self-referential programming system had to be created by gum, and they weren’t going to sit around and write themselves (yet).
I’ve pushed all of the bindings onto the global dynamic stack, so that they’re accessible to the body of the lambda, so why not go ahead and use them? I will. In fact, each form in the body is evaluated one by one, and the last determines the return value:
1
ret
=
FALSE
2
for
form
in
self
.
body
:
3
ret
=
form
.
eval
(
LITHP
.
environment
)
The evaluation context for the lambda body is of course the global, dynamic context. Therefore, all applications occur at the discretion of the ever changing dynamic bindings. However, it’s not enough to just willy-nilly push bindings onto the dynamic context; I also must remove exhausted bindings when I’m done with them. In the case of the lambda, the exhausted bindings are simply those that were set as the result of passing values as arguments:
1
LITHP
.
pop
()
2
return
ret
And that effectively cleans up the lambda execution context.
Closures in brief
McCarthy’s Lisp does not define closures and in fact the presence of closures in the context of a pervasive dynamic scope is problematic. However, this fact was academic to me and didn’t really map conceptually to anything that I had experienced in the normal course of my programming life. Therefore, I added closures to see what would happen. It turns out that if you thought that bound variables caused issues then your head will explode to find out what closures do.
As a thought-experiment I can visualize why this is the case. That is, while a closure captures the contextual binding of a variable, look-ups in dynamic scoping occur on the dynamic stack. This means that you may be able to close over a variable as long as it’s unique, but the moment someone else defines a variable of the same name and attempt to look up the closed variable will resolve to the top-most binding on the dynamic stack. This assumes the the lookup occurs before the variable of the same name is popped.
Function dispatch in the context of a dynamic environment swings from elegant to, frankly, weird. The true elegance of Lisp is yet to come. One such piece of elegance is next in the idea of a “cons cell” and what I like to call “The Magnificent Seven.”
lisp.py
As mentioned earlier, the original Lisp described by McCarthy in his 1960 paper describes the following function set:
atom
car
cdr
cond
cons
eq
quote
Plus two special forms:
label
lambda
The Lisp
class defines the Magnificent Seven in terms of the run-time environment built thus far (i.e. dynamic scope, lambda, environments, and symbols, which are not detailed herein) and aggregated in a Lisp
class starting thus:
class Lisp:
The Magnificent Seven are thus implemented as methods on this class.
eq
Equality is delegated out to the objects being tested, so I will not discuss the mechanics here.
However, examples of usage are as follows:
1
(
eq
nil
(
quote
()))
2
;
=>
t
3
4
(
eq
(
quote
(
a
b
))
(
quote
(
a
b
)))
5
;
=>
t
6
7
(
eq
(
quote
a
)
(
quote
b
))
8
;
=>
()
The most expedient way to achieve this kind of equality was to simply delegate it to Python:
1
def
eq
(
self
,
env
,
args
)
:
2
if
args
[
0
].
eval
(
env
)
==
args
[
1
].
eval
(
env
)
:
3
return
TRUE
4
return
FALSE
Re-evaluating on each equality check is less than ideal, but I’m not writing this on an IBM 704, so what me worry? Sure, eq
is cool and all and while implementing programming languages equality is non-trivial I’m not going to say much on the subject here. However, I do have more to say on the matter of Lisp’s original data abstraction – the cons cell.
The humble cons cell
The original Lisp implementation was written for the IBM 704 by Steve Russell (a genius of the highest order – also the creator/discoverer of Spacewar! and continuations). The somewhat obtuse name for a function that returns the first element of an s-expression derives from the idiosyncrasies of the IBM 704 on which Lisp was first implemented. The car
function was thus a shortening of the term “Contents of the Address part of Register number” that in itself has a very interesting explanation. That is, car
was used to refer to the first half of the word-size addressed by the IBM 704. In this particular machine (and many others at that time and since) the word-size could address more than twice of the actual physical memory. Taking this particular nuance of the IBM 704 into account, programmers were able to efficiently create stacks by using the address of the stack’s top in one half-word and the negative of the allocated size in the other (the “Contents of Decrement part of Register number”), like so:
1
+----------+----------+
2
|
top
|
-
size
|
3
+----------+----------+
4
|
|
size
goes
toward
zero
5
|
|
|
6
|
|
|
7
|
|
v
8
|
|
|
|
9
4
|
|
|
|
10
|
V
|
|
11
3
|
elem3
|
|
12
|
|
|
^
13
2
|
elem2
|
|
|
14
|
|
|
|
15
1
|
elem1
|<-----+
stack
grows
up
16
|
|
17
0
|
elem0
|
18
+--------+
Whenever something was pushed onto the stack the number 1
was added to both half-words. If the decrement part of the word became zero then that signaled a stack-overflow, that was checked on each push or pop instruction. However, the use of the car/cdr half-words was used quite differently (McCarthy 1962). That is, The contents part contained a pointer to the memory location of the actual cons cell (see the documentation for the next function cdr
for more information) element, and the decrement part contained a pointer to the next cell:
1
+----------+----------+
+----------+----------+
2
|
car
|
cdr
|--->|
car
|
cdr
|
...
3
+----------+----------+
+----------+----------+
The Lisp garbage collector used this structure to facilitate garbage collection by marking referenced chains of cells as negative (sign bit), thus causing them to be ignored when performing memory reclamation.
car
That said, the car
function works as follows:
1
(
car
(
quote
(
a
b
c
)))
2
;
=>
a
In Lithp, the car of an empty list is an error. I actually have no idea if this was the case in McCarthy’s original, but I’m beyond corrupted by modern software development practices that I can’t even see straight anymore:
1
def
car
(
self
,
env
,
args
)
:
2
if
(
len
(
args
)
>
1
)
:
3
raise
ValueError
(
"""Wrong number of arguments,
4
expected
{
0
},
got
{
1
}
"""
5
.
format
(
1
,
len
(
args
)))
Of course, I don’t use pointer arithmetic to implement cons cells…
1
cell
=
args
[
0
].
eval
(
env
)
…instead I define it in terms of a “sequence abstraction” (not shown herein). This is a side-effect of a hope in going further with this implementation (e.g. into Linear Lisp), but as of now it’s a bit heavy-weight for what is actually needed.
Interestingly, there are real implications to my design decision over McCarthy’s. That is, the Lithp sequences delegate insertion to the Python List
implementation which internally is built on arrays (CPython anyway). The arrays backing the List
s are geometrically re-sized when new elements are added and thanks to some simple pointer arithmetic elements can be added to the end of a list simply. Therefore, inserting new elements to the head of a Python List
is an amortized O(1) operation. However, McCarthy’s cons cell design and implementation guaranteed that insertions at the head of a list were O(1) operations. That is, the very nature of a cons cell was such that it was composed of two parts: a data part (CAR) and a link part (CDR) which were just pointers. As I hinted at earlier, the registers of the IBM 704 were twice the word size, so both the CAR and CDR parts fit into one register, split into “address” and “decrement” parts as mentioned above.
The data part was the thing actually contained in the cell and the link part pointed to (potentially) another cons cell. Therefore, to add an element to the head of a list required a cons cell allocation and a link manipulation that made it point to the head element of the host list. However, in the Lithp implementation adding to the head of a List not only requires an occasional array size adjustment, but it also requires that elements in the existing array be shifted down the array one at a time to make room for the new element – an O(n) operation.
But I wouldn’t be a programmer if I didn’t needlessly abstract:
1
if
not
isinstance
(
cell
,
Seq
)
:
2
raise
ValueError
(
"""Function not valid on
3
non
-
sequence
type
.
""")
4
5
return
cell
.
car
()
The internal sequence type used by Lithp is a useful tool, but it loses much of the elegance found in the original cons cell abstraction.
cdr
In the previous function definition (car
) I used the term cons-cell to describe the primitive structure underlying a Lisp list. If you allow me, let me spend a few moments describing this elegant structure, and why it’s such an important abstract data type (ADT).
Lisp from the beginning was built with the philosophy that lists should be a first-class citizen of the language; not only in the realm of execution, but also generation and manipulation. If you look at my implementation of List
in seq.py you’ll notice that it’s pretty standard fare. That is, like most list implementations, Lithp’s is backed by a boring sequential store where one element conceptually points to the next and blah blah blah. Boring. Where the cons-cell shines is that it is a very general purpose ADT that can be used in a number of ways, but primary among them is the ability to represent the list.
Lists in the early Lisp was precisely a chain of cons cells and the operators car
and cdr
pointed to very specific implementation details that over time became generalized to mean “the first thing” and “the rest of the things” respectively. But the fact remains that the cons cell solves a problem that is often difficult to do properly. That is, how could Lisp represent a container that solved a number of requirements:
- Represents a list - car is piece of data, cdr is another cons cell
- Represents a pair - car is a piece of data, cdr is a piece of data
- Represents a terminator - car is the cell itself, cdr is the cell itself
- Implementation efficiency - O(1) adds to the front
- Heterogeneous - datum is wrapped in a generic container type
It would be interesting to learn the precise genesis of the idea behind the cons cell, as I imagine that it must have provoked a eureka moment.
I’ve already discussed how the IBM 704 hardware was especially amenable to solving this problem efficiently, but the other points bear further consideration. Lisp popularly stands for “LISt Processing language” but as I explained, the basic unit of data was instead the cons cell structure. The fact of the matter is that the cons cell serves as both the implementation detail for lists and the abstraction of a pair, all named oddly as if the implementation mattered. If Lisp had originally gone whole-hog into the abstraction game, then car
and cdr
would have been first
and rest
and would have spared the world decades of whining.
Modern implementations of Lisp rarely implement lists as chains of cons cells anymore. Indeed, you can probably tell a lot about the level of knowledge for a Lisp programmer by the way that they construct and access lists. For example, a programmer like myself whose exposure to Common Lisp has been entirely academic, you will probably see a propensity toward the use of car
and cdr
instead of leveraging the more expressive sequence abstractions – but I digress.
The cdr
function works as follows:
1
(
cdr
(
quote
(
a
b
c
)))
;⇒ (b c)
1
(
cdr
(
quote
()))
;⇒ ()
The implementation of cdr
is as follows:
1
def
cdr
(
self
,
env
,
args
)
:
2
if
(
len
(
args
)
>
1
)
:
3
raise
ValueError
(
"""Wrong number of arguments,
4
expected
{
0
},
got
{
1
}
"""
5
.
format
(
1
,
len
(
args
)))
6
7
cell
=
args
[
0
].
eval
(
env
)
8
9
if
not
isinstance
(
cell
,
Seq
)
:
10
raise
ValueError
(
"""Function not valid on
11
non
-
sequence
type
.
""")
12
13
return
cell
.
cdr
()
There’s a lot of checking in this implementation, but the gist is that I eval the cons cell and take the tail of it.
As I mentioned in the previous section, by using the sequence type as the base abstraction I lost much of the elegance in the original cons cell design. The cdr
implementation makes this loss more clear. That is, while taking the cdr of a traditional cons cell is simply a matter of returning a new pointer to the link part of a cell, the Lithp implementation requires that a new List
instance be created with a copy of the contents of the original, minus the first element.
cons
So if Common Lisp has a more general sequence abstraction, then why would we still want to keep the cons cell? The reason is that the cons cell is more flexible than a sequence and allows for a more intuitive way to build things like trees, pairs, and to represent code structure. Indeed, it’s the last point that allows the fulfillment of one of John McCarthy’s chief characteristics of what defines Lispiness:
One of the major characteristics of Lisp is that Lisp programs are Lisp data and one can compute with Lisp programs present in the machine without parsing.
– JMC, 1998.02.08 on comp.emacs and comp.emacs.xemacs
Indeed, later in this installment I’ll build Lithp (and its hosted language Lithp’) programs directly using simple Lithp data functions and structures.
For now, the cons
function simply delegates the matter of consing to the target object.
The cons
function works as follows:
1
(
cons
(
quote
a
)
nil
)
2
;
=>
(
a
)
3
4
(
cons
(
quote
a
)
(
quote
(
b
c
)))
5
;
=>
(
a
b
c
)
6
7
(
cons
(
quote
a
)
(
quote
b
))
8
;
=>
Error
I’ve agonized long and hard over whether or not to implement McCarthy Lisp as the language described in Recursive functions… as the anecdotal version only partially described in the LISP 1.5 Programmer’s Manual and in most cases the former was my choice. The creation of “dotted pairs” (I believe) was not an aspect of the original description and therefore is not represented in Lithp. Sadly, I think that in some cases these version are mixed because I originally went down the path of creating a version of Lithp compatible with Linear Lisp and Lisp 1.5, so this is a product of some pollution in the varying ideas. That said, the implementation of cons
is as follows:
1
def
cons
(
self
,
env
,
args
)
:
2
if
(
len
(
args
)
>
2
)
:
3
raise
ValueError
(
"""Wrong number of arguments,
4
expected
{
0
},
got
{
1
}
"""
5
.
format
(
2
,
len
(
args
)))
6
7
first
=
args
[
0
].
eval
(
env
)
8
second
=
args
[
1
].
eval
(
env
)
9
10
return
second
.
cons
(
first
)
The eager nature of cons
is apparent in that both the added element and the target are evaluated before the operation is performed.
The cons cell is an extremely powerful ADT, but it’s not the only kind of data type that was available in the original Lisp. While its supported structures were humble, they were added to support a certain paradigm: symbolic programming.
atom
In addition to the aggregating cons cell, Lisp provided additional atomic types that were singularly representative; that is, they could not be taken apart. The bridge between these two data classes, aggregating and atomic, was the atom
function that was used to identify atomic types.
The atom
function checks if its argument is an atom and returns a truthy value if it is. One thing to note is that the empty list ()
is considered an atom because it cannot be deconstructed further.
The atom
function works as follows:
1
(
atom
(
quote
a
))
2
;
=>
t
3
4
(
atom
nil
)
5
;
=>
t
6
7
(
atom
(
quote
(
a
b
c
)))
8
;
=>
()
Note that the null cons cell (i.e. empty list) is a falsey value in Lithp and the original Lisp.
1
def
atom
(
self
,
env
,
args
)
:
2
if
(
len
(
args
)
>
1
)
:
3
raise
ValueError
(
"""Wrong number of arguments,
4
expected
{
0
},
got
{
1
}
"""
5
.
format
(
1
,
len
(
args
)))
6
7
first
=
args
[
0
].
eval
(
env
)
8
9
if
first
==
FALSE
:
10
return
TRUE
11
elif
isinstance
(
first
,
Symbol
)
:
12
return
TRUE
13
14
return
FALSE
Again, atom
is another eager function in that it must evaluate its argument to determine if it’s an atomic data structure. However, the last of the Magnificent Seven, label
disables evaluation in a way that’s in the spirit of cond
, but that operates towards a very different purpose. But first, I’ll cover a function called quote
that provides a way to avoid evaluation altogether.
quote
The quote
built-in does one thing – it returns exactly what was given to it without evaluation:
1
(
quote
a
)
2
;
=>
a
3
4
(
quote
(
car
(
quote
(
a
b
c
))))
5
;
=>
(
car
(
quote
(
a
b
c
)))
Of course, the result of quote
can itself be evaluated explicitly:
1
(
eval
(
quote
(
car
(
quote
(
a
b
c
))))
(
quote
()))
2
;
=>
a
The implementation of quote
couldn’t be more simple:
1
def
quote
(
self
,
env
,
args
)
:
2
return
args
[
0
]
The quote
, function, while interesting in its own right, is mechanical to implement and directly supported by the Lithp infrastructure built so far. However, a more interesting … thing … I’ll not call it a function, is called label
.
label
Notice that in the following implementation of label
the first argument (hopefully a symbol) is not evaluated. This is the key difference between a Lisp function and a special form (and macro, but I will not talk about those here). That is, in all cases the arguments to a function are evaluated from left to right before being passed into the function. Conversely, special forms have special semantics for evaluation that cannot be directly emulated or implemented using functions.
1
def
label
(
self
,
env
,
args
)
:
2
if
(
len
(
args
)
!=
2
)
:
3
raise
ValueError
(
"""Wrong number of arguments,
4
expected
{
0
},
got
{
1
}
"""
5
.
format
(
2
,
len
(
args
)))
6
7
env
.
set
(
args
[
0
].
data
,
args
[
1
].
eval
(
env
))
8
return
env
.
get
(
args
[
0
].
data
)
Finally, the last of the Magnificent Seven is another lazy operator named cond
.8
cond
Did you know that McCarthy discovered conditionals? This is only partially true. That is, Stephen Kleene defined the notion of a primitive recursive function and McCarthy built on that by defining the conditional as a way to simplify the definition of recursive functions.
How would you define a recursive function without the use of a conditional in the terminating condition? It turns out that you can define recursive functions this way (see fixed point combinators), but the use of the conditional form cond
vastly simplifies the matter.
We take conditionals for granted these days so it’s difficult to imagine writing programs that were not able to use them, or used a subset of their functionality.
The cond
form is used as follows in the original Lisp:
1
(
cond
((
atom
(
quote
(
a
b
)))
(
quote
foo
))
2
((
atom
(
quote
a
))
(
quote
bar
))
3
(
t
(
quote
baz
)))
4
5
;
=>
bar
The implementation is as follows:
1
def
cond
(
self
,
env
,
args
)
:
2
for
test
in
args
:
3
result
=
test
.
car
().
eval
(
env
)
4
5
if
result
==
TRUE
:
6
return
test
.
data
[
1
].
eval
(
env
)
7
8
return
FALSE
Each head (car
) of the cond
clauses is tested for truthiness, and as soon as one is such the tail of the corresponding clause is evaluated. Note that the TRUE
and FALSE
values are set to Symbol("t")
and List()
within Lithp. As you might surmise then, the cond
form is lazy in its evaluation. This kind of lazy evaluation is not found in the original λ-calculus, so this is a major departure by McCarthy. You see, even in the early days of computing there were thoughtful trade-offs between practicality and abstractness… though, now that I think about it, McCarthy was operating in a purely academic environment when developing Lisp, so it’s safe to say that the original was in no way appropriate for modern web apps.
In any case, you might have noticed that I skipped over one of the maginificent seven, specifically the lambda
special form. While I talked about the underlying class Lambda
, there’s a little more work to properly create instances of it, which I’ll talk about in the next section.
Lithpiness
In this the section of my Lisp implementation I’ll cover various bits and pieces that pull all of the core pieces together into an actual language run-time. That is, this section describes briefly the implementation of the REPL and the lambda
special form.
First I’ll start with the Lithp
class, which is the interpreter driver, performing the following:
- Initialize the global environment
- Parse the cl arguments and act on them as appropriate
- Initialize the base Lisp functions
- Read input
- Evaluate
- Loop back to #4
Numbers 4-6 above are the steps involved in the implementation of the Lithp REPL – Read, Eval, Print, Loop. First, allow me to talk about the act of reading.
Reader
The implementation of the Lithp Reader
class is not exactly thrilling. You have to really like the art of lexing to find that kind of stuff interesting. I wrote the thing and I’m only mildly interested in it. What I’m trying to say is that it’s a mess. However, like most Lisp readers, the Lithp reader is much more interesting in its behavior. Within the Lithp interpreter, the reader is created as such:
1
rdr
=
reader
.
Reader
()
2
environment
=
Environment
()
The environment
variable is a stand-in for the Lithp global environment and is merely used for illustrating evaluation principles herein. The first principle that I’d like to discuss is that the Reader
class takes in strings and returns Lithp data structures:
1
sexpr
=
rdr
.
get_sexpr
(
"a"
)
2
sexpr
.
__class__
3
#=> <class atom.Symbol at 0x10b7bd050>
The code above shows that the string "a"
will be read as a Lithp Symbol
instance. This is nothing terribly ground breaking in the world of Lisp, but there is a point that I’d like to make about it. Very often you’ll hear or read that this language or that language has a REPL; however, very few languages have REPLs where the R stands for a reader such as what you’ll find in Lisp. Lisp-like readers take in a textual representation of a piece of code and return a Lisp data structure. What’s so interesting about that. Indeed, if you run something like eval(":foo")
in Ruby you’ll get back a symbol instance correct? The difference between Lisp and many other languages is that the act of converting strings to types is separate from the act of evaluating them.
In many languages there is no reading step at all. Instead, languages tend to have lexing and parsing phases that output special types used only by the machinery of the lexer and parser themselves. Very rarely will you see a programming language that allows access to the lexing and parsing machinery. In fact, even if they did it’s unlikely that the structures returned would be ones that you could manipulate within the context of the language themselves. Most Lisps on the other hand use core structures like symbols, cons cells, etc. as the input to the parsing and/or evaluation phases. This is a key point to understand because it means that if you wish to write programs that manipulate Lisp programs all that you need is Lisp. This fact enables the power of Lisp macro systems. That is, the macro processing system is not hidden from view nor does it provide manipulation via a special subset or super-set of the host language, it’s just Lisp.
Keeping on the path of discussing the REPL, I’ll move on to the Evaluation part. However, before I do that it’s worth noting that all evaluation, for any language really, takes place in terms of a context. In Lithp the evaluation context is the global, dynamic environment, simulated below with environment
:
1
environment
.
set
(
"a"
,
atom
.
Symbol
(
"value"
))
2
environment
.
get
(
atom
.
Symbol
(
"a"
))
3
#=> value
The code above simply sets the value of the string "a"
to a symbol instance corresponding to value
. This establishes a primed context that will allow me to evaluate sexpr
from above:
1
sexpr
.
eval
(
environment
)
2
#=> value
As shown above, evaluating sexpr
in the context of the environment
returns the value that the symbol a
(as read earlier) returns value
. This is the base case in Lithp, but structures can be read just as easy:
1
sexpr
=
rdr
.
get_sexpr
(
"(car (quote (one two three)))"
)
The data type returned from the reader above is thus:
1
sexpr
.
__class__
2
#=> <class seq.List at 0x10b75bef0>
The type returned corresponds to Lithp’s notion of the cons-cell, List
, which if you take a look you’ll notice is not a cons-cell in the strictest sense. I did this not only to leverage the Python language capabilities, but I also did it to illustrate a point later on when I talk about the implementation of core.lisp. That said, it will act like a cons-cell for the purposes of implementation:
1
sexpr
.
car
()
2
#=> car
The car
of the s-expression above is the symbol car
:
1
sexpr
.
car
().
__class__
2
#=> <class atom.Symbol at 0x10b7bd050>
The cdr
of the same s-expression is a little more interesting:
1
sexpr
.
cdr
()
2
#=> ((quote (one two three)))
The cdr
is of course another List
instance representing the “rest” of the original s-expression:
1
sexpr
.
cdr
().
__class__
2
#=> <class seq.List at 0x10b75bef0>
So what happens if I try to evaluate the read s-expression? Observe:
1
sexpr
.
eval
(
environment
)
2
# ValueError: Invalid symbol car
As I mentioned, all evaluations take place within the context of an environment. The environment
instance created above currently only has a single binding a
->value
which is insufficient for Lithp to make sense of the s-expression supplied. As shown, it’s complaining about the car
symbol which means that there is no binding for it. In the absence of a populated environment, the core Lithp interpreter only knows a few things:
- Evaluation semantics (i.e. bindings lookup and argument evaluation)
- Cons-cell structure
- …and that’s it
To get any power out of Lithp requires that the evaluation context be populated with important bindings:
1
environment
.
set
(
"car"
,
Function
(
lithp
.
car
))
The lithp.car
method is the base implementation of the Lispy car
function. It’s wrapped in an instance of the Function
class which takes care of the binding lookup and evaluation logic. The Lithp REPL, during start-up, will load the global environment with the Magnificent Seven function and the two special forms. Of course, simply loading lithp.car
is not enough:
1
sexpr
.
eval
(
environment
)
2
# ValueError: Invalid symbol quote
I also need the quote
binding:
1
environment
.
set
(
"quote"
,
Function
(
lithp
.
quote
))
Now, I have a minimal environment needed to evaluate the read s-expression:
1
sexpr
.
eval
(
environment
)
2
#=> one
The Lithp interpreter, will fully populate the core environment as such:
1
self
.
environment
.
set
(
"eq"
,
Function
(
self
.
eq
))
2
self
.
environment
.
set
(
"quote"
,
Function
(
self
.
quote
))
3
self
.
environment
.
set
(
"car"
,
Function
(
self
.
car
))
4
self
.
environment
.
set
(
"cdr"
,
Function
(
self
.
cdr
))
5
self
.
environment
.
set
(
"cons"
,
Function
(
self
.
cons
))
6
self
.
environment
.
set
(
"atom"
,
Function
(
self
.
atom
))
7
self
.
environment
.
set
(
"cond"
,
Function
(
self
.
cond
))
8
9
self
.
environment
.
set
(
"lambda"
,
Function
(
self
.
lambda_
))
10
self
.
environment
.
set
(
"label"
,
Function
(
self
.
label
))
The rest of the REPL implementation is less interesting and mostly has to do with loading the core.lisp file, looping until full s-expressions are read, delegating evaluation to data instances, and the like. Now that a fully operational Lithp core is in place, I can now start showing the implementation of the core.lisp file, which contains some other interesting functions, including a Lisp interpreter!
Now the code described herein is not a lot of code at all. Indeed, Python is a nice language to serve as a host for a language like Lithp because it provides the essentials for hosting functionality directly in its core. It was a pleasure to work in Python, a language that I used in anger at the time, to discover for myself the stark simplicity of Lisp, a language that I would later use in my day job… so to speak.
The following core.lisp file will, unlike Lithp on Python, be implemented on Lithp which is also a language, like Python, that provides the essentials for hosting functionality directly in its core. The interpreter will almost exact implement McCarthy’s original as described below:
1
eval
[
e
;
a
]
=
[
2
atom
[
e
]
→
assoc
[
e
;
a
];
3
atom
[
car
[
e
]]
→
[
4
eq
[
car
[
e
];
QUOTE
]
→
cadr
[
e
];
5
eq
[
car
[
e
];
ATOM
]
→
atom
[
eval
[
cadr
[
e
];
a
]];
6
eq
[
car
[
e
];
EQ
]
→
[
eval
[
cadr
[
e
];
a
]
=
eval
[
caddr
[
e
];
a
]];
7
eq
[
car
[
e
];
COND
]
→
evcon
[
cdr
[
e
];
a
];
8
eq
[
car
[
e
];
CAR
]
→
car
[
eval
[
cadr
[
e
];
a
]];
9
eq
[
car
[
e
];
CDR
]
→
cdr
[
eval
[
cadr
[
e
];
a
]];
10
eq
[
car
[
e
];
CONS
]
→
cons
[
eval
[
cadr
[
e
];
a
];
11
eval
[
caddr
[
e
];
a
]];
12
T
→
eval
[
cons
[
assoc
[
car
[
e
];
a
];
evlis
[
cdr
[
e
];
a
]];
a
]];
13
eq
[
caar
[
e
];
LABEL
]
→
eval
[
cons
[
caddar
[
e
];
cdr
[
e
]];
14
cons
[
list
[
cadar
[
e
];
car
[
e
];
a
]];
15
eq
[
caar
[
e
];
LAMBDA
]
→
eval
[
caddar
[
e
];
16
append
[
pair
[
cadar
[
e
];
evlis
[
cdr
[
e
];
a
];
a
]]]
So on with it then.
core.lisp
With only the Magnificent Seven and the two special forms, one can implement many useful functions and values. Two interesting values are that of logical true and false. Historically, these values were bound to the identity t
and nil
as the empty list ()
respectively, which I do at the top of core.lisp:
1
(
label
t
(
quote
t
))
2
(
label
nil
(
quote
()))
Checking these values you’ll see:
1
t
2
;;
=>
t
3
4
nil
5
;;
=>
()
Now that those symbols are bound, I can use them in functions as truthiness. One such function is null
that takes something and return t
or ()
if it’s an empty list:
1
(
label
null
(
lambda
(
null_x
)
2
(
eq
null_x
nil
)))
The implementation of null
is really where the lollipop that is Lithp starts licking itself. That is, not only is it built on the core capabilities label
, lambda
, and eq
, but it also uses the previously defined nil
. This is nothing truly magical, but it hints at something, dare I say, breathtaking – Lithp itself can implement itself – I’ll call it Lithp’. Surely something like that will take just as many lines as the Python implementation right? Well, let’s see.
The Lithp’ dynamic environment
The first thing that Lithp’ needs is an environment. I started by describing the Lithp environment, so I might as well start with the same in Lithp’. However, to implement environments in Lithp I was able to leverage the host language Python’s classes to do so. However, since Lithp’ is built on Lithp there are no such language constructs as classes and methods and the like. Instead, all that Lithp has at its disposal are cons cells. Thankfully, and as I mentioned earlier, the cons cell has incredibly robust abstractive properties. Indeed, using the cons cell idea as the basis for a tuple structure is fairly straight-forwardly implemented as such:
1
(
label
pair
(
lambda
(
x
y
)
2
(
cons
x
(
cons
y
nil
))
That is, a tuple in Lithp’ is just a pair of cons operations with a tail corresponding to nil
.9 These pairs can be used as the basis for an environment where its entries consist of symbols bound to values:
1
(
pair
(
quote
a
)
(
quote
value
))
2
;;
=>
(
a
value
)
Just based on this, we can build up an associative structure that maintains the spirit of the Environment
class:
1
(
label
env
nil
)
2
;;
=>
()
3
4
(
cons
(
pair
(
quote
a
)
(
quote
value
))
env
)
5
;;
=>
((
a
value
))
Do you see an environment in there? If not, maybe showing how cons
ing to the return of cons will help to illustrate:
1
(
cons
(
pair
(
quote
b
)
(
quote
valueb
))
2
(
cons
(
pair
(
quote
a
)
(
quote
value
))
env
))
3
4
;;
=>
((
b
valueb
)
(
a
value
))
The value above represents an environment where there are two bindings:
- The symbol
a
is bound tovalue
- The symbol
b
is bound tovalueb
If I want to find the value of a binding all I would need to do is write a function that performs a lookup by walking the structure above and returns the first value bound to a given symbol. Returning the first found binding is critical because I want to emulate the same evaluation semantics as Lithp. That is, Lithp’s implementation is such that bindings can override older bindings via the Environment.parent
chain. I can do the same by dictating that newer bindings appear closer to the front of the Lithp’ environment list:
1
(
label
env
+
2
(
cons
(
pair
(
quote
a
)
(
quote
new
-
a
-
value
))
3
(
cons
(
pair
(
quote
b
)
(
quote
valueb
))
4
(
cons
(
pair
(
quote
a
)
(
quote
value
))
env
))))
5
6
;;
=>
((
a
new
-
a
-
value
)
(
b
valueb
)
(
a
value
))
This structure is in every way a stack! To operate on this stack would be as easy as using cons
as a push operation and cdr
as a popping operation, which I’ll do later. However, what about that “lookup” function that I mentioned earlier? Observe:
1
(
label
lookup
(
lambda
(
name
context
)
2
(
cond
((
eq
(
car
(
car
context
))
name
)
(
car
(
cdr
(
car
context
))))
3
(
t
(
lookup
name
(
cdr
context
))))))
The first part of the cond
in the lookup
function merely says that if the binding name at the head of the first pair (i.e. the car
of the car
) equals the name
then return the binding value in that first pair (i.e. the car
of the cdr
of the car
). The default case of the cond
simply says to pop (i.e. cdr
) the first binding pair and retry the lookup with the new environment. You can see lookup
in action below:
1
(
lookup
(
quote
b
)
env
+
)
2
;;
=>
valueb
3
4
(
lookup
(
quote
a
)
env
+
)
5
;;
=>
new
-
a
-
value
Indeed, the stack nature of the dynamic lookup of Lithp is preserved by Lithp’.
In case you were wondering, all of these operations are non-destructive:
1
env
2
;;
=>
()
That is, the value of env
was never changed neither as a result of cons
ing new stuff, nor by taking the cdr
. I only highlight this point because it will become important in the implementation of the Lithp’ evaluation function. That is, when you’re dealing with immutable values, you’ll need ways to manage state changes. One straight-forward way available to Lithp is the label
special form, but a better way is via the call stack. If you’ll notice, in the implementation of lookup
I used the call stack to manage the decreasingly smaller versions of the context
as binding pairs were popped off in search of a name match.
However, for the moment I can use pair
to build up a seed environment for quick use:
1
(
label
env
'
(
pair
(
pair
(
quote
t
)
(
quote
t
))
2
(
pair
(
quote
nil
)
nil
)))
3
4
;;
=>
((
t
t
)
(
nil
()))
That is, just like Lithp, Lithp’ can start with the true and false values. Now that the environment is out of the way, I can start implementing the Lithp’ versions of the Magnificent Seven.
The Magnificent Seven’
Just as I did with Lithp, so too shall I implement the Magnificent Seven in Lithp’; it seems like a good fit. Therefore, I’ll now perform a round-about and once again talk about how I implemented the core functions on top of the Lithp interpreter.
quote’
Perhaps the simplest way to illustrate the modus operandi of the Lithp’ implementation is to show how a function quote'
works. The implementation is as follows:
1
(
label
quote
'
(
lambda
(
qexpr
)
2
(
car
(
cdr
qexpr
))))
That is, quote'
is just another function. Like Lithp, the implementation of Lithp’ will rely on functions that exist in the host language. However, while Lithp’s host was Python, the implementation of Lithp’ will rely on Lithp as its host. The function quote'
above expects an s-expression of the form (quote foo)
as its input and will return the second element of the expression outright, that is, without evaluation. Actually, the quote'
function doesn’t care one bit if the first element of the s-expression is the symbol quote
. Instead, I’ll leave it up to another piece of code to decide when a form should dispatch into quote'
. In any case, quote'
works as follows:
1
(
quote
'
(
pair
(
quote
quote
)
(
quote
a
)))
2
;;
=>
a
I’m using the previously defined function pair
to build up an s-expression of the form (quote a)
as below:
1
(
pair
(
quote
quote
)
(
quote
a
))
2
;;
=>
(
quote
a
)
So that’s pretty simple no? What about a more complicated function?
atom’
The quote'
function, like Lithp’s quote
function is the only one that never evaluates its argument. Because of this fact I decided to show its implementation first. However, every other Lithp’ function will need access to an evaluation function to operate correctly. The function will be named simply eval
and will operate, in principle, as follows:
1
(
eval
s
-
expression
bindings
)
2
;;
=>
result
However, I’m going to hold-off writing eval
until the end. Instead, I’ll just assume that it’s defined and call it inside of the Lithp’ implementation functions as needed. Therefore, the implementation of atom'
is as follows:
1
(
label
atom
'
(
lambda
(
aexpr
bindings
)
2
(
atom
(
eval
(
car
(
cdr
aexpr
))
bindings
))))
Once eval
is in place, atom'
will work as follows:
1
(
atom
'
(
pair
(
quote
atom
)
(
pair
(
quote
quote
)
(
quote
a
)))
env
'
)
2
;;
=>
t
3
4
(
atom
'
(
pair
(
quote
atom
)
(
quote
t
))
env
'
)
5
;;
=>
t
6
7
(
atom
'
(
pair
(
quote
atom
)
(
pair
(
quote
quote
)
(
quote
(
a
b
c
))))
env
'
)
8
;;
=>
()
The examples above are a bit more complicated than for quote'
, but the principle of building a valid Lithp’ data structure representing a program is the same:
1
(
pair
(
quote
atom
)
(
pair
(
quote
quote
)
(
quote
a
)))
2
;;
=>
(
atom
(
quote
a
))
The implementation of the next few Lithp’ implementation functions will be similar.
eq’
The Lithp’ equality operator will delegate down to the Lithp eq
function after evaluating its two arguments:
1
(
label
eq
'
(
lambda
(
eexpr
ebinds
)
2
(
eq
(
eval
(
car
(
cdr
eexpr
))
ebinds
)
3
(
eval
(
car
(
cdr
(
cdr
eexpr
)))
ebinds
))))
The eq'
takes the whole form, so I have to throw away the first operator. There is no particular reason for doing it this way except that I will eventually place the delegation logic into eval
function, separate from the handlers. Therefore, the form that comes into the eq'
function can be built up programmatically for the purpose of testing:
1
(
cons
(
quote
eq
)
2
(
pair
3
(
pair
(
quote
quote
)
(
quote
a
))
4
(
pair
(
quote
quote
)
(
quote
a
))))
5
6
;;
=>
(
eq
(
quote
a
)
(
quote
a
))
That is, the value returned is a Lithp list representing the code form handled by eq'
. I could have just used (quote (eq (quote a) (quote a)))
instead, and will hereafter, but decided to do the first couple of examples by hand (so to speak) to demonstrate that the input to Lithp’ truly is a data structure. Running the above against eq'
yields:
1
(
eq
'
(
cons
(
quote
eq
)
2
(
pair
3
(
pair
(
quote
quote
)
(
quote
a
))
4
(
pair
(
quote
quote
)
(
quote
a
))))
5
env
'
)
6
7
;;
=>
t
Clearly, the symbol a
is equal to another symbol a
. The results of a failed check is thus:
1
(
eq
'
(
cons
(
quote
eq
)
2
(
pair
3
(
pair
(
quote
quote
)
(
quote
a
))
4
(
pair
(
quote
quote
)
(
quote
zzzzz
))))
5
env
'
)
6
7
;;
=>
()
Now bear in mind that the t
and ()
returned from the calls to eq'
are those that exist in the context of Lithp and not Lithp’. That is, even though the bindings created in env'
are the same values as the Lithp t
and ()
, the return values of eq'
are not retrieved through those bindings. This will not be a problem for the sake of correctness. Instead, I only point it out because I’ve not yet changed the semantics between Lithp and Lithp’, but I will later.
car’ / cdr’
Like eq'
, but car'
and cdr'
delegate down to Lithp as I see no need in changing their semantics either:
1
(
label
car
'
(
lambda
(
caexpr
cabinds
)
2
(
car
(
eval
(
car
(
cdr
caexpr
))
cabinds
))))
3
4
(
label
cdr
'
(
lambda
(
cdexpr
cdbinds
)
5
(
cdr
(
eval
(
car
(
cdr
cdexpr
))
cdbinds
))))
Like with eq'
, I can test these using lists, but this time I’ll use the Lithp quote
function:
1
(
car
'
(
quote
(
car
(
quote
(
a
b
c
))))
env
'
)
2
;;
=>
a
3
4
(
cdr
'
(
quote
(
car
(
quote
(
a
b
c
))))
env
'
)
5
;;
=>
(
b
c
)
These are both straight-forward, as is cons'
, shown next.
cons’
Once again, since the semantics of Lithp and Lithp’ cons
is the same, I just delegate down:
1
(
label
cons
'
(
lambda
(
coexpr
cobinds
)
2
(
cons
(
eval
(
car
(
cdr
coexpr
))
cobinds
)
3
(
eval
(
car
(
cdr
(
cdr
coexpr
)))
cobinds
))))
And test directly:
1
(
cons
'
(
quote
(
cons
(
quote
a
)
(
quote
(
b
c
))))
env
'
)
2
3
;;
=>
(
a
b
c
)
So far, the implementation of Lithp’ has been weak-sauce, but with cond'
shown next, it gets a little more interesting.
cond’
Up until this point, quote'
was the only Lithp’ function that didn’t evaluate its arguments. However, cond'
may or may not evaluate its arguments, depending on the results of the conditional checks. The base implementation of cond'
is trivial:
1
(
label
cond
'
(
lambda
(
cndexpr
cndbinds
)
2
(
eval
-
cond
(
cdr
cndexpr
)
cndbinds
)))
That is, it simply delegates its behavior to the eval-cond
function which is a cheap way of doing some piece-wise transformation, much like I’d do with a let
. The eval-cond
function is tasked with pulling apart the Lithp’ cond
form and checking each condition in turn until it finds one that resolves to a truthy value:
1
(
label
eval
-
cond
(
lambda
(
condition
condbinds
)
2
(
cond
((
eval
(
car
(
car
condition
))
condbinds
)
3
(
eval
(
car
(
cdr
(
car
condition
)))
condbinds
))
4
(
t
(
eval
-
cond
(
cdr
condition
)
condbinds
)))))
The eval-bind
function is the first recursive Lithp function that I’ve shown. It’s not terribly complicated, in fact it just walks through a list of lists where each inner list contains two elements, a condition part and a consequent part:
1
(
label
cond
-
ex
(
quote
(((
eq
a
(
quote
(
x
y
z
)))
(
quote
first
-
branch
))
2
((
eq
a
nil
)
(
quote
second
-
branch
))
3
(
t
(
quote
default
-
case
)))))
Using the form above, I can exercise cond'
by changing the binding of the symbol a
, which serves as the basis for two of the three cond
conditions:
1
(
cons
(
pair
(
quote
a
)
(
quote
nothing
))
env
'
)
2
;;
=>
((
a
nothing
)
(
t
t
)
(
nil
()))
Binding a
to the symbol nothing
should cause the cond
to fall through to the default case:
1
(
eval
-
cond
cond
-
ex
(
cons
(
pair
(
quote
a
)
(
quote
nothing
))
env
'
))
2
;
=>
default
-
case
Likewise, I can trigger the first branch by binding a
to something else:
1
(
eval
-
cond
cond
-
ex
(
cons
(
pair
(
quote
a
)
(
quote
(
x
y
z
)))
env
'
))
2
;
=>
first
-
branch
Finally, the second branch is triggered via another binding:
1
(
eval
-
cond
cond
-
ex
(
cons
(
pair
(
quote
a
)
(
quote
()))
env
'
))
2
;
=>
second
-
branch
As you can see, even though the body of the cond-ex
uses nil
in the second branch check, Lithp’ was able to resolve the binding in the passed environment.
Now that cond
is taken care of, I can move on to the pinnacle of the Lithp’ implementation, eval
.
eval
The eval
function is designed to take two arguments: a Lithp’ expression and a context in which to evaluate it:
1
(
label
eval
(
lambda
(
expr
binds
)
The first part of eval
checks if an expression is an atomic value such as a symbol t
or nil
or some arbitrary name:
1
(
cond
2
((
atom
expr
)
(
lookup
expr
binds
))
If the expression is an atomic value then its evaluation is just whatever its binding happens to be in the supplied context. However, in the case where the expression itself is not an atom, it must be a list, in which case I just look at the first element:
1
((
atom
(
car
expr
))
In Lithp’, just as in Lithp (and most Lisps), the first element of a list is either a function or a special form. Since I’ve implemented the backing function for the Lithp’ special forms using Lithp, I can use what is in effect a dispatch table from the first-element symbol to the implementation:
1
(
cond
2
((
eq
(
car
expr
)
(
quote
quote
))
(
quote
'
expr
))
3
((
eq
(
car
expr
)
(
quote
atom
))
(
atom
'
expr
binds
))
4
((
eq
(
car
expr
)
(
quote
eq
))
(
eq
'
expr
binds
))
5
((
eq
(
car
expr
)
(
quote
car
))
(
car
'
expr
binds
))
6
((
eq
(
car
expr
)
(
quote
cdr
))
(
cdr
'
expr
binds
))
7
((
eq
(
car
expr
)
(
quote
cons
))
(
cons
'
expr
binds
))
8
((
eq
(
car
expr
)
(
quote
cond
))
(
cond
'
expr
binds
))
As shown above, the cond
above checks each first element against the known Lithp’ core functions and delegates to the Lithp handlers to evaluate them. However, what happens if a function is not directly resolvable via the lookup table above? In that case, the eval
function handles it specially:
1
(
t
(
eval
(
rewrite
expr
binds
)
binds
))))
That is, eval
rewrites the expression such that the original name at the first of the list is replace with whatever its binding happens to be:
1
(
label
rewrite
(
lambda
(
rexpr
rbinds
)
2
(
cons
(
lookup
(
car
rexpr
)
rbinds
)
3
(
cdr
rexpr
))))
This action can be shown below:
1
(
rewrite
(
cons
(
quote
a
)
(
quote
(
b
c
)))
2
(
cons
(
pair
(
quote
a
)
(
quote
______
))
env
'
))
3
4
;;
=>
(
______
b
c
)
This behavior is interesting because it signals the first real point of departure of Lithp’ from its host Lithp. That is, Lithp doesn’t rewrite function applications in this way, but I’ve decided to do so for Lithp’. What I haven’t shown is what happens when eval
receives a rewritten expression of the form ((lambda (n) n) nil)
. I’ll defer the answer to that question until later and instead show one more core form label
that’s also rewritten, but in a different way.
label
When examining how to implement the label
form for Lithp’ there were a few ways that it could have been done. The obvious way is to view label
as a binding creator and evaluate the bound value for the purpose of creating a new environment. In this case, the evaluation is as simple as just looking up the binding name in the augmented environment. This technique is less than thrilling. However, I found a more interesting technique in Paul Graham’s paper The Roots of Lisp. Indeed, Graham’s implementation of eval
forms the primary inspiration for the implementation of Lithp’, even though I change it up a bit as the implementation of label
below will show.
That said, observe the implementation of label
below:
1
((
eq
(
car
(
car
expr
))
(
quote
label
))
2
(
eval
(
cons
(
car
(
cdr
(
cdr
(
car
expr
))))
(
cdr
expr
))
3
(
cons
(
pair
(
car
(
cdr
(
car
expr
)))
4
(
car
(
cdr
(
cdr
(
car
label
-
ex
)))))
binds
)))
What’s happening here is a bit more subtle than what’s going on in the rewrite
function shown above. I’ll pull this subtlety apart using the following example:
1
(
label
label
-
ex
(
quote
((
label
a
(
lambda
(
id
)
id
))
t
)))
2
3
;;
=>
((
label
a
(
lambda
(
id
)
id
))
t
)
That is, to see what happens to a Lithp’ form of ((label a (lambda (id) id)) t)
I can show what happens in the parts of the implementation:
1
(
cons
(
car
(
cdr
(
cdr
(
car
label
-
ex
))))
(
cdr
label
-
ex
))
2
;;
=>
((
lambda
(
id
)
id
)
t
)
The first part of the call to the recursive eval
is very similar to the way that rewrite
works. That is, it creates a call to the lambda
form bound to a
with the original argument t
. You probably already know what happens when this form is encountered; the value t
is returned. However, the second argument to the recursive eval
call builds a bit of a weird context:
1
(
cons
(
pair
(
car
(
cdr
(
car
expr
)))
2
(
car
(
cdr
(
cdr
(
car
label
-
ex
)))))
binds
)
3
4
;;
=>
((
a
(
lambda
(
x
)
x
))
(
t
t
)
(
nil
()))
That is, the value a
is bound to the lambda that was rewritten to replace the label
body. So what’s the point? The answer is simply that should I wish to build a Lithp’ interpreter around eval
I would like that the environment coming out of the label
implementation be usable to evaluate further forms. Graham’s original rewrote the label
form to be that of the lambda
call as above, with the binding a
bound to the label
form itself. In both cases, it really doesn’t matter because the rewriting ensure that the evaluation doesn’t touch the a
binding at all.
Observe what happens when it’s all put together:
1
(
eval
(
cons
(
car
(
cdr
(
cdr
(
car
label
-
ex
))))
(
cdr
label
-
ex
))
2
(
cons
(
pair
(
car
(
cdr
(
car
label
-
ex
)))
(
car
label
-
ex
))
env
'
))
3
;;
=>
t
This is the same as:
1
((
lambda
(
id
)
id
)
t
)
And finally, I can show how eval
handles the lambda
form next.
lambda
The keystone of the Lithp’ implementation is the lambda
evaluation logic. As I’ve shown, the eval
expects something like ((lambda (id) id) t)
as its input form. The following code looks for just such a form and again performs a different rewrite to provide an evaluation context for the lambda
body:
1
;;
Lambda
handling
2
((
eq
(
car
(
car
expr
))
(
quote
lambda
))
3
(
eval
(
car
(
cdr
(
cdr
(
car
expr
))))
4
(
append
(
zip
(
car
(
cdr
(
car
expr
)))
(
eval
-
args
(
cdr
expr
)
binds
))
5
binds
)))
There’s a lot going on here, so let me take it piece by piece. First, the append
function probably does what you’d think – it takes two lists and tacks them together end to end:
1
(
label
append
(
lambda
(
append_x
append_y
)
2
(
cond
((
null
append_x
)
append_y
)
3
(
t
(
cons
(
car
append_x
)
4
(
append
(
cdr
append_x
)
append_y
))))))
Yet another recursive function is append
, indeed, to do anything interesting in Lithp recursion is required. Regardless, append
works as follows:
1
(
append
(
quote
(
a
b
c
))
(
quote
(
d
e
f
)))
2
3
;;
=>
(
a
b
c
d
e
f
)
A slightly more interesting function is the zip
function used above:
1
(
label
zip
(
lambda
(
x
y
)
2
(
cond
((
and
(
null
x
)
(
null
y
))
nil
)
3
((
and
(
not
(
atom
x
))
(
not
(
atom
y
)))
4
(
cons
(
pair
(
car
x
)
(
car
y
))
5
(
zip
(
cdr
x
)
(
cdr
y
)))))))
The zip
function takes two lists and builds an environment-like pairing of the elements of both, like so:
1
(
zip
(
quote
(
a
b
c
))
(
quote
(
d
e
f
)))
2
3
;;
=>
((
a
d
)
(
b
e
)
(
c
f
))
With these two functions in place, I can show how the parts of the lambda
handling work in conjunction to rewrite. The example is as I’ve already shown:
1
(
label
lambda
-
ex
(
quote
((
lambda
(
id
)
id
)
t
)))
Now, to build up a context for use in resolving the body of the lambda
I need a way of taking the argument bindings and evaluating them so that they take on values within the body of the lambda. The function eval-args
does just this:
1
(
label
eval
-
args
(
lambda
(
args
binds
)
2
(
cond
((
null
args
)
nil
)
3
(
t
(
cons
(
eval
(
car
args
)
binds
)
4
(
eval
-
args
(
cdr
args
)
binds
))))))
In the core environment env'
the names (t nil)
will evaluated as such:
1
(
eval
-
args
(
quote
(
t
nil
))
env
'
)
2
3
;;
=>
(
t
())
These values will zip
with the arguments to the lambda
form to build a function-level binding context used to evaluate the lambda
body, pulled out with:
1
(
car
(
cdr
(
cdr
(
car
lambda
-
ex
))))
2
3
;;
=>
id
In other words:
1
(
zip
(
car
(
cdr
(
car
lambda
-
ex
)))
(
eval
-
args
(
cdr
lambda
-
ex
)
env
'
))
2
3
;;
=>
((
id
t
))
That is, the body id
gets evaluated in the augmented context ((id t) (t t) (nil ()))
to result in the symbol t
.
And that’s all there is to it. This is the essence of the beauty discovered by John McCarthy.
Conclusion
Lithp is just a humble little Lisp created in the spirit of discovery and learning. It was incredibly instructive to create and I hope by walking through it in this installment you too might have discovered something new and interesting. There are many different ways to take this implementation. From Linear Lisp to macros to Lithp’’, the possibilities are endless. In future installments I’ll touch back with Lithp and Lithp’ and highlight other angles of discovery. But for now perhaps you could go off and implement the same using your favorite programming language?
Further reading
John McCarthy. 1960. Recursive functions of symbolic expressions and their computation by machine, Part I. Commun. ACM 3, 4 (April 1960), 184-195. DOI=10.1145/367177.367199 http://doi.acm.org/10.1145/367177.367199
Robert R. Fenichel , Jerome C. Yochelson, A LISP garbage-collector for virtual-memory computer systems, Communications of the ACM, v.12 n.11, p.611-612, Nov. 1969
Paul Graham. 2002. Roots of Lisp. http://www.paulgraham.com/rootsoflisp.html
Pascal Costanza , Richard P. Gabriel , Robert Hirschfeld , Guy L. Steele, Jr., Lisp50: The 50th birthday of lisp at OOPSLA 2008, Companion to the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, October 19-23, 2008, Nashville, TN, USA
John McCarthy, A basis for a mathematical theory of computation, preliminary report, Papers presented at the May 9-11, 1961, western joint IRE-AIEE-ACM computer conference, May 09-11, 1961, Los Angeles, California
Guy Steele. Growing a Language. Keynote at the 1998 ACM OOPSLA conference. https://www.youtube.com/watch?v=_ahvzDzKdB0
Hubris in tech
hubris: n. Overbearing pride or presumption; arrogance.
calibris: n. The insistence that tech is a pure meritocracy.
chewbris: n. An outmoded management style based around intimidation, threats of firing “if you can’t get this done” and making veiled criticisms and rude comments immediately proceeded by “just kidding.”
dubris: n. The condition of being so high that re-implementing some framework sounds like a great idea.
ewbris: n. see Not-Invented-Here
foobris: n. Programming documentation filled with examples rife with meta-syntactic variables.
gnubris: n. An insistence that every open-source software developer adhere to your definition of what open-source should be.
hubris.js: n. Probably an MVC framework of some sort.
kubris: n. An insistence that every programming book should contain everything needed to understand said book.
lugubris: n. see concern-troll
n00bris: n. The pride of a newcomer to enter a new programming community and immediately start suggesting changes.
oopris: n. The idea that even though you don’t know what OOP means, you’ll know it when you see it.
prebris: n. The insistence that some concept is so important that a) everyone should already understand it and b) therefore it should not be explained.
protubris: n. That ugly utility class holding all of the methods that should have been functions, but weren’t allowed to be.
q-bris: n. Pride so deep that one feels the need to jump all over anyone skeptical of your views.
rubris: n. Excessive pride in one’s programming language or framework.
shoebris: n. The idea that something that is stunted and with chunky buttons and bright colors automatically makes it a good learning environment for children.
tumblis: n. An opinion blog without a single citation.
Viewbris: n. The excessive pride of the MVC framework author.
Wu-bris: n. The idea that all value measurements must occur through a commerce-colored lens.
Final thoughts
I hope you’ve enjoyed this installment - it was a long time coming. Please do not hesitate to reach out via the contact information below to discuss and provide feedback. All issues of Read-Eval-Print-λove are meant to be “living” in the sense that they can and will change as new ideas occur to me and feedback is provided. The essays herein can and will become all the better with your help.
Thank you.
Translations welcomed
If you’re interested in translating issues of Read-Eval-Print-λove then please contact me using the information below. I will happily share proceeds equally (as humble as they might be) with translators.
Author information and links
Michael Fogus — a core contributor to Clojure and ClojureScript. Creator of programming source codes.
A fervent blogger at Send More Paramedics and also co-author of the book, The Joy of Clojure and author of Functional JavaScript.
Author contact
- http://www.fogus.me
- http://www.twitter.com/fogus
- https://www.github.com/fogus
- me -at- fogus -dot- me
Discussion and information
- http://www.readevalprintlove.org
- https://www.github.com/readevalprintlove
- http://www.lispforum.com/
- https://groups.google.com/forum/#!forum/ll-next
- Naggum was referring to Arc when he made this statement. The ego mentioned belonged to Paul Graham. The points made by Naggum in that post ran deeper than taking a shot at Graham.↩
- This is not exactly true since the Common Lisp spec intentionally leaves many key decisions up to the vendors. While I understand why the deferment was done and actually think it was the right move given the goals of the language, the choice of one vendor over another is a non-trivial decision. That said, if you think that Common Lisp is the right language then the complexities around vendor choice are necessary and should not dissuade anyone truly interested in building rock-solid systems.↩
- Frankly, the exploration continues and will continue for the foreseeable future.↩
- This code-base is at times in flux and may or may not always match the implementation detailed herein.↩
- The choice for Python was one of practicality, as at that time I was using it extensively on day-job activities.↩
- Back in the day when Graham actually wrote about Lisp. Sadly, his Lisp illumination has all but died off, but I suppose I can understand – there’s no money in it.↩
- As far as Ur-Lisps go, this is not too much of a monstrosity if I do say so myself – my implementation of Lisp in 25 lines of Ruby however…↩
- There is no particular reason that a primordial Lisp needs to implement
cond
, in fact something likeif
would work just as well.↩ - This is not to be confused with a dotted-pair which is a cons cell with a value at the head and something other than a cons cell as its tail. McCarthy’s original Lisp did not support dotted pairs and neither do I.↩