Michael Fogus
Buy on Leanpub

A note

Read-Eval-Print-Love is an N-monthly zine of original content and curation about the Lisp family of programming languages and little-languages in general.


One day a hopeful macrolyte approached the master macrologist Pi Mu and asked a question:

“Master, I’ve studied with the immortal Brodie, but have left him in search of answers – and as such I would like to study under your tutelage.” Upon hearing this Pi Mu asked, “Brodie is a true master, so why is it that you’ve decided to leave him?” The student answered, “his teachings are obvious and I’ve transcended his teachings.” Pi Mu glared intently and asked yet another question, “What of his teachings were obvious?” The student responded, “I once asked him what the fundamental principle of Forth was, but I did not like the answer.” Pi Mu, losing patience asked gruffly, “but what did he say?” To which the student responded, “his answer was ‘Moore seeks for layers.’” Pi Mu then stated, “well, that was an excellent answer, but do you know it’s meaning?” “Of course,” responded the student, “it means that Moore, as the creator of Forth, would naturally build his programs in a layered fashion.” He added, “for in the same way I too can achieve Forth enlightenment through the practice of layered development.” Pi Mu then laughed, “oh you didn’t understand it at all!” The student, taken aback then asked, “well, how would you answer?” Pi Mu responded, “go on and ask me.” The student then repeated his question, “what is the fundamental principle of Forth?” Almost immediately Pi Mu responded, “Moore seeks for layers.”

The student became enlightened.


RIP October 23, 2011
RIP October 23, 2011

world hello

The Forth programming language was created by Charles (Chuck) Moore as the basis for a personal programming environment. After working on this system for some time in isolation, Moore decided to extract a Forth implementation for use in the National Radio Astronomy Observatory’s 11-meter radio telescope in 1971. Imagine this for a moment if you please. Chuck Moore, some time in 1968 found that the computing environments of the day were insufficient for his needs and decided to create his own from whole-cloth. Over the next 2-3 years Moore developed a computing environment suited to his philosophy of programmatic style. A crucial components of this system was the programming language forming its underpinnings: Forth.

By modern standards, Forth is an odd duck. First, the language is stack-centric, meaning that every operation in the language deals either implicitly or explicitly with a stack-based programming metaphor. This is decidedly different from a programming language like C that may use as an implementation details an underlying stack to pass function activations. In Forth, a program consisting of a single token can demonstrate how its stack is central in its programming model:

1 9

The Forth program above is interesting in that it illustrates the conceptual underpinnings of the language. In most modern programming languages, a program containing the number 9 would be quite pointless, but in Forth such a program would cause an effect in the Forth environment itself. That is, the number 9 in Forth isn’t just a constant integer, instead it’s a command to the Forth environment that states “push the integer 9 onto the data stack.” The data stack in Forth is a global scratchpad in Forth environments used by operations to retrieve and store the results of functions.

A larger Forth program is shown below:

1 36 9 /

In addition to the commands to push 36 and 9 onto the stack, there’s a new command called / that states: “pop something off of the stack and make it the denominator and then pop another thing off and make that the numerator and then push the resulting quotient onto the stack.” At the end of this program you’ll not be surprised to learn that the number 4 is the only element on the data stack.

Before I go any further I’d like to talk a little bit about the word “concatenative.” Very often when discussing Forth and languages of its ilk (the pedigree being Factor, Joy, Open Firmware, and PostScript) you’ll see this term bandied about, but what does it really mean? From a simplistic perspective, the term concatenative means “programs are built by laying code words side by side,” but again this is nearly meaningless. Instead, let me create a very simple little Clojure function that might help clarify the matter:

1 (defn postfix [& words]
2   (reduce #(if (fn? %2)
3              (let [[l r & m] %]
4 		       (cons (%2 r l) m))
5 		       (cons %2 %)) 
6 		  [] 
7 		  words))

The postfix function is fairly straight-forward. It takes a number of arguments (words) that represent either functions or non-functions. A sample call looks as follows:

1 (postfix 5 1 2 + 4 * + 3 -)

It then walks the words and when it finds a function it calls it with the top-2 elements of the “stack” (the words list) and plops the result back in place, otherwise it just plops the word. Eventually, what comes out of postfix is something that represents a stack, a la:

1 (postfix 5 1 2 + 4 * + 3 -)
2 ;;=>  (14)
4 (postfix 3 2 1 + *)
5 ;;=> (9)
7 (postfix 1 2 3 * +)
8 ;;=> (7)

Now imagine that the vector [5 4 *] represents a valid program:

1 (apply postfix [5 4 *])
2 ;;=> (20)

I could literally concatenate another valid program onto this vector in order to create a new valid program:

1 (apply postfix (concat [5 4 *] [30 10 +]))
2 ;;=> (40 20)

Likewise, I could concatenate an operator into that result to create yet another valid program:

1 (apply postfix
2   (-> [5 4 *]
3       (concat [30 10 +])
4       (concat [+])))
5 ;;=> (60)

Ad infinitum.

As shown, programs built with concatenative languages are composed via the act of “laying” operators, or even whole programs, side by side. Additionally, like in natural language, Forth words can directly replace longer phrases. Observe:

1 5 dup *

When the program above is run, the data stack will look as follows:

1 |  25  |
2 |      |
3 +------+

That is, the only element on the stack will be the number 25 because the program 5 dup * means “duplicate the thing on the top of the stack and push the duplicate on top of it, then take two things off of the stack (two 5s) and multiply them together, pushing the result back onto the stack.” The program fragment dup * is therefore equivalent to the act of squaring the number on the top of the data stack. Indeed, I could use those two words to define a new word, as shown below:

1 : sq dup * ;

I’ll talk more about the : operator later, but for now you only need to understand that it creates a new word named sq consisting of the two words dup *. Now, I can directly replace the original longer phrase dup * with sq to keep the original behavior:

1 5 sq

This is a common work-flow in Forth-like languages:

  • Write some code to perform a task
  • Identify some fragment that might be generally useful
  • Extract it and give it a name
  • Replace the relevant bits with the new word

There is nothing special about this work-flow except that it’s indicative of a philosophical underpinning of Forth-like languages driving code toward a highly-layered implementation. I’ll dive into this angle later.

In this installment of Read-Eval-Print-Love I’ll discuss Forth by first dissecting a small Ruby implementation called “Russ-Forth”1, which is a simplified Forth (e.g. no return stack). Once the run-time for Russ-Forth is in place I’ll implement the bare bones of a core library and talk about stack-shufflers, combinators, and the like. I’ll then talk about the ideas in the amazing book “Thinking Forth” by Leo Brodie which describe an interesting bottom-up approach to system design.

While Forth is an interesting programming language in its own right, the language itself is a philosophical statement by Chuck Moore (Brodie 1984) about how programs should be constructed. Of course, I’ll have fun with a tiny Forth-like implementation, but the point of this installment is to establish a context for describing the Forth philosophy. I’m personally of the opinion that far more important than the programming language itself, the idea of Forth is key.

  1. The name is an homage to my friend and colleague Russ Olsen, who posted the original code to me as a comment on one of my GoodReads reviews. Thanks Russ!

Russ Forth: a simple Forth in Ruby

This wouldn’t be an issue of Read-Eval-Print-Love without an implementation of a programming language, so here we are again. This time I’m going to implement a little version of Forth that looks a little bit like a Forth that you might find out in the wild,1 but that is greatly simplified.

The implementation of a small Forth interpreter like Russ Forth (RForth) is fairly small as you’ll see in the next few pages. Aside from the core functions, RForth consists only of a lexicon to store word (functions) definitions:

1 require_relative 'lexicon'

A compiler to convert those word definitions to core constructs – in this case, Ruby constructs:

1 require_relative 'compiler'

And a token reader to handle the raw text input:

1 require_relative 'reader'

While these three pieces will provide the core functionality of the Forth scanning and execution, RForth needs more than that. Indeed, while not comprehensive, I will implement a small set of core functions found in many concatenative languages:

1 require_relative 'shufflers'
2 require_relative 'stack_ops'
3 require_relative 'io'
4 require_relative 'math'
5 require_relative 'comparators'

I’ll talk more about these core words later. For now let me walk through the implementation of the main RForth driver class.


Any good Ruby program starts with a class definition, and this one is no different: 2

The RussForth class will control the text scanning, compilation, and execution interpreter pipeline. On initialization the class is instantiated with the input and output channels, that are set to standard in and out (console I/O) by default.

1     @lexicon =
2     @reader = @s_in )
3     @compiler =

As mentioned before, three key components of the system are implemented as a set of three classes: Lexicon, Reader, and Compiler. However, the core metaphor of the system is implemented as a simple Ruby array:

1     @stack = []

However, rather than being used simply as a array, the @stack property will be managed as, well, a stack. You see, Forth is a stack-centric concatenative programming language.3 All arguments into and results from words pass from one to the other via the system stack. Since I talked about this in the introduction I won’t belabor the point here, but I might talk more about it in situ whenever appropriate.

1     build_lexicon
2   end

The final stage of the initialization process is the population of the lexicon with core words. I’ll discussion this more later.


The RForth evaluation model is very simple. Indeed, much of the simplicity is owed to the fact that it farms off much of the execution logic to Ruby itself. That is, after the read and compilation phases RForth words are left as Ruby Proc instances awaiting execution which are stored in the Lexicon. Programmatically this looks as follows:

1   def evaluate( word )
2     entry = resolve_word(word)
4     if entry
5       entry[:block].call
6     else
7       @s_out.puts "#{word} ??"
8     end
9   end

As you might have noticed, the entry retrieved is actually a hash containing a Proc. The reason for this level of indirection is because there’s a little extra information needed pertaining to the word that’s stored in the hash returned from resolve_word. The implementation of resolve_word is as follows:

1   def resolve_word( word )
2     return @lexicon[word] if @lexicon[word]

First of all, if a word is in the @lexicon then resolve_word simply returns it as is. However, if the word is not found then RForth needs to perform some careful logic:

1     if word[0,1] == ":"
2       x = word[1..-1].to_sym

Since this is a Ruby-based interpreter, I thought it would be fun to handle Ruby symbols. Therefore, the first check is to see if the read word starts with a colon. If it does, then the word is just the symbol itself – there’s a good reason for this, as I’ll talk about in a moment. But first, if the word is not a symbol then it might be a number:

1     else
2       x = to_number(word)
3     end

Without going into too much detail about to_number (I’ll inflict that on you in a moment) the point is that should the word in fact be a number then it too is simply the number itself. So in the cases where a word resolves to a symbol or a number, the RForth interpreter has to handle them in an interesting way:

1     if x
2       block = proc { @stack << x }
3       return { :name => word, :block => block, :immediate => false }
4     end

That is, if the word resolved to either a number or a symbol then RForth builds a Ruby Proc that does nothing but push them onto the top of the stack. You’ll sometimes hear that everything in Forth (and concatenative languages in general) is a function (word) including scalars like numbers and the like. Indeed, the rationale is sound given that it’s often important to perform calculations involving numeric constants and symbols and since that is the case, it makes sense to leverage the paradigm of the language itself to add such constructs. While not every concatenative language utilizes this trick, instead preferring more performant options, I chose to because I’m drawn to the purity of the idea.

1     nil
2   end

Finally, if the word was none of a process in the lexicon, symbol, nor number then of course it was nothing at all. Real quick, allow me to show the abominable to_number implementation below:

 1   def to_number( word )
 2     begin
 3       return Integer( word )
 4     rescue
 5       puts $!
 6     end
 7     begin
 8       return Float( word )
 9     rescue
10       puts $!
11     end
12     nil
13   end

Rather than going into detail about this method, I’d prefer to turn a blind eye and pretend that it never happened. Instead, allow me to say that the above is all that there is to the evaluation model of a simple Forth like RForth. However, while it might make sense that numbers and symbols are self-pushing words, there are more complicated words comprising the RForth run-time. First, there are internal words implemented as Ruby methods and while I’ll get to those eventually, I’m more interested in user-defined words supported by the language itself. Of course, the Devil’s in the details and so in the next few sections I’ll dig into the (sometimes gnarly) details.

Reading and defining words

I glossed over the action of processing user-defined words, so let me take a few paragraphs to go back and discuss it a bit. Rather than dig into the details of tokenization and character-by-character lexing, I’ll keep things at a higher level.

1   def read_and_define_word
2     name = @reader.read_word
3     words = []

Speaking of char-by-char lexing, the bulk of that task is farmed out to the Reader class which I’ll discuss later. It’s assumed, that Reader#read_word “knows” what it’s doing and will return a token corresponding to the name of the word. If you recall, the read_and_define_word was initiated when the resolve_word method encounter a colon. The colon started the user-defined word and a semicolon ends it – in between are just regular RForth words:

1     while (word = @reader.read_word)
2       break if word == ';'
3       words << word
4     end

Once the constituent words comprising the body of the user-defined word are read, they are then compiled into a Ruby Proc:

1     p = @compiler.compile_words(self, *words)
2     @lexicon.define_word(name, &p)
3   end

A method that’s closely related to read_and_define_word is shown below:

1   def read_quotation
2     words = []
3     while (word = @reader.read_word)
4       break if word == ']'
5       words << word
6     end
8     @stack << @compiler.compile_words(self, *words)
9   end

You’ll notice that the basic structure of read_quotation is almost exactly like that of read_and_define_word except for what happens to the returned Proc from Compiler#compile_words. That is, rather than define the word in the @lexicon, it’s stored directly onto the @stack itself. For the Lisp programmers in the audience you might realize that this is the Forth analogy to a lambda or anonymous function. The utility for these anonymous words, called “quotations” in Forth, will be shown later when I talk about RForth combinators.

I’ll also talk a little about the compilation step later, but for now let me move on to the final parts of the RForth class: driving and seeding.


I’ve already shown the basic parts of the system, but it’s the run method that ties it all together:

1   def run
2     until $stdin.eof?
3       @s_out.flush
4       word = @reader.read_word
5       evaluate word
6     end
7   end

It couldn’t possibly be easier. Read, Eval, Loop. However, it gets a little more interesting when a language proper is built on top of this simple frame. Commercial-grade Forth systems typically come with hundreds of general-purpose and domain-specific4 words out of the box. However I’m only going to talk about a handful of word classes for the purpose of illustration, including:

I’ll dive into each of these soon, but for now all that I want to say is that the words are defined as Ruby modules and imported to the RForth lexicon at launch time:

1   def build_lexicon
2     @lexicon.import_words_from Verbs::Shufflers, self
3     @lexicon.import_words_from Verbs::StackOps, self
4     @lexicon.import_words_from Verbs::Io, self
5     @lexicon.import_words_from Math::Arithmetic, self
6     @lexicon.import_words_from Verbs::Comparators, self

Each of the built-in words are implemented as methods contained in modules and aliased for use in the system:

 1     @lexicon.alias_word('?dup', 'qdup')
 2     @lexicon.alias_word('+', 'plus')
 3     @lexicon.alias_word('*', 'mult')
 4     @lexicon.alias_word('-', 'subtract')
 5     @lexicon.alias_word('/', 'divide')
 7     @lexicon.alias_word('=', 'eq')
 8     @lexicon.alias_word('<>', 'noteq')
 9     @lexicon.alias_word('>', 'gt')
10     @lexicon.alias_word('<', 'lt')
12     @lexicon.alias_word('.', 'dot')
13     @lexicon.alias_word('.S', 'dot_s')
15     @lexicon.define_word(':') { read_and_define_word }
16 	@lexicon.define_word('[') { read_quotation }
17     @lexicon.define_word('bye') { exit }

However, there is a special kind of word supported by most Forth implementations called immediate words and I’ll talk about those later as well.

1     @lexicon.define_immediate_word( '\\' ) { @s_in.readline }
2   end
3 end

The basic structure of the RForth system is fairly straight-forward as most of the capability is farmed out to supporting classes, each of which I’ll discuss presently in turn.


The RForth Lexicon class just implements a fancy hash-map: 5

1 class Lexicon
2   def initialize( &block )
3     @entries = {}
4   end

Since the Lexicon is a map then it should behave like one:6

1   def []( name )
2     @entries[name]
3   end

What’s stored in the map is a k/v pair of name/descriptor. The descriptor is yet another map that just describes the word and provides its implementation:

1   def define_word( name, &block )
2     @entries[name] = { :name => name, :block => block, :immediate => false }
3     self
4   end

Immediate words, or words that are executed during the read phase, differ from regular words only by a flag:

1   def define_immediate_word( name, &block )
2     @entries[name] = { :name => name, :block => block, :immediate => true }
3     self
4   end

Words can have aliases, which are defined in terms of previously stored words in the Lexicon. The utility of this will become apparent when I talk about the import_words_from method below.

1   def alias_word( name, old_name )
2     entry = self[old_name]
3     raise "No such word #{old_name}" unless entry
4     new_entry = entry.dup
5     new_entry[:name] = name
6     @entries[name] = entry
7   end

Nothing about the Lexicon has been terribly interesting so far (which is why I glossed over the details). However, its implementation is slightly more interesting in the way that it imports base words from Ruby modules. That is, the core words in RForth are implemented within Ruby modules and actively imported at start time to load the core Lexicon instance, as shown below:

1   def import_words_from( mod, context )
2     mod.public_instance_methods(false).each do |meth|
3       method_closure = context.method(meth.to_sym)
4       define_word( meth.to_s, &method_closure )
5     end  
6   end
7 end

As shown, the public methods defined in a passed module are closed over individually and stored in the Lexicon using their defined names. Perhaps now you see the utility of the alias_word method defined above. That is, the limitations of Ruby method naming shouldn’t bleed into the RForth word name definitions and so while it’s straight-forward to import a module method directly, the presence of alias_word allows me to fix up the name to something more appropriate to RForth. You’ll see this in action later, but for now I’ll talk about the class responsible for the read phase of the RForth implementation.


Forth languages, in general, do not have a read phase like you’d find in Lisp languages. Instead, the term Reader is meant to encapsulate the logic for an RForth phase pertaining to the reading of words from an I/O stream. This is more like a classical language scanner except that it performs no level of token classification. Instead, since everything in RForth is a word then everything scanned is assumed to be a word. This is perhaps a naive implementation, but it works for my purposes.

1 class Reader
2   def initialize( in )
3     @input = in
4   end

The Reader class is initialized with an input stream. There’s nothing surprising about that fact perhaps. As shown in the introduction, RForth words are always separated by white-space, (i.e. tab, space, newline, etc.) so the Reader needs a way to identify when it encounters some:

1   def is_space?( ch )
2     /\W/ =~ ch.chr
3   end

Regular expressions to the rescue!7

This is all fine and good, but the real meat of the word identification is shown below:

 1   def read_word
 2     result = nil
 3     ch = nil
 4     until @input.eof?
 5       ch = @input.readchar
 6       if result and is_space?(ch)
 7         break
 8       elsif result.nil?
 9         result = ch
10       else
11         result << ch
12       end
13     end

That little nasty bit of code is (unfortunately) obfuscating a simple bit of logic that can be stated simply as “append a bunch of characters to a word until a space or nil happens along.” Once one of those separators are encountered then RForth just assumes that it’s found a word and just returns it:

1     return result if result
2     nil
3   end
4 end

And that’s the entirety of the Reader class. This could be made more robust at the expense of clarity, but I decided to keep it simple for now. Frankly, only a few odd people (like myself) care about the minute details of lexical scanning, so by assuming that every token was potentially a word I was able to forego a whole swath of complexity.

I’ll follow this same complexity conservation pattern in the next section in the implementation of the RForth compiler.


I agonized long and hard over whether I would refer to this class as a compiler or a more terrible word (perhaps one starting with a ‘T’), but in the end my biases show through. The Compiler class is more to stick to the layered phase separation of RForth than a traditional compiler. Indeed, the implementation is such that RForth words comprising a program are aggregated into Ruby Proc instances for execution. This is not problematic until you realize that the Compiler translates RForth programs into sequences of Proc calls which then call other Proc instances that then bottom out as Ruby module methods. That is, the compilation phase for RForth is very tightly coupled to the RForth run-time environment. In other words, the Compiler class is simply responsible for translating RForth words into sequences of pre-existing Ruby method calls. That said, it still might be interesting to explore a little.

1 class Compiler
2   def compile_words( context, *words )
3     actions = []

The compiler will take an array of words and transform them, collecting the results into an array of Block instances.

1     words.each do |word|
2       entry = context.resolve_word( word )
3       raise "no such word: #{word}" unless entry

First, each word is expected to exist in the given context (probably a RussForth instance) and throw an error if it can’t be resolved.

1       if entry[:immediate]
2         entry[:block].call

If the resolved word happens to be an immediate word then it’s executed… well… immediately. Immediate words are the Forth way to define compile-time effects and I’ll talk more about that later.

1       else
2         actions << entry[:block]
3       end
4     end

However, if the resolved word is a normal word then its block attribute (the part that performs the action) is stored for later use:

1     proc do
2       actions.each {|b|}
3     end
4   end
5 end

That is, the result of the compile_words method is a Ruby Proc that closes over the actions and executes them each in turn at some point in the future. This is the whole mechanism behind how user-defined words are “compiled” into Ruby Procs.


The conceptual framework for a Forth-like language is surprisingly small. With even a moderately expressive language one can create a Forth-like in very few lines of code, required very little conceptual bulk. That said, even though I’ve shown the guts of RForth, I’ve done very little to describe the kinds of programs and patterns that fall out of a Forth-like language. The rest of this installment will be devoted to just that topic with a particular eye towards demonstrating how such a simple language can lead to deep implications on how programs are reasoned about and constructed.

  1. If you’re interested in actually using a Forth for personal or professional use, then there are far better options than Russ Forth. At the end of this installment I’ll list a few.
  2. Save for perhaps the “good.”

    class Russforth def initialize( s_in = $stdin, s_out = $stdout ) @s_in = s_in @s_out = s_out

  3. Most concatenative programming languages use a stack as the core metaphor for computation, but it’s not necessarily the case that concatenative languages must do so. Indeed, if I ever get around to writing an installment of Read-Eval-Print-Love about concatenative languages in general then I’ll build a little interpreter for a language that does not use a stack at all. That’s a task for another day however.
  4. I’ll dive deeper into Forth and domain specific programming later in this installment.

    include Verbs::Shufflers include Verbs::StackOps include Math::Arithmetic include Verbs::Comparators include Verbs::Io

  5. Traditionally, Forth lexicons are implemented as linked lists to avoid overwriting previous word implementations.
  6. I’ve always liked Ruby’s capability to override the [] array look-up as it’s allowed me to use Ruby think in terms of associative data rather than merely object-centric.
  7. I’ll avoid the temptation to add a certain famous quote about regular expressions. Instead, I will say that pound for pound regexes are amongst the most powerful programming abstractions going. That said, the density of information packed into a regexes leads me to minimal their footprint in my own programs to minimize the occurrences of cursing for my 2-weeks-later-self.

Russ-Forth Core

All of the infrastructure shown above is fine and dandy, but by itself it doesn’t do anything. To add some capability requires seeding the run-time environment with some core words. As I alluded to earlier, the core way to define words is to write them as Ruby module methods which are then mixed in at start time.

Before I start, I should make it clear that the implementation methods must adhere to a certain standard:

  • All core words deal with an implicit @stack property

That is, RForth works off of a single globally accessible stack defined in the Russforth class which will be accessible to the core words once they’re mixed in at run-time. If I were creating a stack-based language in something like Clojure (something more extensive than I showed earlier) than I would make the implementation quite different. That said, while allowing access to a shared property feels a little nasty, there are some advantages for clarity and ease of testing.

One potential problem with dealing in a global stack, and this is a problem even in many stack-based languages, is that it’s not immediately clear just by reading the code what the stack effects might be. Therefore, while discussing the implementations herein I’ll use a variant of the stack effect annotations available in the Factor programming language. For example, the stack effect [x -- x x] means that a word takes a stack and duplicates the top element and pushes it back onto the stack. Another example is [x y -- y x] that refers to the action of popping the first two elements off of the stack and pushing them back on in reverse order. The general form of the annotation is [<before> -- <after>]. When reading stack effect annotations remember that the top of the stack is always the rightmost element of both the <before> and <after> segments.

Enough already – let’s begin with the fun stuff.

A quick example – “pop”

To start I’ll show a very simple stack operator named “.” that does one simple thing – it pops the top element from the stack and prints it out. The word lives in a module devoted to stack operations and for my purpose is the only word there:

1 module Verbs
2   module StackOps
3     # [x -- ]
4     def dot
5       @s_out.print( @stack.pop )
6     end
7   end
8 end

The . word works as follows:

1 1 2 .

Or graphically:

1 | 2 |        | 1 |
2 | 1 |   ->   |   |
3 +---+        +---+

As the element 2 is popped off of the stack it’s printed. The implementation of . is trivial, but I wanted to start with it just to show the layout of the implementation modules, the use of @stack, and how I’ll show examples.


Every good (and bad) programming language need mathematical operators and RForth is no exception. All of the RForth math words live in the same module:

1 module Math
2   module Arithmetic

I won’t belabor the implementations too much, as you can probably infer what they do just by reading them. For example, here are the implementations for plus and mult:

1     # [a b -- c]
2     def plus
3       @stack << (@stack.pop + @stack.pop)
4     end
6     # [a b -- c]
7     def mult
8       @stack << (@stack.pop * @stack.pop) 
9     end

The implementations for subtract and divide are only slightly more involved

 1     # [r l -- m]
 2     def subtract
 3       left = @stack.pop
 4       right = @stack.pop
 5       @stack << left - right
 6     end
 8     # [d n -- q]
 9     def divide
10       numerator = @stack.pop
11       denominator = @stack.pop
12       @stack << numerator / denominator
13     end

For the sake of completeness I’ll just show the code for a one more word below:

1     # [a -- b]
2     def abs
3       @stack << (@stack.pop.abs) 
4     end
5   end
6 end

There’s not much to show here, but I’ll eventually make use of these mathematical words later. Slightly more interesting words are implemented next.


The set of comparison operators live in a common module:

1 module Verbs
2   module Comparators

For my purposes I’ll need a relatively small set of comparators, starting with an equality word:

1     # [a b -- true|false]
2     def eq
3       a = @stack.pop
4       b = @stack.pop
6       @stack << (a == b)
7     end

As previously shown, the eq method is aliased to the = word and works as follows:

1 1 2 =

Or graphically:

1 | 2 |        | false |
2 | 1 |   ->   |       |
3 +---+        +-------+

You’ll notice that the values 1 and 2 were consumed from the stack by =. This is common in Forth-like languages as it’s expected that most operations will use the stack as a scratchpad for computation, wiping and writing constantly as they go. There are some Forth-like languages that also provide variable declarations for lexical-based use and reference, but those will not be used in RForth. The implementation for <> is very similar, as you might expect:

1     # [a b -- true|false]
2     def noteq
3       a = @stack.pop
4       b = @stack.pop
6       @stack << (a != b)
7     end

Likewise, for logical comparisons:

 1     # [a b -- true|false]
 2     def gt
 3       rhs = @stack.pop
 4       lhs = @stack.pop
 6       @stack << (lhs > rhs)
 7     end
 9     # [a b -- true|false]
10     def lt
11       rhs = @stack.pop
12       lhs = @stack.pop
14       @stack << (lhs < rhs)
15     end
16   end
17 end

And that’s all for comparators for now.


Before I dig into the meatier words, I want to take a few moments to show the implementation of a few useful words related to I/O, all implemented in a common module:

1 module Verbs
2   module Io

The simplest I/O word is one that prints an arbitrary character:

1     def emit
2       @s_out.print( @stack.last.chr )
3     end

Which can both be used to implement a word named .S that can be used to inspect the state of the stack:

1     def dot_s
2       @s_out.print( "#{@stack}>\n" )
3     end
4   end
5 end

The .S word prints a little hat on the top end of the stack to show the direction that the elements will be popped off.

Now that I’ve gotten these out of the way, I’ll now dive into the more interesting topic of stack shuffling.


An interesting set of core words implemented in RForth are the “stack shufflers.” In a nutshell, shufflers take elements off the stack and (perhaps) put them back on in various different configurations. All of the shufflers will reside in a common module:

1 module Verbs
2   module Shufflers

The easiest possible shuffler is called drop that takes a stack and just pops the top, throwing it away:

1     # [a -- ]
2     def drop
3       @stack.pop
4     end

The drop word works as follows:

1 1 2 drop

Or graphically:

1 | 2 |        | 1 |
2 | 1 |   ->   |   |
3 +---+        +---+

The nice thing about Ruby arrays is that they can be treated just like stacks, so the use of Array#pop is pretty clear. Likewise, the << (append to end) operator is analogous to a stack push, so a word named dup that just duplicates the top element is implemented in a straight-forward way:

1     # [a -- a a]
2     def dup
3       @stack << @stack.last
4     end

The dup word works as follows:

1 1 dup

Or graphically:

1 | 1 |        | 1 |
2 |   |   ->   | 1 |
3 +---+        +---+

One other nicety that Ruby provides is the array literal notation [...] which allows me to define the swap word via array concatenation:

1     # [a b -- b a]
2     def swap
3       @stack += [@stack.pop, @stack.pop]
4     end

Since the last element in the @stack refers to the top element, I was able to use the sequencing of the #pop operation to build a temporary array with the swapped order in place.

The swap word works as follows:

1 1 2 swap

Or graphically:

1 | 2 |        | 1 |
2 | 1 |   ->   | 2 |
3 +---+        +---+

I can use a similar technique as swap for the implementation of rot (rotate) below:

1     #[a b c -- b c a]
2     def rot 
3       top = @stack.pop
4       mid = @stack.pop
5       lst = @stack.pop
6       @stack += [mid, top, lst]
7     end
8   end
9 end

The rot word works as follows:

1 1 2 3 rot

Or graphically:

1 | 3 |        | 1 |
2 | 2 |        | 3 |
3 | 1 |   ->   | 2 |
4 +---+        +---+

A word analogous to rot is called over that takes the element under the top, duplicates it, and then pushes it back onto the stack:

1     # [a b -- a b a]
2     def over
3       top = @stack.pop
4       und = @stack.pop
5       @stack += [und, top, und]
6     end

The over word works as follows:

1 1 2 over

Or graphically:

1 |   |        | 1 |
2 | 2 |        | 2 |
3 | 1 |   ->   | 1 |
4 +---+        +---+

And that’s all for now. I’ll show more later when I get into user-defined words and combinators.


The final set of core words that I’ll talk about are the combinators. In short, combinators are a set of words that incorporate the use of other words to perform some operation. All of the combinators are implemented in a common module:

1 module Verbs
2   module Combinators

The first combinator word that I’ll talk about is named apply that takes a Proc and just calls it for its stack effects:

1     # [... [...] -- ...]
2     def apply
3       op = @stack.pop
5     end

The apply word works as follows:

1 1 2 [ + ]

Or graphically:

1 | [ + ]  |        | 3 |
2 |   2    |        |   |
3 |   1    |   ->   |   |
4 +--------+        +---+

A more involved variant of apply is called dip that takes a Proc and an element on the stack and then applies that Proc to the rest of the stack, finally pushing the saved element back onto the stack:

1 # [... a [ ... ] -- ... a]
2 def dip
3   op = @stack.pop
4   stash = @stack.pop
6   @stack << stash
7 end

The dip word is an interesting little piece of business because of the way that it uses the stash variable to hold a piece of data for later pushing. As an added bonus, it together with ., dup, and swap forms a primitive base for implementing control flow structures, (Spiewak 2008) but I’ll only touch on that tangentially.

Industrial-strength Forth implementations provide a stashing mechanism in the language itself via something known as the “return stack.” The return stack is a structure used implicitly by Forth run-times as a place to store and retrieve the maze of pointers within and between nested words (Noble 1992). However, while the return stack is an internal run-time structure, Forth programmers happily use it as needed as a place to store values temporarily (as they need to be gone before a word finishes execution). To push values onto the return stack, Forth implementations (often) provide a set of operators >r and r> to push and pop values from the return stack. Therefore, to implement dip in a “real Forth” would look like the following:

1 : dip swap >r execute r> ;

You would read this implementation as: 1. swap the top two main stack elements 2. store the top of the main stack onto the return stack 3. execute the quotation on the top of the main stack 4. push the top of the return stack onto the main stack

As you’ve probably already noticed, RussForth does not have a return stack as it wasn’t necessarily needed. However, adding something analogous to a return stack wouldn’t be too difficult, but I leave that as an exercise to the reader.

The implementation of dip is not too complicated, but it could use some illustration to really help explain what’s happening. First, let me say that you might instinctively think to type the following:

1 5 10 2 * dip

To mean “perform multiplication on 5 and 10, storing 2 back onto the stack.” The problem is that the default action of the read phase was to push words onto the stack by default, but dip requires a Proc as one of its arguments. However, if you recall, the way to push a Proc was to use the quotation form, which means that the code should be written as:

1 5 10 2 [ * ] dip

Now that the * word is encapsulated within a Proc residing at the top of the stack, dip can grab a hold of it and invoke the #call method on it.

Graphically, the stack prior to the execution of dip looks as follows:

1 | [ * ] |
2 |   2   |
3 |  10   |
4 |   5   |
5 +-------+

Before dip calls the Proc implementing the quotation, the stack will look like:

1 |  10   |
2 |   5   |
3 +-------+

Where, internally dip will hold onto the 2 after which it will be pushed back onto the stack after the quotation is executed:

1 |  2    |
2 |  50   |
3 +-------+

This is pretty cool because it illustrates in a simple way the use of quotations in RForth and how they’re used by combinators to do some snazzy stuff. However, this installment is not about combinators, so while I’m tempted to continue, I’d like to take a step back and talk a little bit about building up a language.


Now that I’ve put in place some core functions I can start to build a core library using them. This section will not present a fully fleshed out core, but I’ll go over some interesting examples to give a feel for the kinds of things that comprise a Forth-like run-time.

Before I start I want to point out that earlier I showed the implementation of an immediate word \\ that ate character until it found a newline. This is of course an implementation of a comment word:

1 \\ This is a comment
2 10 \\ The stack should now have 10
3 .S \\= The print out should be [10]>

The last comment line above is how I will display execution results such as I/O print outs and the like.

That out of the way, the first thing that I’d like to create is a simple utility to print out a newline. If you recall I implemented an emit word that I could use to implement just such a user-defined utility:

1 : cr 10 emit ;

So now, if we wanted to use this in a program to print a message it’s as simple as:

1 72 emit 105 emit 33 emit cr
2 \\= Hi!

A more useful Forth word is one called nip that is used to pop the element immediately under the top of the stack:

1 : nip swap drop ;

The nip word works as follows:

1 1 2 nip

Or graphically:

1 | 2 |        | 2 |
2 | 1 |   ->   |   |
3 +---+        +---+

Another useful word is known as tuck that stashes a copy of the top of the stack under the next element down:

1 : tuck swap over ;

The tuck word works as follows:

1 1 2 tuck

Or graphically:

1 |   |        | 2 |
2 | 2 |        | 1 |
3 | 1 |   ->   | 2 |
4 +---+        +---+

A couple of arithmetical operators that might be useful are implemented below:

1 : inc 1 + ;
2 : dec 1 - ;

Perhaps it’s obvious what these words do, but just in case it’s not, observe the following:

1 2 inc inc

Or graphically:

1 | 2 |   ->   | 4 |
2 +---+        +---+

Followed by:

1 dec


1 | 4 |   ->   | 3 |
2 +---+        +---+

This is pretty straight-forward, but I’m leading up to something more interesting than the implementation of a core library. In the next section I’ll expand on this thread of thought and show how the manner in which Forth programs are constructed can inform the construction of programs in any other language.

Forth Thinking

In Forth-like languages, it’s often the case that programs are built from constituent parts such as those implemented so far. This may not seem very unique given that the same can be said of any programming language. However, Forth programmers often strive to write their programs in a bottom-up fashion, building layers of fluency informing all layers above. In other words, each layer in a Forth program is a special-purpose language specifically geared to support the layers above it in the program. However, it’s not the case that Forth programs are one-off deals. Instead, it’s a point of pride amongst Forth programmers that code be aggressively reusable (Pountain 1987). These two concerns, I think, form a powerful approach to programming that I’ll discuss (all too) briefly in this section.

Thinking Forth

In August 2010 I stumbled on a book that deeply influenced my personal programming style and my views on how programs are constructed. The book in question was Thinking Forth by Leo Brodie (Brodie 1987) and upon reading it I immediately put it into my own “personal pantheon” of influential programming books (along with SICP, AMOP, Object-Oriented Software Construction, Smalltalk Best Practice Patterns, and Programmers Guide to the 1802). Up until that point most of the programming books that I had read focused on code, but Thinking Forth spent a significant portion of its pages discussing the thought processes behind program construction.

While I’ve read other books that touched on programming from an angle of thoughtfulness, Thinking Forth was the first that I read that drew an essential marriage between the language in use and the thought processes advocated by it. Bear in mind that this was a deeper relationship than one that you might find in a programming language’s idioms. Indeed, while idioms are often the result of discovery, they are not something that you might consider philosophy. Instead, the very idea of Forth necessarily informs the base tenet of “factoring” as discussed in the book.


Factoring in Forth is akin to “Refactoring” in common parlance. The main difference is that refactoring is an activity that is applied to an exiting code-base and factoring is applied to an evolving code-base. That is, refactoring happens after the fact while factoring happens in situ. Thinking Forth therefore identifies and discusses a number of benchmarks while coding that signal the need for factoring, including, but not limited to:

  • Factor when complexity tickles your conscious limits
  • Factor when you’re able to elucidate a name for something
  • Factor at the point when you feel that you need a comment
  • Factor the moment you start repeating yourself
  • Factor when you need to hide detail
  • Factor when your command set (API) grows too large
  • Don’t factor idioms

Using RussForth I’ll touch briefly on a few of these points below.

Factor when complexity tickles your conscious limits

The book puts stock in George Miller’s famous “Magical Number Seven…” paper (Miller, 1956) when discussing the notion of code’s cognitive load. That is, the gist is that people can general only hold 7 (± 2) pieces of information in their head, so it behooves the programmer to write such code that falls under those bounds. As a stack-based language, Forth’s ongoing stack manipulation from one word to another is quite confusing. Granted, I’m very far from even a novice in Forth-like languages, so such a statement should be taken with a grain of salt. That said, let’s explore a simple operation that one could conceivably find useful. That is, I could imagine a need to apply two separate quotations to a single value (Childers, 2016). Take a stack of the following form:

1 12 [ 3 * ] [ 4 * ]

An expanded description of this sequence could be stated as, “multiply 3 by 12 and push it onto the stack and then multiply 4 by 12 and push it onto the stack too.” To accomplish this requires a series of stack manipulations of the form:

1 rot dup rot apply swap rot apply swap

Breaking this down, let’s see what’s happening. To start, the stack looks as follows:

1 | [ 4 * ] |
2 | [ 3 * ] |
3 |   12    |
4 +---------+

The application of rot causes the following:

1 | [ 4 * ] |        |   12    |
2 | [ 3 * ] |   ->   | [ 4 * ] |
3 |   12    |        | [ 3 * ] |
4 +---------+        +---------

Next, the application of dup causes the following:

1 |   12    |        |   12    |
2 | [ 4 * ] |        |   12    |
3 | [ 3 * ] |   ->   | [ 4 * ] |
4 |         |        | [ 3 * ] |
5 +---------+        +---------+

Next, the application of rot again causes the following:

1 |   12    |        | [ 4 * ] |
2 |   12    |        |   12    |
3 | [ 4 * ] |        |   12    |
4 | [ 3 * ] |   ->   | [ 3 * ] |
5 +---------+        +---------+

Next, the application of apply causes the following:

1 | [ 4 * ] |        |         |
2 |   12    |        |   48    |
3 |   12    |        |   12    |
4 | [ 3 * ] |   ->   | [ 3 * ] |
5 +---------+        +---------+

Now I want to use swap rot to move that result out of the way and then prepping the next operation, causing the following:

1 |         |        | [ 3 * ] |
2 |   48    |        |   12    |
3 |   12    |        |   48    |
4 | [ 3 * ] |   ->   |         |
5 +---------+        +---------+

Then the next use of apply almost gets me there:

1 | [ 3 * ] |        |         |
2 |   12    |        |   36    |
3 |   48    |        |   48    |
4 |         |   ->   |         |
5 +---------+        +---------+

But since I wanted the stack to look a certain way a final swap is needed leaving:

1 |   48    |
2 |   36    |
3 +---------+

And that’s it! Unfortunately between the beginning and the end of putting this sequence of words together I’ve forgotten what’s happened with the stack. I’ve used 8 words to perform this task, but most of those are raw stack shufflers that on their own are somewhat opaque to the task at hand. The heart of the task lies in the apply words. That is, the shuffling performs entirely to setting up the calls to apply, which are informative in their own right. That said, I can replace some of the words to use another apply-like operator dip to trim a word and still maintain the focus point of information:

1 rot dup rot dip rot apply swap

I’ll avoid walking through the stack manipulations again, but I’ll talk just a moment about what this achieves. First, it trims a word and gets the whole sequence into that “7 things” range. Second, while trimming it also keeps the focused setup/process information quanta:

1 rot dup rot dip
2 rot apply
3 swap

With more built-in features, this sequence could be further simplified, but I hope that my point is understood.

Factor when you’re able to elucidate a name for something

It seems that a process for applying two quotations to a single value might be generally useful. Once it becomes clear that a fragment of code is generally useful, it follows that it should have a name. 1 The name2 that I might chose comes straight from the description given earlier, “apply two separate quotations to a single value”:

1 : apply2 rot dup rot dip rot apply swap ;

And now the original fragment becomes:

1 12 [ 3 * ] [ 4 * ] apply2
2 .S
3 \\= [36, 48]>

Which leaves the stack in the condition that we saw earlier.

Factor at the point when you feel that you need a comment

Imagine that I wanted a version of dip that used its stored value in the quotation rather than extracting it for later pushing. I would need to use stack shufflers to duplicate the stored value so that it could be used in the quotation:

1 5 10 2 [ * ] over swap  \ use the stored value
2 dip

Graphically this would look like the following:

1 | [ * ] |
2 |   2   |
3 |  10   |
4 +-------+

With the application of over swap the following would occur:

1 |       |        | [ * ] |
2 | [ * ] |        |   2   |
3 |   2   |        |   2   |
4 |  10   |   ->   |  10   |
5 +-------+        +-------+

After dip the final stack would be:

1 |   2   |
2 |  20   |
3 +-------+

As shown, it’s tempting to put a comment into the original to explain how the use of over swap works to weave the stored value back into the quotation application, but there’s a better way. That is, it’s better to factor out a new combinator instead:

1 : sip over swap dip ;

The comment is now manifested as a reusable (and testable) word. 3

Don’t factor idioms

I had alluded earlier that idioms do not operate at the philosophical level of a given programming language. Instead, idioms are natural growths occurring on a programming language. That said, very often the form of idioms are directly influenced by the philosophical underpinnings of a language. Like idioms in natural languages, programming idioms should be viewed as atomic units with meaning quite independent of their constituent parts. Therefore, it’s important to leave idioms intact and resist the urge to factor them out in whole. Hardcore Forth implementations like arrayForth (based on ColorForth), GForth, and Open Firmware have their own rich sets of idioms, but underneath those idioms lies the Forth philosophy – some more than others.


Forth is an astonishing programming language. The very design and (most) implementations revolve around the idea that there’s a conceptual distance between the ideas in your head and the code on the screen and that distance should be as short as possible. Sadly, I simply cannot adequately cover the whole of the beauty of Forth thinking in this small space. There is so much going on in this book that it literally made me dizzy while reading it. From thoughts on factoring, code organization, testing, DSLs, encapsulation, data-hiding, variable naming, word length, decomposition, and design, Thinking Forth is, ideas-per-page, unmatched in the realm of programming books.

  1. If I ever get the urge to write a “Programming for Buddhists” book then Forth (or some other concatenative language) is my choice for the language. There is a nice parallel between the ideas of rupa, Maya, and nama-rupa that would be a blast to write about.
  2. This word is typically called bi.
  3. Interestingly [ ] sip is equivalent to dup. (Kirby, 2002)

Book of interest: Stack Computers - the new wave

available at

On about 1989 I suspect that the idea of a stack computer was indeed considered “new wave,” but today it’s quite quaint.1 That said, Stack Computers - the new wave by Philip Koopman is well worth a read for the retro-computing-curious, if for no other reason than the bibliography. The book describes the architecture and run-time characteristics of a breed of computers being developed at the time, led by the Novix NC4016 chip. In addition to performing a survey of a few architectures, the book discusses the traps and pitfalls around programming those machines (there is a lot about Forth as you can imagine). The book ends with some thoughtful discussion about the “future” of stack architectures of which I find particularly interesting. Something that I think would be a fun activity would be to explore that final chapter in depth and compare it to the actual evolution of stack architectures in the intervening years. Perhaps this sort of thing appeals to you too?

  1. This is in no way meant to disparage modern stack hardware such as the Green Arrays F18A or software stack machines like the JVM.

References and Interesting Information


  • Brodie, Leo. 1987. Starting Forth
  • Brodie, Leo. 1984. Thinking Forth: A language and philosophy for solving problems
  • Noble, Julian. 1992. Scientific Forth: A modern language for scientific computing
  • Pountain, Dick. 1987. Object-oriented Forth: Implementation of data structure


  • Miller, George. 1956. “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information”


  • Childers, Charles. 2016. “Port of Retro’s combinator implementations to Forth” at
  • Kirby, Brent. 2002. “The Theory of Concatenative Combinators” at
  • JonesForth - an implementation in literate ASM.


  • “A Conversation with Manfred von Thun” at


  • Spiewak, Daniel. 2008. The Joy of Concatenative Languages.

Michael Fogus - a core contributor to Clojure and ClojureScript. Creator of programming source codes.

An occasional blogger at Send More Paramedics and also co-author of the book, The Joy of Clojure and author of Functional JavaScript.

Author contact

Discussion and information


Thanks go out to Russ Olsen for original implementation of Russ Forth. Also, thanks to Karsten Schmidt, Shaun Gilchrist, Jeremy Sherman, Chas Emerick, and the inimitable Alan Eliasen for their feedback on earlier versions of this installment.