A Simple In-Memory RDF Store and Query Language in Gerbil Scheme

Introduction to RDF and the Goal of This Chapter

As we saw in an earlier chapter, the Resource Description Framework (RDF) is a W3C standard for representing knowledge as a graph of statements. Every statement is a triple: a subject, a predicate, and an object. Taken together, a collection of triples forms a directed, labeled graph that can represent almost any kind of structured knowledge — from social networks and bibliographic data to scientific datasets and knowledge bases.

SPARQL is the standard query language for RDF graphs. It lets you express patterns of triples you are looking for, binding variables to the parts of the graph that match. A full SPARQL 1.1 engine is a substantial piece of software, but the core idea — match a set of triple patterns against a store and collect the variable bindings — is surprisingly straightforward to implement from scratch.

This chapter walks through an implementation in RDF.ss, found in the directory gerbil_scheme_book/source_code/RDF_datastore, that is a self-contained Gerbil Scheme file that implements both pieces: a mutable in-memory triple store and a small SPARQL evaluator that handles the most commonly used fragment of the language. The goal is not to replace a production-grade triplestore, but to show how cleanly the underlying ideas map onto idiomatic Scheme code.

We will list the complete implementation later; first we discuss our approach to writing a simple RDF data store with a simple but usable subset SPARQL query language.

Data Model: Triples and the Store

The fundamental unit of RDF is the triple (subject predicate object). In RDF.ss each triple is represented as a plain three-element Scheme list. This is intentionally simple: lists are transparent, easy to inspect in the REPL, and require no additional data-structure support from the runtime.

The store itself is a mutable wrapper around a list of those triples. It is created with make-store and manipulated through two mutating operations: add-triple and remove-triple. Both operations return the store itself, which makes it natural to chain a series of add-triple calls when loading data. store-triples extracts the raw list for inspection or iteration.

Choosing a mutable design keeps the loading code simple and mirrors what a developer would expect from a database: you add facts, you remove facts, and the store reflects the current state of knowledge. A persistent (immutable) variant would be straightforward to build on top of the same ideas by threading the store through every call rather than mutating it in place.

A small helper, print-all-triples, dumps the entire store to standard output. It is useful during development and debugging and is used in the demo at the bottom of RDF.ss to show the loaded data before any queries are run.

Parsing SPARQL Queries

Before any matching can happen, the textual query string must be turned into a structured representation the evaluator can work with. RDF.ss handles this in two stages: tokenization and parsing.

Tokenization

The tokenizer in RDF.ss walks the query string one character at a time, accumulating characters into a buffer and flushing the buffer into a token whenever it encounters whitespace or one of the SPARQL punctuation characters {, }, ., ,, ;, (, and ). Each punctuation character also becomes its own single-character token. The result is a flat list of strings that the parser can consume one element at a time without worrying about character-level details.

Apart from the single helper char-whitespace? (which checks for space, tab, newline, and carriage return), the tokenizer depends on nothing but basic Scheme list and string operations.

Parsing

The parser implemented in parse-sparql consumes the token list produced by the tokenizer and returns two values via let-values:

  • vars — the list of variable names the query projects, or the single wildcard token "*" when the query uses SELECT *.
  • patterns — the list of triple patterns from the WHERE clause, each represented as a three-element list of strings in the same (s p o) shape as stored triples.

The parser first scans forward to the SELECT keyword, then reads variable tokens (anything beginning with ?) or the * token until it hits a non-variable. It then scans to the opening { of the WHERE block and reads groups of three tokens as triple patterns, discarding optional trailing . separators and stopping at the closing }.

The parser deliberately treats every token as a plain string after lowercasing — resources, literals, and variables are all strings. This avoids the need for a separate IR type hierarchy and keeps the code focused on the structural logic rather than on data representation.

One subtlety worth noting: the parser lowercases all tokens before processing. This means variable names like ?Friend and ?friend are treated as identical inside the engine, and resource strings like <ex:Alice> become <ex:alice> in the parsed patterns. For a toy engine this is an acceptable simplification; a production system would need to distinguish case-sensitive IRIs from case-insensitive keywords.

Query Evaluation: Pattern Matching and Joining

The evaluator in RDF.ss works in terms of environments — association lists that map variable name strings to the string values they have been bound to. An empty environment '() represents “no bindings yet” and is the starting point for every query.

Matching a Single Pattern against a Triple

match-pattern tries to unify one triple pattern with one stored triple given an existing environment. For each of the three positions (subject, predicate, object) it applies one of two rules:

  1. If the pattern element is a variable (detected by var?, which checks for a leading ?), look the variable up in the current environment. If it is already bound, check that the bound value equals the triple’s value at that position — a mismatch means the pattern fails. If it is unbound, extend the environment with the new binding.

  2. If the pattern element is a concrete value (a resource or literal), check that it is exactly equal to the triple’s value at that position. If not, the match fails.

The three checks are sequenced so that a failure at any position short-circuits the rest: e1 is the environment after checking the subject, e2 after the predicate (or #f if e1 failed), and e3 after the object. The final value e3 is either a valid extended environment or #f.

The helper env-extend encapsulates the extend-or-check logic. It uses assoc to look up the variable; if a binding exists and matches, it returns the unchanged environment; if a binding exists but conflicts, it returns #f; otherwise it prepends a new (var . val) pair.

Joining Multiple Patterns

A SPARQL WHERE clause typically contains more than one triple pattern, and the results must satisfy all of them simultaneously. join-patterns implements this as a recursive nested-loop join:

  • Start with a list of one empty environment.
  • For the first pattern, try matching it against every triple in the store. Collect all the environments that survive (i.e., match-pattern returned non-#f).
  • Recurse on the remaining patterns, passing those surviving environments as the new starting set.
  • When no patterns remain, whatever environments are left are the solutions.

This is equivalent to computing a relational cross-product filtered by the match conditions — a join. Because each pattern can bind new variables, the common variables between two patterns act as join keys automatically, without any explicit join condition.

The algorithm is correct for any number of patterns, though its time complexity is proportional to the product of the number of triples and the number of patterns. For a small in-memory store that is entirely acceptable.

Projecting Results

Once join-patterns returns the list of surviving environments, sparql-select projects each one down to the variables named in the SELECT clause. For each result environment, it calls project-env, which walks the selected variable list and pulls out the corresponding value from the environment via env-lookup.

When SELECT * is used, no projection is applied — the full environment is returned as-is, which contains all variables that were bound during matching.

Gerbil RDF Implementation

  1 ;;; RDF.ss ── simple in-memory RDF data store with a toy SPARQL subset
  2 ;;;
  3 ;;; Supports:
  4 ;;;   make-store        – create an empty triple store
  5 ;;;   add-triple        – add an (s p o) triple
  6 ;;;   remove-triple     – remove a matching (s p o) triple
  7 ;;;   store-triples     – retrieve all triples from a store
  8 ;;;   sparql-select     – run a SELECT query (variables + WHERE patterns)
  9 ;;;   print-all-triples – display all triples for debugging
 10 ;;;
 11 ;;; SPARQL subset handled:
 12 ;;;   SELECT ?var1 ?var2 ... WHERE { s p o . s p o . }
 13 ;;;   SELECT * WHERE { ... }
 14 ;;;   Variables begin with '?'
 15 
 16 (export make-store add-triple remove-triple store-triples
 17         sparql-select print-all-triples)
 18 
 19 ;;;; ──────────────────────────────────────────────────────────────────
 20 ;;;;  1.  Data model
 21 ;;;; ──────────────────────────────────────────────────────────────────
 22 
 23 ;; A store is just a cons-cell box wrapping a list of triples.
 24 ;; Each triple is a 3-element list: (subject predicate object).
 25 
 26 (define (make-store) (cons 'rdf-store '()))
 27 
 28 (define (store-triples st) (cdr st))
 29 
 30 (define (add-triple st s p o)
 31   "Add the triple (s p o) to store ST; returns ST for chaining."
 32   (set-cdr! st (cons (list s p o) (cdr st)))
 33   st)
 34 
 35 (define (remove-triple st s p o)
 36   "Remove the first triple matching (s p o) from store ST."
 37   (set-cdr! st
 38             (filter (lambda (t)
 39                       (not (and (equal? (car t)   s)
 40                                 (equal? (cadr t)  p)
 41                                 (equal? (caddr t) o))))
 42                     (cdr st)))
 43   st)
 44 
 45 (define (print-all-triples st)
 46   "Display every triple in the store."
 47   (display "All triples:\n")
 48   (for-each (lambda (t)
 49               (display "  ")
 50               (display (car t))   (display " ")
 51               (display (cadr t))  (display " ")
 52               (display (caddr t)) (newline))
 53             (store-triples st))
 54   (newline))
 55 
 56 ;;;; ──────────────────────────────────────────────────────────────────
 57 ;;;;  2.  Tokenizer
 58 ;;;; ──────────────────────────────────────────────────────────────────
 59 
 60 (define (char-whitespace? ch)
 61   (or (char=? ch #\space)
 62       (char=? ch #\tab)
 63       (char=? ch #\newline)
 64       (char=? ch #\return)))
 65 
 66 (define (tokenize str)
 67   "Split STR into a list of tokens; {}, .,; each become their own token."
 68   (let ((tokens '())
 69         (buf    '()))
 70 
 71     (define (emit!)
 72       (when (pair? buf)
 73         (set! tokens (cons (list->string (reverse buf)) tokens))
 74         (set! buf '())))
 75 
 76     (let loop ((chars (string->list str)))
 77       (if (null? chars)
 78           (begin
 79             (emit!)
 80             (reverse tokens))
 81           (let ((ch (car chars)))
 82             (cond
 83              ((member ch '(#\( #\) #\{ #\} #\. #\, #\;))
 84               (emit!)
 85               (set! tokens (cons (string ch) tokens)))
 86              ((char-whitespace? ch)
 87               (emit!))
 88              (else
 89               (set! buf (cons ch buf))))
 90             (loop (cdr chars)))))))
 91 
 92 ;;;; ──────────────────────────────────────────────────────────────────
 93 ;;;;  3.  SPARQL parser
 94 ;;;; ──────────────────────────────────────────────────────────────────
 95 
 96 (define (var? tok)
 97   "True if TOK is a SPARQL variable (starts with '?')."
 98   (and (string? tok)
 99        (> (string-length tok) 0)
100        (char=? (string-ref tok 0) #\?)))
101 
102 (define (string-ci=? a b)
103   (string=? (string-downcase a) (string-downcase b)))
104 
105 (define (string-downcase s)
106   (list->string (map char-downcase (string->list s))))
107 
108 ;; Drop leading tokens until test passes, return remaining list.
109 (define (drop-until pred lst)
110   (cond
111    ((null? lst)         '())
112    ((pred (car lst))    lst)
113    (else                (drop-until pred (cdr lst)))))
114 
115 (define (parse-sparql query)
116   "Parse a minimal SELECT … WHERE { … } query.
117    Returns two values: (vars patterns).
118    vars     – list of variable name strings (or '(\"*\") for SELECT *)
119    patterns – list of (s p o) patterns"
120 
121   (let* ((toks    (map string-downcase (tokenize query)))
122          ;; advance to SELECT keyword
123          (after-select
124           (let ((rest (drop-until (lambda (t) (string=? t "select")) toks)))
125             (if (null? rest) '() (cdr rest)))))
126 
127     ;; collect SELECT variables (or * wildcard)
128     (let-values (((vars after-vars)
129                   (if (and (pair? after-select)
130                            (string=? (car after-select) "*"))
131                       (values '("*") (cdr after-select))
132                       (let loop ((ts after-select) (acc '()))
133                         (if (or (null? ts) (not (var? (car ts))))
134                             (values (reverse acc) ts)
135                             (loop (cdr ts) (cons (car ts) acc)))))))
136 
137       ;; skip to '{'
138       (let* ((after-brace
139               (let ((rest (drop-until (lambda (t) (string=? t "{")) after-vars)))
140                 (if (null? rest) '() (cdr rest)))))
141 
142         ;; read triple patterns until '}'
143         (let loop ((ts after-brace) (patterns '()))
144           (cond
145            ((null? ts)
146             (values vars (reverse patterns)))
147            ((string=? (car ts) "}")
148             (values vars (reverse patterns)))
149            (else
150             ;; consume s p o
151             (let* ((s  (car ts))
152                    (ts (cdr ts))
153                    (p  (if (pair? ts) (car ts) ""))
154                    (ts (if (pair? ts) (cdr ts) '()))
155                    (o  (if (pair? ts) (car ts) ""))
156                    (ts (if (pair? ts) (cdr ts) '()))
157                    ;; optional trailing '.'
158                    (ts (if (and (pair? ts) (string=? (car ts) "."))
159                            (cdr ts) ts)))
160               (loop ts (cons (list s p o) patterns))))))))))
161 
162 ;;;; ──────────────────────────────────────────────────────────────────
163 ;;;;  4.  Query evaluation
164 ;;;; ──────────────────────────────────────────────────────────────────
165 
166 ;; An environment (binding set) is an association list: ((var . val) ...)
167 
168 (define (env-lookup var env)
169   "Return the value bound to VAR in ENV, or #f."
170   (let ((pair (assoc var env)))
171     (and pair (cdr pair))))
172 
173 (define (env-extend var val env)
174   "Return a new env with VAR=VAL added, or #f on conflict."
175   (let ((existing (assoc var env)))
176     (cond
177      (existing
178       (if (equal? (cdr existing) val) env #f))
179      (else
180       (cons (cons var val) env)))))
181 
182 (define (match-pattern pattern triple env)
183   "Try to unify one PATTERN with one TRIPLE given ENV.
184    Returns extended env on success, #f on failure."
185   (let ((ps (car pattern))   (pp (cadr pattern))   (po (caddr pattern))
186         (ts (car triple))    (tp (cadr triple))     (to (caddr triple)))
187     (let* ((e1 (if (var? ps)
188                    (env-extend ps ts env)
189                    (and (equal? ps ts) env)))
190            (e2 (and e1
191                     (if (var? pp)
192                         (env-extend pp tp e1)
193                         (and (equal? pp tp) e1))))
194            (e3 (and e2
195                     (if (var? po)
196                         (env-extend po to e2)
197                         (and (equal? po to) e2)))))
198       e3)))
199 
200 (define (join-patterns patterns triples envs)
201   "For each pattern in PATTERNS, extend every env in ENVS by matching
202    against all TRIPLES.  Returns the list of surviving envs."
203   (if (null? patterns)
204       envs
205       (let* ((pat  (car patterns))
206              (rest (cdr patterns))
207              (new-envs
208               (apply append
209                      (map (lambda (env)
210                             (filter
211                              (lambda (e) e)  ;; remove #f
212                              (map (lambda (triple)
213                                     (match-pattern pat triple env))
214                                   triples)))
215                           envs))))
216         (join-patterns rest triples new-envs))))
217 
218 (define (project-env vars env)
219   "Pick out only the selected VARS from ENV.
220    Returns an alist ((var . val) ...)."
221   (map (lambda (v) (cons v (env-lookup v env))) vars))
222 
223 (define (sparql-select st query)
224   "Run a SELECT query against store ST.
225    Returns a list of alists, each mapping variable names to values.
226    With SELECT *, returns the full binding environment as an alist."
227   (let-values (((vars patterns) (parse-sparql query)))
228     (let* ((triples   (store-triples st))
229            (solutions (join-patterns patterns triples (list '())))
230            (select-*? (and (= (length vars) 1)
231                           (string=? (car vars) "*"))))
232       (map (lambda (env)
233              (if select-*?
234                  env                          ;; full binding set
235                  (project-env vars env)))
236            solutions))))
237 
238 ;;;; ──────────────────────────────────────────────────────────────────
239 ;;;;  5.  Display helpers
240 ;;;; ──────────────────────────────────────────────────────────────────
241 
242 (define (display-results results)
243   "Pretty-print a list of query result alists."
244   (if (null? results)
245       (display "  (no results)\n")
246       (for-each (lambda (row)
247                   (display "  ")
248                   (for-each (lambda (pair)
249                               (display (car pair))
250                               (display ": ")
251                               (display (cdr pair))
252                               (display "  "))
253                             row)
254                   (newline))
255                 results)))
256 
257 (define (run-query st label q)
258   (display "Query: ") (display label) (newline)
259   (display-results (sparql-select st q))
260   (newline))
261 
262 ;;;; ──────────────────────────────────────────────────────────────────
263 ;;;;  6.  Demo
264 ;;;; ──────────────────────────────────────────────────────────────────
265 
266 (let ((st (make-store)))
267 
268   ;; Load some data
269   (add-triple st "<ex:alice>" "<foaf:name>"  "\"Alice\"")
270   (add-triple st "<ex:alice>" "<foaf:knows>" "<ex:bob>")
271   (add-triple st "<ex:alice>" "<foaf:age>"   "30")
272   (add-triple st "<ex:bob>"   "<foaf:name>"  "\"Bob\"")
273   (add-triple st "<ex:bob>"   "<foaf:age>"   "25")
274   (add-triple st "<ex:carol>" "<foaf:name>"  "\"Carol\"")
275   (add-triple st "<ex:carol>" "<foaf:age>"   "35")
276 
277   (print-all-triples st)
278 
279   ;; Who does Alice know, and what is their name?
280   (run-query st
281     "Who does Alice know?"
282     "SELECT ?friendName WHERE {
283        <ex:alice> <foaf:knows> ?friend .
284        ?friend    <foaf:name>  ?friendName .
285      }")
286 
287   ;; All subjects and their names
288   (run-query st
289     "All names"
290     "SELECT ?s ?name WHERE {
291        ?s <foaf:name> ?name .
292      }")
293 
294   ;; All subjects and their ages
295   (run-query st
296     "All ages"
297     "SELECT ?person ?age WHERE {
298        ?person <foaf:age> ?age .
299      }")
300 
301   ;; SELECT * – return every binding
302   (run-query st
303     "SELECT * (all triples)"
304     "SELECT * WHERE { ?s ?p ?o . }")
305 
306   ;; Remove a triple and verify
307   (display "After removing <ex:bob> <foaf:age> 25:\n")
308   (remove-triple st "<ex:bob>" "<foaf:age>" "25")
309   (run-query st
310     "All ages after removal"
311     "SELECT ?person ?age WHERE {
312        ?person <foaf:age> ?age .
313      }"))

Running the Demo and Using the Code

Running with make

The Makefile provides three targets. make run executes RDF.ss directly under the Gerbil interpreter gxi — no compilation step is needed, making this the quickest way to see the output. make compile invokes the Gerbil compiler gxc, which compiles RDF.ss into native code and installs the resulting module under the package name declared in gerbil.pkg (rdf-datastore). After compiling, the module can be imported from any other Gerbil file as :rdf-datastore/RDF. make clean removes all compiled artefacts.

The Demo at the Bottom of RDF.ss

The bottom of RDF.ss contains a self-contained demo wrapped in a let block so it runs automatically when the file is interpreted. It builds a small social graph with three people (Alice, Bob, and Carol), each with a name and an age triple, plus a foaf:knows link from Alice to Bob.

Five queries exercise the main features of the engine:

  • A two-pattern join that finds the name of everyone Alice knows.
  • A single-pattern scan that retrieves all subjects and their names.
  • A single-pattern scan that retrieves all subjects and their ages.
  • A SELECT * query that returns every triple in the store as a full binding.
  • A post-deletion query that verifies the remove-triple operation worked.

Looking at the output of each query against the data you loaded is the most direct way to build intuition for how SPARQL variable binding and joining work.

Here is sample program output:

 1 $ make
 2 gxi RDF.ss
 3 All triples:
 4   <ex:carol> <foaf:age> 35
 5   <ex:carol> <foaf:name> "Carol"
 6   <ex:bob> <foaf:age> 25
 7   <ex:bob> <foaf:name> "Bob"
 8   <ex:alice> <foaf:age> 30
 9   <ex:alice> <foaf:knows> <ex:bob>
10   <ex:alice> <foaf:name> "Alice"
11 
12 Query: Who does Alice know?
13   ?friendname: "Bob"  
14 
15 Query: All names
16   ?s: <ex:carol>  ?name: "Carol"  
17   ?s: <ex:bob>  ?name: "Bob"  
18   ?s: <ex:alice>  ?name: "Alice"  
19 
20 Query: All ages
21   ?person: <ex:carol>  ?age: 35  
22   ?person: <ex:bob>  ?age: 25  
23   ?person: <ex:alice>  ?age: 30  
24 
25 Query: SELECT * (all triples)
26   ?o: 35  ?p: <foaf:age>  ?s: <ex:carol>  
27   ?o: "Carol"  ?p: <foaf:name>  ?s: <ex:carol>  
28   ?o: 25  ?p: <foaf:age>  ?s: <ex:bob>  
29   ?o: "Bob"  ?p: <foaf:name>  ?s: <ex:bob>  
30   ?o: 30  ?p: <foaf:age>  ?s: <ex:alice>  
31   ?o: <ex:bob>  ?p: <foaf:knows>  ?s: <ex:alice>  
32   ?o: "Alice"  ?p: <foaf:name>  ?s: <ex:alice>
33 
34 After removing <ex:bob> <foaf:age> 25:
35 Query: All ages after removal
36   ?person: <ex:carol>  ?age: 35
37   ?person: <ex:alice>  ?age: 30

Package and Module Structure

gerbil.pkg declares the package name rdf-datastore. Gerbil uses this to determine the import path of compiled modules. Once compiled, any other Gerbil source file can bring the exported names — make-store, add-triple, remove-triple, store-triples, sparql-select, and print-all-triples — into scope with a single import form. This makes RDF.ss usable both as a standalone runnable demo and as a reusable library component in a larger Gerbil project.

Summary and Further Directions

RDF.ss demonstrates that the essential machinery of an RDF store and a pattern-matching query engine fits comfortably in a single, readable Gerbil Scheme source file with no external dependencies. The key ideas — triples as lists, environments as association lists, query evaluation as a recursive join — are each short enough to understand in isolation, yet they compose directly into a working system.

Several directions are natural next steps from this foundation:

  • Persistence. The store could be serialized to and loaded from N-Triples (.nt) format, the simplest RDF serialization, by writing a small line-by-line parser and a corresponding writer.
  • OPTIONAL patterns. SPARQL’s OPTIONAL clause implements a left outer join. It can be layered on top of the existing join infrastructure by keeping solutions that survive the optional pattern and also keeping the original environment for solutions that do not.
  • FILTER expressions. Numeric and string comparisons inside a FILTER clause narrow results without adding new bindings. A small expression evaluator over the current environment would be sufficient.
  • Index structures. For larger datasets, a hash-table index on subject, predicate, or object would let match-pattern skip the full triple scan, dramatically reducing query time.
  • Named graphs. Extending each triple to a quad (graph subject predicate object) and adding a FROM clause to the parser extends the model to the SPARQL 1.1 dataset model with minimal structural change.

Each of these extensions is a natural exercise in applied Scheme programming, and the clean separation of the data model, the parser, and the evaluator in RDF.ss makes them straightforward to add without rewriting the whole system.