A Simple In-Memory RDF Store and Query Language in Gerbil Scheme
Introduction to RDF and the Goal of This Chapter
As we saw in an earlier chapter, the Resource Description Framework (RDF) is a W3C standard for representing knowledge as a graph of statements. Every statement is a triple: a subject, a predicate, and an object. Taken together, a collection of triples forms a directed, labeled graph that can represent almost any kind of structured knowledge — from social networks and bibliographic data to scientific datasets and knowledge bases.
SPARQL is the standard query language for RDF graphs. It lets you express patterns of triples you are looking for, binding variables to the parts of the graph that match. A full SPARQL 1.1 engine is a substantial piece of software, but the core idea — match a set of triple patterns against a store and collect the variable bindings — is surprisingly straightforward to implement from scratch.
This chapter walks through an implementation in RDF.ss, found in the directory gerbil_scheme_book/source_code/RDF_datastore, that is a self-contained Gerbil Scheme file that implements both pieces: a mutable in-memory triple store and a small SPARQL evaluator that handles the most commonly used fragment of the language. The goal is not to replace a production-grade triplestore, but to show how cleanly the underlying ideas map onto idiomatic Scheme code.
We will list the complete implementation later; first we discuss our approach to writing a simple RDF data store with a simple but usable subset SPARQL query language.
Data Model: Triples and the Store
The fundamental unit of RDF is the triple (subject predicate object). In
RDF.ss each triple is represented as a plain three-element Scheme list.
This is intentionally simple: lists are transparent, easy to inspect in the
REPL, and require no additional data-structure support from the runtime.
The store itself is a mutable wrapper around a list of those triples. It is
created with make-store and manipulated through two mutating operations:
add-triple and remove-triple. Both operations return the store itself,
which makes it natural to chain a series of add-triple calls when loading
data. store-triples extracts the raw list for inspection or iteration.
Choosing a mutable design keeps the loading code simple and mirrors what a developer would expect from a database: you add facts, you remove facts, and the store reflects the current state of knowledge. A persistent (immutable) variant would be straightforward to build on top of the same ideas by threading the store through every call rather than mutating it in place.
A small helper, print-all-triples, dumps the entire store to standard
output. It is useful during development and debugging and is used in the
demo at the bottom of RDF.ss to show the loaded data before any queries
are run.
Parsing SPARQL Queries
Before any matching can happen, the textual query string must be turned into a structured representation the evaluator can work with. RDF.ss handles this in two stages: tokenization and parsing.
Tokenization
The tokenizer in RDF.ss walks the query string one character at a time,
accumulating characters into a buffer and flushing the buffer into a token
whenever it encounters whitespace or one of the SPARQL punctuation characters
{, }, ., ,, ;, (, and ). Each punctuation character also
becomes its own single-character token. The result is a flat list of strings
that the parser can consume one element at a time without worrying about
character-level details.
Apart from the single helper char-whitespace? (which checks for space, tab,
newline, and carriage return), the tokenizer depends on nothing but basic
Scheme list and string operations.
Parsing
The parser implemented in parse-sparql consumes the token list produced by
the tokenizer and returns two values via let-values:
- vars — the list of variable names the query projects, or the single wildcard token
"*"when the query usesSELECT *. - patterns — the list of triple patterns from the
WHEREclause, each represented as a three-element list of strings in the same(s p o)shape as stored triples.
The parser first scans forward to the SELECT keyword, then reads variable
tokens (anything beginning with ?) or the * token until it hits a
non-variable. It then scans to the opening { of the WHERE block and
reads groups of three tokens as triple patterns, discarding optional trailing
. separators and stopping at the closing }.
The parser deliberately treats every token as a plain string after lowercasing — resources, literals, and variables are all strings. This avoids the need for a separate IR type hierarchy and keeps the code focused on the structural logic rather than on data representation.
One subtlety worth noting: the parser lowercases all tokens before processing.
This means variable names like ?Friend and ?friend are treated as
identical inside the engine, and resource strings like <ex:Alice> become
<ex:alice> in the parsed patterns. For a toy engine this is an acceptable
simplification; a production system would need to distinguish case-sensitive
IRIs from case-insensitive keywords.
Query Evaluation: Pattern Matching and Joining
The evaluator in RDF.ss works in terms of environments — association
lists that map variable name strings to the string values they have been
bound to. An empty environment '() represents “no bindings yet” and is
the starting point for every query.
Matching a Single Pattern against a Triple
match-pattern tries to unify one triple pattern with one stored triple
given an existing environment. For each of the three positions (subject,
predicate, object) it applies one of two rules:
If the pattern element is a variable (detected by
var?, which checks for a leading?), look the variable up in the current environment. If it is already bound, check that the bound value equals the triple’s value at that position — a mismatch means the pattern fails. If it is unbound, extend the environment with the new binding.If the pattern element is a concrete value (a resource or literal), check that it is exactly equal to the triple’s value at that position. If not, the match fails.
The three checks are sequenced so that a failure at any position short-circuits the
rest: e1 is the environment after checking the subject, e2 after the
predicate (or #f if e1 failed), and e3 after the object. The final
value e3 is either a valid extended environment or #f.
The helper env-extend encapsulates the extend-or-check logic. It uses
assoc to look up the variable; if a binding exists and matches, it returns
the unchanged environment; if a binding exists but conflicts, it returns #f;
otherwise it prepends a new (var . val) pair.
Joining Multiple Patterns
A SPARQL WHERE clause typically contains more than one triple pattern, and
the results must satisfy all of them simultaneously. join-patterns
implements this as a recursive nested-loop join:
- Start with a list of one empty environment.
- For the first pattern, try matching it against every triple in the store. Collect all the environments that survive (i.e.,
match-patternreturned non-#f). - Recurse on the remaining patterns, passing those surviving environments as the new starting set.
- When no patterns remain, whatever environments are left are the solutions.
This is equivalent to computing a relational cross-product filtered by the match conditions — a join. Because each pattern can bind new variables, the common variables between two patterns act as join keys automatically, without any explicit join condition.
The algorithm is correct for any number of patterns, though its time complexity is proportional to the product of the number of triples and the number of patterns. For a small in-memory store that is entirely acceptable.
Projecting Results
Once join-patterns returns the list of surviving environments, sparql-select
projects each one down to the variables named in the SELECT clause. For
each result environment, it calls project-env, which walks the selected
variable list and pulls out the corresponding value from the environment via
env-lookup.
When SELECT * is used, no projection is applied — the full environment is
returned as-is, which contains all variables that were bound during matching.
Gerbil RDF Implementation
1 ;;; RDF.ss ── simple in-memory RDF data store with a toy SPARQL subset
2 ;;;
3 ;;; Supports:
4 ;;; make-store – create an empty triple store
5 ;;; add-triple – add an (s p o) triple
6 ;;; remove-triple – remove a matching (s p o) triple
7 ;;; store-triples – retrieve all triples from a store
8 ;;; sparql-select – run a SELECT query (variables + WHERE patterns)
9 ;;; print-all-triples – display all triples for debugging
10 ;;;
11 ;;; SPARQL subset handled:
12 ;;; SELECT ?var1 ?var2 ... WHERE { s p o . s p o . }
13 ;;; SELECT * WHERE { ... }
14 ;;; Variables begin with '?'
15
16 (export make-store add-triple remove-triple store-triples
17 sparql-select print-all-triples)
18
19 ;;;; ──────────────────────────────────────────────────────────────────
20 ;;;; 1. Data model
21 ;;;; ──────────────────────────────────────────────────────────────────
22
23 ;; A store is just a cons-cell box wrapping a list of triples.
24 ;; Each triple is a 3-element list: (subject predicate object).
25
26 (define (make-store) (cons 'rdf-store '()))
27
28 (define (store-triples st) (cdr st))
29
30 (define (add-triple st s p o)
31 "Add the triple (s p o) to store ST; returns ST for chaining."
32 (set-cdr! st (cons (list s p o) (cdr st)))
33 st)
34
35 (define (remove-triple st s p o)
36 "Remove the first triple matching (s p o) from store ST."
37 (set-cdr! st
38 (filter (lambda (t)
39 (not (and (equal? (car t) s)
40 (equal? (cadr t) p)
41 (equal? (caddr t) o))))
42 (cdr st)))
43 st)
44
45 (define (print-all-triples st)
46 "Display every triple in the store."
47 (display "All triples:\n")
48 (for-each (lambda (t)
49 (display " ")
50 (display (car t)) (display " ")
51 (display (cadr t)) (display " ")
52 (display (caddr t)) (newline))
53 (store-triples st))
54 (newline))
55
56 ;;;; ──────────────────────────────────────────────────────────────────
57 ;;;; 2. Tokenizer
58 ;;;; ──────────────────────────────────────────────────────────────────
59
60 (define (char-whitespace? ch)
61 (or (char=? ch #\space)
62 (char=? ch #\tab)
63 (char=? ch #\newline)
64 (char=? ch #\return)))
65
66 (define (tokenize str)
67 "Split STR into a list of tokens; {}, .,; each become their own token."
68 (let ((tokens '())
69 (buf '()))
70
71 (define (emit!)
72 (when (pair? buf)
73 (set! tokens (cons (list->string (reverse buf)) tokens))
74 (set! buf '())))
75
76 (let loop ((chars (string->list str)))
77 (if (null? chars)
78 (begin
79 (emit!)
80 (reverse tokens))
81 (let ((ch (car chars)))
82 (cond
83 ((member ch '(#\( #\) #\{ #\} #\. #\, #\;))
84 (emit!)
85 (set! tokens (cons (string ch) tokens)))
86 ((char-whitespace? ch)
87 (emit!))
88 (else
89 (set! buf (cons ch buf))))
90 (loop (cdr chars)))))))
91
92 ;;;; ──────────────────────────────────────────────────────────────────
93 ;;;; 3. SPARQL parser
94 ;;;; ──────────────────────────────────────────────────────────────────
95
96 (define (var? tok)
97 "True if TOK is a SPARQL variable (starts with '?')."
98 (and (string? tok)
99 (> (string-length tok) 0)
100 (char=? (string-ref tok 0) #\?)))
101
102 (define (string-ci=? a b)
103 (string=? (string-downcase a) (string-downcase b)))
104
105 (define (string-downcase s)
106 (list->string (map char-downcase (string->list s))))
107
108 ;; Drop leading tokens until test passes, return remaining list.
109 (define (drop-until pred lst)
110 (cond
111 ((null? lst) '())
112 ((pred (car lst)) lst)
113 (else (drop-until pred (cdr lst)))))
114
115 (define (parse-sparql query)
116 "Parse a minimal SELECT … WHERE { … } query.
117 Returns two values: (vars patterns).
118 vars – list of variable name strings (or '(\"*\") for SELECT *)
119 patterns – list of (s p o) patterns"
120
121 (let* ((toks (map string-downcase (tokenize query)))
122 ;; advance to SELECT keyword
123 (after-select
124 (let ((rest (drop-until (lambda (t) (string=? t "select")) toks)))
125 (if (null? rest) '() (cdr rest)))))
126
127 ;; collect SELECT variables (or * wildcard)
128 (let-values (((vars after-vars)
129 (if (and (pair? after-select)
130 (string=? (car after-select) "*"))
131 (values '("*") (cdr after-select))
132 (let loop ((ts after-select) (acc '()))
133 (if (or (null? ts) (not (var? (car ts))))
134 (values (reverse acc) ts)
135 (loop (cdr ts) (cons (car ts) acc)))))))
136
137 ;; skip to '{'
138 (let* ((after-brace
139 (let ((rest (drop-until (lambda (t) (string=? t "{")) after-vars)))
140 (if (null? rest) '() (cdr rest)))))
141
142 ;; read triple patterns until '}'
143 (let loop ((ts after-brace) (patterns '()))
144 (cond
145 ((null? ts)
146 (values vars (reverse patterns)))
147 ((string=? (car ts) "}")
148 (values vars (reverse patterns)))
149 (else
150 ;; consume s p o
151 (let* ((s (car ts))
152 (ts (cdr ts))
153 (p (if (pair? ts) (car ts) ""))
154 (ts (if (pair? ts) (cdr ts) '()))
155 (o (if (pair? ts) (car ts) ""))
156 (ts (if (pair? ts) (cdr ts) '()))
157 ;; optional trailing '.'
158 (ts (if (and (pair? ts) (string=? (car ts) "."))
159 (cdr ts) ts)))
160 (loop ts (cons (list s p o) patterns))))))))))
161
162 ;;;; ──────────────────────────────────────────────────────────────────
163 ;;;; 4. Query evaluation
164 ;;;; ──────────────────────────────────────────────────────────────────
165
166 ;; An environment (binding set) is an association list: ((var . val) ...)
167
168 (define (env-lookup var env)
169 "Return the value bound to VAR in ENV, or #f."
170 (let ((pair (assoc var env)))
171 (and pair (cdr pair))))
172
173 (define (env-extend var val env)
174 "Return a new env with VAR=VAL added, or #f on conflict."
175 (let ((existing (assoc var env)))
176 (cond
177 (existing
178 (if (equal? (cdr existing) val) env #f))
179 (else
180 (cons (cons var val) env)))))
181
182 (define (match-pattern pattern triple env)
183 "Try to unify one PATTERN with one TRIPLE given ENV.
184 Returns extended env on success, #f on failure."
185 (let ((ps (car pattern)) (pp (cadr pattern)) (po (caddr pattern))
186 (ts (car triple)) (tp (cadr triple)) (to (caddr triple)))
187 (let* ((e1 (if (var? ps)
188 (env-extend ps ts env)
189 (and (equal? ps ts) env)))
190 (e2 (and e1
191 (if (var? pp)
192 (env-extend pp tp e1)
193 (and (equal? pp tp) e1))))
194 (e3 (and e2
195 (if (var? po)
196 (env-extend po to e2)
197 (and (equal? po to) e2)))))
198 e3)))
199
200 (define (join-patterns patterns triples envs)
201 "For each pattern in PATTERNS, extend every env in ENVS by matching
202 against all TRIPLES. Returns the list of surviving envs."
203 (if (null? patterns)
204 envs
205 (let* ((pat (car patterns))
206 (rest (cdr patterns))
207 (new-envs
208 (apply append
209 (map (lambda (env)
210 (filter
211 (lambda (e) e) ;; remove #f
212 (map (lambda (triple)
213 (match-pattern pat triple env))
214 triples)))
215 envs))))
216 (join-patterns rest triples new-envs))))
217
218 (define (project-env vars env)
219 "Pick out only the selected VARS from ENV.
220 Returns an alist ((var . val) ...)."
221 (map (lambda (v) (cons v (env-lookup v env))) vars))
222
223 (define (sparql-select st query)
224 "Run a SELECT query against store ST.
225 Returns a list of alists, each mapping variable names to values.
226 With SELECT *, returns the full binding environment as an alist."
227 (let-values (((vars patterns) (parse-sparql query)))
228 (let* ((triples (store-triples st))
229 (solutions (join-patterns patterns triples (list '())))
230 (select-*? (and (= (length vars) 1)
231 (string=? (car vars) "*"))))
232 (map (lambda (env)
233 (if select-*?
234 env ;; full binding set
235 (project-env vars env)))
236 solutions))))
237
238 ;;;; ──────────────────────────────────────────────────────────────────
239 ;;;; 5. Display helpers
240 ;;;; ──────────────────────────────────────────────────────────────────
241
242 (define (display-results results)
243 "Pretty-print a list of query result alists."
244 (if (null? results)
245 (display " (no results)\n")
246 (for-each (lambda (row)
247 (display " ")
248 (for-each (lambda (pair)
249 (display (car pair))
250 (display ": ")
251 (display (cdr pair))
252 (display " "))
253 row)
254 (newline))
255 results)))
256
257 (define (run-query st label q)
258 (display "Query: ") (display label) (newline)
259 (display-results (sparql-select st q))
260 (newline))
261
262 ;;;; ──────────────────────────────────────────────────────────────────
263 ;;;; 6. Demo
264 ;;;; ──────────────────────────────────────────────────────────────────
265
266 (let ((st (make-store)))
267
268 ;; Load some data
269 (add-triple st "<ex:alice>" "<foaf:name>" "\"Alice\"")
270 (add-triple st "<ex:alice>" "<foaf:knows>" "<ex:bob>")
271 (add-triple st "<ex:alice>" "<foaf:age>" "30")
272 (add-triple st "<ex:bob>" "<foaf:name>" "\"Bob\"")
273 (add-triple st "<ex:bob>" "<foaf:age>" "25")
274 (add-triple st "<ex:carol>" "<foaf:name>" "\"Carol\"")
275 (add-triple st "<ex:carol>" "<foaf:age>" "35")
276
277 (print-all-triples st)
278
279 ;; Who does Alice know, and what is their name?
280 (run-query st
281 "Who does Alice know?"
282 "SELECT ?friendName WHERE {
283 <ex:alice> <foaf:knows> ?friend .
284 ?friend <foaf:name> ?friendName .
285 }")
286
287 ;; All subjects and their names
288 (run-query st
289 "All names"
290 "SELECT ?s ?name WHERE {
291 ?s <foaf:name> ?name .
292 }")
293
294 ;; All subjects and their ages
295 (run-query st
296 "All ages"
297 "SELECT ?person ?age WHERE {
298 ?person <foaf:age> ?age .
299 }")
300
301 ;; SELECT * – return every binding
302 (run-query st
303 "SELECT * (all triples)"
304 "SELECT * WHERE { ?s ?p ?o . }")
305
306 ;; Remove a triple and verify
307 (display "After removing <ex:bob> <foaf:age> 25:\n")
308 (remove-triple st "<ex:bob>" "<foaf:age>" "25")
309 (run-query st
310 "All ages after removal"
311 "SELECT ?person ?age WHERE {
312 ?person <foaf:age> ?age .
313 }"))
Running the Demo and Using the Code
Running with make
The Makefile provides three targets. make run executes RDF.ss directly
under the Gerbil interpreter gxi — no compilation step is needed, making
this the quickest way to see the output. make compile invokes the Gerbil
compiler gxc, which compiles RDF.ss into native code and installs the
resulting module under the package name declared in gerbil.pkg
(rdf-datastore). After compiling, the module can be imported from any
other Gerbil file as :rdf-datastore/RDF. make clean removes all
compiled artefacts.
The Demo at the Bottom of RDF.ss
The bottom of RDF.ss contains a self-contained demo wrapped in a let
block so it runs automatically when the file is interpreted. It builds a
small social graph with three people (Alice, Bob, and Carol), each with a
name and an age triple, plus a foaf:knows link from Alice to Bob.
Five queries exercise the main features of the engine:
- A two-pattern join that finds the name of everyone Alice knows.
- A single-pattern scan that retrieves all subjects and their names.
- A single-pattern scan that retrieves all subjects and their ages.
- A
SELECT *query that returns every triple in the store as a full binding. - A post-deletion query that verifies the
remove-tripleoperation worked.
Looking at the output of each query against the data you loaded is the most direct way to build intuition for how SPARQL variable binding and joining work.
Here is sample program output:
1 $ make
2 gxi RDF.ss
3 All triples:
4 <ex:carol> <foaf:age> 35
5 <ex:carol> <foaf:name> "Carol"
6 <ex:bob> <foaf:age> 25
7 <ex:bob> <foaf:name> "Bob"
8 <ex:alice> <foaf:age> 30
9 <ex:alice> <foaf:knows> <ex:bob>
10 <ex:alice> <foaf:name> "Alice"
11
12 Query: Who does Alice know?
13 ?friendname: "Bob"
14
15 Query: All names
16 ?s: <ex:carol> ?name: "Carol"
17 ?s: <ex:bob> ?name: "Bob"
18 ?s: <ex:alice> ?name: "Alice"
19
20 Query: All ages
21 ?person: <ex:carol> ?age: 35
22 ?person: <ex:bob> ?age: 25
23 ?person: <ex:alice> ?age: 30
24
25 Query: SELECT * (all triples)
26 ?o: 35 ?p: <foaf:age> ?s: <ex:carol>
27 ?o: "Carol" ?p: <foaf:name> ?s: <ex:carol>
28 ?o: 25 ?p: <foaf:age> ?s: <ex:bob>
29 ?o: "Bob" ?p: <foaf:name> ?s: <ex:bob>
30 ?o: 30 ?p: <foaf:age> ?s: <ex:alice>
31 ?o: <ex:bob> ?p: <foaf:knows> ?s: <ex:alice>
32 ?o: "Alice" ?p: <foaf:name> ?s: <ex:alice>
33
34 After removing <ex:bob> <foaf:age> 25:
35 Query: All ages after removal
36 ?person: <ex:carol> ?age: 35
37 ?person: <ex:alice> ?age: 30
Package and Module Structure
gerbil.pkg declares the package name rdf-datastore. Gerbil uses this to
determine the import path of compiled modules. Once compiled, any other
Gerbil source file can bring the exported names — make-store, add-triple,
remove-triple, store-triples, sparql-select, and print-all-triples —
into scope with a single import form. This makes RDF.ss usable both as
a standalone runnable demo and as a reusable library component in a larger
Gerbil project.
Summary and Further Directions
RDF.ss demonstrates that the essential machinery of an RDF store and a pattern-matching query engine fits comfortably in a single, readable Gerbil Scheme source file with no external dependencies. The key ideas — triples as lists, environments as association lists, query evaluation as a recursive join — are each short enough to understand in isolation, yet they compose directly into a working system.
Several directions are natural next steps from this foundation:
- Persistence. The store could be serialized to and loaded from N-Triples (
.nt) format, the simplest RDF serialization, by writing a small line-by-line parser and a corresponding writer. - OPTIONAL patterns. SPARQL’s
OPTIONALclause implements a left outer join. It can be layered on top of the existing join infrastructure by keeping solutions that survive the optional pattern and also keeping the original environment for solutions that do not. - FILTER expressions. Numeric and string comparisons inside a
FILTERclause narrow results without adding new bindings. A small expression evaluator over the current environment would be sufficient. - Index structures. For larger datasets, a hash-table index on subject, predicate, or object would let
match-patternskip the full triple scan, dramatically reducing query time. - Named graphs. Extending each triple to a quad
(graph subject predicate object)and adding aFROMclause to the parser extends the model to the SPARQL 1.1 dataset model with minimal structural change.
Each of these extensions is a natural exercise in applied Scheme programming, and the clean separation of the data model, the parser, and the evaluator in RDF.ss makes them straightforward to add without rewriting the whole system.