Implementing a Simple RDF Datastore and SPARQL Engine in Hy
This chapter explores a Hy implementation of a basic RDF (Resource Description Framework) triple store with partial SPARQL query support. This project is a direct pedagogical port of the Common Lisp triple store from Loving Common Lisp, redesigned to leverage the unique features of Hy: Lisp’s symbolic power and macro system layered over Python’s built-in dictionary, list, and string data structures.
You can find the complete project source code in the source_code_for_examples/rdf-datastore-with-sparql directory.
The Core Concept
In Semantic Web technologies, data is represented as a directed graph made of triples:
-
Subject: The node we are describing (e.g.,
"John"). -
Predicate: The directed relationship/edge (e.g.,
"likes"). -
Object: The target node or value (e.g.,
"pizza").
A collection of these triples forms a graph. To query this graph, we use SPARQL, which works by matching triple patterns containing variables (e.g., ?name likes ?food) against the store. When a query contains multiple patterns (e.g., ?name age ?age . ?name likes ?food), the engine must resolve a conjunction (join) across patterns, binding variables consistently.
Example Usage
Let’s look at how the datastore is used in practice before we look at its implementation. Below is the demonstration code from main.hy:
1 ;; main.hy
2 (import rdf_datastore [RdfStore])
3
4 (defn print-query-results [store query-string]
5 "Execute a SPARQL query and pretty-print the results."
6 (print f"Query: {query-string}")
7 (setv results (store.execute-sparql query-string))
8 (print "Results:")
9 (if results
10 (do
11 (for [result results]
12 (setv parts (lfor #(var val) (.items result)
13 f"{var}: {val}"))
14 (print (+ " " (.join ", " parts)))))
15 (print " No results"))
16 (print))
17
18 (defn test []
19 (setv store (RdfStore))
20
21 ;; Add triples
22 (store.add-triple "John" "age" "30")
23 (store.add-triple "John" "likes" "pizza")
24 (store.add-triple "Mary" "age" "25")
25 (store.add-triple "Mary" "likes" "sushi")
26 (store.add-triple "Bob" "age" "35")
27 (store.add-triple "Bob" "likes" "burger")
28
29 (store.print-all-triples)
30
31 ;; Query 1: join across two patterns — find name, age, and food
32 (print-query-results store
33 "select * where { ?name age ?age . ?name likes ?food }")
34
35 ;; Query 2: simple single-pattern query
36 (print-query-results store
37 "select ?s ?o where { ?s likes ?o }")
38
39 ;; Query 3: join with a concrete value in second pattern
40 (print-query-results store
41 "select * where { ?name age ?age . ?name likes pizza }"))
42
43 (test)
Running the code yields the following output:
1 $ uv run hy main.hy
2 All triples in the datastore:
3 John age 30
4 John likes pizza
5 Mary age 25
6 Mary likes sushi
7 Bob age 35
8 Bob likes burger
9
10 Query: select * where { ?name age ?age . ?name likes ?food }
11 Results:
12 ?name: John, ?age: 30, ?food: pizza
13 ?name: Mary, ?age: 25, ?food: sushi
14 ?name: Bob, ?age: 35, ?food: burger
15
16 Query: select ?s ?o where { ?s likes ?o }
17 Results:
18 ?s: John, ?o: pizza
19 ?s: Mary, ?o: sushi
20 ?s: Bob, ?o: burger
21
22 Query: select * where { ?name age ?age . ?name likes pizza }
23 Results:
24 ?name: John, ?age: 30
Walkthrough of the Hy Implementation
The implementation in rdf_datastore.hy is split into data representation, the datastore class, the query parser, and the recursive join execution engine.
1. RDF Triple Representation
Instead of defining custom structure classes like Common Lisp’s defstruct, the Hy version maps triples directly to Python dictionaries. Accessor functions shield the query engine from this representation detail:
1 (defn make-triple [subject predicate object]
2 "Create an RDF triple (as a dict)."
3 {"subject" subject "predicate" predicate "object" object})
4
5 (defn triple-subject [triple] (get triple "subject"))
6 (defn triple-predicate [triple] (get triple "predicate"))
7 (defn triple-object [triple] (get triple "object"))
Using Python dictionaries is highly idiomatic in Hy. It means we get fast lookups, serialization, and debugging outputs out of the box, while the Lisp-style syntax keeps the code readable.
2. The RdfStore Class
The triple store itself is implemented as a class. Defining classes in Hy uses the defclass macro:
1 (defclass RdfStore []
2 "An in-memory RDF triple store with SPARQL query support."
3
4 (defn __init__ [self]
5 (setv self.triples []))
6
7 (defn add-triple [self subject predicate object]
8 "Add a triple to the datastore."
9 (.append self.triples (make-triple subject predicate object)))
10
11 (defn remove-triple [self subject predicate object]
12 "Remove all triples matching the given subject, predicate, object."
13 (setv self.triples
14 (lfor t self.triples
15 :if (not (and (= (triple-subject t) subject)
16 (= (triple-predicate t) predicate)
17 (= (triple-object t) object)))
18 t)))
Note how Hy’s list comprehension macro lfor is used to filter elements in remove-triple.
For querying triples with wildcard patterns, we check if the pattern element is a variable (starts with ?) or None, or matches the triple value:
1 (defn query-triples [self subject predicate object]
2 "Return all triples matching the pattern (None or ?-vars are wildcards)."
3 (lfor t self.triples
4 :if (and (or (is subject None) (variable-p subject)
5 (= (triple-subject t) subject))
6 (or (is predicate None) (variable-p predicate)
7 (= (triple-predicate t) predicate))
8 (or (is object None) (variable-p object)
9 (= (triple-object t) object)))
10 t))
3. SPARQL Variable Identification & Binding
To bind query variables, we first identify them. Variables start with the ? character:
1 (defn variable-p [s]
2 "Check if a string is a SPARQL variable (starts with '?')."
3 (and (isinstance s str) (> (len s) 0) (= (get s 0) "?")))
During query pattern matching, if a pattern component is a variable, we generate a binding mapping that variable to the corresponding element of the matching triple:
1 (defn triple-to-binding [triple pattern]
2 "Build a binding dict mapping variables in `pattern` to values in `triple`."
3 (setv binding {})
4 (when (variable-p (get pattern 0))
5 (setv (get binding (get pattern 0)) (triple-subject triple)))
6 (when (variable-p (get pattern 1))
7 (setv (get binding (get pattern 1)) (triple-predicate triple)))
8 (when (variable-p (get pattern 2))
9 (setv (get binding (get pattern 2)) (triple-object triple)))
10 binding)
We also need to apply existing bindings to a query pattern before executing subsequent parts of a join. This is implemented via a list comprehension:
1 (defn apply-bindings [pattern bindings]
2 "Substitute known variable bindings into a pattern."
3 (lfor item pattern
4 (if (and (variable-p item) (in item bindings))
5 (get bindings item)
6 item)))
Merging bindings uses Python 3.9’s dictionary merge operator (|):
1 (defn merge-bindings [b1 b2]
2 "Merge two binding dicts. b2 values override b1 on conflict."
3 (| (dict b1) (dict b2)))
4. SPARQL Parsing
Here we implement a subset of SPARQL. Our SPARQL parser extracts the variables in the SELECT clause and splits the WHERE clause patterns by dots (.):
1 (defn parse-sparql-query [query-string]
2 "Parse a simple SPARQL query: SELECT vars WHERE { patterns }
3 Returns dict with 'select-vars' and 'where-patterns'."
4 (setv tokens (lfor tok (.split query-string)
5 :if (not (in tok ["{" "}"]))
6 tok))
7 ;; Find SELECT and WHERE positions (case-insensitive)
8 (setv lower-tokens (lfor t tokens (.lower t)))
9 (setv select-idx (.index lower-tokens "select"))
10 (setv where-idx (.index lower-tokens "where"))
11 ;; Extract SELECT vars
12 (setv select-vars (cut tokens (+ select-idx 1) where-idx))
13 ;; Extract WHERE clause tokens
14 (setv where-tokens (cut tokens (+ where-idx 1) None))
15 ;; Parse WHERE patterns (split on '.')
16 (setv where-patterns (parse-where-patterns where-tokens))
17 {"select-vars" select-vars "where-patterns" where-patterns})
IMPORTANT: Hy Slicing Gotcha: Lisp programmers are used to
subseqorslicewhere providing one index parameter slices from that index to the end. However, Hy’s(cut sequence start)macro compiles to Python’ssequence[:start]—slicing from the beginning up to the index. To slice from an index to the end of the collection (i.e.sequence[start:]), you must passNoneas the stop parameter:(cut tokens (+ where-idx 1) None)
To group the tokens into [subject predicate object] patterns, we split on the dot . delimiter:
1 (defn parse-where-patterns [tokens]
2 "Split WHERE clause tokens into patterns, delimited by '.'."
3 (setv patterns [])
4 (setv current [])
5 (for [tok tokens]
6 (if (= tok ".")
7 (do
8 (when current
9 (.append patterns (list current))
10 (setv current [])))
11 (.append current tok)))
12 (when current
13 (.append patterns (list current)))
14 patterns)
5. SPARQL Execution Engine (Joins)
When a query has multiple patterns (like ?name age ?age . ?name likes ?food), the patterns must be processed sequentially. We recursively evaluate them:
- Retrieve matching triples for the first pattern and create initial bindings.
- For each binding, substitute the bound variables into the remaining patterns.
- Recursively resolve the remaining patterns using the substituted values, accumulating and merging the bindings.
1 (defn execute-where-patterns [store patterns]
2 "Execute WHERE patterns against the store. Returns a list of binding dicts."
3 (if (not patterns)
4 [{}]
5 (let [pattern (get patterns 0)
6 remaining (cut patterns 1 None)
7 matching (store.query-triples (get pattern 0)
8 (get pattern 1)
9 (get pattern 2))
10 bindings (lfor t matching (triple-to-binding t pattern))]
11 (if (not remaining)
12 bindings
13 (do
14 (setv results [])
15 (for [binding bindings]
16 (setv sub-results
17 (execute-where-with-bindings store remaining binding))
18 (for [sr sub-results]
19 (.append results (merge-bindings binding sr))))
20 results)))))
When evaluating remaining patterns, we substitute the bindings found so far:
1 (defn execute-where-with-bindings [store patterns bindings]
2 "Execute WHERE patterns with existing variable bindings substituted."
3 (if (not patterns)
4 [bindings]
5 (let [pattern (get patterns 0)
6 remaining (cut patterns 1 None)
7 bound-pattern (apply-bindings pattern bindings)
8 matching (store.query-triples (get bound-pattern 0)
9 (get bound-pattern 1)
10 (get bound-pattern 2))
11 new-bindings (lfor t matching
12 (merge-bindings bindings
13 (triple-to-binding t pattern)))]
14 (if (not remaining)
15 new-bindings
16 (do
17 (setv results [])
18 (for [nb new-bindings]
19 (setv sub-results
20 (execute-where-with-bindings store remaining nb))
21 (for [sr sub-results]
22 (.append results sr)))
23 results)))))
Finally, we project only the requested variables using Hy’s dictionary comprehension macro dfor:
1 (defn project-results [results select-vars]
2 "Project results to only the selected variables. '*' means all."
3 (if (= select-vars ["*"])
4 (lfor r results (remove-duplicate-keys r))
5 (lfor r results
6 (dfor var select-vars var (.get r var None)))))
Just like lfor maps to list comprehensions, dfor maps to dictionary comprehensions, where (dfor key-var collection key-expression value-expression) compiles to {key_expression: value_expression for key_var in collection}.
Conclusion
This pedagogical implementation demonstrates the synergy between Lisp’s functional/recursive patterns and Python’s data structures. Implementing the resolution of conjunctive query joins is a classic Lisp programming task, made clean and concise in Hy. Rather than building custom record structures or dictionary utilities from scratch, the engine leverages Python’s dictionary merge |, list/dictionary comprehensions, and classes.