Implementing a Simple RDF Datastore With Partial SPARQL Support
Other examples in this book use full RDF datastores via their APIs. Here we implement two examples that zero external heavy dependencies, only using Haskell base and containers standard libraries:
- SimpleRDF.hs - A simple in-memory RDF datastore that supports queries using a non-standard pattern matching syntax.
- RDF_simple_SPARQL.hs - Also handles simple SPARQL queries such as shown in the test code for this example:

1 main = do
2 putStrLn "--- RDF Store Loaded ---"
3 mapM_ print myGraph
4
5 let queries =
6 [ ("Query 1: Select ?s ?o where { ?s likes ?o }",
7 "SELECT ?s ?o WHERE { ?s likes ?o }")
8 , ("Query 2: Select ?who where { ?who likes <Bob> }",
9 "SELECT ?who WHERE { ?who likes <Bob> }")
10 , ("Query 3 (Join): Who likes someone who likes them back?",
11 "SELECT ?a ?b WHERE { ?a likes ?b . ?b likes ?a }")
12 ]
13
14 mapM_ runAndPrint queries
In-memory Example Using a Pattern Matching Query Syntax
To demonstrate the fundamentals of a graph database, the following Haskell implementation constructs a minimal in-memory RDF store and a query engine that executes queries defined via an internal data structure rather than parsing raw SPARQL text. We define the core types: nodes representing IRIs, literals, or blank nodes, and organize them into a list of triples to simulate the graph. Instead of a string parser, the code uses a SelectQuery type and TriplePattern objects (the AST) to programmatically model the query logic, while the evaluatePatterns function leverages the List Monad to elegantly handle the combinatorial nature of joining these patterns against the data.
1 {-# LANGUAGE DeriveGeneric #-}
2 {-# LANGUAGE OverloadedStrings #-}
3
4 module Main where
5
6 import qualified Data.Map.Strict as Map
7 import Data.Map.Strict (Map)
8 import Data.Maybe (fromMaybe)
9 import Data.List (nub)
10
11
12 -- ==========================================
13 -- 1. RDF Data Types
14 -- ==========================================
15
16 -- | A Node can be an IRI (URI), a Literal string, or a Blank Node.
17 data Node
18 = IRI String
19 | Lit String
20 | BNode Int
21 deriving (Show, Eq, Ord)
22
23 -- | Format an RDF Node for display using standard N-Triples notation.
24 -- Use this instead of 'show' when producing human-readable output.
25 formatNode :: Node -> String
26 formatNode (IRI s) = "<" ++ s ++ ">"
27 formatNode (Lit s) = "\"" ++ s ++ "\""
28 formatNode (BNode i) = "_:b" ++ show i
29
30 -- An RDF Triple: (Subject, Predicate, Object)
31 type Triple = (Node, Node, Node)
32
33 -- The Store is just a list of triples (in a real DB, this would be an indexed Set)
34 type Graph = [Triple]
35
36 -- ==========================================
37 -- 2. SPARQL Query Types
38 -- ==========================================
39
40 -- A Query Item can be a concrete Node or a Variable ("?x")
41 data QueryNode
42 = QTerm Node
43 | QVar String
44 deriving (Eq, Show)
45
46 -- A Triple Pattern: e.g., { ?s <likes> ?o }
47 data TriplePattern =
48 TP QueryNode QueryNode QueryNode
49 deriving (Show)
50
51 -- A simplified representation of a SELECT query
52 data SelectQuery = Select
53 { vars :: [String] -- Variables to select (e.g., ["s", "o"])
54 , whereClause :: [TriplePattern] -- The WHERE block
55 }
56
57 -- A Binding maps variable names to concrete RDF Nodes
58 type Binding = Map String Node
59
60 -- ==========================================
61 -- 3. The Query Engine
62 -- ==========================================
63
64 -- | Checks if a concrete Triple matches a Triple Pattern given a starting context.
65 -- Returns a new Binding extended with matches if successful, or Nothing if failed.
66 matchTriple :: Binding -> TriplePattern -> Triple -> Maybe Binding
67 matchTriple ctx (TP qs qp qo) (s, p, o) = do
68 ctx1 <- matchNode ctx qs s
69 ctx2 <- matchNode ctx1 qp p
70 matchNode ctx2 qo o
71
72 -- | Helper to match a single QueryNode against a concrete Node
73 matchNode :: Binding -> QueryNode -> Node -> Maybe Binding
74 matchNode ctx (QTerm t) n
75 | t == n = Just ctx -- Concrete terms must match exactly
76 | otherwise = Nothing
77 matchNode ctx (QVar v) n =
78 case Map.lookup v ctx of
79 Nothing -> Just (Map.insert v n ctx) -- Bind new variable
80 Just val -> if val == n then Just ctx else Nothing -- Check existing binding constraint
81
82 -- | Executes a list of patterns against the graph using the List Monad for join logic.
83 evaluatePatterns :: Graph -> [TriplePattern] -> [Binding]
84 evaluatePatterns _ [] = [Map.empty] -- Base case: empty pattern results in one empty binding (identity)
85 evaluatePatterns graph (pat:pats) = do
86 -- 1. Take previous results (or start)
87 ctx <- evaluatePatterns graph pats
88
89 -- 2. Find all triples in the graph that match the current pattern
90 -- consistent with the current context (ctx)
91 triple <- graph
92
93 -- 3. Attempt to bind the triple to the pattern
94 case matchTriple ctx pat triple of
95 Just newCtx -> return newCtx
96 Nothing -> []
97
98 -- | The main entry point for running a query.
99 -- It evaluates the WHERE clause and then projects only the requested SELECT variables.
100 runQuery :: Graph -> SelectQuery -> [[Node]]
101 runQuery graph query =
102 let
103 -- 1. Find all valid bindings for the WHERE clause
104 -- We reverse the patterns because the list monad naturally processes right-to-left
105 -- in the recursion above, but we want left-to-right evaluation flow.
106 allBindings = evaluatePatterns graph (reverse $ whereClause query)
107
108 -- 2. Project specific variables (SELECT ?s ?o)
109 project binding = map (\v -> fromMaybe (Lit "NULL") (Map.lookup v binding)) (vars query)
110 in
111 nub $ map project allBindings
112
113 -- ==========================================
114 -- 4. Example Data and Usage
115 -- ==========================================
116
117 -- Helpers to make data entry cleaner
118 iri :: String -> Node
119 iri = IRI
120
121 lit :: String -> Node
122 lit = Lit
123
124 -- The "Social Network" Graph
125 myGraph :: Graph
126 myGraph =
127 [ (iri "Alice", iri "likes", iri "Bob")
128 , (iri "Alice", iri "likes", iri "Pizza")
129 , (iri "Bob", iri "likes", iri "Alice")
130 , (iri "Bob", iri "likes", iri "Pasta")
131 , (iri "Charlie", iri "likes", iri "Bob")
132 , (iri "Alice", iri "age", lit "25")
133 , (iri "Bob", iri "age", lit "28")
134 ]
135
136 main :: IO ()
137 main = do
138 putStrLn "--- RDF Store Loaded ---"
139 mapM_ print myGraph
140
141 putStrLn "\n--- Query 1: Select ?s ?o where { ?s likes ?o } ---"
142 let q1 = Select
143 { vars = ["s", "o"]
144 , whereClause =
145 [ TP (QVar "s") (QTerm (iri "likes")) (QVar "o")
146 ]
147 }
148 printTable ["?s", "?o"] (runQuery myGraph q1)
149
150 putStrLn "\n--- Query 2: Select ?who where { ?who likes <Bob> } ---"
151 let q2 = Select
152 { vars = ["who"]
153 , whereClause =
154 [ TP (QVar "who") (QTerm (iri "likes")) (QTerm (iri "Bob"))
155 ]
156 }
157 printTable ["?who"] (runQuery myGraph q2)
158
159 putStrLn "\n--- Query 3 (Join): Who likes someone who likes them back? ---"
160 -- SPARQL: SELECT ?a ?b WHERE { ?a likes ?b . ?b likes ?a }
161 let q3 = Select
162 { vars = ["a", "b"]
163 , whereClause =
164 [ TP (QVar "a") (QTerm (iri "likes")) (QVar "b")
165 , TP (QVar "b") (QTerm (iri "likes")) (QVar "a")
166 ]
167 }
168 printTable ["?a", "?b"] (runQuery myGraph q3)
169
170 -- Utility to print results nicely
171 printTable :: [String] -> [[Node]] -> IO ()
172 printTable headers rows = do
173 putStrLn $ unwords headers
174 putStrLn $ replicate (length (unwords headers) + 5) '-'
175 mapM_ (putStrLn . unwords . map formatNode) rows
The engine relies on a pattern-matching approach where TriplePattern objects are compared against concrete triples in the store to build a Binding map. The matchTriple function acts as the gatekeeper: it checks if a specific triple fits the constraints of the pattern (e.g., matching a specific Subject IRI) and updates the variable bindings (like ?s or ?o) accordingly. Because the queries are constructed as Haskell data types—shown in the main function alongside the commented-out SPARQL strings they represent—the compiler ensures the structure of the query is valid before the program even runs, bypassing the complexity of text parsing entirely.
While this implementation highlights the semantic clarity of using Haskell’s monadic structure for query resolution, it represents a naive nested-loop join strategy. The evaluatePatterns function iterates through the entire graph for every pattern step, leading to exponential complexity relative to the number of patterns. In a production-grade triple store, this would be optimized using indexed lookups (such as SPO, POS, or OSP indices) to reduce the search space from linear scans to logarithmic lookups, allowing for efficient handling of millions of triples.
Sample Output
The example program output is edited for brevity:
1 $ cabal run simple-rdf
2 Build profile: -w ghc-9.6.7 -O1
3
4 --- RDF Store Loaded ---
5 (<Alice>,<likes>,<Bob>)
6 (<Alice>,<likes>,<Pizza>)
7 (<Bob>,<likes>,<Alice>)
8 (<Bob>,<likes>,<Pasta>)
9 (<Charlie>,<likes>,<Bob>)
10 (<Alice>,<age>,"25")
11 (<Bob>,<age>,"28")
12
13 --- Query 1: Select ?s ?o where { ?s likes ?o } ---
14 ?s ?o
15 ----------
16 <Alice> <Bob>
17 <Alice> <Pizza>
18 <Bob> <Alice>
19 <Bob> <Pasta>
20 <Charlie> <Bob>
21
22 --- Query 2: Select ?who where { ?who likes <Bob> } ---
23 ?who
24 ---------
25 <Alice>
26 <Charlie>
27
28 --- Query 3 (Join): Who likes someone who likes them back? ---
29 ?a ?b
30 ----------
31 <Alice> <Bob>
32 <Bob> <Alice>
In-memory Example Using a Simplified SPARQL Query Syntax
To enhance the usability of our RDF engine, we now introduce a parsing layer that allows users to execute queries using standard SPARQL string syntax rather than manually constructing Haskell data types. By utilizing the Text.ParserCombinators.ReadP library, we define a parser that consumes raw strings—handling the SELECT clause, variable extraction, and the WHERE block’s triple patterns—and converts them into the SelectQuery AST expected by our engine. This addition bridges the gap between the internal logic and user input, enabling the execution of text-based queries like SELECT ?a ?b WHERE { ?a likes ?b } and SELECT ?a ?b WHERE { ?a likes ?b . ?b likes ?a } by tokenizing variables, literals, and IRIs, including a helper parseSimpleIRI to allow unbracketed terms for cleaner input.
1 {-# LANGUAGE DeriveGeneric #-}
2 {-# LANGUAGE OverloadedStrings #-}
3
4 module Main where
5
6 import qualified Data.Map.Strict as Map
7 import Data.Map.Strict (Map)
8 import Data.Maybe (fromMaybe)
9 import Data.List (nub)
10
11 import Text.ParserCombinators.ReadP
12 import Data.Char (isSpace, isAlphaNum, toLower)
13
14 -- ==========================================
15 -- 1. RDF Data Types
16 -- ==========================================
17
18 -- | A Node can be an IRI (URI), a Literal string, or a Blank Node.
19 data Node
20 = IRI String
21 | Lit String
22 | BNode Int
23 deriving (Show, Eq, Ord)
24
25 -- | Format an RDF Node for display using standard N-Triples notation.
26 -- Use this instead of 'show' when producing human-readable output.
27 formatNode :: Node -> String
28 formatNode (IRI s) = "<" ++ s ++ ">"
29 formatNode (Lit s) = "\"" ++ s ++ "\""
30 formatNode (BNode i) = "_:b" ++ show i
31
32 -- An RDF Triple: (Subject, Predicate, Object)
33 type Triple = (Node, Node, Node)
34
35 -- The Store is just a list of triples (in a real DB, this would be an indexed Set)
36 type Graph = [Triple]
37
38 -- ==========================================
39 -- 2. SPARQL Query Types
40 -- ==========================================
41
42 -- A Query Item can be a concrete Node or a Variable ("?x")
43 data QueryNode
44 = QTerm Node
45 | QVar String
46 deriving (Eq, Show)
47
48 -- A Triple Pattern: e.g., { ?s <likes> ?o }
49 data TriplePattern =
50 TP QueryNode QueryNode QueryNode
51 deriving (Show)
52
53 -- A simplified representation of a SELECT query
54 data SelectQuery = Select
55 { vars :: [String] -- Variables to select (e.g., ["s", "o"])
56 , whereClause :: [TriplePattern] -- The WHERE block
57 }
58
59 -- A Binding maps variable names to concrete RDF Nodes
60 type Binding = Map String Node
61
62 -- ==========================================
63 -- 3. The Query Engine
64 -- ==========================================
65
66 -- | Checks if a concrete Triple matches a Triple Pattern given a starting context.
67 -- Returns a new Binding extended with matches if successful, or Nothing if failed.
68 matchTriple :: Binding -> TriplePattern -> Triple -> Maybe Binding
69 matchTriple ctx (TP qs qp qo) (s, p, o) = do
70 ctx1 <- matchNode ctx qs s
71 ctx2 <- matchNode ctx1 qp p
72 matchNode ctx2 qo o
73
74 -- | Helper to match a single QueryNode against a concrete Node
75 matchNode :: Binding -> QueryNode -> Node -> Maybe Binding
76 matchNode ctx (QTerm t) n
77 | t == n = Just ctx -- Concrete terms must match exactly
78 | otherwise = Nothing
79 matchNode ctx (QVar v) n =
80 case Map.lookup v ctx of
81 Nothing -> Just (Map.insert v n ctx) -- Bind new variable
82 Just val -> if val == n then Just ctx else Nothing -- Check existing binding constraint
83
84 -- | Executes a list of patterns against the graph using the List Monad for join logic.
85 evaluatePatterns :: Graph -> [TriplePattern] -> [Binding]
86 evaluatePatterns _ [] = [Map.empty] -- Base case: empty pattern results in one empty binding (identity)
87 evaluatePatterns graph (pat:pats) = do
88 -- 1. Take previous results (or start)
89 ctx <- evaluatePatterns graph pats
90
91 -- 2. Find all triples in the graph that match the current pattern
92 -- consistent with the current context (ctx)
93 triple <- graph
94
95 -- 3. Attempt to bind the triple to the pattern
96 case matchTriple ctx pat triple of
97 Just newCtx -> return newCtx
98 Nothing -> []
99
100 -- | The main entry point for running a query.
101 -- It evaluates the WHERE clause and then projects only the requested SELECT variables.
102 -- Any SELECT variable that does not appear in a WHERE pattern is flagged as an error.
103 runQuery :: Graph -> SelectQuery -> [[Node]]
104 runQuery graph query
105 | not (null unboundVars) =
106 error $ "SELECT variable(s) not bound in WHERE clause: " ++
107 unwords (map ('?':) unboundVars)
108 | otherwise =
109 let
110 -- 1. Find all valid bindings for the WHERE clause
111 -- We reverse the patterns because the list monad naturally processes right-to-left
112 -- in the recursion above, but we want left-to-right evaluation flow.
113 allBindings = evaluatePatterns graph (reverse $ whereClause query)
114
115 -- 2. Project specific variables (SELECT ?s ?o)
116 project binding = map (\v -> fromMaybe (Lit "NULL") (Map.lookup v binding)) (vars query)
117 in
118 nub $ map project allBindings
119 where
120 -- Collect all variable names that appear in WHERE triple patterns
121 patternVars = nub [v | TP s p o <- whereClause query, QVar v <- [s, p, o]]
122 unboundVars = filter (`notElem` patternVars) (vars query)
123
124 -- ==========================================
125 -- 4. SPARQL Parser
126 -- ==========================================
127 --
128 -- Limitations of this simplified SPARQL grammar:
129 -- * Only SELECT queries are supported (no ASK, CONSTRUCT, DESCRIBE).
130 -- * No OPTIONAL, UNION, FILTER, BIND, or sub-queries.
131 -- * No prefix declarations — IRIs must be written in full (<…>) or as
132 -- bare alphanumeric words (which are treated as IRIs).
133 -- * Literal values must be double-quoted strings; no language tags or
134 -- datatype IRIs.
135 -- * Triple patterns are separated by optional dots; no semicolon or
136 -- comma shortcuts.
137
138 -- | Parses a SPARQL query string into a SelectQuery
139 parseQuery :: String -> Either String SelectQuery
140 parseQuery input =
141 case readP_to_S parser input of
142 [(q, "")] -> Right q
143 [(q, rest)] | all isSpace rest -> Right q
144 _ -> Left "Parse error"
145 where
146 parser :: ReadP SelectQuery
147 parser = do
148 skipSpaces
149 _ <- stringCI "SELECT"
150 skipSpaces
151 vs <- many1 (skipSpaces >> parseVarName)
152 skipSpaces
153 _ <- stringCI "WHERE"
154 skipSpaces
155 _ <- char '{'
156 skipSpaces
157 pats <- sepBy parseTriplePattern (skipSpaces >> optional (char '.') >> skipSpaces)
158 skipSpaces
159 _ <- char '}'
160 return $ Select vs pats
161
162 stringCI :: String -> ReadP String
163 stringCI str = traverse (\c -> satisfy (\x -> toLower x == toLower c)) str
164
165 parseVarName :: ReadP String
166 parseVarName = do
167 _ <- char '?'
168 munch1 isAlphaNum
169
170 parseTriplePattern :: ReadP TriplePattern
171 parseTriplePattern = do
172 s <- parseQueryNode
173 skipSpaces
174 p <- parseQueryNode
175 skipSpaces
176 o <- parseQueryNode
177 return $ TP s p o
178
179 parseQueryNode :: ReadP QueryNode
180 parseQueryNode = parseVar <++ parseTerm
181
182 parseVar :: ReadP QueryNode
183 parseVar = QVar <$> parseVarName
184
185 parseTerm :: ReadP QueryNode
186 parseTerm = QTerm <$> (parseIRI <++ parseLit <++ parseSimpleIRI)
187
188 parseIRI :: ReadP Node
189 parseIRI = do
190 _ <- char '<'
191 content <- munch (/= '>')
192 _ <- char '>'
193 return $ IRI content
194
195 -- Support for simple words as IRIs (e.g., "likes" instead of "<likes>")
196 parseSimpleIRI :: ReadP Node
197 parseSimpleIRI = do
198 content <- munch1 isAlphaNum
199 return $ IRI content
200
201 parseLit :: ReadP Node
202 parseLit = do
203 _ <- char '"'
204 content <- munch (/= '"')
205 _ <- char '"'
206 return $ Lit content
207
208 -- ==========================================
209 -- 5. Example Data and Usage
210 -- ==========================================
211
212 -- Helpers to make data entry cleaner
213 iri :: String -> Node
214 iri = IRI
215
216 lit :: String -> Node
217 lit = Lit
218
219 -- The "Social Network" Graph
220 myGraph :: Graph
221 myGraph =
222 [ (iri "Alice", iri "likes", iri "Bob")
223 , (iri "Alice", iri "likes", iri "Pizza")
224 , (iri "Bob", iri "likes", iri "Alice")
225 , (iri "Bob", iri "likes", iri "Pasta")
226 , (iri "Charlie", iri "likes", iri "Bob")
227 , (iri "Alice", iri "age", lit "25")
228 , (iri "Bob", iri "age", lit "28")
229 ]
230
231 main :: IO ()
232 main = do
233 putStrLn "--- RDF Store Loaded ---"
234 mapM_ print myGraph
235
236 let queries =
237 [ ("Query 1: Select ?s ?o where { ?s likes ?o }",
238 "SELECT ?s ?o WHERE { ?s likes ?o }")
239 , ("Query 2: Select ?who where { ?who likes <Bob> }",
240 "SELECT ?who WHERE { ?who likes <Bob> }")
241 , ("Query 3 (Join): Who likes someone who likes them back?",
242 "SELECT ?a ?b WHERE { ?a likes ?b . ?b likes ?a }")
243 ]
244
245 mapM_ runAndPrint queries
246
247 runAndPrint :: (String, String) -> IO ()
248 runAndPrint (desc, qStr) = do
249 putStrLn $ "\n--- " ++ desc ++ " ---"
250 case parseQuery qStr of
251 Left err -> putStrLn $ "Error parsing query: " ++ err
252 Right q -> do
253 let results = runQuery myGraph q
254 let headers = map ("?" ++) (vars q)
255 printTable headers results
256
257 -- Utility to print results nicely
258 printTable :: [String] -> [[Node]] -> IO ()
259 printTable headers rows = do
260 putStrLn $ unwords headers
261 putStrLn $ replicate (length (unwords headers) + 5) '-'
262 mapM_ (putStrLn . unwords . map formatNode) rows
The parseQuery function serves as the new frontend, utilizing monadic parser combinators to decompose the query string. We define a parser that enforces the structural grammar of SPARQL: it expects the SELECT keyword followed by variables, and a WHERE clause containing a set of triple patterns enclosed in braces. The use of sepBy allows us to parse multiple patterns separated by optional dots, while stringCI ensures case-insensitivity for keywords, making the parser robust against minor formatting variations in the input string.
At the granular level, parseTriplePattern constructs the abstract syntax tree by recursively parsing the subject, predicate, and object. The parseQueryNode function uses the choice combinator <++ to distinguish between variables (prefixed with ?) and concrete terms. To improve readability in this simple implementation, we included parseSimpleIRI, which permits alphanumeric words to be interpreted as IRIs without enclosing angle brackets; this allows users to write natural queries like ?s likes ?o alongside strict SPARQL syntax like “25”.
Sample Output
The example program output is :
1 $ cabal run simple-rdf-with-sparql
2
3 --- RDF Store Loaded ---
4 (<Alice>,<likes>,<Bob>)
5 (<Alice>,<likes>,<Pizza>)
6 (<Bob>,<likes>,<Alice>)
7 (<Bob>,<likes>,<Pasta>)
8 (<Charlie>,<likes>,<Bob>)
9 (<Alice>,<age>,"25")
10 (<Bob>,<age>,"28")
11
12 --- Query 1: Select ?s ?o where { ?s likes ?o } ---
13 ?s ?o
14 ----------
15 <Alice> <Bob>
16 <Alice> <Pizza>
17 <Bob> <Alice>
18 <Bob> <Pasta>
19 <Charlie> <Bob>
20
21 --- Query 2: Select ?who where { ?who likes <Bob> } ---
22 ?who
23 ---------
24 <Alice>
25 <Charlie>
26
27 --- Query 3 (Join): Who likes someone who likes them back? ---
28 ?a ?b
29 ----------
30 <Alice> <Bob>
31 <Bob> <Alice>
Wrap Up
These examples illustrate the power of Haskell’s monadic structures for modeling complex logic like database query execution. By utilizing the List Monad in the core engine, we abstracted away the explicit backtracking and iteration required to join triple patterns. The code treats a query not as a mechanical set of nested loops, but as a sequence of context transformations, where each step filters the universe of possible bindings down to those that satisfy the current constraints. This allows the evaluatePatterns function to remain remarkably concise while handling the non-deterministic nature of graph pattern matching.
The progression from manual data type construction to a text-based parser highlights the critical distinction between a query engine and a query language. The first example demonstrated that the execution logic operates purely on the Abstract Syntax Tree (AST), entirely independent of how that tree is created. The second example bridged the gap to usability by adding a combinator-based parser, effectively treating the SPARQL string syntax merely as a serialization format for the internal logic we had already built. This separation of concerns allows the backend to remain stable even as the frontend syntax evolves or expands.
Finally, while the two implementation in this chapter are functionally correct for small datasets, they reveal the performance challenges inherent in graph databases. The nested-loop join strategy employed here performs a linear scan of the graph for every pattern, resulting in exponential time complexity for complex queries. Production-grade triple stores solve this by replacing list-based storage with indexed structures—such as B-Trees or Hexastores (indexing S-P-O, P-O-S, etc.)—to allow constant-time lookups, ensuring that queries remain performant even as the dataset grows into the billions of triples.