Implementing a Simple RDF Datastore and Partial SPARQL Support in Common Lisp

This chapter explores a Common Lisp implementation of a basic RDF (Resource Description Framework) datastore with partial SPARQL (SPARQL Protocol and RDF Query Language) support. You can find the source code on GitHub https://github.com/mark-watson/simple_rdf_sparql.

Before we look at the implementation let’s look at code for an example use case and run it:

 1 (defun test ()
 2   (setf *rdf-store* nil)
 3 
 4   (add-triple "John" "age" "30")
 5   (add-triple "John" "likes" "pizza")
 6   (add-triple "Mary" "age" "25")
 7   (add-triple "Mary" "likes" "sushi")
 8   (add-triple "Bob" "age" "35")
 9   (add-triple "Bob" "likes" "burger")
10 
11   (print-all-triples)
12 
13   (defun print-query-results (query-string)
14     (format t "Query: ~A~%" query-string)
15     (let ((results (execute-sparql-query query-string)))
16       (format t "Final Results:~%")
17       (if results
18           (dolist (result results)
19             (format t "  ~{~A: ~A~^, ~}~%"
20                     (loop for (var . value) in result
21                           collect var collect value)))
22           (format t "  No results~%"))
23       (format t "~%")))
24 
25   (print-query-results "select * where { ?name age ?age . ?name likes ?food }")
26   (print-query-results "select ?s ?o where { ?s likes ?o }")
27   (print-query-results "select * where { ?name age ?age . ?name likes pizza }"))

Assuming that you cloned the repo into ~/quicklisp/local-projects, the library simple_rdf_sparql is available, for example:

 1  $ sbcl
 2 * (ql:quickload :simple_rdf_sparql)
 3 To load "simple_rdf_sparql":
 4   Load 1 ASDF system:
 5     simple_rdf_sparql
 6 ; Loading "simple_rdf_sparql"
 7 [package simple_rdf_sparql]..
 8 (:simple_rdf_sparql)
 9 * (simple_rdf_sparql:test)
10 All triples in the datastore:
11 Bob likes burger
12 Bob age 35
13 Mary likes sushi
14 Mary age 25
15 John likes pizza
16 John age 30
17 
18 Query: select * where { ?name age ?age . ?name likes ?food }
19 Final Results:
20   ?age: 35, ?food: burger, ?name: Bob
21   ?age: 25, ?food: sushi, ?name: Mary
22   ?age: 30, ?food: pizza, ?name: John
23 
24 Query: select ?s ?o where { ?s likes ?o }
25 Final Results:
26   ?s: Bob, ?o: burger
27   ?s: Mary, ?o: sushi
28   ?s: John, ?o: pizza
29 
30 Query: select * where { ?name age ?age . ?name likes pizza }
31 Final Results:
32   ?age: 30, ?name: John
33 
34 nil
35 * 

The GitHub repository https://github.com/mark-watson/simple_rdf_sparql contains two project files package.lisp and simple_rdf_sparql.asd that I won’t list here.

We’ll now break down the implementation code into several key components and explain each part step by step.

1. RDF Triple Structure

The foundation of our RDF datastore is the triple structure:

1 (defstruct triple
2   subject
3   predicate
4   object)

This structure represents the basic unit of data in RDF, consisting of a subject, predicate, and object. Each triple acts as a simple statement of fact, linking a subject to an object through a predicate. For example, consider the statement: “Mark authored the book.” Here, “Mark” is the subject, “authored” is the predicate, and “book” is the object.

In RDF, triples allow us to represent complex relationships between entities in a highly structured manner, which can then be queried and analyzed. The subject typically represents the entity or resource being described, while the predicate denotes the type of relationship, and the object identifies the value or entity linked to the subject. These three components work together to form a directed graph, making the RDF model highly expressive for semantic web and linked data applications.

2. RDF Datastore

The datastore itself is a simple global variable:

1 (defvar *rdf-store* nil)

This variable will hold a list of triples.

3. Basic Datastore Operations

Triples are stored in memory and two utility functions support creating new triples and deleting triples.

Adding Triples

1 (defun add-triple (subject predicate object)
2   (push (make-triple :subject subject
3                      :predicate predicate
4                      :object object)
5         *rdf-store*))

This function creates a new triple and adds it to the datastore.

Removing Triples

1 (defun remove-triple (subject predicate object)
2   (setf *rdf-store*
3         (remove-if (lambda (triple)
4                      (and (equal (triple-subject triple) subject)
5                           (equal (triple-predicate triple) predicate)
6                           (equal (triple-object triple) object)))
7                    *rdf-store*)))

This function removes a specific triple from the datastore.

4. Query Support

Identifying Variables

1 (defun variable-p (str)
2   (and (stringp str) (> (length str) 0) (char= (char str 0) #\?)))

This helper function identifies if a string represents a variable (starting with ‘?’).

Converting Triples to Bindings

The function triple-to-binding (triple &optional pattern) is designed to convert a given RDF triple into a set of variable bindings based on a specified pattern. This is a crucial step in evaluating SPARQL queries like:

1 select * where { ?name age ?age . ?name likes pizza }

In this context, the function matches variables in the SPARQL pattern against actual values in the datastore’s triples. Let’s break down what the function does:

Function parameters:

  • triple: This is the triple we want to match against. In RDF terms, a triple has three components: subject, predicate, and object.
  • pattern (optional): The pattern here is a list of three elements, potentially containing variables (e.g., ?name, ?age).

A variable is represented as a symbol prefixed by ? in SPARQL, and the function uses function variable-p to check if an element in the pattern is a variable.

Breakdown of the Function Logic

The function starts by setting up an empty list called binding, which will store any matched variable-value pairs:

1 (defun triple-to-binding (triple &optional pattern)
2   (let ((binding nil))

Subject Matching in function triple-to-binding:

  • The first when clause checks if the first element of the pattern is a variable ((variable-p (first pattern))).
  • If true, it pairs the first variable in the pattern (?name in the example query) with the subject of the triple ((triple-subject triple)).
1   (when (and pattern (variable-p (first pattern)))
2     (push (cons (first pattern)
3                 (triple-subject triple))
4           binding))

This might generate a binding like (?name . “Alice”) if Alice is the subject in a matching triple.

Predicate Matching:

  • Similarly, the second when clause in function triple-to-binding checks if the predicate in the pattern is a variable.
  • If so, it pairs the second element of the pattern with the predicate value in the triple.
1   (when (and pattern (variable-p (second pattern)))
2     (push (cons (second pattern)
3                 (triple-predicate triple))
4           binding))

This will match patterns like ?relation in ?name ?relation ?age*.

Object Matching:

  • The final when clause in function triple-to-binding checks if the object in the pattern is a variable.
  • If so, it pairs the third element of the pattern with the object value in the triple.
1   (when (and pattern (variable-p (third pattern)))
2     (push (cons (third pattern)
3                 (triple-object triple))
4           binding))

For example, this would produce a binding like (?age . 25) if a triple has 25 as the object and the third predicate in the matching list was ?age.

Now, return the binding as the returned value for a function call to function triple-to-binding: after processing the pattern, the function returns the generated bindings as a list of (variable . value) pairs:

1   binding))

This function converts a triple to a binding, matching variables in the pattern to values in the triple.

If we call triple-to-binding with this triple and the pattern ( ?name age ?age ), the function performs the following steps:

Checks if ?name is a variable (using function variable-p), which if true, then creates a binding for ?name as:

1 (?name . "Alice")

and bind ?age, for example:

1 (?age 25) 

How This Fits into SPARQL Processing

In the larger scope of evaluating a SPARQL query:

  • Pattern Matching: triple-to-binding is applied to match patterns like { ?name age ?age } against all triples in the datastore.
  • Generating Bindings: Each matching triple generates a set of bindings for the variables in the query.
  • Query Evaluation: These bindings are used to check additional conditions (like { ?name likes pizza }) and produce the final results for the select * query.

Querying Triples

The query-triples function performs a search over an RDF datastore (rdf-store) to find triples that match the given subject, predicate, and object parameters. It iterates through all triples stored in the datastore and uses function remove-if-not to filter out those that do not match the specified criteria. For each triple, the function query-triples evaluates whether its subject, predicate, and object match the respective parameters provided in the function call.

Specifically, function query-triples checks each parameter using a combination of and and or conditions: if a parameter is nil, it is treated as a wildcard and will match any value; if the parameter is a variable (identified by the variable-p function), it is considered to match any value as well; otherwise, the parameter must be equal to the corresponding component of the triple (checked using equal). If a triple satisfies all three conditions, either matching the specified values or variables, it is kept in the result set; otherwise, it is filtered out. The function ultimately returns a list of triples from rdf-store that match the given subject, predicate, and object pattern, enabling flexible querying capabilities over the RDF datastore:

 1 (defun query-triples (subject predicate object)
 2   (remove-if-not
 3     (lambda (triple)
 4        (and (or (null subject)
 5                 (variable-p subject)
 6                 (equal (triple-subject triple) subject))
 7             (or (null predicate)
 8                 (variable-p predicate)
 9                 (equal (triple-predicate triple) predicate))
10             (or (null object)
11                 (variable-p object)
12                 (equal (triple-object triple) object))))
13      *rdf-store*))

This function queries the datastore for triples matching the given pattern.

5. SPARQL Query Structure

1 (defstruct sparql-query
2   select-vars
3   where-patterns)

This structure represents a simplified SPARQL query with select variables and where patterns. Please note that this defstruct creates the function make-sparql-query that we use in the next section.

6. SPARQL Query Parsing

The parse-sparql-query function takes a SPARQL query string as input and processes it to extract its main components: the variables to be selected and the patterns in the WHERE clause. It starts by splitting the query string into individual tokens and removes curly braces ({ and }), which are not needed for the parsing. It then identifies the positions of the “select” and “where” keywords within the tokenized list using position. Using these indices, the function extracts the variables listed after the “select” keyword up to the “where” keyword, storing them in the select-vars variable. Next, it extracts the tokens following the “where” keyword, storing them as where-clause. This where-clause is further processed using a helper function, parse-where-patterns, which converts the clause into a structured list of patterns that represent the triple patterns in the query. Finally, the function returns a new sparql-query object (constructed using make-sparql-query) that encapsulates both the extracted variables and parsed patterns, making it suitable for further evaluation against an RDF datastore:

 1 (defun parse-sparql-query (query-string)
 2   (let* ((tokens
 3           (remove-if
 4             (lambda (token)
 5               (member token '("{" "}") :test #'string=))
 6             (split-string query-string)))
 7          (select-index (position "select" tokens :test #'string-equal))
 8          (where-index (position "where" tokens :test #'string-equal))
 9          (select-vars (subseq tokens (1+ select-index) where-index))
10          (where-clause (subseq tokens (1+ where-index)))
11          (where-patterns (parse-where-patterns where-clause)))
12     (make-sparql-query :select-vars select-vars
13                        :where-patterns where-patterns)))

This function parses a SPARQL query string into a structured representation.

7. Query Execution

The execute-where-patterns function recursively evaluates a list of SPARQL WHERE patterns against an RDF datastore, returning a list of variable bindings that satisfy the entire pattern sequence. If the patterns list is empty, it returns a list containing an empty binding ((list nil)), indicating that no more patterns need to be matched. Otherwise, it starts by extracting the first pattern and the remaining patterns, and uses query-triples to find triples in the RDF datastore that match the first pattern. It then generates a set of initial bindings for each matching triple using the triple-to-binding function. If there are no more patterns left to evaluate, it returns these bindings as the final results. If there are remaining patterns, it recursively processes the rest of the patterns by invoking execute-where-patterns-with-bindings, using each binding to constrain the search for subsequent patterns. For each binding in the current level, it combines the results of the recursive call using mapcan and merges the bindings (merge-bindings) to form a unified set of results that meets all the patterns in sequence, thereby producing a comprehensive solution for the entire WHERE clause:

 1 (defun execute-where-patterns (patterns)
 2   (if (null patterns)
 3       (list nil)
 4       (let* ((pattern (first patterns))
 5              (remaining-patterns (rest patterns))
 6              (matching-triples (apply #'query-triples pattern)))
 7         (let ((bindings
 8                (mapcar
 9                  (lambda (triple)
10                    (triple-to-binding triple pattern))
11                  matching-triples)))
12           (if (null remaining-patterns)
13               bindings
14               (mapcan
15                 (lambda (binding)
16                   (let ((results
17                          (execute-where-patterns-with-bindings
18                             remaining-patterns binding)))
19                     (mapcar (lambda (result)
20                               (merge-bindings binding result))
21                             results)))
22                       bindings))))))

This function executes the WHERE patterns of a SPARQL query, finding matching triples and generating bindings.

8. Result Projection

The project-results function processes a list of query results to filter or project only the variables specified in the select-vars list. If select-vars is set to “*”, indicating that all variables should be included, it simply returns the results after removing any duplicate bindings. Otherwise, it iterates over each result, extracts only the variables listed in select-vars, and constructs a new binding list for each result that includes these selected variables and their corresponding values. Afterward, it removes any duplicate bindings in the filtered results and returns the final projected set, ensuring that only the desired variables are included in the output:

1 (defun project-results (results select-vars)
2   (if (equal select-vars '("*"))
3       (mapcar #'remove-duplicate-bindings results)
4       (mapcar (lambda (result)
5                 (remove-duplicate-bindings
6                  (mapcar (lambda (var)
7                            (cons var (cdr (assoc var result :test #'string=))))
8                          select-vars)))
9               results)))

This function projects the query results based on the SELECT variables.

9. Main Query Execution

The execute-sparql-query function orchestrates the entire process of executing a SPARQL query by integrating various helper functions to parse, evaluate, and project the query results. It starts by parsing the input query string using parse-sparql-query, extracting the WHERE patterns and SELECT variables from the parsed query structure. It then uses execute-where-patterns to evaluate the WHERE clause against the RDF datastore, generating a list of variable bindings that satisfy the given patterns. After obtaining these intermediate results, it applies project-results to filter and project only the variables specified in the SELECT clause, resulting in the final output. This function effectively takes a complete SPARQL query, processes it step-by-step, and returns the matching results in the desired format:

1 (defun execute-sparql-query (query-string)
2   (let* ((query (parse-sparql-query query-string))
3          (where-patterns (sparql-query-where-patterns query))
4          (select-vars (sparql-query-select-vars query))
5          (results (execute-where-patterns where-patterns))
6          (projected-results (project-results results select-vars)))
7     projected-results))

This function ties everything together, executing a SPARQL query from start to finish.

Conclusion

This implementation provides a basic framework for an RDF datastore with partial SPARQL support in Common Lisp. While it lacks many features of a full-fledged RDF database and SPARQL engine, it demonstrates the core concepts and can serve as a starting point for more complex implementations. The code showcases Common Lisp’s strengths in list processing and symbolic computation, making it well-suited for working with semantic data structures like RDF.