Knowledge Graph Sampler for Creating Small Custom Knowledge Graphs

I find it convenient to be able to “sample” small parts of larger knowledge graphs. The example program in this chapter accepts a list of DBPedia entity URIs, attempts top find links between these entities, and writes these nodes and discovered edges to a RDF triples file.

The code is in the directory src/kgsampler. As seen in the configuration files kg-add-dbpedia-triples.asd and package.lisp, we will use the sparql library we developed earlier as well as the libraries uiop and drakma:

1 ;;;; kgsampler.asd
2 
3 (asdf:defsystem #:kgsampler
4   :description "sample knowledge graphs"
5   :author "Mark Watson markw@markwatson.com"
6   :license "Apache 2"
7   :depends-on (#:uiop #:drakma #:sparql)
8   :components ((:file "package")
9                (:file "kgsampler")))

1 ;;;; package.lisp
2 
3 (defpackage #:kgsampler
4   (:use #:cl #:uiop #:sparql)
5   (:export #:sample))

The program starts with a list of entities and tries to find links on DBPedia between the entities. A small sample graph of the input entities and any discovered links is written to a file. The function dbpedia-as-nt spawns a process to use the curl utility to make a HTTP request to DBPedia. The function construct-from-dbpedia takes a list of entities and writes SPARQL CONSTRUCT statements with the entity as the subject and the object filtered to a string value in the English language to an output stream. The function find-relations runs at O(N^2) where N is the number of input entities so you should avoid using this program with a large number of input entities.

I offer this code with little explanation since much of it is similar to the techniques you saw in the previous chapter Knowledge Graph Navigator.

 1 ;; kgsampler main program
 2 
 3 (in-package #:kgsampler)
 4 
 5 (defun dbpedia-as-nt (query)
 6   (print query)
 7   (uiop:run-program 
 8    (list
 9     "curl" 
10     (concatenate 'string
11                  "https://dbpedia.org/sparql?format=text/ntriples&query="
12                  ;; formats that work: csv, text/ntriples, text/ttl
13                  (drakma:url-encode query :utf-8)))
14    :output :string))
15 
16 (defun construct-from-dbpedia (entity-uri-list &key (output-stream t))
17   (dolist (entity-uri entity-uri-list)
18     (format output-stream "~%~%# ENTITY NAME: ~A~%~%" entity-uri)
19     (format
20      output-stream
21      (dbpedia-as-nt
22       (format nil
23               "CONSTRUCT { ~A ?p ?o } where { ~A ?p ?o  . FILTER(lang(?o) = 'en') }"  
24               entity-uri entity-uri)))))
25 
26 (defun ensure-angle-brackets (s)
27   "make sure URIs have angle brackets"
28   (if (equal #\< (char s 0))
29       s
30       (concatenate 'string "<" s ">")))
31 
32 (defun find-relations (entity-uri-list &key (output-stream t))
33   (dolist (entity-uri1 entity-uri-list)
34     (dolist (entity-uri2 entity-uri-list)
35       (if (not (equal entity-uri1 entity-uri2))
36           (let ((possible-relations
37                  (mapcar #'cadar
38                          (sparql::dbpedia
39                           (format nil 
40                                   "select ?p where { ~A ?p ~A . filter(!regex(str(?p), \"page\", \"i\"))} limit 50"
41                                   entity-uri1 entity-uri2)))))
42             (print "** possible-relations:") (print possible-relations)
43             (dolist (pr possible-relations)
44               (format output-stream "~A ~A ~a .~%"
45               entity-uri1
46               (ensure-angle-brackets pr)
47               entity-uri2)))))))
48 
49 (defun sample (entity-uri-list output-filepath)
50   (with-open-file (ostream  (pathname output-filepath) :direction :output :if-exists :supersede)
51     (construct-from-dbpedia entity-uri-list :output-stream ostream)
52     (find-relations entity-uri-list :output-stream ostream)))

Let’s start by running the two helper functions interactively so you can see their output (output edited for brevity). The top level function kgsampler:sample for this example takes a list of entity URIs and an output file name, and uses the functions construct-from-dbpedia entity-uri-list and find-relations to write triples for the entities and then for the relationships discovered between entities. The following listing also calls the helper function kgsampler::find-relations to show you what its output looks like.

 1 $ sbcl
 2 * (ql:quickload "kgsampler")
 3 *   (kgsampler::construct-from-dbpedia '("<http://dbpedia.org/resource/Bill_Gates>" "<http://dbpedia.org/resource/Steve_Jobs>") :output-stream nil)
 4 
 5 "CONSTRUCT { <http://dbpedia.org/resource/Bill_Gates> ?p ?o } where { <http://dbpedia.org/resource/Bill_Gates> ?p ?o  . FILTER (lang(?o) = 'en') }" 
 6 "CONSTRUCT { <http://dbpedia.org/resource/Bill_Gates> <http://purl.org/dc/terms/subject> ?o } where { <http://dbpedia.org/resource/Bill_Gates> <http://purl.org/dc/terms/subject> ?o  }" 
 7 
 8  ...
 9 
10 * (kgsampler::find-relations '("<http://dbpedia.org/resource/Bill_Gates>" "<http://dbpedia.org/resource/Microsoft>") :output-stream nil)
11 
12 ("dbpedia SPARQL:"
13  "select ?p where { <http://dbpedia.org/resource/Bill_Gates> ?p <http://dbpedia.org/resource/Microsoft> . filter(!regex(str(?p), \"page\", \"i\"))} limit 50"
14  "n") 
15 "** possible-relations:" 
16 ("http://dbpedia.org/ontology/knownFor") 
17 "http://dbpedia.org/ontology/knownFor" 
18 ("dbpedia SPARQL:"
19  "select ?p where { <http://dbpedia.org/resource/Microsoft> ?p <http://dbpedia.org/resource/Bill_Gates> . filter(!regex(str(?p), \"page\", \"i\"))} limit 50"
20  "n") 
21 "** possible-relations:" 
22 ("http://dbpedia.org/property/founders" "http://dbpedia.org/ontology/foundedBy") 
23 "http://dbpedia.org/property/founders" 
24 "http://dbpedia.org/ontology/foundedBy" 
25 nil

We now use the main function to generate an output RDF triple file:

 1 $ sbcl
 2 * (ql:quickload "kgsampler")
 3 * (kgsampler:sample '("<http://dbpedia.org/resource/Bill_Gates>" "<http://dbpedia.org/resource/Steve_Jobs>" "<http://dbpedia.org/resource/Microsoft>")  "test.nt")
 4 "CONSTRUCT { <http://dbpedia.org/resource/Bill_Gates> ?p ?o } where { <http://dbpedia.org/resource/Bill_Gates> ?p ?o  . FILTER (lang(?o) = 'en') }" 
 5 ("ndbpedia SPARQL:n"
 6  "select ?p where { <http://dbpedia.org/resource/Bill_Gates> ?p <http://dbpedia.org/resource/Microsoft> . filter(!regex(str(?p), \"page\", \"i\"))} limit 50"
 7  "n") 
 8 "** possible-relations:" 
 9 ("http://dbpedia.org/ontology/board") 
10 ("dbpedia SPARQL:"
11  "select ?p where { <http://dbpedia.org/resource/Steve_Jobs> ?p <http://dbpedia.org/resource/Bill_Gates> . filter(!regex(str(?p), \"page\", \"i\"))} limit 50"
12  "n")

Output RDF N-Triple data is written to the file sample-KG.nt. A very small part of this file is listed here:

 1 # ENTITY NAME: <http://dbpedia.org/resource/Bill_Gates>
 2 
 3 <http://dbpedia.org/resource/Bill_Gates>        <http://dbpedia.org/ontology/abstract>  "William Henry \"Bill\" Gates III (born October 28, 1955) is an American business magnate,...."@en .
 4 <http://dbpedia.org/resource/Bill_Gates>
 5         <http://xmlns.com/foaf/0.1/name>
 6         "Bill Gates"@en .
 7 <http://dbpedia.org/resource/Bill_Gates>
 8         <http://xmlns.com/foaf/0.1/surname>
 9         "Gates"@en .
10 <http://dbpedia.org/resource/Bill_Gates>
11         <http://dbpedia.org/ontology/title>
12         "Co-Chairmanof theBill & Melinda Gates Foundation"@en .

The same data in Turtle RDF format can be seen in the file sample-KG.ttl that was produced by importing the triples file into the free edition of GraphDB exporting it to the Turtle file sample-KG.ttl that I find easier to read. GraphDB has visualization tools which I use here to generate an interactive graph display of this data:

GraphDB Visual graph of generated RDF triples

This example is also set up for people and companies. I may expand it in the future to other types of entities as I need them.

This example program takes several minutes to run since many SPARQL queries are made to DBPedia. I am a non-corporate member of the DBPedia organization. Here is a membership application if you are interested in joining me there.

Up next

Knowledge Graph Navigator Common Library Implementation