Knowledge Graph Navigator
The Knowledge Graph Navigator (which I will often refer to as KGN) is a tool for processing a set of entity names and automatically explores the public Knowledge Graph DBPedia using SPARQL queries. I started to write KGN for my own use to automate some things I used to do manually when exploring Knowledge Graphs, and later thought that KGN might be also useful for educational purposes. KGN shows the user the auto-generated SPARQL queries so hopefully the user will learn by seeing examples. KGN uses the Clojure Jena wrapper example code from the last chapter as well the two Java classes JenaAPis and QueryResults (which wrap the Apache Jena library) that were also included in the example for the previous chapter.
Note: There are three separate examples for implementing SPARQL queries in this example:
- Use the code from the last chapter (Jena and query caching)
- Use a small standalone set of Clojure functions to access DBPedia
- Use a small standalone set of Clojure functions to access a local GraphDB RDF server with the data file dbpedia_sample.nt loaded into a graph named dbpedia.
The example code is set up to use Jena and query caching; edit the file sparql.clj to enable the other options.
I have implemented parts of KGN in several languages: Common Lisp, Java, Racket Scheme, Swift, Python, and Hy. The most full featured version of KGN, including a full user interface, is featured in my book Loving Common Lisp, or the Savvy Programmer’s Secret Weapon that you can read free online. That version performs more speculative SPARQL queries to find information compared to the example here that I designed for ease of understanding, and modification.
We will be running an example using data containing three person entities, one company entity, and one place entity. The following figure shows a very small part of the DBPedia Knowledge Graph that is centered around these entities. The data for this figure was collected by an example Knowledge Graph Creator from my Common Lisp book:
I chose to use DBPedia instead of WikiData for this example because DBPedia URIs are human readable. The following URIs represent the concept of a person. The semantic meanings of DBPedia and FOAF (friend of a friend) URIs are self-evident to a human reader while the WikiData URI is not:
http://www.wikidata.org/entity/Q215627
http://dbpedia.org/ontology/Person
http://xmlns.com/foaf/0.1/name
I frequently use WikiData in my work and WikiData is one of the most useful public knowledge bases. I have both DBPedia and WikiData Sparql endpoints in the example code that we will look at later, with the WikiData endpoint comment out. You can try manually querying WikiData at the WikiData SPARL endpoint. For example, you might explore the WikiData URI for the person concept using:
select ?p ?o where {
<http://www.wikidata.org/entity/Q215627> ?p ?o .
} limit 10
For the rest of this chapter we will just use DBPedia or data copied from DBPedia.
After looking an interactive session using the example program for this chapter we will look at the implementation.
Entity Types Handled by KGN
To keep this example simple we handle just three entity types:
- People
- Organizations
- Places
In addition to finding detailed information for people, organizations, and places we will also search for relationships between entities. This search process consists of generating a series of SPARQL queries and calling the DBPedia SPARQL endpoint.
Before we design and write the code, I want to show you sample output for our example program:
(kgn {:People ["Bill Gates" "Steve Jobs" "Melinda Gates"]
:Organization ["Microsoft"]
:Place ["California"]})
The output (with some text shortened) is:
{:entity-summaries
(("Bill Gates"
"http://dbpedia.org/resource/Bill_Gates"
"William Henry Gates III (born October 28, 1955) is an American business magnate,\
software developer, investor, and philanthropist. He is best known as the co-founde\
r of Microsoft Corporation. During his career...")
("Steve Jobs"
"http://dbpedia.org/resource/Steve_Jobs"
"Steven Paul Jobs (; February 24, 1955 – October 5, 2011) was an American busines\
s magnate, industrial designer, investor, and media proprietor. He was the chairman,\
chief executive officer (CEO), and co-founder of Apple Inc., the chairman and major\
ity shareholder of Pixar...")
("Melinda Gates"
"http://dbpedia.org/resource/Melinda_Gates"
"Melinda Ann Gates (née French; August 15, 1964) is an American philanthropist an\
d a former general manager at Microsoft. In 2000, she co-founded the Bill & Melinda \
Gates Foundation with her husband Bill Gates...")
("Microsoft"
"http://dbpedia.org/resource/Microsoft"
"Microsoft Corporation () is an American multinational technology company with he\
adquarters in Redmond, Washington. It develops, manufactures, licenses, supports, an\
d sells computer software...")
("California"
"http://dbpedia.org/resource/California"
"California is a state in the Pacific Region of the United States. With 39.5 mill\
ion residents across ...")),
:discovered-relationships
((["<http://dbpedia.org/resource/Bill_Gates>"
"<http://dbpedia.org/property/spouse>"
"<http://dbpedia.org/resource/Melinda_Gates>"]
["<http://dbpedia.org/resource/Melinda_Gates>"
"<http://dbpedia.org/property/spouse>"
"<http://dbpedia.org/resource/Bill_Gates>"])
(["<http://dbpedia.org/resource/Bill_Gates>"
"<http://dbpedia.org/ontology/knownFor>"
"<http://dbpedia.org/resource/Microsoft>"]
["<http://dbpedia.org/resource/Microsoft>"
"<http://dbpedia.org/property/founders>"
"<http://dbpedia.org/resource/Bill_Gates>"])
(["<http://dbpedia.org/resource/Steve_Jobs>"
"<http://dbpedia.org/ontology/birthPlace>"
"<http://dbpedia.org/resource/California>"]))}
Note that the output from the function kgn is a map containing two keys: :entity-summaries and :discovered-relationships.
KGN Implementation
The example application works processing a list or Person, Place, and Organization names. We generate SPARQL queries to DBPedia to find information about the entities and relationships between them.
Since the DBPedia queries are time consuming, I created a tiny subset of DBPedia in the file dbpedia_sample.nt and load it into a RDF data store like GraphDB or Fuseki running on my laptop. This local setup is especially helpful during development when the same queries are repeatedly used for testing. If you don’t modify the file sparql.clj then by default the public DBPedia SPARQL endpoint will be used.
The Clojure and Java files from the example in the last chapter were copied un-changed to the current example and the project.clj file contains the same dependencies as we used earlier:
1 (defproject knowledge_graph_navigator_clj "0.1.0-SNAPSHOT"
2 :description "Knowledge Graph Navigator"
3 :url "https://markwatson.com"
4 :license
5 {:name
6 "EPL-2.0 OR GPL-2+ WITH Classpath-exception-2.0"
7 :url "https://www.eclipse.org/legal/epl-2.0/"}
8 :source-paths ["src"]
9 :java-source-paths ["src-java"]
10 :javac-options ["-target" "1.8" "-source" "1.8"]
11 :dependencies [[org.clojure/clojure "1.10.1"]
12 [clj-http "3.10.3"]
13 [com.cemerick/url "0.1.1"]
14 [org.clojure/data.csv "1.0.0"]
15 [org.clojure/data.json "1.0.0"]
16 [org.clojure/math.combinatorics "0.1.6"]
17 [org.apache.derby/derby "10.15.2.0"]
18 [org.apache.derby/derbytools "10.15.2.0"]
19 [org.apache.derby/derbyclient
20 "10.15.2.0"]
21 [org.apache.jena/apache-jena-libs
22 "3.17.0" :extension "pom"]]
23 :repl-options
24 {:init-ns knowledge-graph-navigator-clj.kgn}
25 :main ^:skip-aot knowledge-graph-navigator-clj.kgn)
I copied the code from the last chapter into this project to save readers from needing to lein install the project in the last chapter. We won’t look at that code again here.
This example is contained in several source files. We will start at the low-level code in sparql.clj. You can edit lines 10-11 if you want to change which SPARQL libraries and endpoints you want to use. There are utility functions for using DBPedia (lines 13-20), the free version of GraphDB (lines 22-35), and a top level function sparql-endpoint that can be configured to use the options you can change in lines 10-11. I have a top level wrapper function sparql-endpoint so the remainder of the example works without modification with all options. Lines 52-57 is a small main function to facilitate working with this file in isolation.
1 (ns knowledge-graph-navigator-clj.sparql
2 (:require [clj-http.client :as client])
3 (:require clojure.stacktrace)
4 (:require [cemerick.url :refer (url-encode)])
5 (:require [clojure.data.csv :as csv])
6 (:require [semantic-web-jena-clj.core :as jena]))
7
8 ;; see https://github.com/mark-watson/clj-sparql
9
10 (def USE-LOCAL-GRAPHDB false)
11 (def USE-CACHING true) ;; use Jena wrapper
12
13 (defn dbpedia [sparql-query]
14 (let [q
15 (str
16 "https://dbpedia.org//sparql?output=csv&query="
17 (url-encode sparql-query))
18 response (client/get q)
19 body (:body response)]
20 (csv/read-csv body)))
21
22 (defn- graphdb-helper [host port graph-name sparql-query]
23 (let [q
24 (str host ":" port "/repositories/" graph-name
25 "?query=" (url-encode sparql-query))
26 response (client/get q)
27 body (:body response)]
28 (csv/read-csv body)))
29
30 (defn graphdb
31 ([graph-name sparql-query]
32 (graphdb-helper
33 "http://127.0.0.1" 7200 graph-name sparql-query))
34 ([host port graph-name sparql-query]
35 (graphdb-helper host port graph-name sparql-query)))
36
37 (defn sparql-endpoint [sparql-query]
38 (try
39 (if USE-LOCAL-GRAPHDB
40 (graphdb "dbpedia" sparql-query)
41 (if USE-CACHING
42 (jena/query-dbpedia sparql-query)
43 (dbpedia sparql-query)))
44 (catch Exception e
45 (do
46 (println
47 "WARNING: query failed:\n" sparql-query)
48 (println (.getMessage e))
49 (clojure.stacktrace/print-stack-trace e)
50 []))))
51
52 (defn -main
53 "SPARQL example"
54 [& _]
55 (println
56 (sparql-endpoint
57 "select * { ?s ?p ?o } limit 10")))
The next source file sparql_utils.clj contains one function that loads a SPARQL template file and performs variable substitutions from a map:
1 (ns knowledge-graph-navigator-clj.sparql-utils)
2
3 (defn sparql_template
4 "open SPARQL template file and perform
5 variable substitutions"
6 [template-fpath substitution-map]
7 (let [template-as-string (slurp template-fpath)]
8 (clojure.string/replace
9 template-as-string
10 (re-pattern
11 ; create a regex pattern of quoted replacements
12 ; separated by |
13 ; this code is derived from a stackoverflow
14 ; example by user bmillare
15 (apply
16 str
17 (interpose
18 "|"
19 (map
20 (fn [x] (java.util.regex.Pattern/quote x))
21 (keys substitution-map)))))
22 substitution-map)))
The next source file entities_by_name.clj provides the functionality of finding DBPedia entity URIs for names of entities, for example “Steve Jobs.” The heart of this functionality is one SPARQL query template that is used to look up URIs by name; in this example, the name is hard-wired to “Steve Jobs”. The file entities_by_name.sparql contains:
1 select distinct ?s ?comment where {
2 ?s <http://www.w3.org/2000/01/rdf-schema#label>
3 "<NAME>"@en .
4 ?s <http://www.w3.org/2000/01/rdf-schema#comment>
5 ?comment .
6 FILTER (lang(?comment) = "en") .
7 ?s
8 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
9 <ENTITY_TYPE> .
10 }
The function dbpedia-get-entities-by-name takes two arguments name and dbpedia-type where name was set to “Steve Jobs” and dbpedia-type was set to the URI:
1 <http://dbpedia.org/ontology/Person>
in the SPARQL query. The FILTER statement on line 6 is used to discard all string values that are not tagged to be English language (“en”).
This SPARQL query template file is used in lines in lines 9-11:
1 (ns knowledge-graph-navigator-clj.entities-by-name
2 (:require [knowledge-graph-navigator-clj.sparql
3 :as sparql])
4 (:require [clojure.pprint :as pp])
5 (:require clojure.string))
6
7 (defn dbpedia-get-entities-by-name [name dbpedia-type]
8 (let [sparql-query
9 (utils/sparql_template
10 "entities_by_name.sparql"
11 {"<NAME>" name "<ENTITY_TYPE>" dbpedia-type})
12 results (sparql/sparql-endpoint sparql-query)]
13 results))
14
15 (defn -main
16 "test/dev entities by name"
17 [& _]
18 (println
19 (dbpedia-get-entities-by-name
20 "Steve Jobs"
21 "<http://dbpedia.org/ontology/Person>"))
22 (println
23 (dbpedia-get-entities-by-name
24 "Microsoft"
25 "<http://dbpedia.org/ontology/Organization>"))
26 (pp/pprint
27 (dbpedia-get-entities-by-name
28 "California"
29 "<http://dbpedia.org/ontology/Place>"))
30 )
The main function in lines 23-38 was useful for debugging the SPARQL query and code and I left it in the example so you can run and test this file in isolation.
The last utility function we need is defined in the source file relationships.clj that uses another SPARQL query template file. This SPARQL template file relationships.sparql contains:
1 SELECT DISTINCT ?p {
2 <URI1>
3 ?p
4 <URI2> .
5 FILTER (!regex(str(?p),"wikiPage","i"))
6 } LIMIT 5
There are three things to note here. The DISTINCT keyword removes duplicate results, In SPARQL queries URIs are enclosed in < > angle brackets but the brackets are not included in SPARQL query results so the example code adds them. Also, we are looking for all properties that link the two subject/object entity URIs except we don’t want any property URIs that provide human readable results (“follow your nose” to dereference URIs to a human readable format); these property names contain the string “wikiPage” so we filter them out of the results.
The map call on lines 13-16 is used to discard the first SPARQL query result that is a list of variable bindings from the SPARQL query.
1 (ns knowledge-graph-navigator-clj.relationships
2 (:require [knowledge-graph-navigator-clj.sparql
3 :as sparql])
4 (:require [clojure.pprint :as pp])
5 (:require clojure.string))
6
7 (defn dbpedia-get-relationships [s-uri o-uri]
8 (let [query
9 (utils/sparql_template
10 "relationships.sparql"
11 {"<URI1>" s-uri "<URI2>" o-uri})
12 results (sparql/sparql-endpoint query)]
13 (map
14 (fn [u] (clojure.string/join "" ["<" u ">"]))
15 ;;discard SPARQL variable name p (?p):
16 (second results))))
17
18 (defn entity-results->relationship-links
19 [uris-no-brackets]
20 (let [uris (map
21 (fn [u]
22 (clojure.string/join "" ["<" u ">"]))
23 uris-no-brackets)
24 relationship-statements (atom [])]
25 (doseq [e1 uris]
26 (doseq [e2 uris]
27 (if (not (= e1 e2))
28 (let [l1 (dbpedia-get-relationships e1 e2)
29 l2 (dbpedia-get-relationships e2 e1)]
30 (doseq [x l1]
31 (let [a-tuple [e1 x e2]]
32 (if (not
33 (. @relationship-statements
34 contains a-tuple))
35 (reset! relationship-statements
36 (cons a-tuple
37 @relationship-statements))
38 nil))
39 (doseq [x l2]
40 (let [a-tuple [e2 x e1]]
41 (if (not
42 (. @relationship-statements
43 contains a-tuple))
44 (reset! relationship-statements
45 (cons a-tuple
46 @relationship-statements))
47 nil)))))
48 nil)))
49 @relationship-statements))
50
51 (defn -main
52 "dev/test entity relationships code"
53 [& _]
54 (println
55 "Testing entity-results->relationship-links")
56 (pp/pprint
57 (entity-results->relationship-links
58 ["http://dbpedia.org/resource/Bill_Gates"
59 "http://dbpedia.org/resource/Microsoft"])))
The function entity-results->relationship-links (lines 18-49) takes a list of entity URIs (without the angle brackets) and if there are N input URIs it then generates SPARQL queries for all O(N^2) combinations of choosing two entities at a time.
The last source file kgn.clj contains the main function for this application. We use the Clojure library clojure.math.combinatorics to calculate all combinations of entity URIs, taken two at a time. In lines 11-17 we map entity type symbols to the DBPedia entity type URI for the symbol.
There are two parts to the main function kgn:
- Lines 24-39 collects comment descriptions for each input entity.
- Lines 40-50 find, for each entity URI pair, possible relationships between entities.
Function kgn returns a map of summaries and discovered entity relationships that we saw listed early in this chapter.
1 (ns knowledge-graph-navigator-clj.kgn
2 (:require
3 [knowledge-graph-navigator-clj.entities-by-name
4 :as entity-name])
5 (:require
6 [knowledge-graph-navigator-clj.relationships
7 :as rel])
8 (:require [clojure.math.combinatorics :as combo])
9 (:require [clojure.pprint :as pp]))
10
11 (def entity-map
12 {:People
13 "<http://dbpedia.org/ontology/Person>"
14 :Organization
15 "<http://dbpedia.org/ontology/Organization>"
16 :Place
17 "<http://dbpedia.org/ontology/Place>"})
18
19 (defn kgn
20 "Top level function for the KGN library.
21 Inputs: a map with keys Person, Place, and
22 Organization. values list of names"
23 [input-entity-map]
24 (let [entities-summary-data
25 (filter
26 ;; get rid of empty SPARQL results:
27 (fn [x] (> (count x) 1))
28 (mapcat ;; flatten just top level
29 identity
30 (for [entity-key (keys input-entity-map)]
31 (for [entity-name
32 (input-entity-map entity-key)]
33 (cons
34 entity-name
35 (second
36 (entity-name/dbpedia-get-entities-by-name
37 entity-name
38 (entity-map entity-key))))))))
39 entity-uris (map second entities-summary-data)
40 combinations-by-2-of-entity-uris
41 (combo/combinations entity-uris 2)
42 discovered-relationships
43 (filter
44 (fn [x] (> (count x) 0))
45 (for [pair-of-uris
46 combinations-by-2-of-entity-uris]
47 (seq
48 (rel/entity-results->relationship-links
49 pair-of-uris))))]
50 {:entity-summaries entities-summary-data
51 :discovered-relationships
52 discovered-relationships}))
53
54 (defn -main
55 "Main function for KGN example"
56 [& _]
57 (let [results
58 (kgn
59 {:People
60 ["Bill Gates" "Steve Jobs" "Melinda Gates"]
61 :Organization ["Microsoft"]
62 :Place ["California"]})]
63 (println " -- results:") (pp/pprint results)))
This KGN example was hopefully both interesting to you and simple enough in its implementation to use as a jumping off point for your own projects.
I had the idea for the KGN application because I was spending quite a bit of time manually setting up SPARQL queries for DBPedia (and other public sources like WikiData) and I wanted to experiment with partially automating this process. I have experimented with versions of KGN written in Java, Hy language (Lisp running on Python that I wrote a short book on), Swift, and Common Lisp and all four implementations take different approaches as I experimented with different ideas.