Knowledge Graph Navigator

The Knowledge Graph Navigator (which I will often refer to as KGN) is a tool for processing a set of entity names and automatically exploring the public Knowledge Graph DBPedia using SPARQL queries. I wrote KGN in Common Lisp for my own use to automate some things I used to do manually when exploring Knowledge Graphs, and later thought that KGN might be useful also for educational purposes. KGN uses NLP code developed in earlier chapters and we will reuse that code with a short review of using the APIs.

Please note that the example is a simplified version that I first wrote in Common Lisp and is also an example in my book Loving Common Lisp, or the Savvy Programmer’s Secret Weapon that you can read free online.

The code for this application is in the directory kgn and this example is pre-configured to use uv.

One time only, you will need to download spacy language model that we used in the earlier chapter on natural language processing. Install this language model requirement in the directory kgn (edited to fit page width):

$ pwd
~/GITHUB/hy-lisp-python-book/source_code_for_examples/kgn
$ uv run python -m spacy download en_core_web_sm

The following listing shows the text based user interface for this example. This example application asks the user for a list of entity names and uses SPARQL queries to discover potential matches in DBPedia.

 1 $ uv run hy kgn.hy
 2 Enter a list of entities: Bill Gates worked at Microsoft
 3 Generated SPARQL to get DBPedia entity URIs from a name:
 4 select distinct ?s ?comment { ?s ?p "Bill Gates"@en . ?s <http://www.w3.org/2000/01/\
 5 rdf-schema#comment> ?comment . FILTER (lang(?comment) = 'en') . ?s <http://www.w3.or\
 6 g/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person> . } limit 15 
 7 Generated SPARQL to get DBPedia entity URIs from a name:
 8 select distinct ?s ?comment { ?s ?p "Microsoft"@en . ?s <http://www.w3.org/2000/01/r\
 9 df-schema#comment> ?comment . FILTER (lang(?comment) = 'en') . ?s <http://www.w3.org\
10 /1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Organisation> . } limit\
11  15 
12 Please select entities from the list below:
13   1. Bill Gates || Cascade Investment, L.L.C. is an American holding company...
14   2. Bill Gates || William Henry Gates III (born October 28, 1955) is an Ame...
15   3. Bill Gates || Simon Wood is a British cook and winner of the 2015 editi...
16   4. Bill Gates || Harry Roy Lewis (born 1947) is an American computer scien...
17   5. Bill Gates || Jerry P. Dyer (born May 3, 1959) is an American politicia...
18   6. Microsoft || Press Play ApS was a Danish video game development studio ...
19   7. Microsoft || The AMD Professional Gamers League (PGL), founded around 1...
20   8. Microsoft || The CSS Working Group (Cascading Style Sheets Working Grou...
21   9. Microsoft || Microsoft Corporation is an American multinational technol...
22   10. Microsoft || Secure Islands Technologies Ltd. was an Israeli privately...
23   11. Microsoft || Microsoft Innovation Centers (MICs) are local government ...
24 
25 Enter the numbers of the entities you want to process, separated by commas (e.g., 1,\
26  3): 2,9
27 []
28 1
29 Bill Gates || William Henry Gates III (born October 28, 1955) is an American busines\
30 ...
31 8
32 Microsoft || Microsoft Corporation is an American multinational technology corporat.\
33 ..
34 ****** user-selected-entities
35 ['Bill Gates || William Henry Gates III (born October 28, 1955) is an American '
36  'busines...',
37  'Microsoft || Microsoft Corporation is an American multinational technology '
38  'corporat...']
39 Generated SPARQL to get relationships between two entities:
40 SELECT DISTINCT ?p { <http://dbpedia.org/resource/Bill_Gates> ?p <http://dbpedia.org\
41 /resource/Microsoft> . FILTER (!regex(str(?p), 'wikiPage', 'i')) } LIMIT 5 
42 Generated SPARQL to get relationships between two entities:
43 SELECT DISTINCT ?p { <http://dbpedia.org/resource/Microsoft> ?p <http://dbpedia.org/\
44 resource/Bill_Gates> . FILTER (!regex(str(?p), 'wikiPage', 'i')) } LIMIT 5 
45 Generated SPARQL to get relationships between two entities:
46 SELECT DISTINCT ?p { <http://dbpedia.org/resource/Microsoft> ?p <http://dbpedia.org/\
47 resource/Bill_Gates> . FILTER (!regex(str(?p), 'wikiPage', 'i')) } LIMIT 5 
48 Generated SPARQL to get relationships between two entities:
49 SELECT DISTINCT ?p { <http://dbpedia.org/resource/Bill_Gates> ?p <http://dbpedia.org\
50 /resource/Microsoft> . FILTER (!regex(str(?p), 'wikiPage', 'i')) } LIMIT 5 
51 
52 Discovered relationship links:
53 [['<http://dbpedia.org/resource/Bill_Gates>',
54   '<http://dbpedia.org/resource/Microsoft>',
55   [['p', 'http://dbpedia.org/ontology/knownFor']]],
56  ['<http://dbpedia.org/resource/Bill_Gates>',
57   '<http://dbpedia.org/resource/Microsoft>',
58   [['p', 'http://dbpedia.org/property/founders']]],
59  ['<http://dbpedia.org/resource/Bill_Gates>',
60   '<http://dbpedia.org/resource/Microsoft>',
61   [['p', 'http://dbpedia.org/ontology/foundedBy']]],
62  ['<http://dbpedia.org/resource/Microsoft>',
63   '<http://dbpedia.org/resource/Bill_Gates>',
64   [['p', 'http://dbpedia.org/property/founders']]],
65  ['<http://dbpedia.org/resource/Microsoft>',
66   '<http://dbpedia.org/resource/Bill_Gates>',
67   [['p', 'http://dbpedia.org/ontology/foundedBy']]],
68  ['<http://dbpedia.org/resource/Microsoft>',
69   '<http://dbpedia.org/resource/Bill_Gates>',
70   [['p', 'http://dbpedia.org/ontology/knownFor']]]]
71 Enter a list of entities: 

To select found entities of interest, type the entity index numbers you want analyzed. In the last example we chose indices 2 and 9:

1 Enter the numbers of the entities you want to process, separated
2 by commas (e.g., 1, 3): 2,9

Note: in the last listing, if you run this example yourself you will see that generated SPARQL queries are colorized for better readability. This colorization does not appear in the last listing.

After listing the generated SPARQL for finding information for the entities in the query, KGN searches for relationships between these entities. These discovered relationships can be seen at the end of the last screen shot. Please note that this step makes SPARQL queries on O(n^2) where n is the number of entities. Local caching of SPARQL queries to DBPedia helps make processing many entities possible.

Review of NLP Utilities Used in Application

We covered NLP in a previous chapter, so the following is just a quick review. The NLP code we use is near the top of the file kgn.hy:

(import os)
(import sys)
(import pprint [pprint])

(import textui [select-entities get-query])
(import kgnutils [dbpedia-get-entities-by-name first second])
(import relationships [entity-results->relationship-links])

(import spacy)

(setv nlp-model (spacy.load "en_core_web_sm"))

(defn entities-in-text [s]
  (setv doc (nlp-model s))
  (setv ret {})
  (for
    [[ename etype] (lfor entity doc.ents [entity.text entity.label_])]
    
    (if (in etype ret)
        (setv (get ret etype) (+ (get ret etype) [ename]))
        (setv (get ret etype) [ename])))
  ret)

Here is an example use of this function:

=> (kgn.entities-in-text "Bill Gates, Microsoft, Seattle")
{'PERSON': ['Bill Gates'], 'ORG': ['Microsoft'], 'GPE': ['Seattle']}

The entity type “GPE” indicates that the entity is some type of location.

SPARQL Utilities

We will use the caching code from the last section and also the standard Python library requests to access the DBPedia servers. The following code is found in the file sparql.hy and also provides support for using both DBPedia and WikiData. We only use DBPedia in this chapter but when you start incorporating SPARQL queries into applications that you write, you will also probably want to use WikiData.

The function do-query-helper contains generic code for SPARQL queries and is used in functions wikidata-sparql and dbpedia-sparql:

(import json)
(import requests)

(setv wikidata-endpoint "https://query.wikidata.org/bigdata/namespace/wdq/sparql")
(setv dbpedia-endpoint "https://dbpedia.org/sparql")

(defn do-query-helper [endpoint query]
  ;; Construct a request
  (setv params { "query" query "format" "json"})
        
  ;; Call the API
  (setv response (requests.get endpoint :params params))
        
  (setv json-data (response.json))
        
  (setv vars (get (get json-data "head") "vars"))
        
  (setv results (get json-data "results"))
        
  (if (in "bindings" results)
    (do
      (setv bindings (get results "bindings"))
      (setv qr
            (lfor binding bindings
                (lfor var vars
                   [var (get (get binding var) "value")])))
      qr)
    []))

(defn wikidata-sparql [query]
  (do-query-helper wikidata-endpoint query))

(defn dbpedia-sparql [query]
  (do-query-helper dbpedia-endpoint query))

Here is an example query (manually formatted for page width):

$ hy
=> (import sparql)
=> (sparql.dbpedia-sparql
     "select ?s ?p ?o { ?s ?p ?o } limit 1")
[[['s', 'http://www.openlinksw.com/virtrdf-data-formats#default-iid'],
  ['p', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'],
  ['o', 'http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat']]]
=> 

This is a wild-card SPARQL query that will match any of the 9.5 billion RDF triples in DBPedia and return just one result.

This caching layer greatly speeds up my own personal use of KGN. Without caching, queries that contain many entity references simply take too long to run.

Utilities to Colorize SPARQL and Generated Output

When I first had the basic functionality of KGN working, I was disappointed by how the application looked as normal text. Every editor and IDE I use colorizes text in an appropriate way so I used standard ANSI terminal escape sequences to implement color hilting SPARQL queries.

The code in the following listing is in the file colorize.hy.

(require [hy.contrib.walk [let]])
(import [io [StringIO]])

;; Utilities to add ANSI terminal escape sequences to colorize text.
;; note: the following 5 functions return string values that then need to
;;       be printed.

(defn blue [s] (.format "{}{}{}" "\033[94m" s "\033[0m"))
(defn red [s] (.format "{}{}{}" "\033[91m" s "\033[0m"))
(defn green [s] (.format "{}{}{}" "\033[92m" s "\033[0m"))
(defn pink [s] (.format "{}{}{}" "\033[95m" s "\033[0m"))
(defn bold [s] (.format "{}{}{}" "\033[1m" s "\033[0m"))

(defn tokenize-keep-uris [s]
  (.split s))

(defn colorize-sparql [s]
  (setv tokens
        (tokenize-keep-uris
          (.replace (.replace s "{" " { ") "}" " } ")))
  (setv ret (StringIO)) ;; ret is an output stream for a string buffer
  (for [token tokens]
    (when (> (len token) 0)
        (if (= (get token 0) "?")
            (.write ret (red token))
            (if (in
                  token
                  ["where" "select" "distinct" "option" "filter"
                    "FILTER" "OPTION" "DISTINCT" "SELECT" "WHERE"])
                (.write ret (blue token))
                (if (= (get token 0) "<")
                    (.write ret (bold token))
                    (.write ret token)))))
    (when (not (= token "?"))
        (.write ret " ")))
  (.seek ret 0)
  (.read ret))

Text Utilities for Queries and Results

The application low level utility functions are in the file kgn-utils.hy. The function dbpedia-get-entities-by-name requires two arguments:

  • The name of an entity to search for.
  • A URI representing the entity type that we are looking for.

We embed a SPARQL query that has placeholders for the entity name and type. The filter expression specifies that we only want triple results with comment values in the English language by using (lang(?comment) = ‘en’):

 1 (import sparql [dbpedia-sparql])
 2 (import colorize [colorize-sparql])
 3 
 4 (import pprint [pprint])
 5 
 6 (defn dbpedia-get-entities-by-name [name dbpedia-type]
 7   (setv sparql
 8         (.format "select distinct ?s ?comment {{ ?s ?p \"{}\"@en . ?s <http://www.w3\
 9 .org/2000/01/rdf-schema#comment>  ?comment  . FILTER  (lang(?comment) = 'en') . ?s <\
10 http://www.w3.org/1999/02/22-rdf-syntax-ns#type> {} . }} limit 15" name dbpedia-type\
11 ))
12   (print "Generated SPARQL to get DBPedia entity URIs from a name:")
13   (print (colorize-sparql sparql))
14   (dbpedia-sparql sparql))
15 
16 ;;(pprint (dbpedia-get-entities-by-name "Bill Gates" "<http://dbpedia.org/ontology/P\
17 erson>"))
18 
19 (defn first [a-list]
20   (get a-list 0))
21 
22 (defn second [a-list]
23   (get a-list 1))

Finishing the Main Function for KGN

We already looked at the NLP code near the beginning of the file kgn.hy. Let’s look at the remainder of the implementation.

We need a dictionary (or hash table) to convert spaCy entity type names to DBPedia type URIs:

1 (setv entity-type-to-type-uri
2       {"PERSON" "<http://dbpedia.org/ontology/Person>"
3        "GPE" "<http://dbpedia.org/ontology/Place>"
4        "ORG" "<http://dbpedia.org/ontology/Organisation>"
5        })

When we get entity results from DBPedia, the comments describing entities can be a few paragraphs of text. We want to shorten the comments so they fit in a single line of the entity selection list that we have seen earlier. The following code defines a comment shortening function and also a global variable that we will use to store the entity URIs for each shortened comment:

1 (setv short-comment-to-uri {})
2 
3 (defn shorten-comment [comment uri]
4   (setv sc (+ (cut comment 0 70) "..."))
5   (assoc short-comment-to-uri sc uri)
6   sc)

In line 5, we use the function assoc to add a key and value pair to an existing dictionary short-comment-to-uri.

Finally, let’s look at the main application loop. In line 4 we are using the function get-query (defined in file textui.hy) to get a list of entity names from the user. In line 7 we use the function entities-in-text that we saw earlier to map text to entity types and names. In the nested loops in lines 13-26 we build one-line descriptions of people, place, and organizations that we will use to show the user a menu for selecting entities found in DBPedia from the original query. We are giving the use a chance to select only the discovered entities that they are interested in.

In lines 34-36 we are converting the shortened comment strings the user selected back to DBPedia entity URIs. Finally in line 36 we use the function entity-results->relationship-links to find relationships between the user selected entities.

 1 (defn kgn []
 2   (while
 3     True
 4     (setv query (get-query))
 5     (when (or (= query "quit") (= query "q"))
 6         (break))
 7     (setv elist (entities-in-text query))
 8     (setv people-found-on-dbpedia [])
 9     (setv places-found-on-dbpedia [])
10     (setv organizations-found-on-dbpedia [])
11     (global short-comment-to-uri)
12     (setv short-comment-to-uri {})
13     (for [key elist]
14       (setv type-uri (get entity-type-to-type-uri key))
15       (for [name (get elist key)]
16         (setv dbp (dbpedia-get-entities-by-name name type-uri))
17         (for [d dbp]
18           (setv
19             short-comment
20             (shorten-comment (second (second d)) (second (first d))))
21           (when (= key "PERSON")
22               (.extend people-found-on-dbpedia [(+ name  " || " short-comment)]))
23           (when (= key "GPE")
24               (.extend places-found-on-dbpedia [(+ name  " || " short-comment)]))
25           (when (= key "ORG")
26               (.extend organizations-found-on-dbpedia
27                        [(+ name  " || " short-comment)])))))
28     (setv user-selected-entities
29           (select-entities
30             people-found-on-dbpedia
31             places-found-on-dbpedia
32             organizations-found-on-dbpedia))
33     (setv uri-list [])
34     (for [entity (get user-selected-entities "entities")]
35       (setv short-comment (cut entity (+ 4 (.index entity " || ")) (len entity)))
36       (.extend uri-list [(get short-comment-to-uri short-comment)]))
37     (setv relation-data (entity-results->relationship-links uri-list))
38     (print "\nDiscovered relationship links:")
39     (pprint relation-data)))

If you have not already done so, I hope you experiment running this example application. The first time you specify an entity name expect some delay while DBPedia is accessed. Thereafter the cache will make the application more responsive when you use the same name again in a different query.

Wrap-up

If you enjoyed running and experimenting with this example and want to modify it for your own projects then I hope that I provided a sufficient road map for you to do so.

I got the idea for the KGN application because I was spending quite a bit of time manually setting up SPARQL queries for DBPedia and other public sources like WikiData, and I wanted to experiment with partially automating this exploration process.