Knowledge Graph Navigator Common Library Implementation

The Knowledge Graph Navigator (which I will often refer to as KGN) is a tool for processing a set of entity names and automatically exploring the public Knowledge Graph DBPedia using SPARQL queries. I started to write KGN for my own use, to automate some things I used to do manually when exploring Knowledge Graphs, and later thought that KGN might be also useful for educational purposes. KGN shows the user the auto-generated SPARQL queries so hopefully the user will learn by seeing examples. KGN uses NLP code developed in earlier chapters and we will reuse that code with a short review of using the APIs.

In previous versions of this book, this example was hard-wired to use LispWork CAPI for the user interface. This old version is in src/kgn in the main GitHub repository for this book: https://github.com/mark-watson/loving-common-lisp and has a few UI components like a progress bar that I removed since the previous edition. The new version has separate GitHub repositories for:

If you followed the code example setup instructions in the book Preface or in the README file in the main repo https://github.com/mark-watson/loving-common-lisp then all three of these projects are available for loading via Quicklisp on your computer.

After looking at SPARQL generated by this example for an example query, we will start a process of bottom up development, first writing low level functions to automate SPARQL queries, writing utilities we will need for the UIs developed in later chapters.

Since the DBPedia SPARQL queries are time consuming, we will also implement a caching layer using SQLite that will make the app more responsive. The cache is especially helpful during development when the same queries are repeatedly used for testing.

The code for this reusable library is in the directory src/kgn-common. This is common library that will be used for user interfaces developed in later chapters. There is a lot of code in the following program listings and I hope to provide you with a roadmap overview of the code, diving in on code that you might want to reuse for your own projects and some representative code for generating SPARQL queries.

Let’s start by looking at the files for the common library:

  • Makefile - contains development shortcuts.
  • data - data used to remove stop words from text.
  • kgn-common.lisp - main code file for library.
  • package.lisp - standard Common Lisp package definition.
  • utils.lisp - miscelanious utility functions.
  • README.txt
  • kgn-common.asd - standard Common Lisp ASDF definition.

Example Output

Before we get started studying the implementation, let’s look at sample output in order to help give meaning to the code we will look at later. Consider a query that a user might type into the top query field in the KGN app:

1     Steve Jobs lived near San Francisco and was
2     a founder of <http://dbpedia.org/resource/Apple_Inc.>

The system will try to recognize entities in a query. If you know the DBPedia URI of an entity, like the company Apple in this example, you can use that directly. Note that in the SPARQL URIs are surrounded with angle bracket characters.

The application prints out automatically generated SPARQL queries. For the above listed example query the following output will be generated (some editing to fit page width):

1 Trying to get entity by name = Steve Jobs using SPARQL with type:
1 select distinct ?s ?comment { ?s ?p "Steve Jobs"@en . 
2  ?s <http://www.w3.org/2000/01/rdf-schema#comment> ?comment .
3    FILTER ( lang ( ?comment ) = 'en' ) . 
4  ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
5     <http://dbpedia.org/ontology/Person> . 
6  } LIMIT 15 
1 Trying to get entity by name = San Francisco using SPARQL with type:
1 select distinct ?s ?comment { ?s ?p "San Francisco"@en . 
2  ?s <http://www.w3.org/2000/01/rdf-schema#comment> ?comment . 
3    FILTER ( lang ( ?comment ) = 'en' ) . 
4  ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
5     <http://dbpedia.org/ontology/City> . 
6  } LIMIT 15 
1 SPARQL to get PERSON data for <http://dbpedia.org/resource/Steve_Jobs>:
 1 SELECT DISTINCT ?label ?comment 
 2  ( GROUP_CONCAT ( DISTINCT ?birthplace ; SEPARATOR=' | ' ) AS ?birthplace ) 
 3  ( GROUP_CONCAT ( DISTINCT ?almamater ; SEPARATOR=' | ' ) AS ?almamater ) 
 4  ( GROUP_CONCAT ( DISTINCT ?spouse ; SEPARATOR=' | ' ) AS ?spouse ) { 
 5  <http://dbpedia.org/resource/Steve_Jobs> 
 6    <http://www.w3.org/2000/01/rdf-schema#comment>
 7    ?comment . 
 8  FILTER ( lang ( ?comment ) = 'en' ) . 
 9  OPTIONAL { <http://dbpedia.org/resource/Steve_Jobs>
10             <http://dbpedia.org/ontology/birthPlace>
11             ?birthplace } . 
12  OPTIONAL { <http://dbpedia.org/resource/Steve_Jobs> 
13             <http://dbpedia.org/ontology/almaMater> 
14             ?almamater } . 
15  OPTIONAL { <http://dbpedia.org/resource/Steve_Jobs> 
16             <http://dbpedia.org/ontology/spouse> 
17             ?spouse } . 
18  OPTIONAL { <http://dbpedia.org/resource/Steve_Jobs> 
19             <http://www.w3.org/2000/01/rdf-schema#label> 
20             ?label . 
21  FILTER ( lang ( ?label ) = 'en' ) } 
22  } LIMIT 10 

Remember, the SPARQL is generated by KGN from natural language queries. Some more examples:

1 SPARQL to get CITY data for <http://dbpedia.org/resource/San_Francisco>:
 1 SELECT DISTINCT ?label ?comment 
 2  ( GROUP_CONCAT ( DISTINCT ?latitude_longitude ; SEPARATOR=' | ' ) 
 3      AS ?latitude_longitude ) 
 4  ( GROUP_CONCAT ( DISTINCT ?populationDensity ; SEPARATOR=' | ' ) 
 5      AS ?populationDensity ) 
 6  ( GROUP_CONCAT ( DISTINCT ?country ; SEPARATOR=' | ' ) 
 7      AS ?country ) { 
 8  <http://dbpedia.org/resource/San_Francisco> 
 9    <http://www.w3.org/2000/01/rdf-schema#comment>
10    ?comment . 
11       FILTER ( lang ( ?comment ) = 'en' ) . 
12  OPTIONAL { <http://dbpedia.org/resource/San_Francisco> 
13             <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> 
14             ?latitude_longitude } . 
15  OPTIONAL { <http://dbpedia.org/resource/San_Francisco> 
16             <http://dbpedia.org/ontology/PopulatedPlace/populationDensity> 
17             ?populationDensity } . 
18  OPTIONAL { <http://dbpedia.org/resource/San_Francisco> 
19             <http://dbpedia.org/ontology/country> 
20             ?country } . 
21  OPTIONAL { <http://dbpedia.org/resource/San_Francisco> 
22             <http://www.w3.org/2000/01/rdf-schema#label> 
23             ?label . } 
24  } LIMIT 30 
1 SPARQL to get COMPANY data for <http://dbpedia.org/resource/Apple_Inc.>:
 1 SELECT DISTINCT ?label ?comment ( GROUP_CONCAT ( DISTINCT ?industry ; SEPARATOR=' | ' ) 
 2       AS ?industry ) 
 3  ( GROUP_CONCAT ( DISTINCT ?netIncome ; SEPARATOR=' | ' ) 
 4       AS ?netIncome ) 
 5  ( GROUP_CONCAT ( DISTINCT ?numberOfEmployees ; SEPARATOR=' | ' ) 
 6       AS ?numberOfEmployees ) { 
 7  <http://dbpedia.org/resource/Apple_Inc.> 
 8     <http://www.w3.org/2000/01/rdf-schema#comment> ?comment . 
 9  FILTER ( lang ( ?comment ) = 'en' ) . 
10  OPTIONAL { <http://dbpedia.org/resource/Apple_Inc.> 
11             <http://dbpedia.org/ontology/industry> 
12             ?industry } . 
13  OPTIONAL { <http://dbpedia.org/resource/Apple_Inc.> 
14             <http://dbpedia.org/ontology/netIncome> ?netIncome } . 
15  OPTIONAL { <http://dbpedia.org/resource/Apple_Inc.> 
16             <http://dbpedia.org/ontology/numberOfEmployees> ?numberOfEmployees } . 
17  OPTIONAL { <http://dbpedia.org/resource/Apple_Inc.> 
18             <http://www.w3.org/2000/01/rdf-schema#label> ?label . 
19               FILTER ( lang ( ?label ) = 'en' ) } 
20  } LIMIT 30 

Once KGN has identified DBPedia entire URIs, it also searches for relationships between these entities:

1 DISCOVERED RELATIONSHIP LINKS:
 1 <http://dbpedia.org/resource/Steve_Jobs>    ->
 2     <http://dbpedia.org/ontology/birthPlace>    ->
 3     <http://dbpedia.org/resource/San_Francisco>
 4 <http://dbpedia.org/resource/Steve_Jobs>    -> 
 5     <http://dbpedia.org/ontology/occupation>    -> 
 6     <http://dbpedia.org/resource/Apple_Inc.>
 7 <http://dbpedia.org/resource/Steve_Jobs>    -> 
 8     <http://dbpedia.org/ontology/board>         -> 
 9     <http://dbpedia.org/resource/Apple_Inc.>
10 <http://dbpedia.org/resource/Steve_Jobs>    -> 
11     <http://www.w3.org/2000/01/rdf-schema#seeAlso> -> 
12     <http://dbpedia.org/resource/Apple_Inc.>
13 <http://dbpedia.org/resource/Apple_Inc.>    -> 
14     <http://dbpedia.org/property/founders>      -> 
15     <http://dbpedia.org/resource/Steve_Jobs>

After listing the generated SPARQL for finding information for the entities in the query, KGN searches for relationships between these entities. These discovered relationships can be seen at the end of the last listing. Please note that this step makes SPARQL queries on O(n^2) where n is the number of entities. Local caching of SPARQL queries to DBPedia helps make processing many entities possible.

In addition to showing generated SPARQL and discovered relationships in the middle text pane of the application, KGN also generates formatted results that are also displayed in the bottom text pane:

 1 - - - ENTITY TYPE: PEOPLE - - -
 2 
 3 LABEL: Steve Jobs
 4 
 5 COMMENT: Steven Paul "Steve" Jobs was an American information technology 
 6 entrepreneur and inventor. He was the co-founder, chairman, and chief 
 7 executive officer (CEO) of Apple Inc.; CEO and majority shareholder
 8 of Pixar Animation Studios; a member of The Walt Disney Company's 
 9 board of directors following its acquisition of Pixar; and founder, 
10 chairman, and CEO of NeXT Inc. Jobs is widely recognized as a pioneer of 
11 the microcomputer revolution of the 1970s and 1980s, along with Apple 
12 co-founder Steve Wozniak. Shortly after his death, Jobs's official 
13 biographer, Walter Isaacson, described him as a "creative entrepreneur 
14 whose passion for perfection and ferocious drive revolutionized six industries:
15 personal computers, animated movies, music, phones
16 
17 BIRTHPLACE: http://dbpedia.org/resource/San_Francisco
18 
19 ALMAMATER: http://dbpedia.org/resource/Reed_College
20 
21 SPOUSE: http://dbpedia.org/resource/Laurene_Powell_Jobs
22 
23 - - - ENTITY TYPE: CITIES - - -
24 
25 LABEL:  San Francisco
26 
27 COMMENT: San Francisco, officially the City and County of San Francisco, is the
28 cultural, commercial, and financial center of Northern California and
29 the only consolidated city-county in California. San Francisco encompasses a
30 land area of about 46.9 square miles (121 km2) on the northern end of the 
31 San Francisco Peninsula, which makes it the smallest county in the state. 
32 It has a density of about 18,451 people per square mile (7,124 people per km2),
33 making it the most densely settled large city (population greater than 
34 200,000) in the state of California and the second-most densely populated 
35 major city in the United States after New York City. San Francisco is 
36 the fourth-most populous city in California, after Los Angeles, San Diego, and 
37 San Jose, and the 13th-most populous cit
38 
39 LATITUDE--LONGITUDE: POINT(-122.41666412354 37.783332824707)
40 
41 POPULATION-DENSITY: 7123.97092726667
42 
43 COUNTRY: http://dbpedia.org/resource/United_States
44 
45 - - - ENTITY TYPE: COMPANIES - - -
46 
47 LABEL: Apple Inc.
48 
49 COMMENT: Apple Inc. is an American multinational technology company headquartered
50 in Cupertino, 
51 California, that designs, develops, and sells consumer electronics, 
52 computer software, and online services. Its hardware products include the 
53 iPhone smartphone, the iPad tablet computer, the Mac personal computer, the 
54 iPod portable media player, the Apple Watch smartwatch, and the Apple TV digital 
55 media player. Apple's consumer software includes the macOS and iOS operating 
56 systems, the iTunes media player, the Safari web browser, and the iLife and 
57 iWork creativity and productivity suites. Its online services include the 
58 iTunes Store, the iOS App Store and Mac App Store, Apple Music, and iCloud.
59 
60 INDUSTRY: http://dbpedia.org/resource/Computer_hardware | 
61           http://dbpedia.org/resource/Computer_software | 
62           http://dbpedia.org/resource/Consumer_electronics | 
63           http://dbpedia.org/resource/Corporate_Venture_Capital | 
64           http://dbpedia.org/resource/Digital_distribution |
65           http://dbpedia.org/resource/Fabless_manufacturing
66 
67 NET-INCOME: 5.3394E10
68 
69 NUMBER-OF-EMPLOYEES: 115000

Hopefully after reading through sample output and seeing the screen shot of the application, you now have a better idea what this example application does. Now we will look at project configuration and then implementation.

Project Configuration and Running the Application

The following listing of kgn.asd shows the ten packages this example depends on (five of these are also examples in this book, and five are in the public Quicklisp repository):

 1 ;;;; knowledgegraphnavigator.asd
 2 
 3 ((asdf:defsystem #:kgn-common
 4   :description "common utilities for Knowledge Graph Navigator"
 5   :author "Mark Watson <markw@markwatson.com>"
 6   :license "Apache 2"
 7   :depends-on (#:sqlite #:cl-json #:alexandria #:drakma #:myutils #:entities #:entity-uris #:kbnlp #:sparql-cache)
 8   :components ((:file "package")
 9                (:file "utils")
10                (:file "kgn-common")))

Listing of package.lisp:

 1 ;;;; package.lisp
 2 
 3 (defpackage #:kgn-common
 4 (:use #:cl #:alexandria #:myutils  #:myutils #:sparql-cache
 5    #:entities #:entity-uris #:kbnlp)
 6 (:export #:kgn-common #:remove-stop-words        
 7          #:entity-results->relationship-links
 8          #:get-entity-data-helper #:handle-URIs-in-query
 9          #:remove-uris-from-query #:get-URIs-in-query #:display-entity-results
10          #:string-shorten #:prompt-string #:dbpedia-get-product-detail
11          #:dbpedia-get-person-detail #:dbpedia-get-country-detail
12          #:dbpedia-get-city-detail #:dbpedia-get-company-detail #:clean-results
13          #:dbpedia-get-entities-by-name #:clean-comment))

We use ql:quickload to load the KGN common library and call a few APIs (some output removed for brevity):

 1 $ sbcl
 2 This is SBCL 2.1.10, an implementation of ANSI Common Lisp.
 3 * (ql :kgn-common)
 4 To load "kgn-common":
 5   Load 1 ASDF system:
 6     kgn-common
 7 ; Loading "kgn-common"
 8 ..................
 9 "Starting to load data...." 
10 "....done loading data." 
11 To load "sqlite":
12   Load 1 ASDF system:
13     sqlite
14 ; Loading "sqlite"
15 
16 To load "cl-json":
17   Load 1 ASDF system:
18     cl-json
19 ; Loading "cl-json"
20 
21 To load "drakma":
22   Load 1 ASDF system:
23     drakma
24 ; Loading "drakma"
25 [package kgn-common]..
26 (:kgn-common)
27 *
28 *  (kgn-common::dbpedia-get-relationships "<http://dbpedia.org/resource/Bill_Gates>" "<http://dbpedia.org/resource/Microsoft>")
29 ("<http://dbpedia.org/ontology/knownFor>")
30 *
31 *  (kgn-common:dbpedia-get-entities-by-name "Bill Gates" "<http://dbpedia.org/ontology/Person>" "<http://schema.org/Person>" :message-stream nil)
32 
33 (((:s "http://dbpedia.org/resource/Bill_Gates")
34   (:comment
35    "William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, author, and philanthropist. He is a co-founder of Microsoft, along with his late childhood friend Paul Allen. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual shareholder until May 2014. He is considered one of the best known entrepreneurs of the microcomputer revolution of the 1970s and 1980s."))
36  ((:s "http://dbpedia.org/resource/Harry_R._Lewis")
37   (:comment
38    "Harry Roy Lewis (born 1947) is an American computer scientist, mathe 00ADma 00ADti 00ADcian, and uni 00ADver 00ADsity admin 00ADi 00ADstra 00ADtor known for his research in com 00ADpu 00ADta 00ADtional logic, textbooks in theoretical computer science, and writings on computing, higher education, and technology. He is Gordon McKay Professor of Computer Science at Harvard University, and was Dean of Harvard College from 1995 to 2003. A new professorship in Engineering and Applied Sciences, endowed by a former student, will be named for Lewis and his wife upon their retirements."))
39  ((:s "http://dbpedia.org/resource/Cascade_Investment")
40   (:comment
41    "Cascade Investment, L.L.C. is an American holding company and private investment firm headquartered in Kirkland, Washington, United States. It is controlled by Bill Gates, and managed by Michael Larson. More than half of Gates' fortune is held in assets outside his holding of Microsoft shares. Cascade is the successor company to Dominion Income Management, the former investment vehicle for Gates' holdings, which was managed by convicted felon Andrew Evans."))
42  ((:s "http://dbpedia.org/resource/Jerry_Dyer")
43   (:comment
44    "Jerry P. Dyer (born May 3, 1959) is an American politician and former law enforcement officer. He is the 26th and current mayor of Fresno, California. Previously, he served as the chief of the Fresno Police Department."))) 
45 
46 select distinct ?s ?comment { ?s ?p "Bill Gates"@en . @@ ?s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment  . FILTER  (lang(?comment) = 'en') . @@ ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person> . @@ } LIMIT 15
47 (((:s "http://dbpedia.org/resource/Bill_Gates")
48   (:comment
49    "William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, author, and philanthropist. He is a co-founder of Microsoft, along with his late childhood friend Paul Allen. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual shareholder until May 2014. He is considered one of the best known entrepreneurs of the microcomputer revolution of the 1970s and 1980s."))
50  ((:s "http://dbpedia.org/resource/Harry_R._Lewis")
51   (:comment
52    "Harry Roy Lewis (born 1947) is an American computer scientist, mathe 00ADma 00ADti 00ADcian, and uni 00ADver 00ADsity admin 00ADi 00ADstra 00ADtor known for his research in com 00ADpu 00ADta 00ADtional logic, textbooks in theoretical computer science, and writings on computing, higher education, and technology. He is Gordon McKay Professor of Computer Science at Harvard University, and was Dean of Harvard College from 1995 to 2003. A new professorship in Engineering and Applied Sciences, endowed by a former student, will be named for Lewis and his wife upon their retirements."))
53  ((:s "http://dbpedia.org/resource/Cascade_Investment")
54   (:comment
55    "Cascade Investment, L.L.C. is an American holding company and private investment firm headquartered in Kirkland, Washington, United States. It is controlled by Bill Gates, and managed by Michael Larson. More than half of Gates' fortune is held in assets outside his holding of Microsoft shares. Cascade is the successor company to Dominion Income Management, the former investment vehicle for Gates' holdings, which was managed by convicted felon Andrew Evans."))
56  ((:s "http://dbpedia.org/resource/Jerry_Dyer")
57   (:comment
58    "Jerry P. Dyer (born May 3, 1959) is an American politician and former law enforcement officer. He is the 26th and current mayor of Fresno, California. Previously, he served as the chief of the Fresno Police Department.")))
59 * 

In this last example, using :message-stream nil effectively turns off printing generated SPARQL queries used by these APIs. You can use :message-stream t to see generated SPARQL.

Every time the KGN common library makes a web service call to DBPedia the query and response are cached in a SQLite database in ~/.kgn_cache.db which can greatly speed up the program, especially in development mode when testing a set of queries. This caching also takes some load off of the public DBPedia endpoint, which is a polite thing to do.

Review of NLP Utilities Used in Application

Here is a quick review of NLP utilities we saw in an earlier chapter:

  • kbnlp:make-text-object
  • kbnlp::text-human-names
  • kbnlp::text-place-name
  • entity-uris:find-entities-in-text
  • entity-uris:pp-entities

The following code snippets show example calls to the relevant NLP functions and the generated output:

 1 KGN 39 > (setf text "Bill Clinton went to Canada")
 2 "Bill Clinton went to Canada"
 3 
 4 KGN 40 > (setf txtobj (kbnlp:make-text-object text))
 5 #S(TEXT :URL "" :TITLE "" :SUMMARY "<no summary>" :CATEGORY-TAGS (("computers_microsoft.txt" 0.00641) ("religion_islam.txt" 0.00357)) :KEY-WORDS NIL :KEY-PHRASES NIL :HUMAN-NAMES ("Bill Clinton") :PLACE-NAMES ("Canada") :COMPANY-NAMES NIL :TEXT #("Bill" "Clinton" "went" "to" "Canada") :TAGS #("NNP" "NNP" "VBD" "TO" "NNP"))
 6 
 7 KGN 41 > (kbnlp::text-human-names txtobj)
 8 ("Bill Clinton")
 9 
10 KGN 42 > 
11 (loop for key being the hash-keys of  (entity-uris:find-entities-in-text text)
12   using (hash-value value)
13   do (format t "key: ~S value: ~S~%" key value))
14 key: "people" value: (("Bill Clinton" "<http://dbpedia.org/resource/Bill_Clinton>"))
15 key: "countries" value: (("Canada" "<http://dbpedia.org/resource/Canada>"))
16 NIL

The code using loop at the end of the last repl listing that prints keys and values of a hash table is from the Common Lisp Cookbook web site in the section “Traversing a Hash Table.”

Developing Low-Level SPARQL Utilities

I use the standard command line curl utility program with the Common Lisp package uiop to make HTML GET requests to the DBPedia public Knowledge Graph and the package drakma to url-encode parts of a query. The source code is in a separate Quicklisp library located in src/sparql-cache/sparql.lisp. A non-caching library is also available in src/sparql/sparql.lisp.

In the following listing of src/sparql-cache/sparql.lisp, lines 8, 24, 39, and 55 I use some caching code that we will look at later. The nested replace-all statements in lines 12-13 are a kluge to remove Unicode characters that occasionally caused runtime errors in the KGN application.

 1 (in-package #:kgn)
 2 
 3 (ql:quickload "cl-json")
 4 (ql:quickload "drakma")
 5 
 6 (defun sparql-dbpedia (query)
 7   (let* (ret
 8          (cr (fetch-result-dbpedia query))
 9          (response
10           (or
11            cr
12            (replace-all
13             (replace-all
14              (uiop:run-program 
15               (list
16                "curl" 
17                (concatenate 'string
18                             "https://dbpedia.org/sparql?query="
19                             (drakma:url-encode query :utf-8)
20                             "&format=json"))
21               :output :string)
22              "\\u2013" " ")
23             "\\u" " "))))
24     (save-query-result-dbpedia query response)
25     (ignore-errors
26       (with-input-from-string
27           (s response)
28         (let ((json-as-list (json:decode-json s)))
29           (setf
30            ret
31            (mapcar #'(lambda (x)
32                        ;;(pprint x)
33                        (mapcar #'(lambda (y)
34                                    (list (car y) (cdr (assoc :value (cdr y))))) x))
35                    (cdr (cadddr (cadr json-as-list))))))))
36     ret))
37 
38 (defun sparql-ask-dbpedia (query)
39   (let* ((cr (fetch-result-dbpedia query))
40          (response
41           (or
42            cr
43            (replace-all
44             (replace-all
45              (uiop:run-program 
46               (list
47                "curl" 
48                (concatenate 'string
49                             "https://dbpedia.org/sparql?query="
50                             (drakma:url-encode query :utf-8)
51                             "&format=json"))
52               :output :string)
53              "\\u2013" " ")
54             "\\u" " "))))
55     (save-query-result-dbpedia query response)
56     (if  (search "true" response)
57         t
58       nil)))

The code for replacing Unicode characters is messy but prevents problems later when we are using the query results in the example application.

The code (json-as-list (json:decode-json s)) on line 28 converts a deeply nested JSON response to nested Common Lisp lists. You may want to print out the list to better understand the mapcar expression on lines 31-35. There is no magic to writing expressions like this, in a repl I set json-as-list to the results of one query, and I spent a minute or two experimenting with the nested mapcar expression to get it to work with my test case.

The implementation for sparql-ask-dbpedia in lines 38-58 is simpler because we don’t have to fully parse the returned SPARQL query results. A SPARQL ask type query returns a true/false answer to a query. We will use this to determine the types of entities in query text. While our NLP library identifies entity types, making additional ask queries to DBPedia to verify entity types will provide better automated results.

Implementing the Caching Layer

While developing KGN and also using it as an end user, many SPARQL queries to DBPedia contain repeated entity names so it makes sense to write a caching layer. We use a SQLite database “~/.kgn_cache.db” to store queries and responses.

The caching layer is implemented in the file src/sparql-cache/sparql.lisp and some of the relevant code is listed here:

 1 ;;; SqList caching for SPARQL queries:
 2 
 3 (defvar *db-path* (pathname "~/.kgn_cache.db"))
 4 
 5 (defun create-dbpedia ()
 6   (sqlite:with-open-database (d *db-path*)
 7     (ignore-errors
 8       (sqlite:execute-single d
 9         "CREATE TABLE dbpedia (query string  PRIMARY KEY ASC, result string)"))))
10 
11 (defun save-query-result-dbpedia (query result)
12   (sqlite:with-open-database (d *db-path*)
13     (ignore-errors
14       (sqlite:execute-to-list d
15                        "insert into dbpedia (query, result) values (?, ?)"
16                        query result))))
17 (defun fetch-result-dbpedia (query)
18   (sqlite:with-open-database (d *db-path*)
19     (cadar
20      (sqlite:execute-to-list d
21                       "select * from dbpedia where query = ?" query))))

This caching layer greatly speeds up my own personal use of KGN. Without caching, queries that contain many entity references simply take too long to run. The UI for the KGN applications in later chapters have a menu option for clearing the local cache but I almost never use this option because growing a large cache that is tailored for the types of information I search for makes the entire system much more responsive.

Utilities in the Main Library File kgn-common.lisp

The utilities in the file src/kgn-common/kgn-common.lisp can be seen in this complete code listing:

  1 (in-package #:kgn-common)
  2 
  3 (defun pprint-results (results &key (stream t))
  4   (pprint results stream))
  5 
  6 (defun colorize-sparql-local (s  &key (stream nil))
  7   "placeholder - some applications, like the kgn-capi-ui example need to
  8    colorize the sparql output"
  9   (princ s stream))
 10 
 11 (defun check-uri (uri)
 12   "sloppy code fix: URIs have different forms - normalize these"
 13   (if (equal (type-of uri) 'cons) (setf uri (second uri)))
 14   (entity-uris:ensure-uri-brackets uri))
 15 
 16 (defun clean-comment (comment-string)
 17   "When getting comment strings from DBPedia, there are parts
 18    that I remove for display"
 19   (let ((i1 (search "(" comment-string))
 20         (i2 (search ")" comment-string)))
 21     (if (and i1 i2 (> i2 i1) (> i1 0))
 22         (concatenate 'string (subseq comment-string 0 (- i1 1))
 23                              (subseq comment-string (+ i2 1)))
 24         (let ((j1 (search " / " comment-string)))
 25           (if j1
 26               (let ((j2 (search "/" comment-string :start2 (+ j1 2))))
 27                 (if (and j1 j2 (> j2 j1) (< (+ j2 1) (length comment-string)))
 28                     (concatenate 'string (subseq comment-string 0 j1)
 29                                          (subseq comment-string (+ j2 1)))
 30                     comment-string))
 31               comment-string)
 32           comment-string))))
 33 
 34 (defun clean-results (results)
 35   "This function is replaced when we later build GUI apps"
 36   results)
 37 
 38 (defun get-name-and-description-for-uri (uri)
 39   (let* ((sparql
 40            (replace-all
 41             (format nil
 42   "select distinct ?name ?comment { @@ ~
 43     values ?nameProperty {<http://www.w3.org/2000/01/rdf-schema#label>
 44                           <http://xmlns.com/foaf/0.1/name> } . @@ ~
 45     ~A ?nameProperty ?name . @@ ~
 46     ~A <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment  .
 47          FILTER  (lang(?comment) = 'en') . @@ ~
 48    } LIMIT 1" uri uri)
 49              "@@" " "))
 50           (results (sparql-cache:dbpedia sparql)))
 51     (list (second (assoc :name (car results)))
 52           (second (assoc :comment (car results))))))
 53 
 54 ;; (kgn-common::get-name-and-description-for-uri
 55 ;;   "<http://dbpedia.org/resource/Apple_Inc.>")
 56 
 57 (defun ask-is-type-of (entity-uri type-value-uri) ;; both URIs expected to use surrounding < > brackets for SPARQL
 58   (let* ((sparql
 59            (format nil
 60               "ASK { ~A <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ~A }"
 61               (check-uri entity-uri) (check-uri type-value-uri)))
 62          (results (sparql-cache:ask-dbpedia sparql)))
 63     (print sparql)
 64     results))
 65 
 66 ;; (kgn-common::ask-is-type-of
 67 ;;   "<http://dbpedia.org/resource/Apple_Inc.>"
 68 ;;   "<http://dbpedia.org/ontology/Company>")
 69 
 70 
 71 (defun dbpedia-get-entities-by-name (name dbpedia-type schema-org-type
 72           &key (message-stream t)
 73                (colorize-sparql-function #'colorize-sparql-local))
 74   ;; http://www.w3.org/1999/02/22-rdf-syntax-ns#type <http://schema.org/Person>
 75   (let* ((sparql
 76            (format nil
 77 "select distinct ?s ?comment { 
 78    ?s ?p \"~A\"@en . @@ ~
 79    ?s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment  . FILTER  (lang(?comment) = 'en') . @@ ~
 80    ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ~A . @@ ~
 81 } LIMIT 15" name dbpedia-type))
 82          (results (sparql-cache:dbpedia (replace-all sparql "@@" " "))))
 83     (print results)
 84     (terpri message-stream)
 85     (format message-stream
 86       "Trying to get entity by name = ~A using SPARQL with type:"
 87       name dbpedia-type)
 88     (terpri message-stream)
 89     (apply colorize-sparql-function (list sparql :stream message-stream))
 90     (if (null results)
 91         (let* ((sparql2
 92                  (format nil
 93 "select distinct ?s ?comment {
 94     ?s ?p \"~A\"@en . @@ ~
 95     ?s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment  . FILTER  (lang(?comment) = 'en') . @@ ~
 96     ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ~A . @@ ~
 97 } LIMIT 15" name schema-org-type)))
 98           (format
 99            t
100            "No results for ~A for last SPARQL query using type ~A so trying type ~A" name dbpedia-type schema-org-type)
101           (terpri message-stream)
102           (setf results (sparql-cache:dbpedia (replace-all sparql2 "@@" " ")))
103           (if (null results)
104               (format
105                t
106                "No results for ~A for last SPARQL query using type ~A"
107                name schema-org-type)
108               (let* ((filtered (remove-if
109                                 #'(lambda (x)
110                                     (or
111                                      (search "," (cadar x))
112                                      (and
113                                       (not (equal (first x) :comment))
114                                       (not (search "/resource/" (cadar x))))))
115                                 results))
116                      (uris (remove-duplicates
117                             (map
118                               'list
119                               #'(lambda (x)
120                                 (list (concatenate 'string "<" (cadar x) ">") (cadadr x)))
121                                  filtered) :test #'equal)))
122                 (format t "~%~%********** dbpedia-get-entities-by-name: uris:~%")
123                 (pprint uris) (terpri)
124                 uris))))
125     results))
126 
127 ;; (kgn-common:dbpedia-get-entities-by-name
128 ;;    "Bill Gates" "<http://dbpedia.org/ontology/Person>"
129 ;;    "<http://schema.org/Person>" :message-stream nil)
130 ;; in above, pass t for message-stream to see generated SPARQL queries
131 
132 (defun dbpedia-get-person-detail (person-uri-raw
133     &key (message-stream t)
134          (colorize-sparql-function #'colorize-sparql-local))
135   ;; http://dbpedia.org/ontology/birthPlace 
136   (let* ((person-uri (check-uri person-uri-raw))
137          (query
138            (format nil
139       "SELECT DISTINCT ?label ?comment@@ ~
140         (GROUP_CONCAT (DISTINCT ?birthplace; SEPARATOR=' | ') AS ?birthplace) @@ ~
141         (GROUP_CONCAT (DISTINCT ?almamater; SEPARATOR=' | ') AS ?almamater) @@ ~
142         (GROUP_CONCAT (DISTINCT ?spouse; SEPARATOR=' | ') AS ?spouse) { @@ ~
143          ~A <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment .@@
144               FILTER  (lang(?comment) = 'en') . @@ ~
145         OPTIONAL { ~A <http://dbpedia.org/ontology/birthPlace> ?birthplace } . @@ ~
146         OPTIONAL { ~A <http://dbpedia.org/ontology/almaMater> ?almamater } . @@ ~
147         OPTIONAL { ~A <http://dbpedia.org/ontology/spouse> ?spouse } . @@ ~
148         OPTIONAL { ~A  <http://www.w3.org/2000/01/rdf-schema#label> ?label .@@ ~
149                 FILTER  (lang(?label) = 'en') } @@ ~
150       } LIMIT 10@@" person-uri person-uri person-uri person-uri person-uri))
151          (results (sparql-cache:dbpedia  (replace-all query "@@" " "))))
152     (format message-stream "~%SPARQL to get PERSON data for ~A:~%~%" person-uri)
153     (apply colorize-sparql-function (list query :stream message-stream))
154     (format message-stream "~%")
155     ;;results))
156     (clean-results results)))
157 
158 ;; (kgn-common:dbpedia-get-person-detail "<http://dbpedia.org/resource/Bill_Gates>")
159 
160 (defun dbpedia-get-company-detail (company-uri-raw
161      &key (message-stream t)
162           (colorize-sparql-function #'colorize-sparql-local))
163   (let* ((company-uri (check-uri company-uri-raw))
164          (query
165            (format nil
166 "SELECT DISTINCT ?label ?comment (GROUP_CONCAT (DISTINCT ?industry; SEPARATOR=' | ') AS ?industry)@@ ~
167   (GROUP_CONCAT (DISTINCT ?netIncome; SEPARATOR=' | ') AS ?netIncome)@@ ~
168   (GROUP_CONCAT (DISTINCT ?numberOfEmployees; SEPARATOR=' | ') AS ?numberOfEmployees) {@@ ~
169   ~A <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment .@@
170           FILTER  (lang(?comment) = 'en') .@@ ~
171   OPTIONAL { ~A <http://dbpedia.org/ontology/industry> ?industry } .@@  ~
172   OPTIONAL { ~A <http://dbpedia.org/ontology/netIncome> ?netIncome } .@@  ~
173   OPTIONAL { ~A <http://dbpedia.org/ontology/numberOfEmployees> ?numberOfEmployees } .@@  ~
174   OPTIONAL { ~A <http://www.w3.org/2000/01/rdf-schema#label> ?label . FILTER (lang(?label) = 'en') } @@ ~
175 } LIMIT 30@@"
176            company-uri company-uri company-uri company-uri company-uri))
177          (results (sparql-cache:dbpedia  (replace-all query "@@" " "))))
178     (format message-stream "~%SPARQL to get COMPANY data for ~A:~%~%" company-uri)
179     (apply colorize-sparql-function (list query :stream message-stream))
180     (format message-stream "~%")
181     (clean-results results)))
182 
183 ;; (kgn-common:dbpedia-get-company-detail "<http://dbpedia.org/resource/Microsoft>")
184 
185 (defun dbpedia-get-country-detail (country-uri-raw
186     &key (message-stream t)
187          (colorize-sparql-function #'colorize-sparql-local))
188   (let* ((country-uri (check-uri country-uri-raw))
189          (query
190            (format nil
191 "SELECT DISTINCT ?label ?comment (GROUP_CONCAT (DISTINCT ?areaTotal; SEPARATOR=' | ') AS ?areaTotal)@@ ~
192   (GROUP_CONCAT (DISTINCT ?populationDensity; SEPARATOR=' | ') AS ?populationDensity) {@@ ~
193   ~A <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment .@@
194       FILTER  (lang(?comment) = 'en') .@@ ~
195   OPTIONAL { ~A <http://dbpedia.org/ontology/areaTotal> ?areaTotal } .@@  ~
196   OPTIONAL { ~A <http://dbpedia.org/ontology/populationDensity> ?populationDensity } .@@  ~
197   OPTIONAL { ~A <http://www.w3.org/2000/01/rdf-schema#label> ?label . }@@ ~
198 } LIMIT 30@@"
199                    country-uri country-uri country-uri country-uri country-uri))
200          (results (sparql-cache:dbpedia  (replace-all query "@@" " "))))
201     (format message-stream "~%SPARQL to get COUNTRY data for ~A:~%~%" country-uri)
202     (apply colorize-sparql-function (list query :stream message-stream))
203     (format message-stream "~%")
204     (clean-results results)))
205 
206 ;; (kgn-common:dbpedia-get-country-detail "<http://dbpedia.org/resource/Canada>")
207 
208 (defun dbpedia-get-city-detail (city-uri-raw
209     &key (message-stream t)
210          (colorize-sparql-function #'colorize-sparql-local))
211   (let* ((city-uri (check-uri city-uri-raw))
212          (query
213            (format
214             nil
215 "SELECT DISTINCT ?label ?comment @@ ~
216   (GROUP_CONCAT (DISTINCT ?latitude_longitude; SEPARATOR=' | ')  AS ?latitude_longitude) @@ ~
217   (GROUP_CONCAT (DISTINCT ?populationDensity; SEPARATOR=' | ') AS ?populationDensity) @@ ~
218   (GROUP_CONCAT (DISTINCT ?country; SEPARATOR=' | ') AS ?country) { @@ ~
219   ~A <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment . FILTER  (lang(?comment) = 'en') . @@ ~
220   OPTIONAL { ~A <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?latitude_longitude } . @@ ~
221   OPTIONAL { ~A <http://dbpedia.org/ontology/PopulatedPlace/populationDensity> ?populationDensity } . @@ ~
222   OPTIONAL { ~A <http://dbpedia.org/ontology/country> ?country } .@@ ~
223   OPTIONAL { ~A <http://www.w3.org/2000/01/rdf-schema#label> ?label . } @@ ~
224 } LIMIT 30@@"
225             city-uri city-uri city-uri city-uri city-uri))
226          (results (sparql-cache:dbpedia (replace-all query "@@" " "))))
227     (format message-stream "~%SPARQL to get CITY data for ~A:~%~%" city-uri)
228     (apply colorize-sparql-function (list query :stream message-stream))
229     (format message-stream "~%")
230     (clean-results results)))
231 
232 ;; (kgn-common:dbpedia-get-city-detail "<http://dbpedia.org/resource/London>")
233 
234 (defun dbpedia-get-product-detail (product-uri-raw
235      &key (message-stream t)
236           (colorize-sparql-function #'colorize-sparql-local))
237   (let* ((product-uri (check-uri product-uri-raw))
238          (query
239            (format
240             nil
241 "SELECT DISTINCT ?label ?comment {  @@ ~
242   ~A <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment . FILTER  (lang(?comment) = 'en') . @@ ~
243   OPTIONAL { ~A <http://www.w3.org/2000/01/rdf-schema#label> ?label . } ~
244 } LIMIT 30@@"
245             product-uri product-uri))
246          (results (sparql-cache:dbpedia (replace-all query "@@" " "))))
247     (format message-stream "~%SPARQL to get PRODUCT data for ~A:~%~%" product-uri)
248     (apply colorize-sparql-function (list query :stream message-stream))
249     (format message-stream "~%")
250     (clean-results results)))
251 
252 ;; (kgn-common:dbpedia-get-product-detail "<http://dbpedia.org/resource/Pepsi>")
253 
254 
255 (defun dbpedia-get-relationships (s-uri o-uri) ;;  &key (message-stream t))
256   (let* ((query
257            (format
258             nil
259 "SELECT DISTINCT ?p { ~A ?p ~A . FILTER (!regex(str(?p), 'wikiPage', 'i'))} LIMIT 5"
260             (check-uri s-uri) (check-uri  o-uri)))
261          (results (sparql-cache:dbpedia query)))
262     (alexandria:flatten (map 'list
263                              #'(lambda (x)
264                                  (format nil "~{<~A>~}" (cdar x)))
265                              results))))
266 
267 ;; (kgn-common::dbpedia-get-relationships
268 ;;   "<http://dbpedia.org/resource/Bill_Gates>"
269 ;;   "<http://dbpedia.org/resource/Microsoft>")
270 
271 (defun entities (text)
272   (let ((txt-obj (kbnlp:make-text-object text)))
273     (list (kbnlp::text-human-names txt-obj) (kbnlp::text-place-names txt-obj) (kbnlp::text-company-names txt-obj))))
274 
275 ;; (kgn-common::entities "Bill Clinton went to Canada")
276 
277 (defun entities-dbpedia (text)
278   (let ((e-hash (entity-uris:find-entities-in-text text)))
279     (list
280       (gethash "people" e-hash)
281       (gethash "companies" e-hash)
282       (gethash "countries" e-hash)
283       (gethash "cities" e-hash))))
284 
285 ;; (kgn-common::entities-dbpedia "Bill Clinton went to Canada")

Wrap-up

This is a long example application for a book, split between this chapter and the next two chapters offering different user interface implementations.

I got the idea for the KGN application because I was spending quite a bit of time manually setting up SPARQL queries for DBPedia (and other public sources like WikiData) and I wanted to experiment with partially automating this process. Now in the next two chapters we will write user interfaces for this KGN common library.