Knowledge Graph Navigator

The Knowledge Graph Navigator (which I will often refer to as KGN) is a tool for processing a set of entity names and automatically exploring the public Knowledge Graph DBPedia using SPARQL queries. I started to write KGN for my own use, to automate some things I used to do manually when exploring Knowledge Graphs, and later thought that KGN might be also useful for educational purposes. KGN shows the user the auto-generated SPARQL queries so hopefully the user will learn by seeing examples. KGN uses code developed in the earlier chapter Resolve Entity Names to DBPedia References and we will reuse here as well as the two Java classes JenaAPis and QueryResults (which wrap the Apache Jena library) from the chapter Semantic Web.

I have a web site devoted to different versions of KGN that you might find interesting. The most full featured version of KGN, including a full user interface, is featured in my book Loving Common Lisp, or the Savvy Programmer’s Secret Weapon that you can read free online. That version performs more speculative SPARQL queries to find information compared to the example here that I designed for ease of understanding, modification, and embedding in larger Java projects.

I chose to use DBPedia instead of WikiData for this example because DBPedia URIs are human readable. The following URIs represent the concept of a person. The semantic meanings of DBPedia and FOAF (friend of a friend) URIs are self-evident to a human reader while the WikiData URI is not:

1 http://www.wikidata.org/entity/Q215627
2 http://dbpedia.org/ontology/Person
3 http://xmlns.com/foaf/0.1/name

I frequently use WikiData in my work and WikiData is one of the most useful public knowledge bases. I have both DBPedia and WikiData Sparql endpoints in the file Sparql.java that we will look at later, with the WikiData endpoint comment out. You can try manually querying WikiData at the WikiData SPARL endpoint. For example, you might explore the WikiData URI for the person concept using:

1 select ?p ?o where { <http://www.wikidata.org/entity/Q215627> ?p ?o  } limit 10

For the rest of this chapter we will just use DBPedia.

After looking an interactive session using the example program for this chapter (that also includes listing automatically generated SPARQL queries) we will look at the implementation.

Entity Types Handled by KGN

To keep this example simple we handle just four entity types:

  • People
  • Companies
  • Cities
  • Countries

The entity detection library that we use from an earlier chapter also supports the following entity types that we don’t use here:

  • Broadcast Networks
  • Music Groups
  • Political Parties
  • Trade Unions
  • Universities

In addition to finding detailed information for people, companies, cities, and countries we will also search for relationships between person entities and company entities. This search process consists of generating a series of SPARQL queries and calling the DBPedia SPARQL endpoint.

As we look at the KGN implementation I will point out where and how you can easily add support for more entity types and in the wrap-up I will suggest further projects that you might want to try implementing with this example.

General Design of KGN with Example Output

The example application works by first having the user enter names of people and companies. Using libraries written in two previous chapters, we find entities in the user’s input text, and generate SPARQL queries to DBPedia to find information about the entities and relationships between them.

We will start with looking at sample output so you have some understanding on what this implementation of KGN will and will not do. Here is the console output for the example query “Bill Gates, Melinda Gates and Steve Jobs at Apple Computer, IBM and Microsoft” (with some output removed for brevity). As you remember from the chapter Semantic Web, SPAQRL query results are expressed in class QueryResult that contains the variables (labelled as vars) in a query and a list of rows (one query result per row). Starting at line 117 in the following listing we see discovered relationships between entities in the input query.

 1 Enter entities query:
 2 Bill Gates, Melinda Gates and Steve Jobs at Apple Computer, IBM and Microsoft
 3 
 4 Processing query:
 5 Bill Gates, Melinda Gates and Steve Jobs at Apple Computer, IBM and Microsoft
 6 
 7 person  0   1   Bill Gates  <http://dbpedia.org/resource/Bill_Gates>
 8 person  4   5   Melinda Gates   <http://dbpedia.org/resource/Melinda_Gates>
 9 person  7   8   Steve Jobs  <http://dbpedia.org/resource/Steve_Jobs>
10 company 10  11  Apple Computer  <http://dbpedia.org/resource/Apple_Inc.>
11 company 14  15  IBM <http://dbpedia.org/resource/IBM>
12 company 16  17  Microsoft   <http://dbpedia.org/resource/Microsoft>
13 
14 Individual People:
15 
16   Bill Gates                : http://dbpedia.org/resource/Bill_Gates
17 [QueryResult vars:[birthplace, label, comment, almamater, spouse]
18 Rows:
19   [http://dbpedia.org/resource/Seattle, Bill Gates, William Henry \"Bill\" Gates III (born October 28, 1955) is an American business magnate, investor, author and philanthropist. In 1975, Gates and Paul Allen co-founded Microsoft, which became the world's largest PC software company. During his career at Microsoft, Gates held the positions of chairman, CEO and chief software architect, and was the largest individual shareholder until May 2014. Gates has authored and co-authored several books., http://dbpedia.org/resource/Harvard_University, http://dbpedia.org/resource/Melinda_Gates]
20 
21   Melinda Gates             : http://dbpedia.org/resource/Melinda_Gates
22 [QueryResult vars:[birthplace, label, comment, almamater, spouse]
23 Rows:
24   [http://dbpedia.org/resource/Dallas | http://dbpedia.org/resource/Dallas,_Texas, Melinda Gates, Melinda Ann Gates (née French; born August 15, 1964), DBE is an American businesswoman and philanthropist. She is co-founder of the Bill & Melinda Gates Foundation. She worked at Microsoft, where she was project manager for Microsoft Bob, Microsoft Encarta and Expedia., http://dbpedia.org/resource/Duke_University, http://dbpedia.org/resource/Bill_Gates]
25 
26   Steve Jobs                : http://dbpedia.org/resource/Steve_Jobs
27 [QueryResult vars:[birthplace, label, comment, almamater, spouse]
28 Rows:
29   [http://dbpedia.org/resource/San_Francisco, Steve Jobs, Steven Paul \"Steve\" Jobs (/ˈdʒɒbz/; February 24, 1955 – October 5, 2011) was an American information technology entrepreneur and inventor. He was the co-founder, chairman, and chief executive officer (CEO) of Apple Inc.; CEO and majority shareholder of Pixar Animation Studios; a member of The Walt Disney Company's board of directors following its acquisition of Pixar; and founder, chairman, and CEO of NeXT Inc. Jobs is widely recognized as a pioneer of the microcomputer revolution of the 1970s and 1980s, along with Apple co-founder Steve Wozniak. Shortly after his death, Jobs's official biographer, Walter Isaacson, described him as a \"creative entrepreneur whose passion for perfection and ferocious drive revolutionized six industries: personal computers, animated movies, music, phones, tab, http://dbpedia.org/resource/Reed_College, http://dbpedia.org/resource/Laurene_Powell_Jobs]
30 
31 
32 Individual Companies:
33 
34   Apple Computer            : http://dbpedia.org/resource/Apple_Inc.
35 [QueryResult vars:[industry, netIncome, label, comment, numberOfEmployees]
36 Rows:
37   [http://dbpedia.org/resource/Computer_hardware | http://dbpedia.org/resource/Computer_software | http://dbpedia.org/resource/Consumer_electronics | http://dbpedia.org/resource/Corporate_Venture_Capital | http://dbpedia.org/resource/Digital_distribution | http://dbpedia.org/resource/Fabless_manufacturing, 5.3394E10, Apple Inc., Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. Its hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, and the Apple TV digital media player. Apple's consumer software includes the macOS and iOS operating systems, the iTunes media player, the Safari web browser, and the iLife and iWork creativity and productivity suites. Its online services include the iTunes Store, the iOS App Store and Mac App Store, Apple Music, and iCloud., 115000]
38 
39   IBM                       : http://dbpedia.org/resource/IBM
40 [QueryResult vars:[industry, netIncome, label, comment, numberOfEmployees]
41 Rows:
42   [http://dbpedia.org/resource/Cloud_computing | http://dbpedia.org/resource/Cognitive_computing | http://dbpedia.org/resource/Information_technology, 1.319E10, IBM, International Business Machines Corporation (commonly referred to as IBM) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries. The company originated in 1911 as the Computing-Tabulating-Recording Company (CTR) and was renamed \"International Business Machines\" in 1924., 377757]
43 
44   Microsoft                 : http://dbpedia.org/resource/Microsoft
45 [QueryResult vars:[industry, netIncome, label, comment, numberOfEmployees]
46 Rows:
47   [http://dbpedia.org/resource/Computer_hardware | http://dbpedia.org/resource/Consumer_electronics | http://dbpedia.org/resource/Digital_distribution | http://dbpedia.org/resource/Software, , Microsoft, Microsoft Corporation /ˈmaɪkrəˌsɒft, -roʊ-, -ˌsɔːft/ (commonly referred to as Microsoft or MS) is an American multinational technology company headquartered in Redmond, Washington, that develops, manufactures, licenses, supports and sells computer software, consumer electronics and personal computers and services. Its best known software products are the Microsoft Windows line of operating systems, Microsoft Office office suite, and Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface tablet lineup. As of 2011, it was the world's largest software maker by revenue, and one of the world's most valuable companies., 114000]
48 
49 
50 Individual Cities:
51 
52   Seattle                   : http://dbpedia.org/resource/Seattle
53 [QueryResult vars:[latitude_longitude, populationDensity, label, comment, country]
54 Rows:
55   [POINT(-122.33305358887 47.609722137451), 3150.979715864901, Seattle, Seattle is a West Coast seaport city and the seat of King County, Washington. With an estimated 684,451 residents as of 2015, Seattle is the largest city in both the state of Washington and the Pacific Northwest region of North America. As of 2015, it is estimated to be the 18th largest city in the United States. In July 2013, it was the fastest-growing major city in the United States and remained in the Top 5 in May 2015 with an annual growth rate of 2.1%. The Seattle metropolitan area is the 15th largest metropolitan area in the United States with over 3.7 million inhabitants. The city is situated on an isthmus between Puget Sound (an inlet of the Pacific Ocean) and Lake Washington, about 100 miles (160 km) south of the Canada–United States border. A major gateway for trade w, ]
56 
57 Individual Countries:
58 
59 
60 Relationships between person Bill Gates person Melinda Gates:
61 [QueryResult vars:[p]
62 Rows:
63   [http://dbpedia.org/ontology/spouse]
64 
65 Relationships between person Melinda Gates person Bill Gates:
66 [QueryResult vars:[p]
67 Rows:
68   [http://dbpedia.org/ontology/spouse]
69 
70 Relationships between person Bill Gates company Microsoft:
71 [QueryResult vars:[p]
72 Rows:
73   [http://dbpedia.org/ontology/board]
74 
75 Relationships between person Steve Jobs company Apple Computer:
76 [QueryResult vars:[p]
77 Rows:
78   [http://www.w3.org/2000/01/rdf-schema#seeAlso]
79   [http://dbpedia.org/ontology/board]
80   [http://dbpedia.org/ontology/occupation]

Since the DBPedia queries are time consuming, we use the caching layer from the earlier chapter Semantic Web when making SPARQL queries to DBPedia. The cache is especially helpful during development when the same queries are repeatedly used for testing.

The KGN user interface loop allows you to enter queries and see the results. There are two special options that you can enter instead of a query:

  • sparql - this will print out all previous SPARQL queries used to present results. After entering this command the buffer of previous SPARQL queries is emptied. This option is useful for learning SPARQL and you might try pasting a few into the input field for the public DBPedia SPARQL web app and modifying them. We will use this command later in an example.
  • demo - this will randomly choose a sample query.

UML Class Diagram for Example Application

The following UML Class Diagram for KGN shows you an overview of the Java classes we use and their public methods and fields.

UML Class Diagram for KGN Example Application

Implementation

We will walk through the classes in the UML Class Diagram for KGN in alphabetical order, the exception being that we will look at the main program in KGN.java last.

The class EntityAndDescription contains two strings, a name and a URI reference. We also override the default implementation of toString to format and display the data in an instance of this class:

 1 package com.knowledgegraphnavigator;
 2 
 3 public class EntityAndDescription {
 4   public String entityName;
 5   public String entityUri;
 6   public EntityAndDescription(String entityName, String entityUri) {
 7     this.entityName = entityName;
 8     this.entityUri = entityUri;
 9   }
10   public String toString() {
11     return "[EntityAndDescription name: " + entityName +
12         " description: " + entityUri + "]";
13   }
14 }

The class EntityDetail defines SPARQL query templates in lines 80-154 that have slots (using %s for string replacement) for the URI of an entity. We use different templates for different entity types. Before we look at these SPARQL query templates, let’s learn two additional features of the SPARQL language that we will need to use in these entity templates.

We mentioned the OPTIONAL triple matching patterns in the chapter Semantic Web. Before looking at the Java code, let’s first look at how optional matching works. We will run the KGN application asking for information on the city Seattle and then use the sparql command to print the generated SPARQL produced by the method cityResults (most output is not shown here for brevity). On line 2 I enter the query string “Seattle” and on line 22 I enter the command “sparql” to print out the generated SPARQL:

 1 Enter entities query:
 2 Seattle
 3 
 4 Individual Cities:
 5 
 6   Seattle                   : http://dbpedia.org/resource/Seattle
 7 [QueryResult vars:[latitude_longitude, populationDensity, label, comment, country]
 8 Rows:
 9   [POINT(-122.33305358887 47.609722137451), 3150.979715864901, Seattle, Seattle is a West Coast seaport city and the seat of King County, Washington. With an estimated 684,451 residents as of 2015, Seattle is the largest city in both the state of Washington and the Pacific Northwest region of North America. As of 2015, it is estimated to be the 18th largest city in the United States. In July 2013, it was the fastest-growing major city in the United States and remained in the Top 5 in May 2015 with an annual growth rate of 2.1%. The Seattle metropolitan area is the 15th largest metropolitan area in the United States with over 3.7 million inhabitants. The city is situated on an isthmus between Puget Sound (an inlet of the Pacific Ocean) and Lake Washington, about 100 miles (160 km) south of the Canada–United States border. A major gateway for trade w, ]
10 
11 Processing query:
12 sparql
13 
14 Generated SPARQL used to get current results:
15 
16 SELECT DISTINCT
17     (GROUP_CONCAT (DISTINCT ?latitude_longitude2; SEPARATOR=' | ') 
18         AS ?latitude_longitude) 
19     (GROUP_CONCAT (DISTINCT ?populationDensity2; SEPARATOR=' | ')
20         AS ?populationDensity) 
21     (GROUP_CONCAT (DISTINCT ?label2; SEPARATOR=' | ') AS ?label) 
22     (GROUP_CONCAT (DISTINCT ?comment2; SEPARATOR=' | ') AS ?comment) 
23     (GROUP_CONCAT (DISTINCT ?country2; SEPARATOR=' | ') AS ?country) { 
24  <http://dbpedia.org/resource/Seattle>
25    <http://www.w3.org/2000/01/rdf-schema#comment>
26    ?comment2 .
27         FILTER  (lang(?comment2) = 'en') . 
28  OPTIONAL { <http://dbpedia.org/resource/Seattle>
29             <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>
30             ?latitude_longitude2 } . 
31  OPTIONAL { <http://dbpedia.org/resource/Seattle>
32             <http://dbpedia.org/ontology/PopulatedPlace/populationDensity>
33             ?populationDensity2 } . 
34  OPTIONAL { <http://dbpedia.org/resource/Seattle>
35             <http://dbpedia.org/ontology/country>
36             ?country2 } . 
37  OPTIONAL { <http://dbpedia.org/resource/Seattle>
38             <http://www.w3.org/2000/01/rdf-schema#label>
39             ?label2 . } 
40  } LIMIT 30

This listing was manually edited to fit page width. In lines 34-36, we are trying to find a triple stating which country Seattle is in. Please note that this triple matching pattern is generated as one line but I had to manually edit it here to fit the page width.

The triple matching pattern in lines 34-36 must match some triple in DBPedia or no results will be returned. In other words this matching pattern is mandatory. The four optional matching patterns in lines 38-49 specify triple patterns that may be matched. In this example there is no triple matching the following statement in the DBPedia knowledge base so the variable country2 is not bound and the query returns no results for the variable country:

1 <http://dbpedia.org/resource/Seattle> <http://dbpedia.org/ontology/country> ?country2

Notice also the syntax for GROUP_CONCAT used in lines 27-33, for example:

1   (GROUP_CONCAT (DISTINCT ?country2; SEPARATOR=' | ') AS ?country)

This collects all values assigned to the binding variable ?country2 into a string value using the separator string “ | “. Using DISTINCT with GROUP_CONCAT conveniently discards duplicate bindings for binding variables like ?country2.

Now that we have looked at SPARQL examples using OPTIONAL and GROUP_CONCAT, the templates at the end of the following listing should be easier to understand.

The methods genericResults and genericAsString in the following listing are not currently used in this example but I leave them as easy way to get information, given any entity URI. You are likely to use these if you use the code for KGN in your projects.

For each entity type, for example city, I wrote one method like cityResults that returns an instance of QueryResult calculated by using the JenaApis library from the chapter Semantic Web. For each entity type there is another method, like cityAsString that converts an instance of QueryResult to a formatted string for display.

We use the code pattern seen in lines 29-30 for each entity type. We use the static method String.format to replace occurrences of %s in the entity template string with the string representation of entity URIs.

  1 package com.knowledgegraphnavigator;
  2 
  3 import com.markwatson.semanticweb.QueryResult;
  4 
  5 import java.sql.SQLException; // Cache layer in JenaApis library throws this
  6 
  7 public class EntityDetail {
  8 
  9   static public QueryResult genericResults(Sparql endpoint, String entityUri)
 10       throws SQLException, ClassNotFoundException {
 11     String query =
 12         String.format(
 13             "select distinct ?p ?o where { %s ?p ?o . " +
 14             "  FILTER (!regex(str(?p), 'wiki', 'i')) . " +
 15             "  FILTER (!regex(str(?p), 'wiki', 'i')) } limit 10",
 16             entityUri);
 17     return endpoint.query(query);
 18   }
 19 
 20   static public String genericAsString(Sparql endpoint, String entityUri)
 21       throws SQLException, ClassNotFoundException {
 22     QueryResult qr = genericResults(endpoint, entityUri);
 23     return qr.toString();
 24   }
 25 
 26   static public QueryResult cityResults(Sparql endpoint, String entityUri)
 27       throws SQLException, ClassNotFoundException {
 28     String query =
 29         String.format(cityTemplate, entityUri, entityUri, entityUri,
 30                       entityUri, entityUri);
 31     return endpoint.query(query);
 32   }
 33 
 34   static public String cityAsString(Sparql endpoint, String entityUri)
 35       throws SQLException, ClassNotFoundException {
 36     QueryResult qr = cityResults(endpoint, entityUri);
 37     return qr.toString();
 38   }
 39 
 40   static public QueryResult countryResults(Sparql endpoint, String entityUri)
 41       throws SQLException, ClassNotFoundException {
 42     String query =
 43         String.format(countryTemplate, entityUri, entityUri, entityUri,
 44                       entityUri, entityUri);
 45     return endpoint.query(query);
 46   }
 47 
 48   static public String countryAsString(Sparql endpoint, String entityUri)
 49       throws SQLException, ClassNotFoundException {
 50     QueryResult qr = countryResults(endpoint, entityUri);
 51     return qr.toString();
 52   }
 53   static public QueryResult personResults(Sparql endpoint, String entityUri)
 54       throws SQLException, ClassNotFoundException {
 55     String query =
 56         String.format(personTemplate, entityUri, entityUri, entityUri,
 57                       entityUri, entityUri);
 58     return endpoint.query(query);
 59   }
 60 
 61   static public String personAsString(Sparql endpoint, String entityUri)
 62       throws SQLException, ClassNotFoundException {
 63     QueryResult qr = personResults(endpoint, entityUri);
 64     return qr.toString();
 65   }
 66   static public QueryResult companyResults(Sparql endpoint, String entityUri)
 67       throws SQLException, ClassNotFoundException {
 68     String query =
 69         String.format(companyTemplate, entityUri, entityUri, entityUri,
 70                       entityUri, entityUri);
 71     return endpoint.query(query);
 72   }
 73 
 74   static public String companyAsString(Sparql endpoint, String entityUri)
 75       throws SQLException, ClassNotFoundException {
 76     QueryResult qr = companyResults(endpoint, entityUri);
 77     return qr.toString();
 78   }
 79 
 80   static private String companyTemplate =
 81     "SELECT DISTINCT" +
 82     "    (GROUP_CONCAT (DISTINCT ?industry2; SEPARATOR=' | ') AS ?industry)\n" +
 83     "    (GROUP_CONCAT (DISTINCT ?netIncome2; SEPARATOR=' | ') AS ?netIncome)\n" +
 84     "    (GROUP_CONCAT (DISTINCT ?label2; SEPARATOR=' | ') AS ?label)\n" +
 85     "    (GROUP_CONCAT (DISTINCT ?comment2; SEPARATOR=' | ') AS ?comment)\n" +
 86     "    (GROUP_CONCAT (DISTINCT ?numberOfEmployees2; SEPARATOR=' | ')\n" +
 87     "         AS ?numberOfEmployees) {\n" +
 88     "  %s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment2 .\n" +
 89     "            FILTER  (lang(?comment2) = 'en') .\n" +
 90     "  OPTIONAL { %s <http://dbpedia.org/ontology/industry> ?industry2 } .\n" +
 91     "  OPTIONAL { %s <http://dbpedia.org/ontology/netIncome> ?netIncome2 } .\n" +
 92     "  OPTIONAL {\n" +
 93     "    %s <http://dbpedia.org/ontology/numberOfEmployees> ?numberOfEmployees2\n" +
 94     "  } .\n" +
 95     "  OPTIONAL { %s <http://www.w3.org/2000/01/rdf-schema#label> ?label2 .\n" +
 96     "            FILTER (lang(?label2) = 'en') } \n" +
 97     "} LIMIT 30";
 98   
 99   static private String personTemplate =
100     "SELECT DISTINCT\n" +
101     "    (GROUP_CONCAT (DISTINCT ?birthplace2; SEPARATOR=' | ') AS ?birthplace)  \n" +
102     "    (GROUP_CONCAT (DISTINCT ?label2; SEPARATOR=' | ') AS ?label)  \n" +
103     "    (GROUP_CONCAT (DISTINCT ?comment2; SEPARATOR=' | ') AS ?comment)  \n" +
104     "    (GROUP_CONCAT (DISTINCT ?almamater2; SEPARATOR=' | ') AS ?almamater)  \n" +
105     "    (GROUP_CONCAT (DISTINCT ?spouse2; SEPARATOR=' | ') AS ?spouse) {  \n" +
106     " %s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment2 .\n" +
107     " FILTER  (lang(?comment2) = 'en') .  \n" +
108     " OPTIONAL { %s <http://dbpedia.org/ontology/birthPlace> ?birthplace2 } .  \n" +
109     " OPTIONAL { %s <http://dbpedia.org/ontology/almaMater> ?almamater2 } .  \n" +
110     " OPTIONAL { %s <http://dbpedia.org/ontology/spouse> ?spouse2 } .  \n" +
111     " OPTIONAL { %s  <http://www.w3.org/2000/01/rdf-schema#label> ?label2 . \n" +
112     "    FILTER  (lang(?label2) = 'en') }  \n" +
113     " } LIMIT 10";
114 
115   static private String countryTemplate =
116    "SELECT DISTINCT" +
117    "   (GROUP_CONCAT (DISTINCT ?areaTotal2; SEPARATOR=' | ') AS ?areaTotal)\n" +
118    "   (GROUP_CONCAT (DISTINCT ?label2; SEPARATOR=' | ') AS ?label)\n" +
119    "   (GROUP_CONCAT (DISTINCT ?comment2; SEPARATOR=' | ') AS ?comment)\n" +
120    "   (GROUP_CONCAT (DISTINCT ?populationDensity2; SEPARATOR=' | ')\n" +
121    "     AS ?populationDensity) {\n" +
122    "  %s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment2 .\n" +
123    "                           FILTER  (lang(?comment2) = 'en') .\n" +
124    "  OPTIONAL { %s <http://dbpedia.org/ontology/areaTotal> ?areaTotal2 } .\n" +
125    "  OPTIONAL {\n" +
126    "       %s <http://dbpedia.org/ontology/populationDensity> ?populationDensity2\n" +
127    "  } .\n" +
128    "  OPTIONAL { %s <http://www.w3.org/2000/01/rdf-schema#label> ?label2 . }\n" +
129    "} LIMIT 30";
130 
131   static private String cityTemplate =
132     "SELECT DISTINCT\n" +
133     "    (GROUP_CONCAT (DISTINCT ?latitude_longitude2; SEPARATOR=' | ') \n" +
134     "              AS ?latitude_longitude) \n" +
135     "    (GROUP_CONCAT (DISTINCT ?populationDensity2; SEPARATOR=' | ')\n" +
136     "              AS ?populationDensity) \n" +
137     "    (GROUP_CONCAT (DISTINCT ?label2; SEPARATOR=' | ') AS ?label) \n" +
138     "    (GROUP_CONCAT (DISTINCT ?comment2; SEPARATOR=' | ') AS ?comment) \n" +
139     "    (GROUP_CONCAT (DISTINCT ?country2; SEPARATOR=' | ') AS ?country) { \n" +
140     " %s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment2 .\n" +
141     "       FILTER  (lang(?comment2) = 'en') . \n" +
142     " OPTIONAL {\n" +
143     "   %s <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>\n" +
144     "      ?latitude_longitude2\n" +
145     " } . \n" +
146     " OPTIONAL {\n" +
147     "    %s <http://dbpedia.org/ontology/PopulatedPlace/populationDensity>\n" +
148     "      ?populationDensity2\n" +
149     " } . \n" +
150     " OPTIONAL { %s <http://dbpedia.org/ontology/country> ?country2 } . \n" +
151     " OPTIONAL { %s <http://www.w3.org/2000/01/rdf-schema#label> ?label2 . " +
152     "             FILTER  (lang(?label2) = 'en') } \n" +
153     "} LIMIT 30\n";
154 }

The class EntityRelationships in the next listing is used to find property relationships between two entity URIs. The RDF statement matching FILTER on line 15 prevents matching statements where the property contains the string “wiki” to avoid WikiData URI references. This class would need to be rewritten to handle, for example, the WikiData Knowledge Base instead of the DBPedia Knowledge Base. This class uses the JenaApis library developed in the chapter Semantic Web. The class Sparql that we will look at later wraps the use of the JenaApis library.

 1 package com.knowledgegraphnavigator;
 2 
 3 import com.markwatson.semanticweb.QueryResult;
 4 
 5 import java.sql.SQLException;
 6 
 7 public class EntityRelationships {
 8 
 9   static public QueryResult results(Sparql endpoint,
10                                     String entity1Uri, String entity2Uri)
11       throws SQLException, ClassNotFoundException {
12     String query =
13         String.format(
14          "select ?p where { %s ?p %s . "  +
15          "   FILTER (!regex(str(?p), 'wikiPage', 'i')) } limit 10",
16             entity1Uri, entity2Uri);
17     return endpoint.query(query);
18   }
19 }

The class Log in the next listing defines a shorthand out for calling System.out.println, an instance of StringBuilder for storing all generated SPARQL queries made to DBPedia, and a utility method for clearing the stored SPARQL queries. We use the cache of SPARQL queries to support the interactive command “sparql” in the KGN application that we previously saw in an example when we saw the use of this command to display all cached SPARQL queries demonstrating the use of DISTINCT and GROUP_CONCAT.

1 package com.knowledgegraphnavigator;
2 
3 public class Log {
4   static public void out(String s) { System.out.println(s); }
5   static public StringBuilder sparql  = new StringBuilder();
6   static public void clearSparql() { sparql.delete(0, sparql.length()); }
7 }

The class PrintEntityResearchResults in the next listing takes results from multiple DBPedia queries, formats the results, and displays them. The class constructor has no use except for the side effect of displaying results to a user. The constructor requires the arguments:

  • Sparql endpoint - we will look at the definition of class Sparql in the next section.
  • List people - a list of person names and URIs.
  • List companies - a list of company names and URIs.
  • List cities - a list of city names and URIs.
  • List countries - a list of country names and URIs.

I define static string values for a few ANSI terminal escape sequences for changing the default color of text written to a terminal. If you are running on Windows you may need to set initialization values for RESET, GREEN, YELLOW, PURPLE, and CYAN to empty strings ““.

 1 package com.knowledgegraphnavigator;
 2 
 3 import static com.knowledgegraphnavigator.Log.out;
 4 import static com.knowledgegraphnavigator.Utils.removeBrackets;
 5 
 6 import java.sql.SQLException;
 7 import java.util.List;
 8 
 9 public class PrintEntityResearchResults {
10 
11   /**
12    * Note for Windows users: the Windows console may not render the following
13    * ANSI terminal escape sequences correctly. If yo have problems, just
14    * change the following to the empty string "":
15    */
16   public static final String RESET  = "\u001B[0m"; // ANSI characters for styling
17   public static final String GREEN  = "\u001B[32m";
18   public static final String YELLOW = "\u001B[33m";
19   public static final String PURPLE = "\u001B[35m";
20   public static final String CYAN   = "\u001B[36m";
21 
22   private PrintEntityResearchResults() { }
23 
24   public PrintEntityResearchResults(Sparql endpoint,
25                                     List<EntityAndDescription> people,
26                                     List<EntityAndDescription> companies,
27                                     List<EntityAndDescription> cities,
28                                     List<EntityAndDescription> countries)
29       throws SQLException, ClassNotFoundException {
30     out("\n" + GREEN + "Individual People:\n" + RESET);
31     for (EntityAndDescription person : people) {
32       out("  " + GREEN + String.format("%-25s", person.entityName) +
33           PURPLE + " : " + removeBrackets(person.entityUri) + RESET);
34       out(EntityDetail.personAsString(endpoint, person.entityUri));
35     }
36     out("\n" + CYAN + "Individual Companies:\n" + RESET);
37     for (EntityAndDescription company : companies) {
38       out("  " + CYAN + String.format("%-25s", company.entityName) +
39           YELLOW + " : " + removeBrackets(company.entityUri) + RESET);
40       out(EntityDetail.companyAsString(endpoint, company.entityUri));
41     }
42     out("\n" + GREEN + "Individual Cities:\n" + RESET);
43     for (EntityAndDescription city : cities) {
44       out("  " + GREEN + String.format("%-25s", city.entityName) +
45           PURPLE + " : " + removeBrackets(city.entityUri) + RESET);
46       out(EntityDetail.cityAsString(endpoint, city.entityUri));
47     }
48     out("\n" + GREEN + "Individual Countries:\n" + RESET);
49     for (EntityAndDescription country : countries) {
50       out("  " + GREEN + String.format("%-25s", country.entityName) +
51           PURPLE + " : " + removeBrackets(country.entityUri) + RESET);
52       out(EntityDetail.countryAsString(endpoint, country.entityUri));
53     }
54     out("");
55   }
56 }

The class Sparql in the next listing wraps the JenaApis library from the chapter Semantic Web. I set the SPARQL endpoint for DBPedia on line 13. I set and commented out the WikiData SPARQL endpoint on lines 11-12. The KGN application will not work with WikiData without some modifications. If you enjoy experimenting with KGN then you might want to clone it and enable it to work simultaneously with DBPedia, WikiData, and local RDF files by using three instances of the class JenaApis.

Notice that we are importing the value of a static StringBuffer com.knowledgegraphnavigator.Log.sparql on line 5. We will use this for storing SPARQL queries for display to the user.

 1 package com.knowledgegraphnavigator;
 2 
 3 import com.markwatson.semanticweb.QueryResult;
 4 import com.markwatson.semanticweb.JenaApis;
 5 import static com.knowledgegraphnavigator.Log.sparql;
 6 import static com.knowledgegraphnavigator.Log.out;
 7 
 8 import java.sql.SQLException;
 9 
10 public class Sparql {
11   //static private String endpoint =
12   //      "https://query.wikidata.org/bigdata/namespace/wdq/sparql";
13   static private String endpoint = "https://dbpedia.org/sparql";
14   public Sparql() {
15     this.jenaApis = new JenaApis();
16   }
17 
18   public QueryResult query(String sparqlQuery)
19          throws SQLException, ClassNotFoundException {
20     //out(sparqlQuery); // debug for now...
21     sparql.append(sparqlQuery);
22     sparql.append(("\n\n"));
23     return jenaApis.queryRemote(endpoint, sparqlQuery);
24   }
25   private JenaApis jenaApis;
26 }

The class Utils contains one utility method removeBrackets that is used to convert a URI in SPARQL RDF statement form:

1 <http://dbpedia.org/resource/Seattle>

to:

1 http://dbpedia.org/resource/Seattle

The single method removeBrackets is only used in the class PrintEntityResearchResults.

1 package com.knowledgegraphnavigator;
2 
3 public class Utils {
4   static public String removeBrackets(String s) {
5     if (s.startsWith("<")) return s.substring(1, s.length() - 1);
6     return s;
7   }
8 }

Finally we get to the main program implemented in the class KGN. The interactive program is implemented in the class constructor with the heart of the code being the while loop in lines 26-119 that accepts text input from the user, detects entity names and the corresponding entity types in the input text, and uses the Java classes we just looked at to find information on DBPedia for the entities in the input text as well as finding relations between these entities. Instead of entering a list of entity names the user can also enter either of the commands sparql (which we saw earlier in an example) or demo (to use a randomly chosen example query).

We use the class TextToDbpediaUris on line 38 to get the entity names and types found in the input text. You can refer back to chapter Resolve Entity Names to DBPedia References for details on using the class TextToDbpediaUris.

The loops in lines 39-70 store entity details that are displayed by calling PrintEntityResearchResults in lines 72-76. The nested loops over person entities in lines 78-91 calls EntityRelationships.results to look for relationships between two different person URIs. The same operation is done in the nested loops in lines 93-104 to find relationships between people and companies. The nested loops in lines 105-118 finds relationships between different company entities.

The static method main in lines 134-136 simply creates an instance of class KGN which has the side effect of running the example KGN program.

  1 package com.knowledgegraphnavigator;
  2 
  3 import com.markwatson.ner_dbpedia.TextToDbpediaUris;
  4 import com.markwatson.semanticweb.QueryResult;
  5 
  6 import static com.knowledgegraphnavigator.Log.out;
  7 import static com.knowledgegraphnavigator.Log.sparql;
  8 import static com.knowledgegraphnavigator.Log.clearSparql;
  9 
 10 import java.util.ArrayList;
 11 import java.util.Arrays;
 12 import java.util.List;
 13 import java.util.Scanner;
 14 
 15 public class KGN {
 16 
 17   private static List<String> demosList =
 18    Arrays.asList(
 19     "Bill Gates and Melinda Gates worked at Microsoft",
 20     "IBM opened an office in Canada",
 21     "Steve Jobs worked at Apple Computer and visited IBM and Microsoft in Seattle");
 22 
 23   public KGN() throws Exception {
 24     Sparql endpoint = new Sparql();
 25 
 26     while (true) {
 27       String query = getUserQueryFromConsole();
 28       out("\nProcessing query:\n" + query + "\n");
 29       if (query.equalsIgnoreCase("sparql")) {
 30         out("Generated SPARQL used to get current results:\n");
 31         out(sparql.toString());
 32         out("\n");
 33         clearSparql();
 34       } else {
 35         if (query.equalsIgnoreCase("demo")) {
 36           query = demosList.get((int) (Math.random() * (demosList.size() + 1)));
 37         }
 38         TextToDbpediaUris kt = new TextToDbpediaUris(query);
 39         List<EntityAndDescription> userSelectedPeople = new ArrayList();
 40         if (kt.personNames.size() > 0) {
 41           for (int i = 0; i < kt.personNames.size(); i++) {
 42             userSelectedPeople.add(
 43                 new EntityAndDescription(kt.personNames.get(i),
 44                                      kt.personUris.get(i)));
 45           }
 46         }
 47         List<EntityAndDescription> userSelectedCompanies = new ArrayList();
 48         if (kt.companyNames.size() > 0) {
 49           for (int i = 0; i < kt.companyNames.size(); i++) {
 50             userSelectedCompanies.add(
 51                 new EntityAndDescription(kt.companyNames.get(i),
 52                                      kt.companyUris.get(i)));
 53           }
 54         }
 55         List<EntityAndDescription> userSelectedCities = new ArrayList();
 56         if (kt.cityNames.size() > 0) {
 57           out("+++++ kt.cityNames:" + kt.cityNames.toString());
 58           for (int i = 0; i < kt.cityNames.size(); i++) {
 59             userSelectedCities.add(
 60                 new EntityAndDescription(kt.cityNames.get(i), kt.cityUris.get(i)));
 61           }
 62         }
 63         List<EntityAndDescription> userSelectedCountries = new ArrayList();
 64         if (kt.countryNames.size() > 0) {
 65           out("+++++ kt.countryNames:" + kt.countryNames.toString());
 66           for (int i = 0; i < kt.countryNames.size(); i++) {
 67             userSelectedCountries.add(
 68                 new EntityAndDescription(kt.countryNames.get(i),
 69                                      kt.countryUris.get(i)));
 70           }
 71         }
 72         new PrintEntityResearchResults(endpoint,
 73             userSelectedPeople,
 74             userSelectedCompanies,
 75             userSelectedCities,
 76             userSelectedCountries);
 77 
 78         for (EntityAndDescription person1 : userSelectedPeople) {
 79           for (EntityAndDescription person2 : userSelectedPeople) {
 80             if (person1 != person2) {
 81               QueryResult qr = 
 82                 EntityRelationships.results(endpoint, person1.entityUri,
 83                                             person2.entityUri);
 84               if (qr.rows.size() > 0) {
 85                 out("Relationships between person " + person1.entityName +
 86                     " person " + person2.entityName + ":");
 87                 out(qr.toString());
 88               }
 89             }
 90           }
 91         }
 92 
 93         for (EntityAndDescription person : userSelectedPeople) {
 94           for (EntityAndDescription company : userSelectedCompanies) {
 95             QueryResult qr = 
 96               EntityRelationships.results(endpoint, person.entityUri,
 97                                           company.entityUri);
 98             if (qr.rows.size() > 0) {
 99               out("Relationships between person " + person.entityName +
100                   " company " + company.entityName + ":");
101               out(qr.toString());
102             }
103           }
104         }
105         for (EntityAndDescription company1 : userSelectedCompanies) {
106           for (EntityAndDescription company2 : userSelectedCompanies) {
107             if (company1 != company2) {
108               QueryResult qr = 
109                 EntityRelationships.results(endpoint, company1.entityUri, 
110                                             company2.entityUri);
111               if (qr.rows.size() > 0) {
112                 out("Relationships between company " + company1.entityName +
113                     " company " + company2.entityName + ":");
114                 out(qr.toString());
115               }
116             }
117           }
118         }
119       }
120     }
121   }
122 
123   private String getUserQueryFromConsole() {
124     out("Enter entities query:");
125     Scanner input = new Scanner(System.in);
126     String ret = "";
127     while (input.hasNext()) {
128       ret = input.nextLine();
129       break;
130     }
131     return ret;
132   }
133 
134   public static void main(String[] args) throws Exception {
135     new KGN();
136   }
137 }

This KGN example was hopefully both interesting to you and simple enough in its implementation (because we relied heavily on code from the last two chapters) that you feel comfortable modifying it and reusing it as a part of your own Java applications.

Wrap-up

If you enjoyed running and experimenting with this example and want to modify it for your own projects then I hope that I provided a sufficient road map for you to do so.

I suggest further projects that you might want to try implementing with this example:

  • Write a web application that processes news stories and annotates them with additional data from DBPedia and/or WikiData.
  • In a web or desktop application, detect entities in text and display additional information when the user’s mouse cursor hovers over a word or phrase that is identified as an entity found in DBPedia or WikiData.
  • Clone this KGN example and enable it to work simultaneously with DBPedia, WikiData, and local RDF files by using three instances of the class JenaApis and in the main application loop access all three data sources.

I had the idea for the KGN application because I was spending quite a bit of time manually setting up SPARQL queries for DBPedia (and other public sources like WikiData) and I wanted to experiment with partially automating this process. I have experimented with versions of KGN written in Java, Hy language (Lisp running on Python that I wrote a short book on), Swift, and Common Lisp and all four implementations take different approaches as I experimented with different ideas. You might want to check out my web site devoted to different versions of KGN: www.knowledgegraphnavigator.com.