DBPedia Question Answering System Using SparQL and Deep Learning
The Jupyter notebook for this example is available on Google Colab DBPedia Question Answering System
The example is also in the GitHub repository for this book in the jupyter_notesbooks directory (includes both notebook format files and a Python source file):
1 # DBPedia Question Answering System
2 # Copyright 2021-2022 Mark Watson. All rights reserved. License: Apache 2
3
4 !pip install transformers
5 !pip install SPARQLWrapper
6 !pip freeze
7
8 from transformers import pipeline
9
10 qa = pipeline(
11 "question-answering",
12 #model="NeuML/bert-small-cord19qa",
13 model="NeuML/bert-small-cord19-squad2",
14 tokenizer="NeuML/bert-small-cord19qa"
15 )
TBD:
1 !pip install import spacy
2 !python -m spacy download en_core_web_sm
3
4 import spacy
5
6 nlp_model = spacy.load('en_core_web_sm')
TBD:
1 from SPARQLWrapper import SPARQLWrapper, JSON
2
3 sparql = SPARQLWrapper("http://dbpedia.org/sparql")
4
5 def query(query):
6 sparql.setQuery(query)
7 sparql.setReturnFormat(JSON)
8 return sparql.query().convert()['results']['bindings']
TBD:
1 def entities_in_text(s):
2 doc = nlp_model(s)
3 ret = {}
4 for [ename, etype] in [[entity.text, entity.label_] for entity in doc.ents]:
5 if etype in ret:
6 ret[etype] = ret[etype] + [ename]
7 else:
8 ret[etype] = [ename]
9 return ret
10
11 def dbpedia_get_entities_by_name(name, dbpedia_type):
12 sparql = "select distinct ?s ?comment where {{ ?s <http://www.w3.org/2000/01/rdf-s\
13 chema#label> \"{}\"@en . ?s <http://www.w3.org/2000/01/rdf-schema#comment> ?commen\
14 t . FILTER (lang(?comment) = 'en') . ?s <http://www.w3.org/1999/02/22-rdf-syntax-n\
15 s#type> {} . }} limit 15".format(name, dbpedia_type)
16 #print(sparql)
17 results = query(sparql)
18 return(results)
19
20 entity_type_to_type_uri = {'PERSON': '<http://dbpedia.org/ontology/Person>',
21 'GPE': '<http://dbpedia.org/ontology/Place>', 'ORG':
22 '<http://dbpedia.org/ontology/Organisation>'}
TBD:
1 def QA(query_text):
2 entities = entities_in_text(query_text)
3
4 def helper(entity_type):
5 ret = ""
6 if entity_type in entities:
7 for hname in entities[entity_type]:
8 results = dbpedia_get_entities_by_name(hname, entity_type_to_type_uri[entity\
9 _type])
10 for result in results:
11 ret += ret + result['comment']['value'] + " . "
12 return ret
13
14 context_text = helper('PERSON') + helper('ORG') + helper('GPE')
15 print("\ncontext text:\n", context_text, "\n")
16
17 print("Answer from transformer model:")
18 print("Original query: ", query_text)
19 print("Answer:")
20
21 answer = qa({
22 "question": query_text,
23 "context": context_text
24 })
25 print(answer)
TBD:
1 QA("where does Bill Gates work?")
2 QA("where is IBM is headquartered?")
3 QA("who is Bill Clinton married to?")
4 QA("what is the population of Paris?")
Program output is:
1 context text:
2 William Henry Gates III (born October 28, 1955) is an American business magnate, so\
3 ftware developer, investor, author, and philanthropist. He is a co-founder of Micros\
4 oft, along with his late childhood friend Paul Allen. During his career at Microsoft\
5 , Gates held the positions of chairman, chief executive officer (CEO), president and\
6 chief software architect, while also being the largest individual shareholder until\
7 May 2014. He is considered one of the best known entrepreneurs of the microcomputer\
8 revolution of the 1970s and 1980s. .
9
10 Answer from transformer model:
11 Original query: where does Bill Gates work?
12 Answer:
13 {'score': 0.3949914574623108, 'start': 242, 'end': 251, 'answer': 'Microsoft'}
14
15 context text:
16 International Business Machines Corporation (IBM) is an American multinational tech\
17 nology corporation headquartered in Armonk, New York, with operations in over 171 co\
18 untries. The company began in 1911, founded in Endicott, New York by trust businessm\
19 an Charles Ranlett Flint, as the Computing-Tabulating-Recording Company (CTR) and wa\
20 s renamed "International Business Machines" in 1924. IBM is incorporated in New York\
21 . .
22
23 Answer from transformer model:
24 Original query: where is IBM is headquartered?
25 Answer:
26 {'score': 0.8817499279975891, 'start': 119, 'end': 135, 'answer': 'Armonk, New York'}
27
28 context text:
29 William Jefferson Clinton (né Blythe III; born August 19, 1946) is an American poli\
30 tician and attorney who served as the 42nd president of the United States from 1993 \
31 to 2001. He previously served as governor of Arkansas from 1979 to 1981 and again fr\
32 om 1983 to 1992, and as attorney general of Arkansas from 1977 to 1979. A member of \
33 the Democratic Party, Clinton became known as a New Democrat, as many of his policie\
34 s reflected a centrist "Third Way" political philosophy. He is the husband of Hillar\
35 y Clinton, who was secretary of state from 2009 to 2013 and the Democratic nominee f\
36 or president in the 2016 presidential election. .
37
38 Answer from transformer model:
39 Original query: who is Bill Clinton married to?
40 Answer:
41 {'score': 0.0006518915179185569, 'start': 497, 'end': 512, 'answer': 'Hillary Clinto\
42 n'}
43
44 context text:
45 Paris (French pronunciation: [paʁi]) is the capital and most populous city of Fran\
46 ce, with an estimated population of 2,175,601 residents as of 2018, in an area of mo\
47 re than 105 square kilometres (41 square miles). Since the 17th century, Paris has b\
48 een one of Europe's major centres of finance, diplomacy, commerce, fashion, gastrono\
49 my, science, and arts. The City of Paris is the centre and seat of government of the\
50 region and province of Île-de-France, or Paris Region, which has an estimated popul\
51 ation of 12,174,880, or about 18 percent of the population of France as of 2017. The\
52 Paris Region had a GDP of €709 billion ($808 billion) in 2017. According to the Eco\
53 nomist Intelligence Unit Worldwide Cost of Living Survey in 2018, Paris was the seco\
54 nd most expensive city in the world, after .
55
56 Answer from transformer model:
57 Original query: what is the population of Paris?
58 Answer:
59 {'score': 0.787605345249176, 'start': 119, 'end': 128, 'answer': '2,175,601'}