Table of Contents
- Preface
- A Quick Racket Tutorial
- Datastores
- Implementing a Simple RDF Datastore With Partial SPARQL Support in Racket
- Web Scraping
- Using the OpenAI, Anthropic, Mistral, and Local Hugging Face Large Language Model APIs in Racket
- Retrieval Augmented Generation of Text Using Embeddings
- Natural Language Processing
- Knowledge Graph Navigator
- Conclusions
Preface
I have been using Lisp languages since the 1970s. In 1982 my company bought a Lisp Machine for my use. A Lisp Machine provided an “all batteries included” working environment, but now no one seriously uses Lisp Machines. In this book I try to lead you, dear reader, through a process of creating a “batteries included” working environment using Racket Scheme.
The latest edition is always available for purchase at https://leanpub.com/racket-ai. You can also read free online at https://leanpub.com/racket-ai/read. I offer the purchase option for readers who wish to directly support my work.
If you read my eBooks free online then please consider tipping me https://markwatson.com/#tip.
This is a “live book:” there will never be a second edition. As I add material and make corrections, I simply update the book and the free to read online copy and all eBook formats for purchase get updated.
I have been developing commercial Artificial Intelligence (AI) tools and applications since the 1980s and I usually use the Lisp languages Common Lisp, Clojure, Racket Scheme, and Gambit Scheme. Here you will find Racket code that I wrote for my own use and I am wrapping in a book in the hopes that it will also be useful to you, dear reader.

I wrote this book for both professional programmers and home hobbyists who already know how to program in Racket (or another Scheme dialect) and who want to learn practical AI programming and information processing techniques. I have tried to make this an enjoyable book to work through. In the style of a “cook book,” the chapters can be read in any order.
You can find the code examples and the files for the manuscript for this book in the following GitHub repository:
https://github.com/mark-watson/Racket-AI-book
Git pull requests with code improvements for either the source code or manuscript markdown files will be appreciated by me and the readers of this book.
License for Book Manuscript: Creative Commons
Copyright 2022-2024 Mark Watson. All rights reserved.
This book may be shared using the Creative Commons “share and share alike, no modifications, no commercial reuse” license.
Book Example Programs
Here are the projects for this book:
- embeddingsdb - Vector database for embeddings. This project implements a vector database for storing and querying embeddings. It provides functionality for creating, storing, and retrieving vector representations of text or other data, which is crucial for many modern AI applications, especially those involving semantic search or similarity comparisons.
- kgn - Knowledge Graph Navigator. The Knowledge Graph Navigator is a tool for exploring and querying knowledge graphs. It includes utilities querying graph structures, executing queries, and visualizing relationships between entities. This can be particularly useful for working with semantic web technologies or large-scale knowledge bases.
- llmapis - Interfaces to various LLM APIs. This module provides interfaces to various Large Language Model APIs, including OpenAI, Anthropic, Mistral, and others. It offers for interact ing with different LLM providers, allowing users to easily switch between models or compare outputs from different services. This can be valuable for developers looking to integrate state-of-the-art language models into their applications.
- misc_code - Miscellaneous utility code. This directory contains various utility functions and examples for common programming tasks in Racket. It includes code for working with hash tables, parsing HTML, handling HTTP requests, and interacting with SQLite databases. These utilities can serve as building blocks for larger AI projects or as reference implementations for common tasks.
- nlp - Natural Language Processing utilities. The NLP module provides tools for natural language processing tasks. It includes implementations for part-of-speech tagging and named entity recognition. These fundamental NLP tasks are essential for many text analysis and understanding applications, making this module a valuable resource for developers working with textual data.
- pdf_chat - PDF text extraction. This project focuses on extracting text from PDF documents. It provides utilities for parsing PDF files and converting their content into plain text format. This can be particularly useful for applications that need to process or analyze information contained in PDF documents, such as document summarization or information retrieval systems.
- search_brave - Brave Search API wrapper. The Brave Search API wrapper provides an interface for interacting with the Brave search engine programmatically. It offers functions for sending queries and processing search results, making it easier to integrate Brave’s search capabilities into Racket applications.
- sparql - SPARQL querying utilities for DBpedia. This module focuses on SPARQL querying, particularly for interacting with DBpedia. It includes utilities for constructing SPARQL queries, executing them against DBpedia’s endpoint, and processing the results. This can be valuable for applications that need to extract structured information from large knowledge bases.
- webscrape - Web scraping utilities. The web scraping module provides tools for extracting information from websites. It includes functions for fetching web pages, parsing HTML content, and extracting specific data elements. These utilities can be useful for a wide range of applications, from data collection to automated information gathering and analysis.
The following diagram showing Racket software examples configured for your local laptop. There are several combined examples that build both to a Racket package that get installed locally, as well as command line programs that get built and deployed to ~/bin. Other examples are either a command line tool or a Racket package.

Racket, Scheme, and Common Lisp
I like Common Lisp slightly more than Racket and other Scheme dialects, even though Common Lisp is ancient and has defects. Then why do I use Racket? Racket is a family of languages, a very good IDE, and a rich ecosystem supported by many core Racket developers and Racket library authors. Choosing Racket Scheme was an easy decision, but there are also other Scheme dialects that I enjoy using:
- Gambit/C Scheme
- Gerbil Scheme (based on Gambit/C)
- Chez Scheme
Personal Artificial Intelligence Journey: or, Life as a Lisp Developer
I have been interested in AI since reading Bertram Raphael’s excellent book Thinking Computer: Mind Inside Matter in the early 1980s. I have also had the good fortune to work on many interesting AI projects including the development of commercial expert system tools for the Xerox LISP machines and the Apple Macintosh, development of commercial neural network tools, application of natural language and expert systems technology, medical information systems, application of AI technologies to Nintendo and PC video games, and the application of AI technologies to the financial markets. I have also applied statistical natural language processing techniques to analyzing social media data from Twitter and Facebook. I worked at Google on their Knowledge Graph and I managed a deep learning team at Capital One where I was awarded 55 US patents.
I enjoy AI programming, and hopefully this enthusiasm will also infect you.
Acknowledgements
I produced the manuscript for this book using the leanpub.com publishing system and I recommend leanpub.com to other authors.
Editor: Carol Watson
Thanks to the following people who found typos in this and earlier book editions: David Rupp.
A Quick Racket Tutorial
If you are an experienced Racket developer then feel free to skip this chapter! I wrote this tutorial to cover just the aspects of using Racket that you, dear reader, will need in the book example programs.
I assume that you have read the section Racket Essentials in the The Racket Guide written by Matthew Flatt, Robert Bruce Findler, and the PLT group. Here I just cover some basics of getting started so you can enjoy the later code examples without encountering “road blocks.”
Installing Packages
The DrRacket IDE lets you interactively install packages. I prefer using the command line so, for example, I would install SQlite support using:
1
raco pkg install sqlite-table
We can then require the code in this package in our Racket programs:
1
(
require
sqlite-table
)
Note that when the argument to require is a symbol (not a string) then modules are searched and loaded from your system. When the argument is a string like “utils.rkt” that a module is loaded from a file in the current directory.
Installing Local Packages In Place
In a later chapter Natural Language Processing (NLP) we define a fairly complicated local package. This package has one unusual requirement that you may or may not need in your own projects: My NLP library requires static linguistic data files that are stored in the directory Racket-AI-book-code/nlp/data. If I am in the directory Racket-AI-book-code/nlp working on the Racket code, it is simple enough to just open the files in ./data/….
The default for installing your own Racket packages is to link to the original source directory on your laptop’s file system. Let’s walk through this. First, I will make sure my library code is compiled and then install the code in the current directory:
1
cd
Racket-AI-book-code/nlp/
2
raco make *.rkt
3
raco pkg install --scope user
Then I can run the racket REPL (or DrRacket) on my laptop and use my NLP package by requiring the code in this package in our Racket programs (shown in a REPL):
1
>
(
require
nlp
)
2
loading
lex-hash......done.
3
>
(
parts-of-speech
(
list->vector
'
(
"the"
"cat"
"ran"
)))
4
'#
(
"DT"
"NN"
"VBD"
)
5
>
(
find-place-names
6
'#
(
"George"
"Bush"
"went"
"to"
"San"
"Diego"
7
"and"
"London"
)
'
())
8
'
(
"London"
"San Diego"
)
9
>
(
find-place-names
10
'#
(
"George"
"Bush"
"went"
"to"
"San"
"Diego"
11
"and"
"London"
)
'
())
12
'
(
"London"
"San Diego"
)
13
>
Mapping Over Lists
We will be using functions that take other functions as arguments:
1
>
(
range
5
)
2
'
(
0
1
2
3
4
)
3
>
(
define
(
1+
n
)
(
+
n
1
))
4
>
(
map
1+
(
range
5
))
5
'
(
1
2
3
4
5
)
6
>
(
map
+
(
range
5
)
'
(
100
101
102
103
104
))
7
'
(
100
102
104
106
108
)
8
>
Hash Tables
The following listing shows the file misc_code/hash_tests.rkt:
1
#lang
racket
2
3
(
define
h1
(
hash
"dog"
'
(
"friendly"
5
)
"cat"
'
(
"not friendly"
2
)))
;; not mutable
4
5
(
define
cat
(
hash-ref
h1
"cat"
))
6
7
(
define
h2
(
make-hash
))
;; mutable
8
(
hash-set!
h2
"snake"
'
(
"smooth"
4
))
9
10
;; make-hash also accepts a second argument that is a list of pairs:
11
(
define
h3
(
make-hash
'
((
"dog"
'
(
"friendly"
5
))
(
"cat"
'
(
"not friendly"
2
)))))
12
(
hash-set!
h3
"snake"
'
(
"smooth"
4
))
13
(
hash-set!
h3
"cat"
'
(
"sometimes likeable"
3
))
;; overwrite key value
14
15
;; for can be used with hash tables:
16
17
(
for
([(
k
v
)
h3
])
(
println
'
(
"key:"
k
"value:"
v
)))
Here is a lising of the output window after running this file and then manually evaluating h1, h2, and h3 in the REPL (like all listings in this book, I manually edit the output to fit page width):
1
Welcome
to
DrRacket
,
version
8.10
[
cs
]
.
2
Language
:
racket
,
with
debugging
;
memory
limit
:
128
MB
.
3
'("key:" k "value:" v)
4
'
(
"key:"
k
"value:"
v
)
5
'("key:" k "value:" v)
6
> h1
7
'
#hash
((
"cat"
.
(
"not friendly"
2
))
(
"dog"
.
(
"friendly"
5
)))
8
>
h2
9
'#hash(("snake" . ("smooth" 4)))
10
> h3
11
'
#hash
((
"cat"
.
(
"sometimes likeable"
3
))
12
(
"dog"
.
(
'
(
"friendly"
5
)))
13
(
"snake"
.
(
"smooth"
4
)))
14
>
Racket Structure Types
A structurer type is like a list that has named list elements. When you define a structure the Racket system writes getter and setter methods to access and change structure attribute values. Racket also generates a constructor function with the structure name. Let’s look at a simple example in a Racket REPL of creating a structure with mutable elements:
1
>
(
struct
person
(
name
age
email
)
#:mutable
)
2
>
(
define
henry
(
person
"Henry Higgans"
45
"henry@higgans.com"
))
3
>
(
person-age
henry
)
4
45
5
>
(
set-person-age!
henry
46
)
6
>
(
person-age
henry
)
7
46
8
>
(
set-person-email!
henry
"henryh674373551@gmail.com"
)
9
>
(
person-email
henry
)
10
"henryh674373551@gmail.com"
11
>
If you don’t add #:mutable to a struct definition, then no set-NAME-ATTRIBUTE! methods are generated.
Racket also supports object oriented programming style classes with methods. I don’t use classes in the book examples so you, dear reader, can read the official Racket documentation on classes if you want to use Racket in a non-functional way.
Simple HTTP GET and POST Operations
We will be using HTTP GET and POST instructions in later chapters for web scraping and accessing remote APIs, such as those for OpenAI GPT-4, Hugging Face, etc. We will see more detail later but for now, you can try a simple example:
1
#lang
racket
2
3
(
require
net/http-easy
)
4
(
require
html-parsing
)
5
(
require
net/url
xml
xml/path
)
6
(
require
racket/pretty
)
7
8
(
define
res-stream
9
(
get
"https://markwatson.com"
#:stream?
#t
))
10
11
(
define
lst
12
(
html->xexp
(
response-output
res-stream
)))
13
14
(
response-close!
res-stream
)
15
16
(
displayln
"
\n
Paragraph text:
\n
"
)
17
18
(
pretty-print
(
take
(
se-path*/list
'
(
p
)
lst
)
8
))
19
20
(
displayln
"
\n
LI text:
\n
"
)
21
22
(
pretty-print
(
take
(
se-path*/list
'
(
li
)
lst
)
8
))
The output is:
1
Paragraph
text
:
2
3
'
("My customer list includes: Google, Capital One, Babylist, Olive AI, CompassLabs, \
4
Mind
AI
, Disney
, SAIC
, Americast
, PacBell
, CastTV
, Lutris
Technology
, Arctan
Group
, \
5
Sitescout
.com
, Embed
.ly
, and
Webmind
Corporation
."
6
"
I have worked in the fields of general
\n
"
7
"
artificial intelligence, machine learning, semantic web and linked data, an\
8
d
\n
"
9
"
natural language processing since 1982.
"
10
"
My eBooks are available to read for FREE or you can purchase them at
"
11
(
a
(
@ (
href
"
https://leanpub.com/u/markwatson
"
))
"
leanpub
"
)
12
"
.
"
13
"
My standard
"
)
14
15
LI
text
:
16
17
'
((@ (class "list f4 f3-ns fw4 dib pr3"))
18
"
\n
"
19
"
"
20
(
&
nbsp
)
21
(
&
nbsp
)
22
(
a
23
(
@
24
(
class
"
hover-white no-underline white-90
"
)
25
(
href
"
https://mark-watson.blogspot.com
"
)
26
(
target
"
new
"
)
27
(
title
"
Mark's Blog on Blogspot
"
))
28
"
\n
"
29
"
Read my Blog
\n
"
30
"
"
)
31
"
\n
"
32
"
"
)
Using Racket ~/.racketrc Initialization File
In my Racket workflow I don’t usually use ~/.racketrc to define initial forms that are automatically loaded when starting the racket command line tool or the DrRacket application. That said I do like to use ~/.racketrc for temporary initialization forms when working on a specific project to increase the velocity of interactive development.
Here is an example use:
1
$
cat ~/.racketrc
2
(define (foo-list x)
3
(list x x x))
4
$
racket
5
Welcome to Racket v8.10 [cs].
6
> (foo-list 3.14159)
7
'(3.14159 3.14159 3.14159)
8
>
If you have local and public libraries you frequently load you can permanently keep require forms for them in ~/.racketrc but that will slightly slow down the startup times of racket and DrRacket.
Tutorial Wrap Up
The rest of this book is comprised of example Racket programs that I have written for my own enjoyment that I hope will also be useful to you, dear reader. Please refer to the https://docs.racket-lang.org/guide/ for more technical detail on effectively using the Racket language and ecosystem.
Datastores
For my personal research projects the only datastores that I often use are the embedded relational database and Resource Description Framework (RDF) datastores that might be local to my laptop or public Knowledge Graphs like DBPedia and WikiData. The use of RDF data and the SPARQL query language is part of the fields of the semantic web and linked data.
Accessing Public RDF Knowledge Graphs - a DBPedia Example
I will not cover RDF data and the SPARQL query language in great detail here. Rather, please reference the following link to read the RDF and SPARQL tutorial data in my Common Lisp book: Loving Common Lisp, or the Savvy Programmer’s Secret Weapon.
In the following Racket code example for accesing data on DBPedia using SPARQL, the primary objective is to interact with DBpedia’s SPARQL endpoint to query information regarding a person based on their name or URI. The code is structured into several functions, each encapsulating a specific aspect of the querying process, thereby promoting modular design and ease of maintenance.
Function Definitions:
-
sparql-dbpedia-for-person: This function takes a
person-uri
as an argument and constructs a SPARQL query to retrieve the comment and website associated with the person. The@string-append
macro helps in constructing the SPARQL query string by concatenating the literals and theperson-uri
argument. -
sparql-dbpedia-person-uri: Similar to the above function, this function accepts a
person-name
argument and constructs a SPARQL query to fetch the URI and comment of the person from DBpedia. -
sparql-query->hash: This function encapsulates the logic for sending the constructed SPARQL query to the DBpedia endpoint. It takes a
query
argument, encodes it into a URL format, and sends an HTTP request to the DBpedia SPARQL endpoint. The response, expected in JSON format, is then converted to a Racket expression usingstring->jsexpr
. - json->listvals: This function is designed to transform the JSON expression obtained from the SPARQL endpoint into a more manageable list of values. It processes the hash data structure, extracting the relevant bindings and converting them into a list format.
-
gd (Data Processing Function): This function processes the data structure obtained from
json->listvals
. It defines four inner functionsgg1
,gg2
,gg3
, andgg4
, each designed to handle a specific number of variables returned in the SPARQL query result. It uses acase
statement to determine which inner function to call based on the length of the data. -
sparql-dbpedia: This is the entry function which accepts a
sparql
argument, invokessparql-query->hash
to execute the SPARQL query, and then callsgd
to process the resulting data structure.
Usage Flow:
The typical flow would be to call sparql-dbpedia-person-uri with a person’s name to obtain the person’s URI and comment from DBpedia. Following that, sparql-dbpedia-for-person can be invoked with the obtained URI to fetch more detailed information like websites associated with the person. The results from these queries are then processed through sparql-query->hash, json->listvals, and gd to transform the raw JSON response into a structured list format, making it easier to work with within the Racket environment.
1
#lang
at-exp
racket
2
3
(
provide
sparql-dbpedia-person-uri
)
4
(
provide
sparql-query->hash
)
5
(
provide
json->listvals
)
6
(
provide
sparql-dbpedia
)
7
8
(
require
net/url
)
9
(
require
net/uri-codec
)
10
(
require
json
)
11
(
require
racket/pretty
)
12
13
(
define
(
sparql-dbpedia-for-person
person-uri
)
14
@string-append
{
15
SELECT
16
(
GROUP_CONCAT
(
DISTINCT
?website
; SEPARATOR=" | ")
17
AS
?website
)
?comment
{
18
OPTIONAL
{
19
@person-uri
20
<http://www.w3.org/2000/01/rdf-schema#comment>
21
?comment
.
FILTER
(
lang
(
?comment
)
=
'
en
'
)
22
}
.
23
OPTIONAL
{
24
@person-uri
25
<http://dbpedia.org/ontology/wikiPageExternalLink>
26
?website
27
.
FILTER
(
!regex
(
str
(
?website
)
,
"dbpedia"
,
"i"
))
28
}
29
}
LIMIT
4
})
30
31
(
define
(
sparql-dbpedia-person-uri
person-name
)
32
@string-append
{
33
SELECT
DISTINCT
?personuri
?comment
{
34
?personuri
35
<http://xmlns.com/foaf/0.1/name>
36
"@person-name"
@
"@"
en
.
37
?personuri
38
<http://www.w3.org/2000/01/rdf-schema#comment>
39
?comment
.
40
FILTER
(
lang
(
?comment
)
=
'
en
'
)
.
41
}})
42
43
44
(
define
(
sparql-query->hash
query
)
45
(
call/input-url
46
(
string->url
47
(
string-append
48
"https://dbpedia.org/sparql?query="
49
(
uri-encode
query
)))
50
get-pure-port
51
(
lambda
(
port
)
52
(
string->jsexpr
(
port->string
port
)))
53
'
(
"Accept: application/json"
)))
54
55
(
define
(
json->listvals
a-hash
)
56
(
let
((
bindings
(
hash->list
a-hash
)))
57
(
let*
((
head
(
first
bindings
))
58
(
vars
(
hash-ref
(
cdr
head
)
'
vars
))
59
(
results
(
second
bindings
)))
60
(
let*
((
x
(
cdr
results
))
61
(
b
(
hash-ref
x
'
bindings
)))
62
(
for/list
63
([
var
vars
])
64
(
for/list
([
bc
b
])
65
(
let
((
bcequal
66
(
make-hash
(
hash->list
bc
))))
67
(
let
((
a-value
68
(
hash-ref
69
(
hash-ref
70
bcequal
71
(
string->symbol
var
))
'
value
)))
72
(
list
var
a-value
)))))))))
73
74
75
(
define
gd
(
lambda
(
data
)
76
77
(
let
((
jd
(
json->listvals
data
)))
78
79
(
define
gg1
80
(
lambda
(
jd
)
(
map
list
(
car
jd
))))
81
(
define
gg2
82
(
lambda
(
jd
)
(
map
list
(
car
jd
)
(
cadr
jd
))))
83
(
define
gg3
84
(
lambda
(
jd
)
85
(
map
list
(
car
jd
)
(
cadr
jd
)
(
caddr
jd
))))
86
(
define
gg4
87
(
lambda
(
jd
)
88
(
map
list
89
(
car
jd
)
(
cadr
jd
)
90
(
caddr
jd
)
(
cadddr
jd
))))
91
92
(
case
(
length
(
json->listvals
data
))
93
[(
1
)
(
gg1
(
json->listvals
data
))]
94
[(
2
)
(
gg2
(
json->listvals
data
))]
95
[(
3
)
(
gg3
(
json->listvals
data
))]
96
[(
4
)
(
gg4
(
json->listvals
data
))]
97
[
else
98
(
error
"sparql queries with 1 to 4 vars"
)]))))
99
100
101
(
define
sparql-dbpedia
102
(
lambda
(
sparql
)
103
(
gd
(
sparql-query->hash
sparql
))))
104
105
;; (sparql-dbpedia (sparql-dbpedia-person-uri "Steve Jobs"))
Let’s try an example in a Racket REPL:
1
'
(((
"personuri"
"http://dbpedia.org/resource/Steve_Jobs"
)
2
(
"comment"
3
"Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American entrepre
\
4
neur, industrial designer, media proprietor, and investor. He was the co-founder, ch
\
5
airman, and CEO of Apple; the chairman and majority shareholder of Pixar; a member o
\
6
f The Walt Disney Company's board of directors following its acquisition of Pixar; a
\
7
nd the founder, chairman, and CEO of NeXT. He is widely recognized as a pioneer of t
\
8
he personal computer revolution of the 1970s and 1980s, along with his early busines
\
9
s partner and fellow Apple co-founder Steve Wozniak."
))
10
((
"personuri"
"http://dbpedia.org/resource/Steve_Jobs_(film)"
)
11
(
"comment"
12
"Steve Jobs is a 2015 biographical drama film directed by Danny Boyle and writte
\
13
n by Aaron Sorkin. A British-American co-production, it was adapted from the 2011 bi
\
14
ography by Walter Isaacson and interviews conducted by Sorkin, and covers 14 years (
\
15
1984–1998) in the life of Apple Inc. co-founder Steve Jobs. Jobs is portrayed by Mic
\
16
hael Fassbender, with Kate Winslet as Joanna Hoffman and Seth Rogen, Katherine Water
\
17
ston, Michael Stuhlbarg, and Jeff Daniels in supporting roles."
))
18
((
"personuri"
"http://dbpedia.org/resource/Steve_Jobs_(book)"
)
19
(
"comment"
20
"Steve Jobs is the authorized self-titled biography of American business magnate
\
21
and Apple co-founder Steve Jobs. The book was written at the request of Jobs by Wal
\
22
ter Isaacson, a former executive at CNN and TIME who has written best-selling biogra
\
23
phies of Benjamin Franklin and Albert Einstein. The book was released on October 24,
\
24
2011, by Simon & Schuster in the United States, 19 days after Jobs's death. A film
\
25
adaptation written by Aaron Sorkin and directed by Danny Boyle, with Michael Fassben
\
26
der starring in the title role, was released on October 9, 2015."
)))
In practice, I start exploring data on DBPedia using the SPARQL query web app https://dbpedia.org/sparql. I experiment with different SPARQL queries for whatever application I am working on and then embed those queries in my Racket, Common Lisp, Clojure (link to read my Clojure AI book free online), and other programming languages I use.
In addition to using DBPedia I often also use the WikiData public Knowledge Graph and local RDF data stores hosted on my laptop with Apache Jena. I might add examples for these two use cases in future versions of this live eBook.
Sqlite
Using SQlite in Racket is simple so we will just look at a simple example. We will be using the Racket source file sqlite.rkt in the directory Racket-AI-book-code/misc_code for the code snippets in this REPL:
1
$
racket
2
Welcome
to
Racket
v8.10
[
cs
]
.
3
>
(
require
db
)
4
>
(
require
sqlite-table
)
5
>
(
define
db-file
"test.db"
)
6
>
(
define
db
(
sqlite3-connect
#:database
db-file
#:mode
'
create
))
7
>
(
query-exec
db
8
"create temporary table the_numbers (n integer, d varchar(20))"
)
9
>
(
query-exec
db
10
"create table person (name varchar(30), age integer, email varchar(20))"
)
11
>
(
query-exec
db
12
"insert into person values ('Mary', 34, 'mary@test.com')"
)
13
>
(
query-rows
db
"select * from person"
)
14
'
(
#
(
"Mary"
34
"mary@test.com"
))
15
>
Here we see how to interact with a SQLite database using the db and sqlite-table libraries in Racket. The sqlite3-connect function is used to connect to the SQLite database specified by the string value of db-file. The #:mode ‘create keyword argument indicates that a new database should be created if it doesn’t already exist. The database connection object is bound to the identifier db.
The query-exec function call is made to create a permanent table named person with three columns: name of type varchar(30), age of type integer, and email of type varchar(20). The next query-exec function call is made to insert a new row into the person table with the values ‘Mary’, 34, and ‘mary@test.com’. There is a function query that we don’t use here that returns the types of the columns returned by a query. We use the alternative function query-rows that only returns the query results with no type information.
Implementing a Simple RDF Datastore With Partial SPARQL Support in Racket
This chapter explains a Racket implementation of a simple RDF (Resource Description Framework) datastore with partial SPARQL (SPARQL Protocol and RDF Query Language) support. We’ll cover the core RDF data structures, query parsing and execution, helper functions, and the main function with example queries. The file rdf_sparql.rkt can be found online at https://github.com/mark-watson/Racket-AI-book/source-code/simple_RDF_SPARQL.
Before looking at the code we look at sample use and output. The function test function demonstrates the usage of the RDF datastore and SPARQL query execution:
1
(
define
(
test
)
2
(
set!
rdf-store
'
())
3
4
(
add-triple
"John"
"age"
"30"
)
5
(
add-triple
"John"
"likes"
"pizza"
)
6
(
add-triple
"Mary"
"age"
"25"
)
7
(
add-triple
"Mary"
"likes"
"sushi"
)
8
(
add-triple
"Bob"
"age"
"35"
)
9
(
add-triple
"Bob"
"likes"
"burger"
)
10
11
(
print-all-triples
)
12
13
(
define
(
print-query-results
query-string
)
14
(
printf
"Query: ~a
\n
"
query-string
)
15
(
let
([
results
(
execute-sparql-query
query-string
)])
16
(
printf
"Final Results:
\n
"
)
17
(
if
(
null?
results
)
18
(
printf
" No results
\n
"
)
19
(
for
([
result
results
])
20
(
printf
" ~a
\n
"
21
(
string-join
22
(
map
(
lambda
(
pair
)
23
(
format
"~a: ~a"
(
car
pair
)
(
cdr
pair
)))
24
result
)
25
", "
))))
26
(
printf
"
\n
"
)))
27
28
(
print-query-results
"select * where { ?name age ?age . ?name likes ?food }"
)
29
(
print-query-results
"select ?s ?o where { ?s likes ?o }"
)
30
(
print-query-results
"select * where { ?name age ?age . ?name likes pizza }"
))
This function test:
- Initializes the RDF store with sample data.
- Prints all triples in the datastore.
- Defines a
print-query-results
function to execute and display query results. - Executes three example SPARQL queries:
- Query all name-age-food combinations.
- Query all subject-object pairs for the “likes” predicate.
- Query all people who like pizza and their ages.
Function test generates this output:
1
All triples in the datastore:
2
Bob likes burger
3
Bob age 35
4
Mary likes sushi
5
Mary age 25
6
John likes pizza
7
John age 30
8
9
Query: select * where { ?name age ?age . ?name likes ?food }
10
Final Results:
11
?age: 35, ?name: Bob, ?food: burger
12
?age: 25, ?name: Mary, ?food: sushi
13
?age: 30, ?name: John, ?food: pizza
14
15
Query: select ?s ?o where { ?s likes ?o }
16
Final Results:
17
?s: Bob, ?o: burger
18
?s: Mary, ?o: sushi
19
?s: John, ?o: pizza
20
21
Query: select * where { ?name age ?age . ?name likes pizza }
22
Final Results:
23
?age: 30, ?name: John
1. Core RDF Data Structures and Basic Operations
There are two parts to this example in file rdf_sparql.rkt: a simple unindexed RDF datastore and a partial SPARQL query implementation that supports compound where clause matches like: select * where { ?name age ?age . ?name likes pizza }.
1.1 RDF Triple Structure
The foundation of our RDF datastore is the triple
structure:
1
(
struct
triple
(
subject
predicate
object
)
#:transparent
)
This structure represents an RDF triple, consisting of a subject, predicate, and object. The #:transparent
keyword makes the structure’s fields accessible for easier debugging and printing.
1.2 RDF Datastore
The RDF datastore is implemented as a simple list:
1
(
define
rdf-store
'
())
1.3 Basic Operations
Two fundamental operations are defined for the datastore:
- Adding a triple:
1
(
define
(
add-triple
subject
predicate
object
)
2
(
set!
rdf-store
(
cons
(
triple
subject
predicate
object
)
rdf-store
)))
- Removing a triple:
1
(
define
(
remove-triple
subject
predicate
object
)
2
(
set!
rdf-store
3
(
filter
(
lambda
(
t
)
4
(
not
(
and
(
equal?
(
triple-subject
t
)
subject
)
5
(
equal?
(
triple-predicate
t
)
predicate
)
6
(
equal?
(
triple-object
t
)
object
))))
7
rdf-store
)))
2. Query Parsing and Execution
2.1 SPARQL Query Structure
A simple SPARQL query is represented by the sparql-query
structure:
1
(
struct
sparql-query
(
select-vars
where-patterns
)
#:transparent
)
2.2 Query Parsing
The parse-sparql-query
function takes a query string and converts it into a sparql-query
structure:
1
(
define
(
parse-sparql-query
query-string
)
2
(
define
tokens
(
filter
(
lambda
(
token
)
(
not
(
member
token
'
(
"{"
"}"
)
string=?
)))
3
(
split-string
query-string
)))
4
(
define
select-index
(
index-of
tokens
"select"
string-ci=?
))
5
(
define
where-index
(
index-of
tokens
"where"
string-ci=?
))
6
(
define
(
sublist
lst
start
end
)
7
(
take
(
drop
lst
start
)
(
-
end
start
)))
8
(
define
select-vars
(
sublist
tokens
(
add1
select-index
)
where-index
))
9
(
define
where-clause
(
drop
tokens
(
add1
where-index
)))
10
(
define
where-patterns
(
parse-where-patterns
where-clause
))
11
(
sparql-query
select-vars
where-patterns
))
2.3 Query Execution
The main query execution function is execute-sparql-query
:
1
(
define
(
execute-sparql-query
query-string
)
2
(
let*
([
query
(
parse-sparql-query
query-string
)]
3
[
where-patterns
(
sparql-query-where-patterns
query
)]
4
[
select-vars
(
sparql-query-select-vars
query
)]
5
[
results
(
execute-where-patterns
where-patterns
)]
6
[
projected-results
(
project-results
results
select-vars
)])
7
projected-results
))
This function parses the query, executes the WHERE patterns, and projects the results based on the SELECT variables.
3. Helper Functions and Utilities
Several helper functions are implemented to support query execution:
-
variable?
: Checks if a string is a SPARQL variable (starts with ‘?’). -
triple-to-binding
: Converts a triple to a binding based on a pattern. -
query-triples
: Filters triples based on a given pattern. -
apply-bindings
: Applies bindings to a pattern. -
merge-bindings
: Merges two sets of bindings. -
project-results
: Projects the final results based on the SELECT variables.
1
(
define
(
variable?
str
)
2
(
and
(
string?
str
)
(
>
(
string-length
str
)
0
)
(
char=?
(
string-ref
str
0
)
#\?
)))
3
4
(
define
(
triple-to-binding
t
[
pattern
#f
])
5
(
define
binding
'
())
6
(
when
(
and
pattern
(
variable?
(
first
pattern
)))
7
(
set!
binding
(
cons
(
cons
(
first
pattern
)
(
triple-subject
t
))
binding
)))
8
(
when
(
and
pattern
(
variable?
(
second
pattern
)))
9
(
set!
binding
(
cons
(
cons
(
second
pattern
)
(
triple-predicate
t
))
binding
)))
10
(
when
(
and
pattern
(
variable?
(
third
pattern
)))
11
(
set!
binding
(
cons
(
cons
(
third
pattern
)
(
triple-object
t
))
binding
)))
12
binding
)
13
14
(
define
(
query-triples
subject
predicate
object
)
15
(
filter
16
(
lambda
(
t
)
17
(
and
18
(
or
(
not
subject
)
(
variable?
subject
)
(
equal?
(
triple-subject
t
)
subject
))
19
(
or
(
not
predicate
)
(
variable?
predicate
)
20
(
equal?
(
triple-predicate
t
)
predicate
))
21
(
or
(
not
object
)
(
variable?
object
)
(
equal?
(
triple-object
t
)
object
))))
22
rdf-store
))
23
24
(
define
(
apply-bindings
pattern
bindings
)
25
(
map
(
lambda
(
item
)
26
(
if
(
variable?
item
)
27
(
or
(
dict-ref
bindings
item
#f
)
item
)
28
item
))
29
pattern
))
30
31
(
define
(
merge-bindings
binding1
binding2
)
32
(
append
binding1
binding2
))
33
34
(
define
(
project-results
results
select-vars
)
35
(
if
(
equal?
select-vars
'
(
"*"
))
36
(
map
remove-duplicate-bindings
results
)
37
(
map
(
lambda
(
result
)
38
(
remove-duplicate-bindings
39
(
map
(
lambda
(
var
)
40
(
cons
var
(
dict-ref
result
var
#f
)))
41
select-vars
)))
42
results
)))
Conclusion
This implementation provides a basic framework for an RDF datastore with partial SPARQL support in Racket. While it lacks many features of a full-fledged RDF database and SPARQL engine, it demonstrates the core concepts and can serve as a starting point for more complex implementations. The code is simple and can be fun experimenting with.
Web Scraping
I often write software to automatically collect and use data from the web and other sources. As a practical matter, much of the data that many people use for machine learning comes from either the web or from internal data sources. This section provides some guidance and examples for getting text data from the web.
Before we start a technical discussion about web scraping I want to point out that much of the information on the web is copyright, so first you should read the terms of service for web sites to insure that your use of “scraped” or “spidered” data conforms with the wishes of the persons or organizations who own the content and pay to run scraped web sites.
We start with low-level Racket code examples in the GitHub repository for this book in the directory Racket-AI-book-code/misc_code. We will then implement a standalone library in the directory Racket-AI-book-code/webscrape.
Getting Started Web Scraping
All of the examples in the section can be found in the Racket code snippet files in the directory Racket-AI-book-code/misc_code.
I have edited the output for brevity in the following REPL outoput:
1
$
racket
2
Welcome
to
Racket
v8.10
[
cs
]
.
3
>
(
require
net/http-easy
)
4
>
(
require
html-parsing
)
5
>
(
require
net/url
xml
xml/path
)
6
>
(
require
racket/pretty
)
7
>
(
define
res-stream
8
(
get
"https://markwatson.com"
#:stream?
#t
))
9
>
res-stream
10
#<response>
11
>
(
define
lst
12
(
html->xexp
(
response-output
res-stream
)))
13
>
lst
14
'
(
*TOP*
15
(
*DECL*
DOCTYPE
html
)
16
"
\n
"
17
(
html
18
(
@
(
lang
"en-us"
))
19
"
\n
"
20
" "
21
(
head
22
(
title
23
"Mark Watson: AI Practitioner and Author of 20+ AI Books | Mark Watson"
)
24
...
Different element types are html, head, p, h1, h2, etc. If you are familiar with XPATH operations for XML data, then the function se-path/list will make more sense to your. The function se-path/list takes a list of element types from a list and recursively searches an input s-expression for lists starting with one of the target element types. In the following example we extract all elements of type p:
1
>
(
se-path*/list
'
(
p
)
lst
)
;; get text from all p elements
2
'
(
"My customer list includes: Google, Capital One, Babylist, Olive AI, CompassLabs,
\
3
Mind AI, Disney, SAIC, Americast, PacBell, CastTV, Lutris Technology, Arctan Group,
\
4
Sitescout.com, Embed.ly, and Webmind Corporation."
5
"I have worked in the fields of general
\n
"
6
" artificial intelligence, machine learning, semantic web and linked data, and
\n
"
7
" natural language processing since 1982."
8
"My eBooks are available to read for FREE or you can purchase them at "
9
(
a
(
@
(
href
"https://leanpub.com/u/markwatson"
))
"leanpub"
)
10
...
1
>
(
define
lst-p
(
se-path*/list
'
(
p
)
lst
))
2
>
(
filter
(
lambda
(
s
)
(
string?
s
))
lst-p
)
;; keep only text strings
3
'
(
"My customer list includes: Google, Capital One, Babylist, Olive AI, CompassLabs,
\
4
Mind AI, Disney, SAIC, Americast, PacBell, CastTV, Lutris Technology, Arctan Group,
\
5
Sitescout.com, Embed.ly, and Webmind Corporation."
1
#<procedure:string-normalize-spaces>
2
>
(
string-normalize-spaces
3
(
string-join
4
(
filter
(
lambda
(
s
)
(
string?
s
))
lst-p
)
5
"
\n
"
))
6
"My customer list includes: Google, Capital One, Babylist, Olive AI, CompassLabs, Mi
\
7
nd AI, Disney, SAIC, Americast, PacBell, CastTV, Lutris Technology, Arctan Group, Si
\
8
tescout.com, Embed.ly, and Webmind Corporation. I have worked in the fields of gener
\
9
al artificial intelligence, machine learning, semantic web and linked data, and natu
\
10
ral language processing since 1982.
11
...
12
"
Now we will extract HTML anchor links:
1
>
(
se-path*/list
'
(
href
)
lst
)
;; get all links from HTML as a lisp
2
'
(
"/index.css"
3
"https://mark-watson.blogspot.com"
4
"#fun"
5
"#books"
6
"#opensource"
7
"https://markwatson.com/privacy.html"
8
"https://leanpub.com/u/markwatson"
9
"/nda.txt"
10
"https://mastodon.social/@mark_watson"
11
"https://twitter.com/mark_l_watson"
12
"https://github.com/mark-watson"
13
"https://www.linkedin.com/in/marklwatson/"
14
"https://markwatson.com/index.rdf"
15
"https://www.wikidata.org/wiki/Q18670263"
16
...
17
)
Implementation of a Racket Web Scraping Library
The web scraping library listed below can be found in the directory Racket-AI-book/manuscript. The following listing of webscrape.rkt should look familiar after reading the code snippets in the last section.
The provided Racket Scheme code defines three functions to interact with and process web resources: web-uri->xexp
, web-uri->text
, and web-uri->links
.
web-uri->xexp
:
- Requires three libraries: net/http-easy
, html-parsing
, and net/url xml xml/path
.
- Given a URI (a-uri
), it creates a stream (a-stream
) using the get
function from the net/http-easy
library to fetch the contents of the URI.
- Converts the HTML content of the URI to an S-expression (xexp
) using the html->xexp
function from the html-parsing
library.
- Closes the response stream using response-close!
and returns the xexp
.
web-uri->text
:
- Calls web-uri->xexp
to convert the URI content to an xexp
.
- Utilizes se-path*/list
from the xml/path
library to extract all paragraph elements (p
) from the xexp
.
- Filters the paragraph elements to retain only strings (excluding nested tags or other structures).
- Joins these strings with a newline separator, normalizing spaces using string-normalize-spaces
from the srfi/13
library.
web-uri->links
:
- Similar to web-uri->text
, it starts by converting URI content to an xexp
.
- Utilizes se-path*/list
to extract all href
attributes from the xexp
.
- Filters these href
attributes to retain only those that are external links (those beginning with “http”).
In summary, these functions collectively enable the extraction and processing of HTML content from a specified URI, converting HTML to a more manageable S-expression format, and then extracting text and links as required.
1
#lang
racket
2
3
(
require
net/http-easy
)
4
(
require
html-parsing
)
5
(
require
net/url
xml
xml/path
)
6
(
require
srfi/13
)
;; for strings
7
8
(
define
(
web-uri->xexp
a-uri
)
9
(
let*
((
a-stream
10
(
get
a-uri
#:stream?
#t
))
11
(
lst
(
html->xexp
(
response-output
a-stream
))))
12
(
response-close!
a-stream
)
13
lst
))
14
15
(
define
(
web-uri->text
a-uri
)
16
(
let*
((
a-xexp
17
(
web-uri->xexp
a-uri
))
18
(
p-elements
(
se-path*/list
'
(
p
)
a-xexp
))
19
(
lst-strings
20
(
filter
21
(
lambda
(
s
)
(
string?
s
))
22
p-elements
)))
23
(
string-normalize-spaces
24
(
string-join
lst-strings
"
\n
"
))))
25
26
(
define
(
web-uri->links
a-uri
)
27
(
let*
((
a-xexp
28
(
web-uri->xexp
a-uri
)))
29
;; we want only external links so filter out local links:
30
(
filter
31
(
lambda
(
s
)
(
string-prefix?
"http"
s
))
32
(
se-path*/list
'
(
href
)
a-xexp
))))
Here are a few examples in a Racket REPL (most output omitted for brevity):
1
>
(
web-uri->xexp
"https://knowledgebooks.com"
)
2
'
(
*TOP*
3
(
*DECL*
DOCTYPE
html
)
4
"
\n
"
5
(
html
6
"
\n
"
7
"
\n
"
8
(
head
9
"
\n
"
10
" "
11
(
title
"KnowledgeBooks.com - research on the Knowledge Management, and the Seman
\
12
tic Web "
)
13
"
\n
"
14
...
15
16
>
(
web-uri->text
"https://knowledgebooks.com"
)
17
"With the experience of working on Machine Learning and Knowledge Graph applications
18
...
19
20
> (web-uri->links "
https://knowledgebooks.com
")
21
'("
http://markwatson.com
"
22
"
https://oceanprotocol.com/
"
23
"
https://commoncrawl.org/
"
24
"
http://markwatson.com/consulting/
"
25
"
http://kbsportal.com
")
If you want to install this library on your laptop using linking (requiring the library access a link to the source code in the directory Racket-AI-book-code/webscrape) run the following in the library source directory Racket-AI-book-code/webscrape:
raco pkg install –scope user
Using the OpenAI, Anthropic, Mistral, and Local Hugging Face Large Language Model APIs in Racket
Note: November 21, 2024 change: added examples using William J. Bowman’s Racket language llm to the end of this chapter.
As I write the first version of this chapter in October 2023, Peter Norvig and Blaise Agüera y Arcas just wrote an article Artificial General Intelligence Is Already Here making the case that we might already have Artificial General Intelligence (AGI) because of the capabilities of Large Language Models (LLMs) to solve new tasks.
In the development of practical AI systems, LLMs like those provided by OpenAI, Anthropic, and Hugging Face have emerged as pivotal tools for numerous applications including natural language processing, generation, and understanding. These models, powered by deep learning architectures, encapsulate a wealth of knowledge and computational capabilities. As a Racket Scheme enthusiast embarking on the journey of intertwining the elegance of Racket with the power of these modern language models, you are opening a gateway to a realm of possibilities that we begin to explore here.
The OpenAI and Anthropic APIs serve as gateways to some of the most advanced language models available today. By accessing these APIs, developers can harness the power of these models for a variety of applications. Here, we delve deeper into the distinctive features and capabilities that these APIs offer, which could be harnessed through a Racket interface.
OpenAI provides an API for developers to access models like GPT-4. The OpenAI API is designed with simplicity and ease of use in mind, making it a favorable choice for developers. It provides endpoints for different types of interactions, be it text completion, translation, or semantic search among others. We will use the completion API in this chapter. The robustness and versatility of the OpenAI API make it a valuable asset for anyone looking to integrate advanced language understanding and generation capabilities into their applications.
On the other hand, Anthropic is a newer entrant in the field but with a strong emphasis on building models that are not only powerful but also understandable and steerable. The Anthropic API serves as a portal to access their language models. While the detailed offerings and capabilities might evolve, the core ethos of Anthropic is to provide models that developers can interact with in a more intuitive and controlled manner. This aligns with a growing desire within the AI community for models that are not black boxes, but instead, offer a level of interpretability and control that makes them safer and more reliable to use in different contexts. We will use the Anthropic completion API.
What if you want the total control of running open LLMs on your own computers? The company Hugging Face maintains a huge repository of pre-trained models. Some of these models are licensed for research only but many are licensed (e.g., using Apache 2) for any commercial use. Many of the Hugging Face models are derived from Meta and other companies. We will use the llama.cpp server at the end of this chapter to run our own LLM on a laptop and access it via Racket code.
Lastly, this chapter will delve into practical examples showing the synergy between systems developed in Racket and the LLMs. Whether it’s automating creative writing, conducting semantic analysis, or building intelligent chatbots, the fusion of Racket with OpenAI, Anthropic, and Hugging Face’s LLMs provides many opportunities for you, dear reader, to write innovative software that utilizes the power of LLMs.
Introduction to Large Language Models
Large Language Models (LLMs) represent a huge advance in the evolution of artificial intelligence, particularly in the domain of natural language processing (NLP). They are trained on vast corpora of text data, learning to predict subsequent words in a sequence, which imbues them with the ability to generate human-like text, comprehend the semantics of language, and perform a variety of language-related tasks. The architecture of these models, typically based on deep learning paradigms such as Transformer, empowers them to encapsulate intricate patterns and relationships within language. These models are trained utilizing substantial computational resources.
The utility of LLMs extends across a broad spectrum of applications including but not limited to text generation, translation, summarization, question answering, and sentiment analysis. Their ability to understand and process natural language makes them indispensable tools in modern AI-driven solutions. However, with great power comes great responsibility. The deployment of LLMs raises imperative considerations regarding ethics, bias, and the potential for misuse. Moreover, the black-box nature of these models presents challenges in interpretability and control, which are active areas of research in the quest to make LLMs more understandable and safe. The advent of LLMs has undeniably propelled the field of NLP to new heights, yet the journey towards fully responsible and transparent utilization of these powerful models is an ongoing endeavor. I recommend reading material at Center for Humane Technology for issues of the safe use of AI. You might also be interested in a book I wrote in April 2023 Safe For Humans AI: A “humans-first” approach to designing and building AI systems (link for reading my book free online).
Using the OpenAI APIs in Racket
We will now have some fun using Racket Scheme and OpenAI’s APIs. The combination of Racket’s language features and programming environment with OpenAI’s linguistic models opens up many possibilities for developing sophisticated AI-driven applications.
Our goal is straightforward interaction with OpenAI’s APIs. The communication between your Racket code and OpenAI’s models is orchestrated through well-defined API requests and responses, allowing for a seamless exchange of data. The following sections will show the technical aspects of interfacing Racket with OpenAI’s APIs, showcasing how requests are formulated, transmitted, and how the JSON responses are handled. Whether your goal is to automate content generation, perform semantic analysis on text data, or build intelligent systems capable of engaging in natural language interactions, the code snippets and explanations provided will serve as a valuable resource in understanding and leveraging the power of AI through Racket and OpenAI’s APIs.
The Racket code listed below defines two functions, question and completion, aimed at interacting with the OpenAI API to leverage the GPT-3.5 Turbo model for text generation. The function question accepts a prompt argument and constructs a JSON payload following the OpenAI’s chat models schema. It constructs a value for prompt-data string containing a user message that instructs the model to “Answer the question” followed by the provided prompt. The auth lambda function within question is utilized to set necessary headers for the HTTP request, including the authorization header populated with the OpenAI API key obtained from the environment variable OPENAI_API_KEY. The function post from the net/http-easy library is employed to issue a POST request to the OpenAI API endpoint “https://api.openai.com/v1/chat/completions” with the crafted JSON payload and authentication headers. The response from the API is then parsed as JSON, and the content of the message from the first choice is extracted and returned.
The function completion, on the other hand, serves a specific use case of continuing text from a given prompt. It reformats the prompt to prepend the phrase “Continue writing from the following text: “ to the provided text, and then calls the function question with this modified prompt. This setup encapsulates the task of text continuation in a separate function, making it straightforward for developers to request text extensions from the OpenAI API by merely providing the initial text to the function completion. Through these functions, the code provides a structured mechanism to generate responses or text continuations.
This example was updated May 13, 2024 when OpenAI released the new GPT-4o model.
1
#lang
racket
2
3
(
require
net/http-easy
)
4
(
require
racket/set
)
5
(
require
racket/pretty
)
6
7
(
provide
question-openai
completion-openai
embeddings-openai
)
8
9
(
define
(
helper-openai
prefix
prompt
)
10
(
let*
((
prompt-data
11
(
string-join
12
(
list
13
(
string-append
14
"{
\"
messages
\"
: [ {
\"
role
\"
:
\"
user
\"
,"
15
"
\"
content
\"
:
\"
"
prefix
": "
16
prompt
17
"
\"
}],
\"
model
\"
:
\"
gpt-4o
\"
}"
))))
18
(
auth
(
lambda
(
uri
headers
params
)
19
(
values
20
(
hash-set*
21
headers
22
'
authorization
23
(
string-join
24
(
list
25
"Bearer "
26
(
getenv
"OPENAI_API_KEY"
)))
27
'
content-type
"application/json"
)
28
params
)))
29
(
p
30
(
post
31
"https://api.openai.com/v1/chat/completions"
32
#:auth
auth
33
#:data
prompt-data
))
34
(
r
(
response-json
p
)))
35
;;(pretty-print r)
36
(
hash-ref
37
(
hash-ref
(
first
(
hash-ref
r
'
choices
))
'
message
)
38
'
content
)))
39
40
41
(
define
(
question-openai
prompt
)
42
(
helper-openai
"Answer the question: "
prompt
))
43
44
(
define
(
completion-openai
prompt
)
45
(
helper-openai
"Continue writing from the following text: "
46
prompt
))
47
48
(
define
(
embeddings-openai
text
)
49
(
let*
((
prompt-data
50
(
string-join
51
(
list
52
(
string-append
53
"{
\"
input
\"
:
\"
"
text
"
\"
,"
54
"
\"
model
\"
:
\"
text-embedding-ada-002
\"
}"
))))
55
(
auth
(
lambda
(
uri
headers
params
)
56
(
values
57
(
hash-set*
58
headers
59
'
authorization
60
(
string-join
61
(
list
62
"Bearer "
63
(
getenv
"OPENAI_API_KEY"
)))
64
'
content-type
"application/json"
)
65
params
)))
66
(
p
67
(
post
68
"https://api.openai.com/v1/embeddings"
69
#:auth
auth
70
#:data
prompt-data
))
71
(
r
(
response-json
p
)))
72
(
hash-ref
73
(
first
(
hash-ref
r
'
data
))
74
'
embedding
)))
The output looks like (output from the second example shortened for brevity):
1
> (question "Mary is 30 and Harry is 25. Who is older?")
2
"Mary is older than Harry."
3
> (displayln
4
(completion
5
"Frank bought a new sports car. Frank drove"))
6
Frank bought a new sports car. Frank drove it out of the dealership with a wide grin\
7
on his face. The sleek, aerodynamic design of the car hugged the road as he acceler\
8
ated, feeling the power under his hands. The adrenaline surged through his veins, an\
9
d he couldn't help but let out a triumphant shout as he merged onto the highway.
10
11
As he cruised down the open road, the wind whipping through his hair, Frank couldn't\
12
help but reflect on how far he had come. It had been a lifelong dream of his to own\
13
a sports car, a symbol of success and freedom in his eyes. He had worked tirelessly\
14
, saving every penny, making sacrifices along the way to finally make this dream a r\
15
eality.
16
...
17
>
Using the Anthropic APIs in Racket
The Racket code listed below defines two functions, question and completion, which facilitate interaction with the Anthropic API to access a language model named claude-instant-1 for text generation purposes. The function question takes two arguments: a prompt and a max-tokens value, which are used to construct a JSON payload that will be sent to the Anthropic API. Inside the function, several Racket libraries are utilized for handling HTTP requests and processing data. A POST request is initiated to the Anthropic API endpoint “https://api.anthropic.com/v1/complete” with the crafted JSON payload. This payload includes the prompt text, maximum tokens to sample, and specifies the model to be used. The auth lambda function is used to inject necessary headers for authentication and specifying the API version. Upon receiving the response from the API, it extracts the completion field from the JSON response, trims any leading or trailing whitespace, and returns it.
The function completion is defined to provide a more specific use-case scenario, where it is intended to continue text from a given prompt. It also accepts a max-tokens argument to limit the length of the generated text. This function internally calls the function question with a modified prompt that instructs the model to continue writing from the provided text. By doing so, it encapsulates the common task of text continuation, making it easy to request text extensions by simply providing the initial text and desired maximum token count. Through these defined functions, the code offers a structured way to interact with the Anthropic API for generating text responses or completions in a Racket Scheme environment.
1
#lang
racket
2
3
(
require
net/http-easy
)
4
(
require
racket/set
)
5
(
require
pprint
)
6
7
(
provide
question
completion
)
8
9
(
define
(
question
prompt
max-tokens
)
10
(
let*
((
prompt-data
11
(
string-join
12
(
list
13
(
string-append
14
"{
\"
prompt
\"
:
\"\\
n
\\
nHuman: "
15
prompt
16
"
\\
n
\\
nAssistant:
\"
,
\"
max_tokens_to_sample
\"
: "
17
(
number->string
max-tokens
)
18
",
\"
model
\"
:
\"
claude-instant-1
\"
}"
))))
19
(
auth
(
lambda
(
uri
headers
params
)
20
(
values
21
(
hash-set*
22
headers
23
'
x-api-key
24
(
getenv
"ANTHROPIC_API_KEY"
)
25
'
anthropic-version
"2023-06-01"
26
'
content-type
"application/json"
)
27
params
)))
28
(
p
29
(
post
30
"https://api.anthropic.com/v1/complete"
31
#:auth
auth
32
#:data
prompt-data
))
33
(
r
(
response-json
p
)))
34
(
string-trim
(
hash-ref
r
'
completion
))))
35
36
(
define
(
completion
prompt
max-tokens
)
37
(
question
38
(
string-append
39
"Continue writing from the following text: "
40
prompt
)
41
max-tokens
))
We will try the same examples we used with OpenAI APIs in the previous section:
1
$
racket
2
>
(
require
"anthropic.rkt"
)
3
>
(
question
"Mary is 30 and Harry is 25. Who is older?"
20
)
4
"Mary is older than Harry. Mary is 30 years old and Harry is 25 years old."
5
>
(
completion
"Frank bought a new sports car. Frank drove"
200
)
6
"Here is a possible continuation of the story:
\n\n
Frank bought a new sports car. Fra
\
7
nk drove excitedly to show off his new purchase. The sleek red convertible turned he
\
8
ads as he cruised down the street with the top down. While stopping at a red light,
\
9
Frank saw his neighbor Jane walking down the sidewalk. He pulled over and called out
\
10
to her,
\"
Hey Jane, check out my new ride! Want to go for a spin?
\"
Jane smiled and
\
11
said
\"
Wow that is one nice car! I'd love to go for a spin.
\"
She hopped in and the
\
12
y sped off down the road, the wind in their hair. Frank was thrilled to show off his
\
13
new sports car and even more thrilled to share it with his beautiful neighbor Jane.
\
14
Little did he know this joyride would be the beginning of something more between th
\
15
em."
16
>
While I usually use the OpenAPI APIs, I always like to have alternatives when I am using 3rd party infrastructure, even for personal research projects. The Anthropic LLMs definitely have a different “feel” than the OpenAPI APIs, and I enjoy using both.
Using a Local Hugging Face Llama2-13b-orca Model with Llama.cpp Server
Now we look at an approach to run LLMs locally on your own computers.
Diving into AI unveils many ways where modern language models play a pivotal role in bridging the gap between machines and human language. Among the many open and public models, I chose Hugging Face’s Llama2-13b-orca model because of its support for natural language processing tasks. To truly harness the potential of Llama2-13b-orca, an interface to Racket code is essential. This is where we use the Llama.cpp Server as a conduit between the local instance of the Hugging Face model and the applications that seek to utilize it. The combination of Llama2-13b-orca with the llama.cpp server code will meet our requirements for local deployment and ease of installation and use.
Installing and Running Llama.cpp server with a Llama2-13b-orca Model
The llama.cpp server acts as a conduit for translating REST API requests to the respective language model APIs. By setting up and running the llama.cpp server, a channel of communication is established, allowing Racket code to interact with these language models in a seamless manner. There is also a Python library to encapsulate running models inside a Python program (a subject I leave to my Python AI books).
I run the llama.cpp service easily on a M2 Mac with 16G of memory. Start by cloning the llama.cpp project and building it:
1
git clone https://github.com/ggerganov/llama.cpp.git
2
make
3
mkdir models
Then get a model file from https://huggingface.co/TheBloke/OpenAssistant-Llama2-13B-Orca-8K-3319-GGUF and copy to ./models directory:
1
$ ls -lh models
2
8
.6G openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf
Note that there are many different variations of this model that trade off quality for memory use. I am using one of the larger models. If you only have 8G of memory try a smaller model.
Run the REST server:
1
./server -m models/openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf -c 2048
We can test the REST server using the curl utility:
1
$ curl --request POST \
2
--url http://localhost:8080/completion \
3
--header "Content-Type: application/json"
\
4
--data '{"prompt": "Answer the question: Mary is 30 years old and Sam is 25. Who\
5
is older and by how much?","n_predict": 128, "top_k": 1}'
6
{
"content"
:"\nAnswer: Mary is older than Sam by 5 years."
,"generation_settings"
:{
"fr\
7
equency_penalty"
:0.0,"grammar"
:""
,"ignore_eos"
:false,"logit_bias"
:[]
,"mirostat"
:0,"m\
8
irostat_eta"
:0.10000000149011612,"mirostat_tau"
:5.0,"model"
:"models/openassistant-ll\
9
ama2-13b-orca-8k-3319.Q5_K_M.gguf"
,"n_ctx"
:2048,"n_keep"
:0,"n_predict"
:128,"n_probs"
\
10
:0,"penalize_nl"
:true,"presence_penalty"
:0.0,"repeat_last_n"
:64,"repeat_penalty"
:1.1\
11
00000023841858
,"seed"
:4294967295,"stop"
:[]
,"stream"
:false,"temp"
:0.800000011920929,"\
12
tfs_z"
:1.0,"top_k"
:1,"top_p"
:0.949999988079071,"typical_p"
:1.0}
,"model"
:"models/open\
13
assistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf"
,"prompt"
:"Answer the question: Mary i\
14
s 30 years old and Sam is 25. Who is older and by how much?"
,"stop"
:true,"stopped_eo\
15
s"
:true,"stopped_limit"
:false,"stopped_word"
:false,"stopping_word"
:""
,"timings"
:{
"pr\
16
edicted_ms"
:960.595,"predicted_n"
:13,"predicted_per_second"
:13.53327885321077,"predi\
17
cted_per_token_ms"
:73.89192307692308,"prompt_ms"
:539.3580000000001,"prompt_n"
:27,"pr\
18
ompt_per_second"
:50.05951520140611,"prompt_per_token_ms"
:19.976222222222223}
,"tokens\
19
_cached"
:40,"tokens_evaluated"
:27,"tokens_predicted"
:13,"truncated"
:false}
The important part of the output is:
1
"content":"Answer: Mary is older than Sam by 5 years."
In the next section we will write a simple library to extract data from Llama.cpp server responses.
A Racket Library for Using a Local Llama.cpp server with a Llama2-13b-orca Model
The following Racket code is designed to interface with a local instance of a Llama.cpp server to interact with a language model for generating text completions. This setup is particularly beneficial when there’s a requirement to have a local language model server, reducing latency and ensuring data privacy. We start by requiring libraries for handling HTTP requests and responses. The functionality of this code is encapsulated in three functions: helper, question, and completion, each serving a unique purpose in the interaction with the Llama.cpp server.
The helper function provides common functionality, handling the core logic of constructing the HTTP request, sending it to the Llama.cpp server, and processing the response. It accepts a prompt argument which forms the basis of the request payload. A JSON string is constructed with three key fields: prompt, n_predict, and top_k, which respectively contain the text prompt, the number of tokens to generate, and a parameter to control the diversity of the generated text. A debug line with displayln
is used to output the constructed JSON payload to the console, aiding in troubleshooting. The function post is employed to send a POST request to the Llama.cpp server hosted locally on port 8080 at the /completion endpoint, with the constructed JSON payload as the request body. Upon receiving the response, it’s parsed into a Racket hash data structure, and the content field, which contains the generated text, is extracted and returned.
The question and completion functions serve as specialized interfaces to the helper function, crafting specific prompts aimed at answering a question and continuing a text, respectively. The question function prefixes the provided question text with “Answer: “ to guide the model’s response, while the completion function prefixes the provided text with a phrase instructing the model to continue from the given text. Both functions then pass these crafted prompts to the helper function, which in turn handles the interaction with the Llama.cpp server and extracts the generated text from the response.
The following code is in the file llama_local.rkt:
1
#lang
racket
2
3
(
require
net/http-easy
)
4
(
require
racket/set
)
5
(
require
pprint
)
6
7
(
define
(
helper
prompt
)
8
(
let*
((
prompt-data
9
(
string-join
10
(
list
11
(
string-append
12
"{
\"
prompt
\"
:
\"
"
13
prompt
14
"
\"
,
\"
n_predict
\"
: 256,
\"
top_k
\"
: 1}"
))))
15
(
ignore
(
displayln
prompt-data
))
16
(
p
17
(
post
18
"http://localhost:8080/completion"
19
#:data
prompt-data
))
20
(
r
(
response-json
p
)))
21
(
hash-ref
r
'
content
)))
22
23
(
define
(
question
question
)
24
(
helper
(
string-append
"Answer: "
question
)))
25
26
(
define
(
completion
prompt
)
27
(
helper
28
(
string-append
29
"Continue writing from the following text: "
30
prompt
)))
We can try this in a Racket REPL (output of the second example is edited for brevity):
1
>
(
question
"Mary is 30 and Harry is 25. Who is older?"
)
2
{
"prompt"
:
"Answer: Mary is 30 and Harry is 25. Who is older?"
,
"n_predict"
:
256
,
"t
\
3
op_k"
:
1
}
4
"
\n
Answer: Mary is older than Harry."
5
>
(
completion
"Frank bought a new sports car. Frank drove"
)
6
{
"prompt"
:
"Continue writing from the following text: Frank bought a new sports car.
\
7
Frank drove"
,
"n_predict"
:
256
,
"top_k"
:
1
}
8
" his new sports car to work every day. He was very happy with his new sports car. O
\
9
ne day, while he was driving his new sports car, he saw a beautiful girl walking on
\
10
the side of the road. He stopped his new sports car and asked her if she needed a ri
\
11
de. The beautiful girl said yes, so Frank gave her a ride in his new sports car. The
\
12
y talked about many things during the ride to work. When they arrived at work, Frank
\
13
asked the beautiful girl for her phone number. She gave him her phone number, and h
\
14
e promised to call her later that day...."
15
>
(
question
"Mary is 30 and Harry is 25. Who is older and by how much?"
)
16
{
"prompt"
:
"Answer: Mary is 30 and Harry is 25. Who is older and by how much?"
,
"n_p
\
17
redict"
:
256
,
"top_k"
:
1
}
18
"
\n
Answer: Mary is older than Harry by 5 years."
19
>
Using a Local Mistral-7B Model with Ollama.ai
Now we look at another approach to run LLMs locally on your own computers. The Ollama.ai project supplies a simple-to-install application for macOS and Linux (Windows support expected soon). When you download and run the application, it will install a command line tool ollama that we use here.
Installing and Running Ollama.ai server with a Mistral-7B Model
The Mistral model is the best 7B LLM that I have used (as I write this chapter in October 2023). When you run the ollama command line tool it will download and cache for future use the requested model.
For example, the first time we run ollama requesting the mistral LLM, you see that it is downloading the model:
1
$ ollama run mistral
2
pulling manifest
3
pulling 6ae280299950... 100
% |
███████████████████████████████████████████████|
(
4
.1/\
4
4
.1 GB, 13
MB/s)
5
pulling fede2d8d6c1f... 100
% |
██████████████████████████████████████████████████████\
6
|
(
29
/29 B, 20
B/s)
7
pulling b96850d2e482... 100
% |
███████████████████████████████████████████████████|
(
\
8
307
/307 B, 170
B/s)
9
verifying sha256 digest
10
writing manifest
11
removing any unused layers
12
success
13
>>> Mary is 30
and Bill is 25
. Who is older and by how much?
14
Mary is older than Bill by 5
years.
15
16
>>> /?
17
Available Commands:
18
/set Set session variables
19
/show Show model information
20
/bye Exit
21
/?, /help Help for
a command
22
23
Use """ to begin a multi-line message.
24
25
>>>
When you run the ollama command line tool, it also runs a REST API serve which we use later. The next time you run the mistral model, there is no download delay:
1
$ ollama run mistral
2
>>> ^D
3
$ ollama run mistral
4
>>> If I am driving between Sedona Arizona and San Diego, what sites should I visit \
5
as a tourist?
6
7
There are many great sites to visit when driving from Sedona, Arizona to San Diego. \
8
Here are some
9
suggestions:
10
11
* Grand Canyon National Park - A must-see attraction in
the area, the Grand Canyon i\
12
s a massive and
13
awe-inspiring natural wonder that offers countless opportunities for
outdoor activit\
14
ies such as hiking,
15
camping, and rafting.
16
* Yuma Territorial Prison State Historic Park - Located in
Yuma, Arizona, this forme\
17
r prison was once the
18
largest and most secure facility of its kind in
the world. Today, visitors can explo\
19
re the site and learn
20
about its history
through exhibits and guided tours.
21
* Joshua Tree National Park - A unique and otherworldly landscape in
southern Califo\
22
rnia, Joshua Tree
23
National Park is home to a variety of natural wonders, including towering trees, gia\
24
nt boulders, and
25
scenic trails for
hiking and camping.
26
* La Jolla Cove - Located just north of San Diego, La Jolla Cove is a beautiful beac\
27
h and tidal pool area
28
that offers opportunities for
snorkeling, kayaking, and exploring marine life.
29
* Balboa Park - A cultural and recreational hub in
the heart of San Diego, Balboa Pa\
30
rk is home to numerous
31
museums, gardens, theaters, and other attractions that offer a glimpse into the city\
32
'
s history
and culture.
33
34
>>>
While we use the mistral LLM here, there are many more available models listed in the GitHub repository for Ollama.ai: https://github.com/jmorganca/ollama.
A Racket Library for Using a Local Ollama.ai REST Server with a Mistral-7B Model
The example code in the file ollama_ai_local.rkt is very similar to the example code in the last section. The main changes are a different REST service URI and the format of the returned JSON response:
1
(
require
net/http-easy
)
2
(
require
racket/set
)
3
(
require
pprint
)
4
5
(
define
(
helper
prompt
)
6
(
let*
((
prompt-data
7
(
string-join
8
(
list
9
(
string-append
10
"{
\"
prompt
\"
:
\"
"
11
prompt
12
"
\"
,
\"
model
\"
:
\"
mistral
\"
,
\"
stream
\"
: false}"
))))
13
(
ignore
(
displayln
prompt-data
))
14
(
p
15
(
post
16
"http://localhost:11434/api/generate"
17
#:data
prompt-data
))
18
(
r
(
response-json
p
)))
19
(
hash-ref
r
'
response
)))
20
21
(
define
(
question-ollama-ai-local
question
)
22
(
helper
(
string-append
"Answer: "
question
)))
23
24
(
define
(
completion-ollama-ai-local
prompt
)
25
(
helper
26
(
string-append
27
"Continue writing from the following text: "
28
prompt
)))
29
30
;; EMBEDDINGS:
31
32
(
define
(
embeddings-ollama
text
)
33
(
let*
((
prompt-data
34
(
string-join
35
(
list
36
(
string-append
37
"{
\"
prompt
\"
:
\"
"
text
"
\"
,"
38
"
\"
model
\"
:
\"
mistral
\"
}"
))))
39
(
p
40
(
post
41
"http://localhost:11434/api/embeddings"
42
#:data
prompt-data
))
43
(
r
(
response-json
p
)))
44
(
hash-ref
r
'
embedding
)))
45
46
47
;; (embeddings-ollama "Here is an article about llamas...")
The function embeddings-ollama can be used to create embedding vectors from text input. Embeddings are used for chat with local documents, web sites, etc. We will run the same examples we used in the last section for comparison:
1
>
(
question
"
Mary is 30 and Harry is 25. Who is older and by how much?
"
)
2
{"
prompt
"
: "
Answer: Mary is 30 and Harry is 25. Who is older and by how much?
"
, "
mod\
3
el
"
:
"
mistral
"
,
"
stream
"
: false}
4
"
Answer: Mary is older than Harry by 5 years.
"
5
>
(
completion
"
Frank bought a new sports car. Frank drove
"
)
6
{"
prompt
"
: "
Continue writing from the following text: Frank bought a new sports car.\
7
Frank
drove
"
,
"
model
"
:
"
mistral
"
,
"
stream
"
: false}
8
"
Frank drove his new sports car around town, enjoying the sleek design and powerful \
9
engine
. The
car
was
a
bright
red
, which
caught
the
attention
of
everyone
on
the
road
\
10
. Frank
couldn
'
t help but smile as he cruised down the highway, feeling the wind in \
11
his
hair
and
the
sun
on
his
face
.\n
\nAs
he
drove
, Frank
couldn
'
t resist the urge to \
12
test
out
the
car
'
s speed and agility. He weaved through traffic, expertly maneuverin\
13
g
the
car
around
curves
and
turns
. The
car
handled
perfectly
, and
Frank
felt
a
rush
\
14
of
adrenaline
as
he
pushed
it
to
its
limits
.\n
\nEventually
, Frank
found
himself
at
a
\
15
local
track
where
he
could
put
his
new
sports
car
to
the
test
. He
revved
up
the
eng
\
16
ine
and
took
off
down
the
straightaway
, hitting
top
speeds
in
no
time
. The
car
handl
\
17
ed
like
a
dream
, and
Frank
couldn
'
t help but feel a sense of pride as he crossed the\
18
finish
line
.\n
\nAfterwards
, Frank
parked
his
sports
car
and
walked
over
to
a
nearby
\
19
caf
é to
grab
a
cup
of
coffee
. As
he
sat
outside
, sipping
his
drink
and
watching
the
\
20
other
cars
drive
by
, he
couldn
'
t help but think about how much he loved his new rid\
21
e
. It
was
the
perfect
addition
to
his
collection
of
cars
, and
he
knew
he
would
be
dr
\
22
iving
it
for
years
to
come
."
23
>
While I often use larger and more capable proprietary LLMs like Claude 2.1 and GPT-4, smaller open models from Mistral are very capable and sufficient for most of my experiments embedding LLMs in application code. As I write this, you can run Mistral models locally and through commercially hosted APIs.
Examples Using William J. Bowman’s Racket Language LLM
I implemented the code in this chapter using REST API interfaces for LLM providers like OpenAI and Anthropic and also for running local models using Ollama.
Since I wrote my LLM client libraries, William J. Bowman wrote a very interesting new Racket language for LLMs that can be used with DrRacket’s language support for interactively experimenting with LLMs and alternatively used in Racket programs using the standard Racket language. I added three examples to the directory Racket-AI-book-code/racket_llm_language:
- test_lang_mode_llm_openai.rkt - uses #lang llm
- test_llm_openai.rkt - uses #lang racket
- test_llm_ollama.rkt - uses #lang racket
For the Ollama example, make sure you have Ollama installed and the phi3:latest model downloaded.
The documentation for the LLM language can be found here: https://docs.racket-lang.org/llm/index.html and the GitHub repository for the project can be found here: https://github.com/wilbowma/llm-lang.
LLM Language Example
In the listing of file test_lang_mode_llm_openai.rkt notice that Racket statements are escaped using @ and plain text is treated as a prompt to send to a LLM:
1
#lang
llm
2
3
@
(
require
llm/openai/gpt4o-mini
)
4
5
What
is
13
+
7?
Evaluating this in a DrRacket buffer produces output like this:
1
Welcome
to
DrRacket
,
version
8.12
[
cs
]
.
2
Language
:
llm
,
with
debugging
;
memory
limit
:
128
MB
.
3
13
+
7
equals
20.
4
>
What
is
66
+
2
?
5
66
+
2
equals
68.
6
>
What
is
the
radius
of
the
moon
?
7
The
average
radius
of
the
Moon
is
approximately
1
,
737.4
kilometers
(
about
1
,
079.6
mi
\
8
les
).
9
>
This makes a DrRacket edit buffer a convenient way to experiment with models. Also, once the example buffer is loaded, the DrRacket REPL can be used to enter LLM prompts since the REPL is also using #lang llm.
Using the LLM Language as a Library Using the Standard Racket Language Mode
Here we look at examples for accessing the OpenAI gpt4o-mini model and the phi3 model running locally on your laptop using Ollama.
Install the llm package:
1
raco pkg install llm
Here is the example file test_llm_openai.rkt:
1
#lang
racket
2
3
(
require
llm/openai/gpt4o-mini
)
4
5
(
gpt4o-mini-send-prompt!
"What is 13 + 7?"
'
())
The output looks like this:
1
Welcome
to
DrRacket
,
version
8.12
[
cs
]
.
2
Language
:
racket
,
with
debugging
;
memory
limit
:
128
MB
.
3
"13 + 7 equals 20."
4
>
This is a simple way to use the OpenAI gpt4o-mini model in your Racket programs. A similar example supports local models running on Ollama; here is the example file test_llm_ollama.rkt:
1
#lang
racket
2
3
(
require
llm/ollama/phi3
)
4
5
(
phi3-send-prompt!
"What is 13 + 7? Be concise."
'
())
That generates the output text:
1
Welcome
to
DrRacket
,
version
8.12
[
cs
]
.
2
Language
:
racket
,
with
debugging
;
memory
limit
:
128
MB
.
3
"20."
4
>
(
phi3
-
send
-
prompt
!
"Mary is 37 years old, Bill is 28, and Sam is 52. List the pair\
5
wise age differences. Be concise."
'())
6
"- Mary vs Bill: 9 years (37 - 28)\n\n- Mary vs Sam: 15 years (37 - 52)\n\n- Bill vs\
7
Sam: 24 years (52 - 28)"
8
> (display (phi3-send-prompt! "Mary is 37 years old, Bill is 28, and Sam is 52. Lis\
9
t the pairwise age differences. Be concise." '
()))
10
-
Mary
vs
Bill
:
9
years
(
37
-
28
)
11
12
-
Mary
vs
Sam
:
15
years
(
37
-
52
)
13
14
-
Bill
vs
Sam
:
24
years
(
52
-
28
)
15
>
Here I entered more examples in the DrRacket REPL.
For general work and experimentation with LLMs I like the flexibility of using my own Racket LLM client code, but for the LLM package makes it simple to experiment with prompts and if you only need to generate text from a prompt the LLM package lets generate text using just two lines of Racket code that is using the standard #language racket language..
Retrieval Augmented Generation of Text Using Embeddings
Retrieval-Augmented Generation (RAG) is a framework that combines the strengths of pre-trained language models (LLMs) with retrievers. Retrievers are system components for accessing knowledge from external sources of text data. In RAG a retriever selects relevant documents or passages from a corpus, and a generator produces a response based on both the retrieved information and the input query. The process typically follows these steps that we will use in the example Racket code:
- Query Encoding: The input query is encoded into a vector representation.
- Document Retrieval: A retriever system uses the query representation to fetch relevant documents or passages from an external corpus.
- Document Encoding: The retrieved documents are encoded into vector representations.
- Joint Encoding: The query and document representations are combined, often concatenated or mixed via attention mechanisms.
- Generation: A generator, usually LLM, is used to produce a response based on the joint representation.
RAG enables the LLM to access and leverage external text data sources, which is crucial for tasks that require information beyond what the LLM has been trained on. It’s a blend of retrieval-based and generation-based approaches, aimed at boosting the factual accuracy and informativeness of generated responses.
Example Implementation
In the following short Racket example program (file Racket-AI-book-code/embeddingsdb/embeddingsdb.rkt) I implement some ideas of a RAG architecture. At file load time the text files in the subdirectory data are read, split into “chunks”, and each chunk along with its parent file name and OpenAI text embedding is stored in a local SQLite database. When a user enters a query, the OpenAI embedding is calculated, and this embedding is matched against the embeddings of all chunks using the dot product of two 1536 element embedding vectors. The “best” chunks are concatenated together and this “context” text is passed to GPT-4 along with the user’s original query. Here I describe the code in more detail:
The provided Racket code uses a local SQLite database and OpenAI’s APIs for calculating text embeddings and for text completions.
Utility Functions:
-
floats->string
andstring->floats
are utility functions for converting between a list of floats and its string representation. -
read-file
reads a file’s content. -
join-strings
joins a list of strings with a specified separator. -
truncate-string
truncates a string to a specified length. -
interleave
merges two lists by interleaving their elements. -
break-into-chunks
breaks a text into chunks of a specified size. -
string-to-list
anddecode-row
are utility functions for parsing and processing database rows.
Database Setup:
- Database connection is established to “test.db” and a table named “documents” is created with columns for document_path, content, and embedding.
Document Management:
-
insert-document
inserts a document and its associated information into the database. -
get-document-by-document-path
andall-documents
are utility functions for querying documents from the database. -
create-document
reads a document from a file path, breaks it into chunks, computes embeddings for each chunk via a functionembeddings-openai
, and inserts these into the database.
Semantic Matching and Interaction:
-
execute-to-list
anddot-product
are utility functions for database queries and vector operations. -
semantic-match
performs a semantic search by calculating the dot product of embeddings of the query and documents in the database. It then aggregates contexts of documents with a similarity score above a certain threshold, and sends a new query constructed with these contexts to OpenAI for further processing. -
QA
is a wrapper aroundsemantic-match
for querying. -
CHAT
initiates a loop for user interaction where each user input is processed throughsemantic-match
to generate a response, maintaining a context of the previous chat.
Test Code:
-
test
function creates documents by reading from specified file paths, and performs some queries using theQA
function.
The code uses a local SQLite database to store and manage document embeddings and the OpenAI API for generating embeddings and performing semantic searches based on user queries. Two functions are exported in case you want to use this example as a library: create-document and QA. Note: in the test code at the bottom of the listing, change the absolute path to reflect where you cloned the GitHub repository for this book.
1
#lang
racket
2
3
(
require
db
)
4
(
require
llmapis
)
5
6
(
provide
create-document
QA
)
7
8
; Function to convert list of floats to string representation
9
(
define
(
floats->string
floats
)
10
(
string-join
(
map
number->string
floats
)
" "
))
11
12
; Function to convert string representation back to list of floats
13
(
define
(
string->floats
str
)
14
(
map
string->number
(
string-split
str
)))
15
16
(
define
(
read-file
infile
)
17
(
with-input-from-file
infile
18
(
lambda
()
19
(
let
((
contents
(
read
)))
20
contents
))))
21
22
(
define
(
join-strings
separator
list
)
23
(
string-join
list
separator
))
24
25
(
define
(
truncate-string
string
length
)
26
(
substring
string
0
(
min
length
(
string-length
string
))))
27
28
(
define
(
interleave
list1
list2
)
29
(
if
(
or
(
null?
list1
)
(
null?
list2
))
30
(
append
list1
list2
)
31
(
cons
(
car
list1
)
32
(
cons
(
car
list2
)
33
(
interleave
(
cdr
list1
)
(
cdr
list2
))))))
34
35
(
define
(
break-into-chunks
text
chunk-size
)
36
(
let
loop
((
start
0
)
(
chunks
'
()))
37
(
if
(
>=
start
(
string-length
text
))
38
(
reverse
chunks
)
39
(
loop
(
+
start
chunk-size
)
40
(
cons
(
substring
text
start
(
min
(
+
start
chunk-size
)
(
string-length
t\
41
ext
)))
chunks
)))))
42
43
(
define
(
string-to-list
str
)
44
(
map
string->number
(
string-split
str
)))
45
46
(
define
(
decode-row
row
)
47
(
let
((
id
(
vector-ref
row
0
))
48
(
context
(
vector-ref
row
1
))
49
(
embedding
(
string-to-list
(
read-line
(
open-input-string
(
vector-ref
row
2
))
\
50
))))
51
(
list
id
context
embedding
)))
52
53
(
define
54
db
55
(
sqlite3-connect
#:database
"test.db"
#:mode
'
create
56
#:use-place
#t
))
57
58
(
with-handlers
([
exn:fail?
(
lambda
(
ex
)
(
void
))])
59
(
query-exec
60
db
61
"CREATE TABLE documents (document_path TEXT, content TEXT, embedding TEXT);"
))
62
63
(
define
(
insert-document
document-path
content
embedding
)
64
(
printf
"~%insert-document:~% content:~a~%~%"
content
)
65
(
query-exec
66
db
67
"INSERT INTO documents (document_path, content, embedding) VALUES (?, ?, ?);"
68
document-path
content
(
floats->string
embedding
)))
69
70
(
define
(
get-document-by-document-path
document-path
)
71
(
map
decode-row
72
(
query-rows
73
db
74
"SELECT * FROM documents WHERE document_path = ?;"
75
document-path
)))
76
77
(
define
(
all-documents
)
78
(
map
79
decode-row
80
(
query-rows
81
db
82
"SELECT * FROM documents;"
)))
83
84
(
define
(
create-document
fpath
)
85
(
let
((
contents
(
break-into-chunks
(
file->string
fpath
)
200
)))
86
(
for-each
87
(
lambda
(
content
)
88
(
with-handlers
([
exn:fail?
(
lambda
(
ex
)
(
void
))])
89
(
let
((
embedding
(
embeddings-openai
content
)))
90
(
insert-document
fpath
content
embedding
))))
91
contents
)))
92
93
;; Assuming a function to fetch documents from database
94
(
define
(
execute-to-list
db
query
)
95
(
query-rows
db
query
))
96
97
;; dot product of two lists of floating point numbers:
98
(
define
(
dot-product
a
b
)
99
(
cond
100
[(
or
(
null?
a
)
(
null?
b
))
0
]
101
[
else
102
(
+
(
*
(
car
a
)
(
car
b
))
103
(
dot-product
(
cdr
a
)
(
cdr
b
)))]))
104
105
(
define
(
semantic-match
query
custom-context
[
cutoff
0.7
])
106
(
let
((
emb
(
embeddings-openai
query
))
107
(
ret
'
()))
108
(
for-each
109
(
lambda
(
doc
)
110
(
let*
((
context
(
second
doc
))
111
(
embedding
(
third
doc
))
112
(
score
(
dot-product
emb
embedding
)))
113
(
when
(
>
score
cutoff
)
114
(
set!
ret
(
cons
context
ret
)))))
115
(
all-documents
))
116
(
printf
"~%semantic-search: ret=~a~%"
ret
)
117
(
let*
((
context
(
string-join
(
reverse
ret
)
" . "
))
118
(
query-with-context
(
string-join
(
list
context
custom-context
"Question:"
\
119
query
)
" "
)))
120
(
question-openai
query-with-context
))))
121
122
(
define
(
QA
query
[
quiet
#f
])
123
(
let
((
answer
(
semantic-match
query
""
)))
124
(
unless
quiet
125
(
printf
"~%~%** query: ~a~%** answer: ~a~%~%"
query
answer
))
126
answer
))
127
128
(
define
(
CHAT
)
129
(
let
((
messages
'
(
""
))
130
(
responses
'
(
""
)))
131
(
let
loop
()
132
(
printf
"~%Enter chat (STOP or empty line to stop) >> "
)
133
(
let
((
string
(
read-line
)))
134
(
cond
135
((
or
(
string=?
string
"STOP"
)
(
<
(
string-length
string
)
1
))
136
(
list
(
reverse
messages
)
(
reverse
responses
)))
137
(
else
138
(
let*
((
custom-context
139
(
string-append
140
"PREVIOUS CHAT: "
141
(
string-join
(
reverse
messages
)
" "
)))
142
(
response
(
semantic-match
string
custom-context
)))
143
(
set!
messages
(
cons
string
messages
))
144
(
set!
responses
(
cons
response
responses
))
145
(
printf
"~%Response: ~a~%"
response
)
146
(
loop
))))))))
147
148
;; ... test code ...
149
150
(
define
(
test
)
151
"Test Semantic Document Search Using GPT APIs and local vector database"
152
(
create-document
153
"/Users/markw/GITHUB/Racket-AI-book-code/embeddingsdb/data/sports.txt"
)
154
(
create-document
155
"/Users/markw/GITHUB/Racket-AI-book-code/embeddingsdb/data/chemistry.txt"
)
156
(
QA
"What is the history of the science of chemistry?"
)
157
(
QA
"What are the advantages of engaging in sports?"
))
Let’s look at a few examples form a Racket REPL:
1
>
(
QA
"What is the history of the science of chemistry?"
)
2
**
query
:
What
is
the
history
of
the
science
of
chemistry
?
3
**
answer
:
The
history
of
the
science
of
chemistry
dates
back
thousands
of
years
.
An
\
4
cient
civilizations
such
as
the
Egyptians
,
Greeks
,
and
Chinese
were
experimenting
wi
\
5
th
various
substances
and
observing
chemical
reactions
even
before
the
term
"chemist
\
6
ry"
was
coined
.
7
8
The
foundations
of
modern
chemistry
can
be
traced
back
to
the
works
of
famous
schola
\
9
rs
such
as
alchemists
in
the
Middle
Ages
.
Alchemists
sought
to
transform
common
meta
\
10
ls
into
gold
and
discover
elixirs
of
eternal
life
.
Although
their
practices
were
oft
\
11
en
based
on
mysticism
and
folklore
,
it
laid
the
groundwork
for
the
understanding
of
\
12
chemical
processes
and
experimentation
.
13
14
In
the
17
th
and
18
th
centuries
,
significant
advancements
were
made
in
the
field
of
c
\
15
hemistry
.
Prominent
figures
like
Robert
Boyle
and
Antoine
Lavoisier
began
to
underst
\
16
and
the
fundamental
principles
of
chemical
reactions
and
the
concept
of
elements
.
La
\
17
voisier
is
often
referred
to
as
the
"father of modern chemistry"
for
his
work
in
est
\
18
ablishing
the
law
of
conservation
of
mass
and
naming
and
categorizing
elements
.
19
20
Throughout
the
19
th
and
20
th
centuries
,
chemistry
continued
to
progress
rapidly
.
The
\
21
development
of
the
periodic
table
by
Dmitri
Mendeleev
in
1869
revolutionized
the
or
\
22
ganization
of
elements
.
The
discovery
of
new
elements
,
the
formulation
of
atomic
the
\
23
ory
,
and
the
understanding
of
chemical
bonding
further
expanded
our
knowledge
.
24
25
Chemistry
also
played
a
crucial
role
in
various
industries
and
technologies
,
such
as
\
26
the
development
of
synthetic
dyes
,
pharmaceuticals
,
plastics
,
and
materials
.
The
em
\
27
ergence
of
quantum
mechanics
and
spectroscopy
in
the
early
20
th
century
opened
up
ne
\
28
w
avenues
for
understanding
the
behavior
of
atoms
and
molecules
.
29
30
Today
,
chemistry
is
an
interdisciplinary
science
that
encompasses
various
fields
suc
\
31
h
as
organic
chemistry
,
inorganic
chemistry
,
physical
chemistry
,
analytical
chemistr
\
32
y
,
and
biochemistry
.
It
continues
to
evolve
and
make
significant
contributions
to
so
\
33
ciety
,
from
developing
sustainable
materials
to
understanding
biological
processes
a
\
34
nd
addressing
global
challenges
such
as
climate
change
.
35
36
In
summary
,
the
history
of
the
science
of
chemistry
spans
centuries
,
starting
from
a
\
37
ncient
civilizations
to
the
present
day
,
with
numerous
discoveries
and
advancements
\
38
shaping
our
understanding
of
the
composition
,
properties
,
and
transformations
of
mat
\
39
ter
.
This output is the combination of data found in the text files in the directory Racket-AI-book-code/embeddingsdb/data and the data that OpenAI GPT-4 was trained on. Since the local “document” file chemistry.txt is very short, most of this output is derived from the innate knowledge GPT-4 has from its training data.
In order to show that this example is also using data in the local “document” text files, I manually edited the file data/chemistry.txt adding the following made-up organic compound:
1
ZorroOnian Alcohol is another organic compound with the formula C 6 H 10 O.
GPT-4 was never trained on my made-up data so it has no idea what the non-existent compound ZorroOnian Alcohol is. The following answer is retrieved via RAG from the local document data (for brevity, most of the output for adding the local document files to the embedding index is not shown):
1
>
(
create
-
document
2
"
/Users/markw/GITHUB/Racket-AI-book-code/embeddingsdb/data/chemistry.txt
"
)
3
4
insert
-
document
:
5
content
:Amyl
alcohol
is
an
organic
compound
with
the
formula
C
5
H
12
O
. ZorroOnia
\
6
n
Alcohol
is
another
organic
compound
with
the
formula
C
6
H
10
O
. All
eight
isomers
\
7
of
amyl
alcohol
are
known
.
8
9
...
10
11
>
(
QA
"
what is the formula for ZorroOnian Alcohol
"
)
12
13
**
query
: what
is
the
formula
for
ZorroOnian
Alcohol
14
**
answer
: The
formula
for
ZorroOnian
Alcohol
is
C6H10O
.
There is also a chat interface:
1
Enter
chat
(
STOP
or
empty
line
to
stop
)
>>
who
is
the
chemist
Robert
Boyle
2
3
Response
: Robert
Boyle
was
an
Irish
chemist
and
physicist
who
is
known
as
one
of
the
\
4
pioneers
of
modern
chemistry
. He
is
famous
for
Boyle
'
s Law, which describes the inv\
5
erse
relationship
between
the
pressure
and
volume
of
a
gas
, and
for
his
experiments
\
6
on
the
properties
of
gases
. He
lived
from
1627
to
1691
.
7
8
Enter
chat
(
STOP
or
empty
line
to
stop
)
>>
Where
was
he
born
?
9
10
Response
: Robert
Boyle
was
born
in
Lismore
Castle
, County
Waterford
, Ireland
.
11
12
Enter
chat
(
STOP
or
empty
line
to
stop
)
>>
Retrieval Augmented Generation Wrap Up
Retrieval Augmented Generation (RAG) is one of the best use cases for semantic search. Another way to write RAG applications is to use a web search API to get context text for a query, and add this context data to whatever context data you have in a local embeddings data store.
Natural Language Processing
I have a Natural Language Processing (NLP) library that I wrote in Common Lisp. Here we will use code that I wrote in pure Scheme and converted to Racket.
The NLP library is still a work in progress so please check for future updates to this live eBook.
Since we will use the example code in this chapter as a library we start by defining a main.rkt file:
1
#lang
racket/base
2
3
(
require
"fasttag.rkt"
)
4
(
require
"names.rkt"
)
5
6
(
provide
parts-of-speech
)
7
(
provide
find-human-names
)
8
(
provide
find-place-names
)
There are two main source files for the NLP library: fasttag.rkt and names.rkt.
The following listing of fasttag.rkt is a conversion of original code I wrote in Java and later translated to Common Lisp. The provided Racket Scheme code is designed to perform part-of-speech tagging for a given list of words. The code begins by loading a hash table (lex-hash
) from a data file (“data/tag.dat”), where each key-value pair maps a word to its possible part of speech. Then it defines several helper functions and transformation rules for categorizing words based on various syntactic and morphological criteria.
The core function, parts-of-speech
, takes a vector of words and returns a vector of corresponding parts of speech. Inside this function, a number of rules are applied to each word in the list to refine its part of speech based on both its individual characteristics and its context within the list. For instance, Rule 1 changes the part of speech to “NN” (noun) if the previous word is “DT” (determiner) and the current word is categorized as a verb form (“VBD”, “VBP”, or “VB”). Rule 2 changes a word to a cardinal number (“CD”) if it contains a period, and so on. The function applies these rules in sequence, updating the part of speech for each word accordingly.
The parts-of-speech
function iterates over each word in the input vector, checks it against lex-hash
, and then applies the predefined rules. The result is a new vector of tags, one for each input word, where each tag represents the most likely part of speech for that word, based on the rules and the original lexicon.
#lang
racket
(
require
srfi/13
)
; the string SRFI
(
require
racket/runtime-path
)
(
provide
parts-of-speech
)
;; FastTag.lisp
;;
;; Conversion of KnowledgeBooks.com Java FastTag to Scheme
;;
;; Copyright 2002 by Mark Watson. All rights reserved.
;;
(
display
"loading lex-hash..."
)
(
log-info
"loading lex-hash"
"starting"
)
(
define
lex-hash
(
let
((
hash
(
make-hash
)))
(
with-input-from-file
(
string-append
(
path->string
my-data-path
)
"/tag.dat"
)
(
lambda
()
(
let
loop
()
(
let
((
p
(
read
)))
(
if
(
list?
p
)
(
hash-set!
hash
(
car
p
)
(
cadr
p
))
#f
)
(
if
(
eof-object?
p
)
#f
(
loop
))))))
hash
))
(
display
"...done."
)
(
log-info
"loading lex-hash"
"ending"
)
(
define
(
string-suffix?
pattern
str
)
(
let
loop
((
i
(
-
(
string-length
pattern
)
1
))
(
j
(
-
(
string-length
str
)
1
)))
(
cond
((
negative?
i
)
#t
)
((
negative?
j
)
#f
)
((
char=?
(
string-ref
pattern
i
)
(
string-ref
str
j
))
(
loop
(
-
i
1
)
(
-
j
1
)))
(
else
#f
))))
;;
; parts-of-speech
;
; input: a vector of words (each a string)
; output: a vector of parts of speech
;;
(
define
(
parts-of-speech
words
)
(
display
"
\n
+ tagging:"
)
(
display
words
)
(
let
((
ret
'
())
(
r
#f
)
(
lastRet
#f
)
(
lastWord
#f
))
(
for-each
(
lambda
(
w
)
(
set!
r
(
hash-ref
lex-hash
w
#f
))
;; if this word is not in the hash table, try making it ll lower case:
(
if
(
not
r
)
(
set!
r
'
(
"NN"
))
#f
)
;;(if (list? r) (set! r (car r))))
;; apply transformation rules:
; rule 1: DT, {VBD, VBP, VB} --> DT, NN
(
if
(
equal?
lastRet
"DT"
)
(
if
(
or
(
equal?
r
"VBD"
)
(
equal?
r
"VBP"
)
(
equal?
r
"VB"
))
(
set!
r
'
(
"NN"
))
#f
)
#f
)
; rule 2: convert a noun to a number if a "." appears in the word
(
if
(
string-contains
"."
w
)
(
set!
r
'
(
"CD"
))
#f
)
; rule 3: convert a noun to a past participle if word ends with "ed"
(
if
(
equal?
(
member
"N"
r
)
0
)
(
let*
((
slen
(
string-length
w
)))
(
if
(
and
(
>
slen
1
)
(
equal?
(
substring
w
(
-
slen
2
))
"ed"
))
(
set!
r
"VBN"
)
#f
))
#f
)
; rule 4: convert any type to an adverb if it ends with "ly"
(
let
((
i
(
string-suffix?
"ly"
w
)))
(
if
(
equal?
i
(
-
(
string-length
w
)
2
))
(
set!
r
'
(
"RB"
))
#f
))
; rule 5: convert a common noun (NN or NNS) to an adjective
; if it ends with "al"
(
if
(
or
(
member
"NN"
r
)
(
member
"NNS"
r
))
(
let
((
i
(
string-suffix?
"al"
w
)))
(
if
(
equal?
i
(
-
(
string-length
w
)
2
))
(
set!
r
'
(
"RB"
))
#f
))
#f
)
; rule 6: convert a noun to a verb if the receeding word is "would"
(
if
(
equal?
(
member
"NN"
r
)
0
)
(
if
(
equal?
lastWord
"would"
)
(
set!
r
'
(
"VB"
))
#f
)
#f
)
; rule 7: if a word has been categorized as a common noun and it
; ends with "s", then set its type to a plural noun (NNS)
(
if
(
member
"NN"
r
)
(
let
((
i
(
string-suffix?
"s"
w
)))
(
if
(
equal?
i
(
-
(
string-length
w
)
1
))
(
set!
r
'
(
"NNS"
))
#f
))
#f
)
; rule 8: convert a common noun to a present participle verb
; (i.e., a gerand)
(
if
(
equal?
(
member
"NN"
r
)
0
)
(
let
((
i
(
string-suffix?
"ing"
w
)))
(
if
(
equal?
i
(
-
(
string-length
w
)
3
))
(
set!
r
'
(
"VBG"
))
#f
))
#f
)
(
set!
lastRet
ret
)
(
set!
lastWord
w
)
(
set!
ret
(
cons
(
first
r
)
ret
)))
(
vector->list
words
))
;; not very efficient !!
(
list->vector
(
reverse
ret
))))
The following listing of file names.rkt identifies human and place names in text. The Racket Scheme code is a script for Named Entity Recognition (NER). It is specifically designed to recognize human names and place names in given text:
- It provides two main functions:
find-human-names
andfind-place-names
. - Uses two kinds of data: human names and place names, loaded from text files.
- Employs Part-of-Speech tagging through an external
fasttag.rkt
module. - Uses hash tables and lists for efficient look-up.
- Handles names with various components (prefixes, first name, last name, etc.)
The function process-one-word-per-line
reads each line of a file and applies a given function func
on it.
Initial data preparation consists of defining the hash tables *last-name-hash*
, *first-name-hash*
, *place-name-hash*
are populated with last names, first names, and place names, respectively, from specified data files.
We define two Named Entity Recognition (NER) functions:
-
find-human-names
: Takes a word vector and an exclusion list.- Utilizes parts-of-speech tags.
- Checks for names that have 1 to 4 words.
- Adds names to
ret
list if conditions are met, considering the exclusion list. - Returns processed names (
ret2
).
-
find-place-names
: Similar tofind-human-names
, but specifically for place names.- Works on 1 to 3 word place names.
- Returns processed place names.
We define one helper functions not-in-list-find-names-helper
to ensures that an identified name does not overlap with another name or entry in the exclusion list.
Overall, the code is fairly optimized for its purpose, utilizing hash tables for constant-time look-up and lists to store identified entities.
#lang
racket
(
require
"fasttag.rkt"
)
(
require
racket/runtime-path
)
(
provide
find-human-names
)
(
provide
find-place-names
)
(
define
(
process-one-word-per-line
file-path
func
)
(
with-input-from-file
file-path
(
lambda
()
(
let
loop
()
(
let
([
l
(
read-line
)])
(
if
(
equal?
l
#f
)
#f
(
func
l
))
(
if
(
eof-object?
l
)
#f
(
loop
)))))))
(
define
*last-name-hash*
(
make-hash
))
(
process-one-word-per-line
(
string-append
(
path->string
my-data-path
)
"/human_names/names.last"
)
(
lambda
(
x
)
(
hash-set!
*last-name-hash*
x
#t
)))
(
define
*first-name-hash*
(
make-hash
))
(
process-one-word-per-line
(
string-append
(
path->string
my-data-path
)
"/human_names/names.male"
)
(
lambda
(
x
)
(
hash-set!
*first-name-hash*
x
#t
)))
(
process-one-word-per-line
(
string-append
(
path->string
my-data-path
)
"/human_names/names.female"
)
(
lambda
(
x
)
(
hash-set!
*first-name-hash*
x
#t
)))
(
define
*place-name-hash*
(
make-hash
))
(
process-one-word-per-line
(
string-append
(
path->string
my-data-path
)
"/placenames.txt"
)
(
lambda
(
x
)
(
hash-set!
*place-name-hash*
x
#t
)))
(
define
*name-prefix-list*
'
(
"Mr"
"Mrs"
"Ms"
"Gen"
"General"
"Maj"
"Major"
"Doctor"
"Vice"
"President"
"Lt"
"Premier"
"Senator"
"Congressman"
"Prince"
"King"
"Representative"
"Sen"
"St"
"Dr"
))
(
define
(
not-in-list-find-names-helper
a-list
start
end
)
(
let
((
rval
#t
))
(
do
((
x
a-list
(
cdr
x
)))
((
or
(
null?
x
)
(
let
()
(
if
(
or
(
and
(
>=
start
(
caar
x
))
(
<=
start
(
cadar
x
)))
(
and
(
>=
end
(
caar
x
))
(
<=
end
(
cadar
x
))))
(
set!
rval
#f
)
#f
)
(
not
rval
)))))
rval
))
;; return a list of sublists, each sublist looks like:
;; (("John" "Smith") (11 12) 0.75) ; last number is an importance rating
(
define
(
find-human-names
word-vector
exclusion-list
)
(
define
(
score
result-list
)
(
-
1.0
(
*
0.2
(
-
4
(
length
result-list
)))))
(
let
((
tags
(
parts-of-speech
word-vector
))
(
ret
'
())
(
ret2
'
())
(
x
'
())
(
len
(
vector-length
word-vector
))
(
word
#f
))
(
display
"
\n
tags: "
)
(
display
tags
)
;;(dotimes (i len)
(
for/list
([
i
(
in-range
len
)])
(
set!
word
(
vector-ref
word-vector
i
))
(
display
"
\n
word: "
)
(
display
word
)
;; process 4 word names: HUMAN NAMES
(
if
(
<
i
(
-
len
3
))
;; case #1: single element from '*name-prefix-list*'
(
if
(
and
(
not-in-list-find-names-helper
ret
i
(
+
i
4
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
4
))
(
member
word
*name-prefix-list*
)
(
equal?
"."
(
vector-ref
word-vector
(
+
i
1
)))
(
hash-ref
*first-name-hash*
(
vector-ref
word-vector
(
+
i
2
))
#f
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
3
))
#f
))
(
if
(
and
(
string-prefix?
(
vector-ref
tags
(
+
i
2
))
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
3
))
"NN"
))
(
set!
ret
(
cons
(
list
i
(
+
i
4
))
ret
))
#f
)
#f
)
;; case #1: two elements from '*name-prefix-list*'
(
if
(
and
(
not-in-list-find-names-helper
ret
i
(
+
i
4
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
4
))
(
member
word
*name-prefix-list*
)
(
member
(
vector-ref
word-vector
(
+
i
1
))
*name-prefix-list*
)
(
hash-ref
*first-name-hash*
(
vector-ref
word-vector
(
+
i
2
))
#f
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
3
))
#f
))
(
if
(
and
(
string-prefix?
(
vector-ref
tags
(
+
i
2
))
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
3
))
"NN"
))
(
set!
ret
(
cons
(
list
i
(
+
i
4
))
ret
))
#f
)
#f
))
;; process 3 word names: HUMAN NAMES
(
if
(
<
i
(
-
len
2
))
(
if
(
and
(
not-in-list-find-names-helper
ret
i
(
+
i
3
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
3
)))
(
if
(
or
(
and
(
member
word
*name-prefix-list*
)
(
hash-ref
*first-name-hash*
(
vector-ref
word-vector
(
+
i
1
))
#f
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
2
))
#f
)
(
string-prefix?
(
vector-ref
tags
(
+
i
1
))
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
2
))
"NN"
))
(
and
(
member
word
*name-prefix-list*
)
(
member
(
vector-ref
word-vector
(
+
i
1
))
*name-prefix-list*
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
2
))
#f
)
(
string-prefix?
(
vector-ref
tags
(
+
i
1
))
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
2
))
"NN"
))
(
and
(
member
word
*name-prefix-list*
)
(
equal?
"."
(
vector-ref
word-vector
(
+
i
1
)))
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
2
))
#f
)
(
string-prefix?
(
vector-ref
tags
(
+
i
2
))
"NN"
))
(
and
(
hash-ref
*first-name-hash*
word
#f
)
(
hash-ref
*first-name-hash*
(
vector-ref
word-vector
(
+
i
1
))
#f
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
2
))
#f
)
(
string-prefix?
(
vector-ref
tags
i
)
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
1
))
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
2
))
"NN"
)))
(
set!
ret
(
cons
(
list
i
(
+
i
3
))
ret
))
#f
)
#f
)
#f
)
;; process 2 word names: HUMAN NAMES
(
if
(
<
i
(
-
len
1
))
(
if
(
and
(
not-in-list-find-names-helper
ret
i
(
+
i
2
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
2
)))
(
if
(
or
(
and
(
member
word
'
(
"Mr"
"Mrs"
"Ms"
"Doctor"
"President"
"Premier"
))
(
string-prefix?
(
vector-ref
tags
(
+
i
1
))
"NN"
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
1
))
#f
))
(
and
(
hash-ref
*first-name-hash*
word
#f
)
(
hash-ref
*last-name-hash*
(
vector-ref
word-vector
(
+
i
1
))
#f
)
(
string-prefix?
(
vector-ref
tags
i
)
"NN"
)
(
string-prefix?
(
vector-ref
tags
(
+
i
1
))
"NN"
)))
(
set!
ret
(
cons
(
list
i
(
+
i
2
))
ret
))
#f
)
#f
)
#f
)
;; 1 word names: HUMAN NAMES
(
if
(
hash-ref
*last-name-hash*
word
#f
)
(
if
(
and
(
string-prefix?
(
vector-ref
tags
i
)
"NN"
)
(
not-in-list-find-names-helper
ret
i
(
+
i
1
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
1
)))
(
set!
ret
(
cons
(
list
i
(
+
i
1
))
ret
))
#f
)
#f
))
;; TBD: calculate importance rating based on number of occurences of name in tex\
t:
(
set!
ret2
(
map
(
lambda
(
index-pair
)
(
string-replace
(
string-join
(
vector->list
(
vector-copy
word-vector
(
car
index-pa\
ir
)
(
cadr
index-pair
))))
" ."
"."
))
ret
))
ret2
))
(
define
(
find-place-names
word-vector
exclusion-list
)
;; PLACE
(
define
(
score
result-list
)
(
-
1.0
(
*
0.2
(
-
4
(
length
result-list
)))))
(
let
((
tags
(
parts-of-speech
word-vector
))
(
ret
'
())
(
ret2
'
())
(
x
'
())
(
len
(
vector-length
word-vector
))
(
word
#f
))
(
display
"
\n
tags: "
)
(
display
tags
)
;;(dotimes (i len)
(
for/list
([
i
(
in-range
len
)])
(
set!
word
(
vector-ref
word-vector
i
))
(
display
"
\n
word: "
)
(
display
word
)
(
display
"
\n
"
)
;; process 3 word names: PLACE
(
if
(
<
i
(
-
len
2
))
(
if
(
and
(
not-in-list-find-names-helper
ret
i
(
+
i
3
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
3
)))
(
let
((
p-name
(
string-append
word
" "
(
vector-ref
word-vector
(
+
i
1
))
\
" "
(
vector-ref
word-vector
(
+
i
2
)))))
(
if
(
hash-ref
*place-name-hash*
p-name
#f
)
(
set!
ret
(
cons
(
list
i
(
+
i
3
))
ret
))
#f
))
#f
)
#f
)
;; process 2 word names: PLACE
(
if
(
<
i
(
-
len
1
))
(
if
(
and
(
not-in-list-find-names-helper
ret
i
(
+
i
2
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
2
)))
(
let
((
p-name
(
string-append
word
" "
(
vector-ref
word-vector
(
+
i
1
))
\
)))
(
if
(
hash-ref
*place-name-hash*
p-name
#f
)
(
set!
ret
(
cons
(
list
i
(
+
i
2
))
ret
))
#f
)
#f
)
#f
)
#f
)
;; 1 word names: PLACE
(
if
(
hash-ref
*place-name-hash*
word
#f
)
(
if
(
and
(
string-prefix?
(
vector-ref
tags
i
)
"NN"
)
(
not-in-list-find-names-helper
ret
i
(
+
i
1
))
(
not-in-list-find-names-helper
exclusion-list
i
(
+
i
1
)))
(
set!
ret
(
cons
(
list
i
(
+
i
1
))
ret
))
#f
)
#f
))
;; TBD: calculate importance rating based on number of occurences of name in tex\
t:
can
use
(
count-substring..
)
defined
in
utils.rkt
(
set!
ret2
(
map
(
lambda
(
index-pair
)
(
string-join
(
vector->list
(
vector-copy
word-vector
(
car
index-pai\
r
)
(
cadr
index-pair
)))
" "
))
ret
))
ret2
))
#|
(define nn (find-human-names '#("President" "George" "Bush" "went" "to" "San" "Diego\
" "to" "meet" "Ms" "." "Jones" "and" "Gen" "." "Pervez" "Musharraf" ".") '()))
(display (find-place-names '#("George" "Bush" "went" "to" "San" "Diego" "and" "Londo\
n") '()))
|#
Let’s try some examples in a Racket REPL:
1
>
Racket
-
AI
-
book
-
code
/
nlp
$
racket
2
Welcome
to
Racket
v8
.10
[
cs
]
.
3
>
(
require
nlp
)
4
loading
lex
-
hash
......
done
.
#f
5
>
(
find
-
human
-
names
'#("President" "George" "Bush" "went" "to" "San" "Diego" "to" "m\
6
eet" "Ms" "." "Jones
7
" "and" "Gen" "." "Pervez" "Musharraf" ".") '
())
8
9
+
tagging
:
#
(
President
George
Bush
went
to
San
Diego
to
meet
Ms
.
Jones
and
Gen
.
Per
\
10
vez
Musharraf
.)
11
tags
:
#
(
NNP
NNP
NNP
VBD
TO
NNP
NNP
TO
VB
NNP
CD
NNP
CC
NNP
CD
NN
NN
CD
)
12
word
:
President
13
word
:
George
14
word
:
Bush
15
word
:
went
16
word
:
to
17
word
:
San
18
word
:
Diego
19
word
:
to
20
word
:
meet
21
word
:
Ms
22
word
:
.
23
word
:
Jones
24
word
:
and
25
word
:
Gen
26
word
:
.
27
word
:
Pervez
28
word
:
Musharraf
29
word
:
.
'("Gen. Pervez Musharraf" "Ms. Jones" "San" "President George Bush")
30
> (find-place-names '
#
(
"George"
"Bush"
"went"
"to"
"San"
"Diego"
"and"
"London"
)
'())
31
32
+ tagging:#(George Bush went to San Diego and London)
33
tags: #(NNP NNP VBD TO NNP NNP CC NNP)
34
word: George
35
36
word: Bush
37
38
word: went
39
40
word: to
41
42
word: San
43
44
word: Diego
45
46
word: and
47
48
word: London
49
'
(
"London"
"San Diego"
)
50
>
NLP Wrap Up
The NLP library is still a work in progress so please check for updates to this live eBook and the GitHub repository for this book:
https://github.com/mark-watson/Racket-AI-book-code
Knowledge Graph Navigator
The Knowledge Graph Navigator (which I will often refer to as KGN) is a tool for processing a set of entity names and automatically exploring the public Knowledge Graph DBPedia using SPARQL queries. I started to write KGN for my own use to automate some things I used to do manually when exploring Knowledge Graphs, and later thought that KGN might be also useful for educational purposes. KGN shows the user the auto-generated SPARQL queries so hopefully the user will learn by seeing examples. KGN uses the SPARQL queries.
I cover SPARQL and linked data/knowledge Graphs is previous books I have written and while I give you a brief background here, I ask interested users to look at either for more details:
- The chapter Knowledge Graph Navigator in my book Loving Common Lisp, or the Savvy Programmer’s Secret Weapon
- The chapters Background Material for the Semantic Web and Knowledge Graphs, Knowledge Graph Navigator in my book Practical Artificial Intelligence Programming With Clojure
We use the Natural Language Processing (NLP) library from the last chapter to find human and place names in input text and then construct SPARQL queries to access data from DBPedia.
The KGN application is still a work in progress so please check for updates to this live eBook. The following screenshots show the current version of the application:

I have implemented parts of KGN in several languages: Common Lisp, Java, Clojure, Racket Scheme, Swift, Python, and Hy. The most full featured version of KGN, including a full user interface, is featured in my book Loving Common Lisp, or the Savvy Programmer’s Secret Weapon that you can read free online. That version performs more speculative SPARQL queries to find information compared to the example here that I designed for ease of understanding, and modification. I am not covering the basics of RDF data and SPARQL queries here. While I provide sufficient background material to understand the code, please read the relevant chapters in my Common Lisp book for more background material.

We will be running an example using data containing three person entities, one company entity, and one place entity. The following figure shows a very small part of the DBPedia Knowledge Graph that is centered around these entities. The data for this figure was collected by an example Knowledge Graph Creator from my Common Lisp book:

I chose to use DBPedia instead of WikiData for this example because DBPedia URIs are human readable. The following URIs represent the concept of a person. The semantic meanings of DBPedia and FOAF (friend of a friend) URIs are self-evident to a human reader while the WikiData URI is not:
http://www.wikidata.org/entity/Q215627
http://dbpedia.org/ontology/Person
http://xmlns.com/foaf/0.1/name
I frequently use WikiData in my work and WikiData is one of the most useful public knowledge bases. I have both DBPedia and WikiData SPARQL endpoints in the example code that we will look at later, with the WikiData endpoint comment out. You can try manually querying WikiData at the WikiData SPARQL endpoint. For example, you might explore the WikiData URI for the person concept using:
select
?p
?o
where
{
<http://www.wikidata.org/entity/Q215627>
?p
?o
.
}
limit
10
For the rest of this chapter we will just use DBPedia or data copied from DBPedia.
After looking at an interactive session using the example program for this chapter we will look at the implementation.
Entity Types Handled by KGN
To keep this example simple we handle just two entity types:
- People
- Places
The Common Lisp version of KGN also searches for relationships between entities. This search process consists of generating a series of SPARQL queries and calling the DBPedia SPARQL endpoint. I may add this feature to the Racket version of KGN in the future.
KGN Implementation
The example application works processing a list or Person, Place, and Organization names. We generate SPARQL queries to DBPedia to find information about the entities and relationships between them.
We are using two libraries developed for this book that can be found in the directories Racket-AI-book-code/sparql and Racket-AI-book-code/nlp to supply support for SPARQL queries and natural language processing.
SPARQL Client Library
We already looked at code examples for making simple SPARQL queries in the chapter Datastores and here we continue with more examples that we need to the KGN application.
The following listing shows Racket-AI-book-code/sparql/sparql.rkt where we implement several functions for interacting with DBPedia’s SPARQL endpoint. There are two functions sparql-dbpedia-for-person and sparql-dbpedia-person-uri crafted for constructing SPARQL queries. The function sparql-dbpedia-for-person takes a person URI and formulates a query to fetch associated website links and comments, limiting the results to four. On the other hand, the function sparql-dbpedia-person-uri takes a person name and builds a query to obtain the person’s URI and comments from DBpedia. Both functions utilize string manipulation to embed the input parameters into the SPARQL query strings. There are similar functions for places.
Another function sparql-query->hash executes SPARQL queries against the DBPedia endpoint. It takes a SPARQL query string as an argument, sends an HTTP request to the DBpedia SPARQL endpoint, and expects a JSON response. The call/input-url function is used to send the request, with uri-encode ensuring the query string is URL-encoded. The response is read from the port, converted to a JSON expression using the function string->jsexpr, and is expected to be in a hash form which is returned by this function.
Lastly, there are two functions json->listvals and gd for processing the JSON response from DBPedia. The function json->listvals extracts the variable bindings from the SPARQL result and organizes them into lists. The function gd further processes these lists based on the number of variables in the query result, creating lists of lists which represent the variable bindings in a structured way. The sparql-dbpedia function serves as an interface to these functionalities, taking a SPARQL query string, executing the query via sparql-query->hash, and processing the results through gd to provide a structured output. This arrangement encapsulates the process of querying DBPedia and formatting the results, making it convenient for further use within a Racket program.
We already saw most of the following code listing in the previous chapter Datastores. The following listings in this chapter will be updated in future versions of this live eBook when I finish writing the KGN application.
Part of solving this problem is constructing SPARQL queries as strings. We will look in some detail at one utility function sparql-dbpedia-for-person
that constructs a SPARQL query string for fetching data from DBpedia about a specific person. The function takes one parameter, person-uri
, which is expected to be the URI of a person in the DBpedia dataset. The query string is built by appending strings, including the dynamic insertion of the person-uri
parameter value. Here’s a breakdown of how the code works:
-
Function Definition: The function
sparql-dbpedia-for-person
is defined with one parameter,person-uri
. This parameter is used to dynamically insert the person’s URI into the SPARQL query. -
String Appending (
@string-append
): The@string-append
construct (which seems like a custom or pseudo-syntax, as the standard Scheme function for string concatenation isstring-append
without the@
) is used to concatenate multiple strings to form the complete SPARQL query. This includes static parts of the query as well as dynamic parts where theperson-uri
is inserted. -
SPARQL Query Construction: The function constructs a SPARQL query with the following key components:
-
SELECT Clause: This part of the query specifies what information to return. It uses
GROUP_CONCAT
to aggregate multiple?website
values into a single string, separated by” | “
, and also selects the?comment
variable. -
OPTIONAL Clauses: Two OPTIONAL blocks are included:
- The first block attempts to fetch English comments (
?comment
) associated with the person, filtering to ensure the language of the comment is English (lang(?comment) = ‘en’
). - The second block fetches external links (
?website
) associated with the person but filters out any URLs containing “dbpedia” (case-insensitive), to likely avoid self-references within DBpedia.
- The first block attempts to fetch English comments (
-
Dynamic URI Insertion: The
@person-uri
placeholder is replaced with the actualperson-uri
passed to the function. This dynamically targets the SPARQL query at a specific DBpedia resource. -
LIMIT Clause: The query is limited to return at most 4 results with
LIMIT 4
.
-
SELECT Clause: This part of the query specifies what information to return. It uses
-
Usage of
@person-uri
Placeholder: The code shows@person-uri
placeholders within the query string, indicating where theperson-uri
parameter’s value should be inserted. However, the mechanism for replacing these placeholders with the actual URI value is not explicitly shown in the snippet. Typically, this would involve string replacement functionality, ensuring the final query string includes the specific URI of the person of interest.
In summary, the sparql-dbpedia-for-person
function dynamically constructs a SPARQL query to fetch English comments and external links (excluding DBpedia links) for a given person from DBpedia, with the results limited to a maximum of 4 entries. The use of string concatenation (or a pseudo-syntax resembling @string-append
) allows for the dynamic insertion of the person’s URI into the query.
1
(
provide
sparql-dbpedia-person-uri
)
2
(
provide
sparql-query->hash
)
3
(
provide
json->listvals
)
4
(
provide
sparql-dbpedia
)
5
6
(
require
net/url
)
7
(
require
net/uri-codec
)
8
(
require
json
)
9
(
require
racket/pretty
)
10
11
(
define
(
sparql-dbpedia-for-person
person-uri
)
12
@string-append
{
13
SELECT
14
(
GROUP_CONCAT
(
DISTINCT
?website
; SEPARATOR=" | ")
15
AS
?website
)
?comment
{
16
OPTIONAL
{
17
@person-uri
18
<http://www.w3.org/2000/01/rdf-schema#comment>
19
?comment
.
FILTER
(
lang
(
?comment
)
=
'
en
'
)
20
}
.
21
OPTIONAL
{
22
@person-uri
23
<http://dbpedia.org/ontology/wikiPageExternalLink>
24
?website
25
.
FILTER
(
!regex
(
str
(
?website
)
,
"dbpedia"
,
"i"
))
26
}
27
}
LIMIT
4
})
28
29
(
define
(
sparql-dbpedia-person-uri
person-name
)
30
@string-append
{
31
SELECT
DISTINCT
?personuri
?comment
{
32
?personuri
33
<http://xmlns.com/foaf/0.1/name>
34
"@person-name"
@
"@"
en
.
35
?personuri
36
<http://www.w3.org/2000/01/rdf-schema#comment>
37
?comment
.
38
FILTER
(
lang
(
?comment
)
=
'
en
'
)
.
39
}})
40
41
42
(
define
(
sparql-query->hash
query
)
43
(
call/input-url
44
(
string->url
45
(
string-append
46
"https://dbpedia.org/sparql?query="
47
(
uri-encode
query
)))
48
get-pure-port
49
(
lambda
(
port
)
50
(
string->jsexpr
(
port->string
port
)))
51
'
(
"Accept: application/json"
)))
52
53
(
define
(
json->listvals
a-hash
)
54
(
let
((
bindings
(
hash->list
a-hash
)))
55
(
let*
((
head
(
first
bindings
))
56
(
vars
(
hash-ref
(
cdr
head
)
'
vars
))
57
(
results
(
second
bindings
)))
58
(
let*
((
x
(
cdr
results
))
59
(
b
(
hash-ref
x
'
bindings
)))
60
(
for/list
61
([
var
vars
])
62
(
for/list
([
bc
b
])
63
(
let
((
bcequal
64
(
make-hash
(
hash->list
bc
))))
65
(
let
((
a-value
66
(
hash-ref
67
(
hash-ref
68
bcequal
69
(
string->symbol
var
))
'
value
)))
70
(
list
var
a-value
)))))))))
71
72
73
(
define
gd
(
lambda
(
data
)
74
75
(
let
((
jd
(
json->listvals
data
)))
76
77
(
define
gg1
78
(
lambda
(
jd
)
(
map
list
(
car
jd
))))
79
(
define
gg2
80
(
lambda
(
jd
)
(
map
list
(
car
jd
)
(
cadr
jd
))))
81
(
define
gg3
82
(
lambda
(
jd
)
83
(
map
list
(
car
jd
)
(
cadr
jd
)
(
caddr
jd
))))
84
(
define
gg4
85
(
lambda
(
jd
)
86
(
map
list
87
(
car
jd
)
(
cadr
jd
)
88
(
caddr
jd
)
(
cadddr
jd
))))
89
90
(
case
(
length
(
json->listvals
data
))
91
[(
1
)
(
gg1
(
json->listvals
data
))]
92
[(
2
)
(
gg2
(
json->listvals
data
))]
93
[(
3
)
(
gg3
(
json->listvals
data
))]
94
[(
4
)
(
gg4
(
json->listvals
data
))]
95
[
else
96
(
error
"sparql queries with 1 to 4 vars"
)]))))
97
98
99
(
define
sparql-dbpedia
100
(
lambda
(
sparql
)
101
(
gd
(
sparql-query->hash
sparql
)))
The function gd converts JSON data to Scheme nested lists and then extracts the values for up to four variables.
NLP Library
We implemented a library in the chapter Natural Language Processing that we use here.
Please make sure you have read that chapter before the following sections.
Implementation of KGN Application Code
The file Racket-AI-book-code/kgn/main.rkt contains library boilerplate and the file Racket-AI-book-code/kgn/kgn.rkt the application code. The provided Racket scheme code is structured for interacting with the DBPedia SPARQL endpoint to retrieve information about persons or places based on a user’s string query. The code is organized into several defined functions aimed at handling different steps of the process:
Query Parsing and Entity Recognition:
The parse-query function takes a string query-str and tokenizes it into a list of words after replacing certain characters (like “.” and “?”). It then checks for keywords like “who” or “where” to infer the type of query - person or place. Using find-human-names and find-place-names (defined in the earlier section on SPARQL), it extracts the entity names from the tokens. Depending on the type of query and the entities found, it returns a list indicating the type and name of the entity, or unknown if no relevant entities are identified.
SPARQL Query Construction and Execution:
The functions get-person-results and get-place-results take a name string, construct a SPARQL query to get information about the entity from DBPedia, execute the query, and process the results. They utilize the sparql-dbpedia-person-uri, sparql-query->hash, and json->listvals functions that we listed previously to construct the query, execute it, and convert the returned JSON data to a list, respectively.
Query Interface:
The ui-query-helper function acts as the top-level utility for processing a string query to generate a SPARQL query, execute it, and return the results. It first calls parse-query to understand the type of query and the entity in question. Depending on whether the query is about a person or a place, it invokes get-person-results or get-place-results, respectively, to get the relevant information from DBPedia. It then returns a list containing the SPARQL query and the results, or #f if the query type is unknown.
This code structure facilitates the breakdown of a user’s natural language query into actionable SPARQL queries to retrieve and present information about identified entities from a structured data source like DBPedia.
1
(
require
racket/pretty
)
2
(
require
nlp
)
3
4
(
provide
get-person-results
)
5
(
provide
ui-query-helper
)
6
7
(
require
"sparql-utils.rkt"
)
8
9
(
define
(
get-person-results
person-name-string
)
10
(
let
((
person-uri
(
sparql-dbpedia-person-uri
person-name-string
)))
11
(
let*
((
hash-data
(
sparql-query->hash
person-uri
)))
12
(
list
13
person-uri
14
(
extract-name-uri-and-comment
15
(
first
(
json->listvals
hash-data
))
(
second
(
json->listvals
hash-data
)))))))
16
17
18
(
define
(
get-place-results
place-name-string
)
19
(
let
((
place-uri
(
sparql-dbpedia-place-uri
place-name-string
)))
20
(
let*
((
hash-data
(
sparql-query->hash
place-uri
)))
21
(
list
22
place-uri
23
(
extract-name-uri-and-comment
24
(
first
(
json->listvals
hash-data
))
(
second
(
json->listvals
hash-data
)))))))
25
26
27
(
define
(
parse-query
query-str
)
28
(
let
((
cleaned-query-tokens
29
(
string-split
(
string-replace
(
string-replace
query-str
"."
" "
)
"?"
" "
))
\
30
))
31
(
printf
"
\n
+ + + cleaned-query-tokens:~a
\n
"
cleaned-query-tokens
)
32
(
if
(
member
"who"
cleaned-query-tokens
)
33
(
let
((
person-names
(
find-human-names
(
list->vector
cleaned-query-tokens
)
'
(
\
34
))))
35
(
printf
"
\n
+ + person-names= ~a
\n
"
person-names
)
36
(
if
(
>
(
length
person-names
)
0
)
37
(
list
'
person
(
first
person-names
))
;; for now, return the first nam\
38
e
found
39
#f
))
40
(
if
(
member
"where"
cleaned-query-tokens
)
41
(
let
((
place-names
(
find-place-names
(
list->vector
cleaned-query-tokens
)
\
42
'
())))
43
(
printf
"
\n
+ + place-names= ~a
\n
"
place-names
)
44
(
if
(
>
(
length
place-names
)
0
)
45
(
list
'
place
(
first
place-names
))
;; for now, return the first p\
46
lace
name
found
47
(
list
'
unknown
query-str
)))
48
(
list
'
unknown
query-str
)))))
;; no person or place name match so just r\
49
eturn
original
query
50
51
(
define
(
ui-query-helper
query-str
)
;; top level utility function for string query \
52
->
1
)
generated
sparql
2
)
query
function
53
(
display
"in ui-query-helper: query-str="
)
(
display
query-str
)
54
(
let*
((
parse-results
(
parse-query
query-str
))
55
(
question-type
(
first
parse-results
))
56
(
entity-name
(
second
parse-results
)))
57
(
display
(
list
parse-results
question-type
entity-name
))
58
(
if
(
equal?
question-type
'
person
)
59
(
let*
((
results2
(
get-person-results
entity-name
))
60
(
sparql
(
car
results2
))
61
(
results
(
second
results2
)))
62
(
printf
"
\n
++ results: ~a
\n
"
results
)
63
(
list
sparql
results
))
64
(
if
(
equal?
question-type
'
place
)
65
(
let*
((
results2
(
get-place-results
entity-name
))
66
(
sparql
(
car
results2
))
67
(
results
(
second
results2
)))
68
(
list
sparql
results
))
69
#f
)))
)
The file Racket-AI-book-code/kgn/dialog-utils.rkt contains the user interface specific code for implementing a dialog box.
1
(
require
htdp/gui
)
2
(
require
racket/gui/base
)
3
(
require
racket/pretty
)
4
(
provide
make-selection-functions
)
5
6
(
define
(
make-selection-functions
parent-frame
title
)
7
(
let*
((
dialog
8
(
new
dialog%
9
[
label
title
]
10
[
parent
parent-frame
]
11
[
width
440
]
12
[
height
480
]))
13
(
close-callback
14
(
lambda
(
button
event
)
15
(
send
dialog
show
#f
)))
16
(
entity-chooser-dialog-list-box
17
(
new
list-box%
18
[
label
""
]
19
[
choices
(
list
"aaaa"
"bbbb"
)]
20
[
parent
dialog
]
21
[
callback
(
lambda
(
click
event
)
22
(
if
(
equal?
(
send
event
get-event-type
)
'
list-box-dclick
)
23
(
close-callback
click
event
)
24
#f
))]))
25
(
quit-button
26
(
new
button%
[
parent
dialog
]
27
[
label
"Select an entity"
]
28
[
callback
close-callback
]))
29
(
set-new-items-and-show-dialog
30
(
lambda
(
a-list
)
31
(
send
entity-chooser-dialog-list-box
set-selection
0
)
32
(
send
entity-chooser-dialog-list-box
set
a-list
)
33
(
send
dialog
show
#t
)))
34
(
get-selection-index
(
lambda
()
(
first
(
send
entity-chooser-dialog-list-box\
35
get-selections
)))))
36
(
list
set-new-items-and-show-dialog
get-selection-index
)))
The local file sparql-utils.rkt contains additional utility functions for accessing information in DBPedia.
1
(
provide
sparql-dbpedia-for-person
)
2
(
provide
sparql-dbpedia-person-uri
)
3
(
provide
sparql-query->hash
)
4
(
provide
json->listvals
)
5
(
provide
extract-name-uri-and-comment
)
6
7
(
require
net/url
)
8
(
require
net/uri-codec
)
9
(
require
json
)
10
(
require
racket/pretty
)
11
12
(
define
ps-encoded-by
"ps:P702"
)
13
(
define
wdt-instance-of
"wdt:P31"
)
14
(
define
wdt-in-taxon
"wdt:P703"
)
15
(
define
wd-human
"wd:Q15978631"
)
16
(
define
wd-mouse
"wd:Q83310"
)
17
(
define
wd-rat
"wd:Q184224"
)
18
(
define
wd-gene
"wd:Q7187"
)
19
20
(
define
(
sparql-dbpedia-for-person
person-uri
)
21
@string-append
{
22
SELECT
23
(
GROUP_CONCAT
(
DISTINCT
?website
; SEPARATOR=" | ") AS ?website) ?comment {
24
OPTIONAL
{
@person-uri
<http://www.w3.org/2000/01/rdf-schema#comment>
?comment\
25
.
FILTER
(
lang
(
?comment
)
=
'
en
'
)
}
.
26
OPTIONAL
{
@person-uri
<http://dbpedia.org/ontology/wikiPageExternalLink>
?web\
27
site
.
FILTER
(
!regex
(
str
(
?website
)
,
"dbpedia"
,
"i"
))}
.
28
}
LIMIT
4
})
29
30
(
define
(
sparql-dbpedia-person-uri
person-name
)
31
@string-append
{
32
SELECT
DISTINCT
?personuri
?comment
{
33
?personuri
<http://xmlns.com/foaf/0.1/name>
"@person-name"
@
"@"
en
.
34
?personuri
<http://www.w3.org/2000/01/rdf-schema#comment>
?comment
.
35
FILTER
(
lang
(
?comment
)
=
'
en
'
)
.
36
}})
37
38
(
define
(
sparql-query->hash
query
)
39
(
call/input-url
(
string->url
(
string-append
"https://dbpedia.org/sparql?query="
(
u\
40
ri-encode
query
)))
41
get-pure-port
42
(
lambda
(
port
)
43
(
string->jsexpr
(
port->string
port
))
44
)
45
'
(
"Accept: application/json"
)))
46
47
(
define
(
json->listvals
a-hash
)
48
(
let
((
bindings
(
hash->list
a-hash
)))
49
(
let*
((
head
(
first
bindings
))
50
(
vars
(
hash-ref
(
cdr
head
)
'
vars
))
51
(
results
(
second
bindings
)))
52
(
let*
((
x
(
cdr
results
))
53
(
b
(
hash-ref
x
'
bindings
)))
54
(
for/list
([
var
vars
])
55
(
for/list
([
bc
b
])
56
(
let
((
bcequal
(
make-hash
(
hash->list
bc
))))
57
(
let
((
a-value
(
hash-ref
(
hash-ref
bcequal
(
string->symbol
var\
58
))
'
value
)))
59
(
list
var
a-value
)))))))))
60
61
(
define
extract-name-uri-and-comment
(
lambda
(
l1
l2
)
62
(
map
;; perform a "zip" action on teo lists
63
(
lambda
(
a
b
)
64
(
list
(
second
a
)
(
second
b
)))
65
l1
l2
)))
The local file kgn.rkt is the main program for this application.
1
(
require
htdp/gui
)
;; note: when building executable, choose GRacket, not Racket to\
2
get
*.app
bundle
3
(
require
racket/gui/base
)
4
(
require
racket/match
)
5
(
require
racket/pretty
)
6
(
require
scribble/text/wrap
)
7
8
;; Sample queries:
9
;; who is Bill Gates
10
;; where is San Francisco
11
;; (only who/where queries are currently handled)
12
13
;;(require "utils.rkt")
14
(
require
nlp
)
15
(
require
"main.rkt"
)
16
(
require
"dialog-utils.rkt"
)
17
18
(
define
count-substring
19
(
compose
length
regexp-match*
))
20
21
(
define
(
short-string
s
)
22
(
if
(
<
(
string-length
s
)
75
)
23
s
24
(
substring
s
0
73
)))
25
26
(
define
dummy
(
lambda
(
button
event
)
(
display
"
\n
dummy
\n
"
)))
;; this will be redefin\
27
ed
after
UI
objects
are
created
28
29
(
let
((
query-callback
(
lambda
(
button
event
)
(
dummy
button
event
))))
30
(
match-let*
([
frame
(
new
frame%
[
label
"Knowledge Graph Navigator"
]
31
[
height
400
]
[
width
608
]
[
alignment
'
(
left
top
)])]
32
[(
list
set-new-items-and-show-dialog
get-selection-index
)
; returns l\
33
ist
of
2
callback
functions
34
(
make-selection-functions
frame
"Test selection list"
)]
35
[
query-field
(
new
text-field%
36
[
label
" Query:"
]
[
parent
frame
]
37
[
callback
38
(
lambda
(
k
e
)
39
(
if
(
equal?
(
send
e
get-event-type
)
'
text-field-\
40
enter
)
(
query-callback
k
e
)
#f
))])]
41
[
a-container
(
new
pane%
42
[
parent
frame
]
[
alignment
'
(
left
top
)])]
43
[
a-message
(
new
message%
44
[
parent
frame
]
[
label
" Generated SPARQL:"
])]
45
[
sparql-canvas
(
new
text-field%
46
(
parent
frame
)
(
label
""
)
47
[
min-width
380
]
[
min-height
200
]
48
[
enabled
#f
])]
49
[
a-message-2
(
new
message%
[
parent
frame
]
[
label
" Results:"
])]
50
[
results-canvas
(
new
text-field%
51
(
parent
frame
)
(
label
""
)
52
[
min-height
200
]
[
enabled
#f
])]
53
[
a-button
(
new
button%
[
parent
a-container
]
54
[
label
"Process: query -> generated SPARQL -> results
\
55
from DBPedia"
]
56
[
callback
query-callback
])])
57
(
display
"
\n
before setting new query-callback
\n
"
)
58
(
set!
59
dummy
;; override dummy labmda defined earlier
60
(
lambda
(
button
event
)
61
(
display
"
\n
+ in query-callback
\n
"
)
62
(
let
((
query-helper-results-all
(
ui-query-helper
(
send
query-field
get-value
)
\
63
)))
64
(
if
(
equal?
query-helper-results-all
#f
)
65
(
let
()
66
(
send
sparql-canvas
set-value
"no generated SPARQL"
)
67
(
send
results-canvas
set-value
"no results"
))
68
(
let*
((
sparql-results
(
first
query-helper-results-all
))
69
(
query-helper-results-uri-and-description
(
cadr
query-helper-res\
70
ults-all
))
71
(
uris
(
map
first
query-helper-results-uri-and-description
))
72
(
query-helper-results
(
map
second
query-helper-results-uri-and-d\
73
escription
)))
74
(
display
"
\n
++ query-helper-results:
\n
"
)
(
display
query-helper-result\
75
s
)
(
display
"
\n
"
)
76
(
if
(
=
(
length
query-helper-results
)
1
)
77
(
let
()
78
(
send
sparql-canvas
set-value
sparql-results
)
79
(
send
results-canvas
set-value
80
(
string-append
(
string-join
(
wrap-line
(
first
query-hel\
81
per-results
)
95
)
"
\n
"
)
"
\n\n
"
(
first
uris
))))
82
(
if
(
>
(
length
query-helper-results
)
1
)
83
(
let
()
84
(
set-new-items-and-show-dialog
(
map
short-string
query-help\
85
er-results
))
86
(
set!
query-helper-results
87
(
let
((
sel-index
(
get-selection-index
)))
88
(
if
(
>
sel-index
-1
)
89
(
list-ref
query-helper-results
sel-index
)
90
'
(
""
))))
91
(
set!
uris
(
list-ref
uris
(
get-selection-index
)))
92
(
display
query-helper-results
)
93
(
send
sparql-canvas
set-value
sparql-results
)
94
(
send
results-canvas
set-value
95
(
string-append
(
string-join
(
wrap-line
query-helper\
96
-results
95
)
"
\n
"
)
"
\n\n
"
uris
)))
97
(
send
results-canvas
set-value
(
string-append
"No results fo
\
98
r: "
(
send
query-field
get-value
))))))))))
99
(
send
frame
show
#t
)))
The two screen shot figures seen earlier show the GUI application running.
Knowledge Graph Navigator Wrap Up
This KGN example was hopefully both interesting to you and simple enough in its implementation to use as a jumping off point for your own projects.
I had the idea for the KGN application because I was spending quite a bit of time manually setting up SPARQL queries for DBPedia (and other public sources like WikiData) and I wanted to experiment with partially automating this process. I have experimented with versions of KGN written in Java, Hy language (Lisp running on Python that I wrote a short book on), Swift, and Common Lisp and all four implementations take different approaches as I experimented with different ideas.
Conclusions
The material in this book was informed by my own work interests and experiences. If you enjoyed reading it and you make practical use of at least some of the material I covered, then I consider my effort to be worthwhile.
Racket is a language that many people use for both fun personal projects and for professional development. I have tried, dear reader, to make the case here that Racket is a practical language that integrates well with my work flows on both Linux and macOS.
Writing software is a combination of a business activity, promoting good for society, and an exploration to try out new ideas for self improvement. I believe that there is sometimes a fine line between spending too many resources tracking many new technologies versus getting stuck using old technologies at the expense of lost opportunities. My hope is that reading this book was an efficient and pleasurable use of your time, letting you try some new techniques and technologies that you had not considered before.
If we never get to meet in person or talk on the telephone, then I would like to thank you now for taking the time to read my book.