LangChain and LlamaIndex Projects Lab Book: Hooking Large Language Models Up to the Real World
LangChain and LlamaIndex Projects Lab Book: Hooking Large Language Models Up to the Real World
Mark Watson
Buy on Leanpub

Table of Contents


I have been working in the field of artificial intelligence since 1982 and without a doubt Large Language Models (LLMs) like GPT-4 represent the greatest advance in practical AI technology that I have experienced. Infrastructure projects like LangChain and LlamaIndex make it simpler to use LLMs and provide some level of abstraction to facilitate switching between LLMs. This book will use the LangChain and LlamaIndex projects along with the OpenAI GPT-3.5, GPT-4, GPT-4.5 APIs, and local models run on your computer to solve a series of interesting problems.

If you read my eBooks free online then please consider tipping me

As I write this new edition in May 2024, I mostly run local LLMs but I still use OpenAI APIs, as well as well as APIs from Anthropic for the Claude 2 model and APIs from the French company Mistral.

Harrison Chase started the LangChain project in October 2022 and as I wrote the first edition of this book in February 2023, the GitHub repository for LangChain 171 contributors and as I write this new edition LangChain has 2100+ contributors on GitHub. Jerry Liu started the GPT Index project (recently renamed to LlamaIndex) at the end of 2022 and the GitHub Repository for LlamaIndex currently has 54 contributors.

The GitHub repository for examples in this book is Please note that I usually update the code in the examples repository fairly frequently for library version updates, etc.

While the documentation and examples online for LangChain and LlamaIndex are excellent, I am still motivated to write this book to solve interesting problems that I like to work on involving information retrieval, natural language processing (NLP), dialog agents, and the semantic web/linked data fields. I hope that you, dear reader, will be delighted with these examples and that at least some of them will inspire your future projects.

LLM Hallucinations and RAG, Summarization, Structured Data Conversion, and Fact/Relationship Extraction LLM Applications

When we chat with LLMs and rely on the innate information they provide the results often contain so-called “hallucinations” that can occur when a model does not know an answer so it generates plausible sounding text. Many of the examples in this book are Retrieval Augmented Generation (RAG) style applications. RAG applications provide context text with chat prompts or queries so LLMs prioritize using the provided information to answer questions. The latest Google Gemini model has an input context size of one million tokens and even smaller models you can run on your own computer using tools like Ollama often support up to 128K context size.

Beyond RAG, other effective LLM applications effective at minimizing hallucinations include text summarization, structured data conversion and extraction, and fact/relationship extraction. In text summarization, LLMs condense lengthy documents into concise summaries, focusing on core information. Structured data conversion and extraction involve transforming unstructured text into organized formats or pinpointing specific information. Fact/relationship extraction allows LLMs to identify and understand key connections within text, making them less prone to misinterpretation and hallucinations.

Comparing LangChain and LlamaIndex

LangChain and LlamaIndex are two distinct frameworks designed to use the capabilities of large language models (LLMs) by integrating them into various applications. LangChain is a more general-purpose framework that provides extensive flexibility and control, allowing developers to build a wide range of applications. It is particularly noted for its ability to handle complex tasks by offering granular control over components and the ability to optimize performance across different use cases. LangChain’s architecture is designed to be modular, enabling the chaining of different components to address specific requirements, which makes it suitable for creating sophisticated, context-aware query engines and semantic search applications.

On the other hand, LlamaIndex is specifically tailored for indexing and retrieval tasks focusing on enhancing the performance of LLMs in these areas. It provides a streamlined interface for connecting custom data sources to LLMs, making it ideal for developers looking to build powerful search and retrieval systems. LlamaIndex optimizes the indexing and retrieval process, which results in increased speed and accuracy in data search and summarization tasks. This framework is particularly effective in environments where quick and accurate extraction of information from large datasets is crucial.

Both frameworks have their unique strengths and cater to different aspects of LLM application development. LangChain’s versatility makes it suitable for a broader range of applications offering developers the ability to customize and extend their LLM capabilities extensively. In contrast LlamaIndex is the go-to choice for specific use cases centered around efficient data retrieval making it highly effective for applications that require robust search functionalities. Each framework thus serves distinct needs in the ecosystem of LLM-powered applications, with LangChain providing a comprehensive toolkit for diverse applications and LlamaIndex offering specialized tools for optimized search and retrieval.

About the Author

I have written over 20 books, I have over 50 US patents, and I have worked at interesting companies like Google, Capital One, SAIC, Mind AI, and others. You can find links for reading most of my recent books free on my web site If I had to summarize my career the short take would be that I have had a lot of fun and enjoyed my work. I hope that what you learn here will be both enjoyable and help you in your work.

If you would like to support my work please consider purchasing my books on Leanpub and star my git repositories that you find useful on GitHub. You can also interact with me on social media on Mastodon and Twitter. I am also available as a consultant:

Book Cover

I live in Sedona, Arizona. I took the book cover photo in January 2023 from the street that I live on.


This picture shows me and my wife Carol who helps me with book production and editing.

Mark and Carol Watson
Figure 1. Mark and Carol Watson

I would also like to thank the following readers who reported errors or typos in this book: Armando Flores, Peter Solimine, and David Rupp.

Requirements for Running and Modifying Book Examples

I show full source code and a fair amount of example output for each book example so if you don’t want to get access to some of the following APIs then you can still read along in the book.

To use OpenAI’s GPT-3 and ChatGPT models you will need to sign up for an API key (free tier is OK) at and set the environment variable OPENAI_API_KEY to your key value.

You will need to get an API key for examples using Google’s Knowledge Graph APIs.

The example programs using Google’s Knowledge Graph APIs assume that you have the file ~/.google_api_key in your home directory that contains your key from

You will need to install SerpApi for examples integrating web search. Please reference: PyPi project page.

You can sign up for a free non-commercial 100 searches/month account with an email address and phone number at

You will also need Zapier account for the GMail and Google Calendar examples.

After reading though this book, you can review the website LangChainHub which contains prompts, chains and agents that are useful for building LLM applications.

Issues and Workarounds for Using the Material in this Book

The libraries that I use in this book are frequently updated and sometimes the documentation or code links change, invalidating links in this book. I will try to keep everything up to date. Please report broken links to me.

In some cases you will need to use specific versions or libraries for some of the code examples.

Because the Python code listings use colorized text you may find that copying code from this eBook may drop space characters. All of the code listings are in the GitHub repository for this book so you should clone the repository to experiment with the example code.

Large Language Model Overview

Large language models are a subset of artificial intelligence that use deep learning and neural networks to process natural language. Transformers are a type of neural network architecture that can learn context in sequential data using self-attention mechanisms. They were introduced in 2017 by a team at Google Brain and have become popular for LLM research. Some older examples of transformer-based LLMs are BERT, GPT-3, T5 and Megatron-LM.

The main points we will discuss in this book are:

  • LLMs are deep learning algorithms that can understand and generate natural language based on massive datasets.
  • LLMs use techniques such as self-attention, masking, and fine-tuning to learn complex patterns and relationships in language. LLMs can understand and generate natural language because they use transformer models, which are a type of neural network that can process sequential data such as text using attention mechanisms. Attention mechanisms allow the model to focus on relevant parts of the input and output sequences while ignoring irrelevant ones.
  • LLMs can perform various natural language processing (NLP) and natural language generation (NLG) tasks, such as summarization, translation, prediction, classification, and question answering.
  • Even though LLMs were initially developed for NLP applications, LLMs have also shown potential in other domains such as computer vision and computational biology by leveraging their generalizable knowledge and transfer learning abilities.

BERT models are one of the first types of transformer models that were widely used. BERT was developed by Google AI Language in 2018. BERT models are a family of masked language models that use transformer architecture to learn bidirectional representations of natural language. BERT models can understand the meaning of ambiguous words by using the surrounding text as context. The “magic trick” here is that training data comes almost free because in masking models, you programatically chose random words, replace them with a missing word token, and the model is trained to predict the missing words. This process is repeated with massive amounts of training data from the web, books, etc.

Here are some “papers with code” links for BERT (links are for code, paper links in the code repositories):

Technological Change is Increasing at an Exponential Rate

When I wrote the first edition of this book it was difficult to run LLMs locally on my own computers. Now in May 2024, I can use Ollama to run very useful models on the old M1 8G MacBook I am writing this on:

1  $ ollama list
2 NAME                    ID              SIZE    MODIFIED    
3 dolphin-mistral:latest  5dc8c5a2be65    4.1 GB  4 weeks ago 
4 dolphin-mistral:v2.8    5dc8c5a2be65    4.1 GB  4 weeks ago 
5 gemma:2b                b50d6c999e59    1.7 GB  2 weeks ago 
6 llama3:latest           71a106a91016    4.7 GB  8 days ago  
7 phi3:latest             a2c89ceaed85    2.3 GB  5 days ago 

The llama3 model released recently by Meta is arguably more powerful than any models I could run on my M2 32G system in late February. The good news is that techniques you learn now for incorporating LLMs into your own applications and you increased knowledge of and ease of writing effective prompts for LLMs will be useful even as models become more powerful.

What LLMs Are and What They Are Not

Large Language Models are text predictors. Given a prompt, or context text and a prompt or question, an LLM predicts a highly likely text completion. As human beings we have a tendency to ascribe deep intelligence and world knowledge to LLMs. I try to avoid this misconception. A year ago I asked ChatGPT to write a poem about my pet parrot escaping out the window in the style of poet Elizabeth Bishop. When an friend asked that ChatGPT rewrite the poem in the style of more modern poet Billy Collins we both were surprised how closely it mimicked the styles of both poets. Surely this must be some deep form of intelligence, right? No, this phenomenon is text prediction on a model trained on most books and most web content.

LLMs compress knowledge of language and some knowledge of the world into a compact representation. Clever software developers can certainly build useful and interesting systems using LLMs and this is the main topic of this book. My hope is that by experimenting with writing prompts, learning the differences between available models, and practicing applying LLMs to transform textual data that you will develop your own good ideas and build your own applications that you and other people find useful.

Big Tech Businesses vs. Small Startups Using Large Language Models

Both Microsoft and Google play both sides of this business game: they want to sell cloud LLM services to developers and small startup companies and they would also like to achieve lock-in for their consumer services like Office 365, Google Docs and Sheets, etc.

Microsoft has been integrating AI technology into workplace emails, slideshows, and spreadsheets as part of its ongoing partnership with OpenAI, the company behind ChatGPT. Microsoft’s Azure OpenAI service offers a powerful tool to enable these outcomes when leveraged with their data lake of more than two billion metadata and transactional elements.

Google has opened access to their Gemini Model based AI/chat search service. I have used various Google APIs for years in code I write. I have no favorites in the battle between tech giants, rather I am mostly interested in what they build that I can use in my own projects.

Hugging Face, creates LLMs and also hosts those developed by other companies, is working on open-source rivals to ChatGPT and will use AWS for that as well. Cohere AI, Anthropic, Hugging Face, and Stability AI are some of the startups that are competing with OpenAI and Hugging Face APIs. Hugging Face is a great source of specialized models, that is, standard models that have been fine tuned for specific applications. I love that Hugging Face models can be run via their APIs and also self-hosted on our own servers and sometimes even on our laptops. Hugging Face is a fantastic resource and even though I use their models much less frequently in this book than OpenAI APIs, you should embrace the hosting and open source flexibility of Hugging Face. Starting in late 2023 I also stated heavily using the Ollama platform for downloading and running models on my laptop. There is a chapter in this book on using Ollama. In this book I most freequently use OpenAI APIs because they are so widely used.

Dear reader, I didn’t write this book for developers working at established AI companies (although I hope such people find the material here useful). I wrote this book for small developers who want to scratch their own itch by writing tools that save them time. I also wrote this book hoping that it would help developers build capabilities into the programs they design and write that rival what the big tech companies are doing.

Getting Started With LangChain

LangChain is a framework for building applications with large language models (LLMs) through chaining different components together. Some of the applications of LangChain are chatbots, generative question-answering, summarization, data-augmented generation and more. LangChain can save time in building chatbots and other systems by providing a standard interface for chains, agents and memory, as well as integrations with other tools and end-to-end examples. We refer to “chains” as sequences of calls (to an LLMs and a different program utilities, cloud services, etc.) that go beyond just one LLM API call. LangChain provides a standard interface for chains, many integrations with other tools, and end-to-end chains for common applications. Often you will find existing chains already written that meet the requirements for your applications.

For example, one can create a chain that takes user input, formats it using a PromptTemplate, and then passes the formatted response to a Large Language Model (LLM) for processing.

While LLMs are very general in nature which means that while they can perform many tasks effectively, they often can not directly provide specific answers to questions or tasks that require deep domain knowledge or expertise. LangChain provides a standard interface for agents, a library of agents to choose from, and examples of end-to-end agents.

LangChain Memory is the concept of persisting state between calls of a chain or agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory. LangChain provides a large collection of common utils to use in your application. Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

LangChain can be integrated with one or more model providers, data stores, APIs, etc. LangChain can be used for in-depth question-and-answer chat sessions, API interaction, or action-taking. LangChain can be integrated with Zapier’s platform through a natural language API interface (we have an entire chapter dedicated to Zapier integrations).

Installing Necessary Packages

For the purposes of examples in this book, you might want to create a new Anaconda or other Python environment and install:

1 pip install -U langchain langchain-openai faiss-cpu tiktoken
2 pip install -U langchain llama_index langchain-openai
3 pip install -U langchain-community langchainhub
4 pip install -U kor pydrive pandas rdflib 
5 pip install -U google-search-results SPARQLWrapper

For the rest of this chapter we will use the subdirectory langchain_getting_started and in the next chapter use llama-index_case_study in the GitHub repository for this book.

Creating a New LangChain Project

Simple LangChain projects are often just a very short Python script file. As you read this book, when any example looks interesting or useful, I suggest copying the requirements.txt and Python source files to a new directory and making your own GitHub private repository to work in. Please make the examples in this book “your code,” that is, freely reuse any code or ideas you find here.

Basic Usage and Examples

While I try to make the material in this book independent, something you can enjoy with no external references, you should also take advantage of the high quality Langchain Quickstart Guide documentation and the individual detailed guides for prompts, chat, document loading, indexes, etc.

As we work through some examples please keep in mind what it is like to use the ChatGPT web application: you enter text and get responses. The way you prompt ChatGPT is obviously important if you want to get useful responses. In code examples we automate and formalize this manual process.

You need to choose a LLM to use. We will usually choose the GPT-4 API from OpenAI because it is general purpose but is more expensive that the older GPT-3.5 APIs. You will need to sign up for an API key and set it as an environment variable:


Both the libraries openai and langchain will look for this environment variable and use it. We will look at a few simple examples in a Python REPL. We will start by just using OpenAI’s text prediction API:

 1 $ python
 2 >>> from langchain_openai import OpenAI
 3 >>> llm = OpenAI(temperature=0.8)
 4 >>> s = llm("John got into his new sports car, and he drove it")
 5 >>> s
 6 ' to work\n\nJohn started up his new sports car and drove it to work. He had a huge \
 7 smile on his face as he drove, excited to show off his new car to his colleagues. Th
 8 e wind blowing through his hair, and the sun on his skin, he felt a sense of freedom
 9  and joy as he cruised along the road. He arrived at work in no time, feeling refres
10 hed and energized.'
11 >>> s = llm("John got into his new sports car, and he drove it")
12 >>> s
13 " around town\n\nJohn drove his new sports car around town, enjoying the feeling of \
14 the wind in his hair. He stopped to admire the view from a scenic lookout, and then 
15 sped off to the mall to do some shopping. On the way home, he took a detour down a w
16 inding country road, admiring the scenery and enjoying the feel of the car's powerfu
17 l engine. By the time he arrived back home, he had a huge smile on his face."

Notice how when we ran the same input text prompt twice that we see different results. Setting the temperature in line 3 to a higher value increases the randomness.

Our next example is in the source file in the directory langchain_getting_started and uses the PromptTemplate class. A prompt template is a reproducible way to generate a prompt. It contains a text string (“the template”), that can take in a set of parameters from the end user and generate a prompt. The prompt template may contain language model instructions, few-shot examples to improve the model’s response, or specific questions for the model to answer.

 1 from langchain.prompts import PromptTemplate
 2 from langchain_openai import OpenAI
 3 llm = OpenAI(temperature=0.9)
 5 def get_directions(thing_to_do):
 6     prompt = PromptTemplate(
 7         input_variables=["thing_to_do"],
 8         template="How do I {thing_to_do}?",
 9     )
10     prompt_text = prompt.format(thing_to_do=thing_to_do)
11     print(f"\n{prompt_text}:")
12     return llm(prompt_text)
14 print(get_directions("get to the store"))
15 print(get_directions("hang a picture on the wall"))

You could just write Python string manipulation code to create a prompt but using the utility class PromptTemplate is more legible and works with any number of prompt input variables.

The output is:

 1 $ python
 3 How do I get to the store?:
 5 To get to the store, you will need to use a mode of transportation such as walking, \
 6 driving, biking, or taking public transportation. Depending on the location of the s
 7 tore, you may need to look up directions or maps to determine the best route to take
 8 .
10 How do I hang a picture on the wall?:
12 1. Find a stud in the wall, or use two or three wall anchors for heavier pictures.
13 2. Measure and mark the wall where the picture hanger will go. 
14 3. Pre-drill the holes and place wall anchors if needed.
15 4. Hammer the picture hanger into the holes.
16 5. Hang the picture on the picture hanger.

The next example in the file is derived from an example in the LangChain documentation. In this example we use PromptTemplate that contains the pattern we would like the LLM to use when returning a response.

 1 from langchain.prompts import PromptTemplate
 2 from langchain_openai import OpenAI
 3 llm = OpenAI(temperature=0.9)
 5 def get_country_information(country_name):
 6     print(f"\nProcessing {country_name}:")
 7     global prompt
 8     if "prompt" not in globals():
 9         print("Creating prompt...")
10         prompt = PromptTemplate(
11             input_variables=["country_name"],
12             template = """
13 Predict the capital and population of a country.
15 Country: {country_name}
16 Capital:
17 Population:""",
18         )
19     prompt_text = prompt.format(country_name=country_name)
20     print(prompt_text)
21     return llm(prompt_text)
23 print(get_country_information("Canada"))
24 print(get_country_information("Germany"))

You can use the ChatGPT web interface to experiment with prompts and when you find a pattern that works well then write a Python script like the last example, but changing the data you supply in the PromptTemplate instance.

The output of the last example is:

 1  $ python
 3 Processing Canada:
 4 Creating prompt...
 6 Predict the capital and population of a country.
 8 Country: Canada
 9 Capital:
10 Population:
13 Capital: Ottawa
14 Population: 37,058,856 (as of July 2020)
16 Processing Germany:
18 Predict the capital and population of a country.
20 Country: Germany
21 Capital:
22 Population:
25 Capital: Berlin
26 Population: 83,02 million (est. 2019)

Creating Embeddings

We will reference the LangChain embeddings documentation. We can use a Python REPL to see what text to vector space embeddings might look like:

 1 $ python
 2 Python 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] on darwin
 3 Type "help", "copyright", "credits" or "license" for more information.
 4 >>> from langchain_openai import OpenAIEmbeddings
 5 >>> embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
 6 >>> text = "Mary has blond hair and John has brown hair. Mary lives in town and John\
 7  lives in the country."
 8 >>> doc_embeddings = embeddings.embed_documents([text])
 9 >>> doc_embeddings
10 [[0.007727328687906265, 0.0009025644976645708, -0.0033224383369088173, -0.0179449208\
11 0807686, -0.017969949170947075, 0.028506645932793617, -0.013414892368018627, 0.00466
12 76816418766975, -0.0024965214543044567, -0.02662956342101097,
13 ...]]
14 >>> query_embedding = embeddings.embed_query("Does John live in the city?")
15 >>> query_embedding
16 [0.028048301115632057, 0.011499025858938694, -0.00944007933139801, -0.02080961130559\
17 4444, -0.023904507979750633, 0.018750663846731186, -0.01626438833773136, 0.018129095
18 435142517,
19 ...]
20 >>>

Notice that the doc_embeddings is a list where each list element is the embeddings for one input text document. The query_embedding is a single embedding. Please read the above linked embedding documentation.

We will use vector stores to store calculated embeddings for future use. In the next chapter we will see a document database search example using LangChain and Llama-Index.

Using LangChain Vector Stores to Query Documents: a Simple RAG Application

We will reference the LangChain Vector Stores documentation.

We use a utility in file to convert text files in a directory to a list of strings:

 1 import os
 3 # Generated by Perplexity
 5 def read_text_files(directory_path):
 6     """
 7     Reads all .txt files in the specified directory and returns their contents as a \
 8 list of strings.
10     :param directory_path: The path to the directory containing .txt files
11     :return: A list of strings where each string is the content of a .txt file
12     """
13     txt_contents = []
14     # Check if the directory exists
15     if not os.path.isdir(directory_path):
16         print(f"The directory {directory_path} does not exist.")
17         return txt_contents
19     # Iterate over all files in the directory
20     for filename in os.listdir(directory_path):
21         # Check for .txt extension
22         if filename.endswith(".txt"):
23             # Construct full file path
24             file_path = os.path.join(directory_path, filename)
25             # Open and read the contents of the file
26             try:
27                 with open(file_path, 'r') as file:
28                     txt_contents.append(
29             except IOError as e:
30                 print(f"Failed to read file {filename}: {e}")
32     return txt_contents

The example script is

 1 from langchain_community.vectorstores import FAISS
 2 from langchain_core.output_parsers import StrOutputParser
 3 from langchain_core.prompts import ChatPromptTemplate
 4 from langchain_core.runnables import RunnablePassthrough
 5 from langchain_openai import ChatOpenAI, OpenAIEmbeddings
 6 from langchain_community.document_loaders import DirectoryLoader
 8 model = ChatOpenAI()
10 from read_text_files import read_text_files
12 vectorstore = FAISS.from_texts(read_text_files("../data/"), embedding=OpenAIEmbeddin\
13 gs())
15 retriever = vectorstore.as_retriever()
17 template = """Answer the question based only on the following context:
18 {context}
20 Question: {question}
21 """
22 prompt = ChatPromptTemplate.from_template(template)
24 chain = (
25     {"context": retriever, "question": RunnablePassthrough()}
26     | prompt
27     | model
28     | StrOutputParser()
29 )
31 print(chain.invoke("who tried to define what Chemistry is?"))
33 print(chain.invoke("What kinds of equipment are in a chemistry laboratory?"))
34 print(chain.invoke("What is Austrian School of Economics?"))
35 print(chain.invoke("Why do people engage in sports?"))
36 print(chain.invoke("What is the effect of body chemistry on exercise?"))

The output is:

 1 $ python      
 2 Georg Ernst Stahl, Jean-Baptiste Dumas, Linus Pauling, and Professor Raymond Chang t\
 3 ried to define what Chemistry is.
 4 Various forms of laboratory glassware are typically used in a chemistry laboratory. \
 5 However, it is mentioned that a great deal of experimental and applied/industrial ch
 6 emistry can also be done without the use of glassware.
 7 The Austrian School of Economics is a school of economic thought that emphasizes the\
 8  spontaneous organizing power of the price mechanism, advocates a laissez-faire appr
 9 oach to the economy, and holds that commercial transactions should be subject to min
10 imal government intervention.
11 People engage in sports for physical athleticism, physical dexterity, and for leisur\
12 e activities that they find amusing or entertaining.
13 Body chemistry can affect exercise in terms of how energy is transferred from one su\
14 bstance to another, with heat being transferred more easily than other forms of ener
15 gy. Additionally, the body's ability to learn to fight specific infections after exp
16 osure to germs, known as adaptive immunity, can impact exercise performance and reco
17 very.

NOTE: This example is deprecated.

The example shown here is in the directory from_langchain_docs in the source file The relevant LangChain Integrations documentation page is

 1 # make sure SERPER_API_KEY is set in your environment
 3 from langchain_community.utilities import GoogleSerperAPIWrapper
 4 search_helper = GoogleSerperAPIWrapper()
 6 def search(query):
 7     return
 9 print(search("What is the capital of Arizona?"))
10 #print(search("Sedona Arizona?"))

You will need a Server API key form Currently you can get a free key for 2500 API calls. After that the paid tier currently starts at $50 for 50K API calls and these credits must be used within a 6 month period.

OpenAI Model GPT-4o Example

Here we use the new OpenAI model GPT-4o released May 13, 2024. The example file from_langchain_docs/ uses the latest (May 2024) LangChain APIs:

 1 from langchain_core.messages import HumanMessage, SystemMessage
 2 from langchain_openai import ChatOpenAI
 4 llm = ChatOpenAI(model="gpt-4o")
 6 messages = [
 7     SystemMessage(content="You're a helpful assistant"),
 8     HumanMessage(content="What is the purpose of model regularization? Be concise."),
 9 ]
11 results = llm.invoke(messages)
12 print(results.content)
13 print("\n")
14 print(results)

The output is:

 1 $ python 
 2 The purpose of model regularization is to prevent overfitting by adding a penalty fo\
 3 r larger coefficients in the model, thereby improving its generalization to new, uns
 4 een data.
 6 content='The purpose of model regularization is to prevent overfitting by adding a p\
 7 enalty for larger coefficients in the model, thereby improving its generalization to
 8  new, unseen data.' response_metadata={'token_usage': {'completion_tokens': 34, 'pro
 9 mpt_tokens': 27, 'total_tokens': 61}, 'model_name': 'gpt-4o', 'system_fingerprint': 
10 'fp_729ea513f7', 'finish_reason': 'stop', 'logprobs': None} id='run-f2fa3e57-eeff-44
11 96-b6e4-d06af8f230d0-0'

This new (as of May 2024) model is half the cost and much lower API call latency than the previous GPT-4 API.

LangChain Overview Wrap Up

We will continue using LangChain for the rest of this book as well as the LlamaIndex library that we introduce in the next chapter.

I cover just the subset of LangChain that I use in my own projects in this book. I urge you to read the LangChain documentation and to explore public LangChain chains that users have written on Langchain-hub.

Overview of LlamaIndex

The popular LlamaIndex project used to be called GPT-Index but has been generalized to work with many LLM models like GPT-4, Hugging Face, Anthropic, local models run using Ollama, and many other models.

LlamaIndex is a project that provides a central interface to connect your language models with external data. It was created by Jerry Liu and his team in the fall of 2022. It consists of a set of data structures designed to make it easier to use large external knowledge bases with language models. Some of its uses are:

  • Querying structured data such as tables or databases using natural language
  • Retrieving relevant facts or information from large text corpora
  • Enhancing language models with domain-specific knowledge

LlamaIndex supports a variety of document types, including:

  • Text documents are the most common type of document. They can be stored in a variety of formats, such as .txt, .doc, and .pdf.
  • XML documents are a type of text document that is used to store data in a structured format.
  • JSON documents are a type of text document that is used to store data in a lightweight format.
  • HTML documents are a type of text document that is used to create web pages.
  • PDF documents are a type of text document that is used to store documents in a fixed format.

LlamaIndex can also index data that is stored in a variety of databases, including:

  • SQL databases such as MySQL, PostgreSQL, and Oracle. NoSQL databases such as MongoDB, Cassandra, and CouchDB.
  • Solr is a popular open-source search engine that provides high performance and scalability.
  • Elasticsearch is another popular open-source search engine that offers a variety of features, including full-text search, geospatial search, and machine learning.
  • Apache Cassandra is a NoSQL database that can be used to store large amounts of data.
  • MongoDB is another NoSQL database that is easy to use and scale.
  • PostgreSQL is a relational database that is widely used in enterprise applications.

LlamaIndex is a flexible framework that can be used to index a variety of document types and data sources.

Compared to LangChain, LlamaIndex presents a focused advantage in the realm of indexing and retrieval tasks, making it a highly efficient choice for applications that prioritize these functions. Its design is tailored specifically for the efficient ingestion, structuring, and accessing of private or domain-specific data, which is crucial for applications that rely heavily on the quick retrieval of accurate and relevant information from large datasets. The streamlined interface of LlamaIndex simplifies the process of connecting custom data sources to large language models (LLMs), thereby reducing the complexity and development time for search-centric applications. This focus on indexing and retrieval, as highlighted in the search results, leads to increased speed and accuracy in search and summarization tasks, setting LlamaIndex apart as the go-to framework for developers building intelligent search tools.

Another significant advantage of LlamaIndex is its integration capabilities with a wide array of tools and services, which enhances the functionality and versatility of LLM-powered applications. The framework’s ability to merge with vector stores like Pinecone and Milvus facilitates efficient document search and retrieval. Additionally, its compatibility with tracing tools such as Graphsignal offers insights into LLM-powered application operations, while integration with application frameworks like LangChain and Streamlit enables easier building and deployment. These integrations extend to data loaders, agent tools, and observability tools, thus enhancing the capabilities of data agents and offering various structured output formats to facilitate the consumption of application results. This extensive integration ecosystem empowers developers to create powerful, versatile applications with minimal effort.

Lastly, LlamaIndex’s specialized focus on indexing and retrieval is complemented by its simplicity and ease of use, making it an attractive option for developers seeking to build efficient and straightforward search experiences. The framework’s optimization for these specific tasks, in comparison to more general-purpose frameworks like LangChain, results in a tool that is not only more efficient for search and retrieval applications but also easier to learn and implement. This simplicity is particularly beneficial for projects with tight deadlines or for developers new to working with LLMs, as it allows for the quick deployment of high-performance applications without the need for extensive customization or complex setup processes.

We will look at a short example derived from the LlamaIndex documentation.

Using LlamaIndex for Question Answering from a Web Site

In this example we use the trafilatura and html2text libraries to get text from a web page that we will index and search. The class TrafilaturaWebReader does the work of creating local documents from a list of web page URIs and the index class VectorStoreIndex builds a local index for use with OpenAI API calls to implement search.

1  pip install -U trafilatura html2text

The following listing shows the file

 1 # Derived from examples in llama_index documentation
 3 # pip install llama-index html2text trafilatura
 5 from pprint import pprint
 6 from llama_index.core import Document
 7 import trafilatura
 9 from llama_index.core import VectorStoreIndex
11 def query_website(url, *questions):
12     downloaded = trafilatura.fetch_url(url)
13     text = trafilatura.extract(downloaded)
14     #print(text)
15     list_of_documents = [Document(text=text)]
16     index = VectorStoreIndex.from_documents(list_of_documents)   #.from_texts([text])
17     engine = index.as_query_engine()
18     for question in questions:
19         print(f"\n== QUESTION: {question}\n")
20         response = engine.query(question)
21         print(f"== RESPONSE: {response}")
23 if __name__ == "__main__":
24   url = ""
25   query_website(url, "What instruments does Mark play?",
26                      "How many books has Mark written?")

This example is not efficient because we create a new index for each web page we want to search. That said, this example (that was derived from an example in the LlamaIndex documentation) implements a pattern that you can use, for example, to build a reusable index of your company’s web site and build an end-user web search app.

The output for these three test questions in the last code example is:

1  $ python
3 == QUESTION: What instruments does Mark play?
5 == RESPONSE: Mark plays the guitar, didgeridoo, and American Indian flute.
7 == QUESTION: How many books has Mark written?
9 == RESPONSE: Mark has written 9 books.

Note that the answer to the second question is strictly incorrect since it counted the books mentioned in the text. It did this correctly. However, the Trafilatura library skipped the text in the header block of my web site that said I have written over 20 books. This inaccuracy if from my use of the Trafilatura library.

LlamaIndex/GPT-Index Case Study Wrap Up

LlamaIndex is a set of data structures and library code designed to make it easier to use large external knowledge bases such as Wikipedia. LlamaIndex creates a vectorized index from your document data, making it highly efficient to query. It then uses this index to identify the most relevant sections of the document based on the query.

LlamaIndex is useful because it provides a central interface to connect your LLM’s with external data and offers data connectors to your existing data sources and data formats (API’s, PDF’s, docs, SQL, etc.). It provides a simple, flexible interface between your external data and LLMs.

Some projects that use LlamaIndex include building personal assistants with LlamaIndex and GPT-4, using LlamaIndex for document retrieval, and combining answers across documents.

Extraction of Facts and Relationships from Text Data

Traditional methods for extracting email addresses, names, addresses, etc. from text included the use of hand-crafted regular expressions and custom software. LLMs are text processing engines with knowledge of grammar, sentence structure, and some real world embedded knowledge. Using LLMs can reduce the development time of information extraction systems.

Key Capabilities of LLMs for Fact and Relationship Extraction

  • Named Entity Recognition (NER): LLMs excel at identifying and classifying named entities within text. This includes pinpointing people, organizations, locations, dates, quantities, etc. NER forms the basis of any fact extraction process, as entities are the core elements around which facts are organized.
  • Relationship Extraction (RE): LLMs are adept at understanding the grammatical structure of sentences and the contextual meaning of words. This enables them to identify relationships between the entities they’ve identified, such as: Employment relationships (“Jane Smith works for Microsoft”) Ownership (“Apple acquired Beats Electronics”) and Location-based relationships (“The Louvre Museum is located in Paris”)
  • Semantic Understanding: LLMs possess a deep understanding of language semantics. This allows them to go beyond simple keyword matching and grasp the nuances and implicit meanings within text, leading to more accurate fact extraction.
  • Knowledge Base Augmentation: Pre-trained LLMs draw on their vast knowledge bases (from being trained on massive text datasets) to fill in gaps when text is incomplete and support the disambiguation of entities or relationships.

Techniques and Approaches

  • Fine-tuned Question Answering: LLMs can be fine-tuned to directly answer factual questions posed based on a text. For example, given a news article and the question, “When did the event occur?”, the LLM can pin down the relevant date within the text.
  • Knowledge Graph Construction: LLMs play a crucial role in automatically constructing knowledge graphs. These graphs are structured representations of facts and relationships extracted from text. LLMs identify the entities, relationships, and help enrich the graphs with relevant attributes.
  • Zero-shot or Few-shot Learning: Advanced LLMs can extract certain facts and relationships with minimal or no additional training examples. This is especially valuable in scenarios where manually labelled data is scarce or time-consuming to create.


  • Accuracy: LLMs often surpass traditional rule-based systems in accuracy, particularly when working with complex or varied text formats.
  • Scalability: LLMs can process vast amounts of text data to efficiently extract facts and relationships, enabling the analysis of large-scale datasets.
  • Time-saving: The ability of LLMs to adapt and learn reduces the need for extensive manual rule creation or feature engineering, leading to faster development of fact extraction systems.


  • Financial Analysis: Identifying key facts and relationships within financial reports and news articles to support investment decisions.
  • Legal Research: Extracting relevant clauses, case law, and legal relationships from complex legal documents.
  • Scientific Literature Analysis: Building databases of scientific findings and discoveries by extracting relationships and networks from research papers.
  • Customer Support: Analyzing customer feedback and queries to understand product issues, sentiment, and commonly reported problems.

Example Prompts for Getting Information About a Person from Text and Generating JSON

Before using LLMs directly in application code I like to experiment with prompts. Here we will use a two-shot approach of providing as context two examples of text and the extracted JSON data, followed by text we want to process. Consider the following that I ran on my old M1 8G MacBook:

 1 Given the two examples below, extract the names, addresses, and email addresses of i\
 2 ndividuals mentioned later as Process Text. Format the extracted information in JSON
 3 , with keys for "name", "address", and "email". If any information is missing, use "
 4 null" for that field.
 6 Example 1:
 7 Text: "John Doe lives at 1234 Maple Street, Springfield. His email is johndoe@exampl\
 9 Output: 
10 {
11   "name": "John Doe",
12   "address": "1234 Maple Street, Springfield",
13   "email": ""
14 }
16 Example 2:
17 Text: "Jane Smith has recently moved to 5678 Oak Avenue, Anytown. She hasn't updated\
18  her email yet."
19 Output: 
20 {
21   "name": "Jane Smith",
22   "address": "5678 Oak Avenue, Anytown",
23   "email": null
24 }
26 Process Text: "Mark Johnson enjoys living in Berkeley California at 102 Dunston Stre\
27 et and use for contacting him."
28 Output:

This prompt is in the file prompt_examples/two-shot-2.txt.

The output can be overly verbose:

 1 $ ollama run llama3:instruct < two-shot-2.txt
 2 Here is the extracted information in JSON format:
 4 {
 5 "name": "Mark Johnson",
 6 "address": "102 Dunston Street, Berkeley California",
 7 "email": ""
 8 }
10 Note that I used the address format from Example 1, which combines the street 
11 address with the city and state. If you want to separate these fields into different
12 keys (e.g., "street", "city", "state"), let me know!

While the comments the llama3-8b-instruct model makes are interesting, let’s modify the prompt to ask for concise output that only includes the generated JSON:

1 Given the two examples below, extract the names, addresses, and email addresses of i\
2 ndividuals mentioned later as Process Text. Format the extracted information in JSON
3 , with keys for "name", "address", and "email". If any information is missing, use "
4 null" for that field. Be concise in your output by providing only the output JSON.

The rest of the prompt is unchanged, now the output is:

1 $ ollama run llama3:instruct < two-shot-2.txt
2 {
3   "name": "Mark Johnson",
4   "address": "102 Dunston Street, Berkeley California",
5   "email": ""
6 }

Example Code

To use this example we would use the same prompt except we would make the Process Text a variable that is replaced before processing by an LLM. We copy the file two-shot-2.txt to two-shot-2-var.txt and change the second to the last line in the file:

1 Process Text: "{input_text}"

Now let’s wrap these ideas up in a short Python example in the file extraction/

 1 import openai
 2 from openai import OpenAI
 3 import os
 5 openai.api_key = os.getenv("OPENAI_API_KEY")
 6 client = OpenAI()
 8 # Read the prompt from a text file
 9 with open('prompt.txt', 'r') as file:
10     prompt_template =
12 # Substitute a string variable into the prompt
13 input_text = "Mark Johnson enjoys living in Berkeley California at 102 Dunston Stree\
14 t and use for contacting him."
15 prompt = prompt_template.replace("input_text", input_text)
17 # Use the OpenAI completion API to generate a response with GPT-4
18 completion =
19     model="gpt-4",
20     messages=[
21         {
22             "role": "user",
23             "content": prompt,
24         },
25     ],
26 )
28 print(completion.choices[0].message.content)

The output looks like:

1 $ python
2 {
3   "name": "Mark Johnson",
4   "address": "102 Dunston Street, Berkeley California",
5   "email": ""
6 }

For reference, the complete completion object looks like this:

1 ChatCompletion(id='chatcmpl-9LBZao4hFMmw7VrYcRbQIR2EGzvCj', choices=[Choice(finish_r\
2 eason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='{\n  "n
3 ame": "Mark Johnson",\n  "address": "102 Dunston Street, Berkeley California",\n  "e
4 mail": ""\n}', role='assistant', function_call=None, tool_calls=None
5 ))], created=1714836402, model='gpt-4-0613', object='chat.completion', system_finger
6 print=None, usage=CompletionUsage(completion_tokens=34, prompt_tokens=223, total_tok
7 ens=257))

Using LLMs to Summarize Text

LLMs bring a new level of ability to text summarization tasks. With their ability to process massive amounts of information and “understand” natural language, they’re able to capture the essence of lengthy documents and distill them into concise summaries. Two main types of summarization dominate with LLMs: extractive and abstractive. Extractive summarization pinpoints the most important sentences within the original text, while abstractive summarization requires the LLM to paraphrase or generate new text to represent the core ideas. If you are interested in extractive summarization there is a chapter on this topic in my Common Lisp AI book (link to read online).

LLMs excel in text summarization for several reasons. Their deep understanding of language semantics allows them to identify key themes, even when wording varies across a document. Additionally, they have an ability for maintaining logical consistency within summaries, ensuring that the condensed version makes sense as a cohesive unit. Modern LLMs are also trained on massive datasets encompassing diverse writing styles, helping them adapt to different sources and generate summaries tailored to specific audiences.

The applications of LLM-powered text summarization are vast. They can help researchers digest lengthy scientific reports quickly, allow businesses to analyze customer feedback efficiently, or provide concise news briefs for busy individuals. LLM-based summarization also has the potential to improve accessibility, creating summaries for those with reading difficulties or summarizing complex information into simpler language. As LLM technology continues to advance, we can expect even more innovative and accurate summarization tools in the future.

Example Prompt

In this example, the prompt is simply:

1 Summarize the following text: "{input_text}"
2 Output:

Code Example

The example in file summarization/ reads a prompt file and substitutes the text from the test file ../data/economics.txt:

 1 import openai
 2 from openai import OpenAI
 3 import os
 5 openai.api_key = os.getenv("OPENAI_API_KEY")
 6 client = OpenAI()
 8 # Read the prompt from a text file
 9 with open('prompt.txt', 'r') as file:
10     prompt_template =
12 # Substitute a string variable into the prompt
13 with open('../data/economics.txt', 'r') as file:
14     input_text =
15 prompt = prompt_template.replace("input_text", input_text)
17 # Use the OpenAI completion API to generate a response with GPT-4
18 completion =
19     model="gpt-4",
20     messages=[
21         {
22             "role": "user",
23             "content": prompt,
24         },
25     ],
26 )
28 print(completion.choices[0].message.content)

The length of the output summary is about 20% of the length of the original text.

LLM Techniques for Structured Data Conversion

Here we look at a simple example of converting CSV spreadsheet files to JSON but the idea of data conversion using LLMs is general purpose.

Using LLMs helps handle ambiguity. Traditional Symbolic AI methods often struggle with the nuance of human language. LLMs, with their understanding of context, can resolve ambiguity and provide more accurate extraction.

LLMs are also effective at handling complex or previously unseen formats (one shot). LLMs are trained on vast amounts of diverse text data, making them more adaptable to unexpected variations in data formats than rule-based approaches.

Using LLMs for application development can reduce manual effort by automating many parts of the conversion process that traditionally required significant human intervention and the creation of detailed extraction rules.

Example Prompt for Converting CSV Files to JSON

In the prompt we supply a few examples for converting between these two formats:

 1 Given the example below, convert a CSV spreadsheet text file to a JSON text file:
 3 Example:
 4 CSV:
 5 name,address, email
 6 John Doe, 1234 Maple Street, Springfield,
 7 "Jane Smith", "5678 Oak Avenue, Anytown",
 8 Output: 
 9 {
10   "name": "John Doe",
11   "address": "1234 Maple Street, Springfield",
12   "email": ""
13 }
14 {
15   "name": "Jane Smith",
16   "address": "5678 Oak Avenue, Anytown",
17   "email": null
18 }
20 Process Text: "{input_csv}"
21 Output:

Example Code for Converting CSV Files to JSON

The example in file structured_data_conversion/ reads the prompt template file and substitutes the CSV data from the test file test.csv. The modified prompt is passed to the OpenAI completion API:

 1 import openai
 2 from openai import OpenAI
 3 import os
 5 openai.api_key = os.getenv("OPENAI_API_KEY")
 6 client = OpenAI()
 8 # Read the prompt from a text file
 9 with open('prompt.txt', 'r') as file:
10     prompt_template =
12 # Substitute a string variable into the prompt
13 with open('test.csv', 'r') as file:
14     input_csv =
15 prompt = prompt_template.replace("input_csv", input_csv)
17 # Use the OpenAI completion API to generate a response with GPT-4
18 completion =
19     model="gpt-4",
20     messages=[
21         {
22             "role": "user",
23             "content": prompt,
24         },
25     ],
26 )
28 print(completion.choices[0].message.content)

Here is the test CSV input file:

1 last_name,first_name,email
2 "Jackson",Michael,
3 Jordan,Michael,""
4 Smith, John,

Notice that this file is not consistent in quoting strings, hopefully making this a more general example of data you might see in the wild. The generated JSON looks like:

 1 {
 2   "last_name": "Jackson",
 3   "first_name": "Michael",
 4   "email": ""
 5 }
 6 {
 7   "last_name": "Jordan",
 8   "first_name": "Michael",
 9   "email": ""
10 }
11 {
12   "last_name": "Smith",
13   "first_name": "John",
14   "email": ""
15 }

Retrieval Augmented Generation (RAG) Applications

Note to readers: this chapter was added May 2024 and supersedes some of the older material in other chapters on indexing and chatting about data from local text and PDF documents as well as data from web sites.

Retrieval Augmented Generation (RAG) Applications work by pre-processing a user’s query to search for indexed document fragments that are semantically similar to the user’s query. These fragments are concatenated together as context text that is attached to the user’s query and then passed of to a LLM model. The LLM can preferentially use information in this context text as well as innate knowledge stored in the LLM to process user queries.

RAG System Overview
Figure 2. RAG System Overview

Simple RAG Example Using LlamaIndex

We will start with an example that only uses a vector store to find documents similar to the text in a query. Here is a listing of rag/

The code imports VectorStoreIndex and Document from the llama_index.core module. It then defines a list of strings, each describing aspects of LlamaIndex, and converts these strings into Document objects. These documents are then used to create an index using VectorStoreIndex.from_documents(documents), which builds an index from the provided documents. This index is capable of understanding and storing the text data in a structured form that can be efficiently queried.

Following the index creation, the code initializes a query engine with index.as_query_engine(), which allows for querying the indexed data. The query “What is LlamaIndex?” is passed to the retrieve method of the query engine. This method processes the query against the indexed documents to find relevant information. The results are then printed. This demonstrates a basic use case of LlamaIndex for text retrieval where the system identifies and retrieves information directly related to the user’s query from the indexed data. In practice we usually combine the use of vector stores with LLM chat models, as we do in later examples.

 1 from llama_index.core import VectorStoreIndex
 2 from llama_index.core import Document
 4 text_list = ["LlamaIndex is a powerful tool for LLM applications.",
 5              "It helps in structuring and retrieving data efficiently."]
 6 documents = [Document(text=t) for t in text_list]
 8 index = VectorStoreIndex.from_documents(documents)
 9 query_engine = index.as_query_engine()
11 retrieved_docs = query_engine.retrieve("What is LlamaIndex?")
12 print(retrieved_docs)

The output looks like:

 1  $ python
 2 [NodeWithScore(node=TextNode(id_='504482a2-e633-4d08-9bbe-363b42e67cde', embedding=N\
 3 one, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], re
 4 lationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='876b1b31-116e-
 5 4c31-a65d-1d9ae3bbe6a4', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='7d
 6 d73651a571de869137f2fc94f74d158ecbd7901ed81576ad7172de96394464')}, text='LlamaIndex 
 7 is a powerful tool for LLM applications.', start_char_idx=0, end_char_idx=51, text_t
 8 emplate='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_
 9 seperator='\n'), score=0.9153632840140399), NodeWithScore(node=TextNode(id_='694d071
10 9-b6cc-4053-9c29-2cba0fc1f564', embedding=None, metadata={}, excluded_embed_metadata
11 _keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1
12 '>: RelatedNodeInfo(node_id='b9411d72-f945-4553-8cb4-5b55754e9b56', node_type=<Objec
13 tType.DOCUMENT: '4'>, metadata={}, hash='2813e9c7d682d7dec242363505166e9696fbf7a4958
14 ec6e1c2ed8e47389f70c6')}, text='It helps in structuring and retrieving data efficien
15 tly.', start_char_idx=0, end_char_idx=56, text_template='{metadata_str}\n\n{content}
16 ', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.74424191545
17 64265)]

I almost never use just a vector index by itself. The next example is in the file

This code demonstrates the use of the LlamaIndex library to process, index, and query text data, specifically focusing on extracting information about sports from a dataset. The code imports necessary classes from the llama_index library. VectorStoreIndex is used for creating a searchable index of documents.

SimpleDirectoryReader reads documents from a specified directory. OpenAIEmbedding is used to convert text into numerical embeddings using OpenAI’s models. SentenceSplitter and TitleExtractor are used for preprocessing the text by splitting it into sentences and extracting titles, respectively. The ingestion pipeline configuration object IngestionPipeline is configured with three transformations:

  • SentenceSplitter breaks the text into smaller chunks or sentences with specified chunk_size and chunk_overlap. This helps in managing the granularity of text processing.
  • TitleExtractor pulls out titles from the text, which can be useful for summarizing or categorizing documents.
  • OpenAIEmbedding converts the processed text into vector embeddings. These embeddings represent the text in a high-dimensional space, capturing semantic meanings which are crucial for effective searching and querying.

We use SimpleDirectoryReader to convert the text files in a directory to a list or document objects:

1 documents = SimpleDirectoryReader("../data_small").load_data()

As in the last example, we create a vector store object fro the list of document objects:

1 index = VectorStoreIndex.from_documents(documents)

The index is built using the vector embeddings generated by OpenAIEmbedding, allowing for semantic search capabilities.

The statement

1 query_engine = index.as_query_engine()

sets up a query engine from the index. This engine can process queries to find relevant documents based on their semantic content.

Finally we are ready to make a test query:

1 response = query_engine.query("List a few sports")

The engine searches through the indexed documents and retrieves information that semantically matches the query about sports.

Here is a complete code listing for this example:

 1 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
 2 from llama_index.embeddings.openai import OpenAIEmbedding
 3 from llama_index.core.node_parser import SentenceSplitter
 4 from llama_index.core.extractors import TitleExtractor
 5 from llama_index.core.ingestion import IngestionPipeline
 7 pipeline = IngestionPipeline(
 8     transformations=[
 9         SentenceSplitter(chunk_size=25, chunk_overlap=0),
10         TitleExtractor(),
11         OpenAIEmbedding(),
12     ]
13 )
15 documents = SimpleDirectoryReader("../data_small").load_data()
16 index = VectorStoreIndex.from_documents(documents)
18 query_engine = index.as_query_engine()
19 response = query_engine.query("List a few sports")
21 print(response)

The output looks like:

1 $ python
2 Football, basketball, tennis, swimming, and cycling.

RAG With Reranking Example

Reranking in the context of Retrieval-Augmented Generation (RAG) for LLMs refers to a process of adjusting the order of documents retrieved by an initial search query to improve the relevance and quality of the results before they are used for generating responses. This step is crucial because the initial retrieval might fetch a broad set of documents, not all of which are equally relevant to the user’s query.

The primary function of reranking is to refine the selection of documents based on their actual relevance to the expanded query. This is typically achieved by employing more sophisticated models that can better understand the context and nuances of the query and the documents. For instance, cross-encoder models are commonly used in reranking due to their ability to process the query and document simultaneously, providing a more accurate evaluation of relevance.

Just as LLMs have the flexibility to handle a broad range of text topics, different programming languages, etc., reranking mechanisms have the flexibility to handle a wide range of source information as well as a wide range of query types.

For your RAG applications, you should notice a reduction of noise text and irrelevance in RAG system responses to user queries.

The code example is very similar to the last example but we now add a reranker as a query engine postprocessor:

 1 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
 2 from llama_index.embeddings.openai import OpenAIEmbedding
 3 from llama_index.core.node_parser import SentenceSplitter
 4 from llama_index.core.extractors import TitleExtractor
 5 from llama_index.core.ingestion import IngestionPipeline
 6 from llama_index.core.postprocessor import SentenceTransformerRerank
 8 # Set up the ingestion pipeline with transformations
 9 pipeline = IngestionPipeline(
10     transformations=[
11         SentenceSplitter(chunk_size=25, chunk_overlap=0),
12         TitleExtractor(),
13         OpenAIEmbedding(),
14     ]
15 )
17 # Load documents using a directory reader
18 documents = SimpleDirectoryReader("../data_small").load_data()
20 # Create an index from the documents
21 index = VectorStoreIndex.from_documents(documents)
23 # Initialize the reranker with a specific model
24 reranker = SentenceTransformerRerank(
25     model="cross-encoder/ms-marco-MiniLM-L-12-v2",  # Example model, adjust as needed
26     top_n=3  # Adjust the number of top documents to rerank
27 )
29 # Set up the query engine with the reranker as a postprocessor
30 query_engine = index.as_query_engine(
31     similarity_top_k=10,  # Set for how many results to retrieve before reranking
32     node_postprocessors=[reranker]  # Add the reranker to the postprocessing steps
33 )
35 # Perform a query
36 response = query_engine.query("List a few sports")
38 # Print the response
39 print(response)

The output is similar to before:

1 List a few sports: basketball, soccer, tennis, swimming, and cycling.

Let’s try a more complex query and instead of just using the document directory ../data_small that only contains information about sports, we will now use the text documents in ../data that cover more general topics. We make two code changes. First we use a different document directory:

1 documents = SimpleDirectoryReader("../data").load_data()

We also change the query:

1 response = query_engine.query("Compare sports with the study of health issues")

The output looks like:

1 Sports are activities based on physical athleticism or dexterity, often governed by \
2 rules to ensure fair competition. On the other hand, the study of health issues invo
3 lves analyzing the production, distribution, and consumption of goods and services r
4 elated to health and well-being. While sports focus on physical activities and exerc
5 ise for leisure or competition, the study of health issues delves into understanding
6  and addressing various aspects of physical and mental well-being, including disease
7 s, treatments, and preventive measures.

RAG on CSV Spreadsheet Files


Using Google’s Knowledge Graph APIs With LangChain

Google’s Knowledge Graph (KG) is a knowledge base that Google uses to serve relevant information in an info-box beside its search results. It allows the user to see the answer in a glance, as an instant answer. The data is generated automatically from a variety of sources, covering places, people, businesses, and more. I worked at Google in 2013 on a project that used their KG for an internal project.

Google’s public Knowledge Graph Search API lets you find entities in the Google Knowledge Graph. The API uses standard types and is compliant with the JSON-LD specification. It supports entity search and lookup.

You can use the Knowledge Graph Search API to build applications that make use of Google’s Knowledge Graph. For example, you can use the API to build a search engine that returns results based on the entities in the Knowledge Graph.

In the next chapter we also use the public KGs DBPedia and Wikidata. One limitation of Google’s KG APIs is that it is designed for entity (people, places, organizations, etc.) lookup. When using DBPedia and Wikidata it is possible to find a wider range of information using the SPARQL query language, such as relationships between entities. You can use the Google KG APIs to find some entity relationships, e.g., all the movies directed by a particular director, or all the books written by a particular author. You can also use the API to find information like all the people who have worked on a particular movie, or all the actors who have appeared in a particular TV show.

Setting Up To Access Google Knowledge Graph APIs

To get an API key for Google’s Knowledge Graph Search API, you need to go to the Google API Console, enable the Google Knowledge Graph Search API, and create an API key to use in your project. You can then use this API key to make requests to the Knowledge Graph Search API.

To create your application’s API key, follow these steps:

  • Go to the API Console.
  • From the projects list, select a project or create a new one.
  • If the APIs & services page isn’t already open, open the left side menu and select APIs & services.
  • On the left, choose Credentials.
  • Click Create credentials and then select API key.

You can then use this API key to make requests to the Knowledge Graph Search APIs.

When I use Google’s APIs I set the access key in ~/.google_api_key and read in the key using:

1 api_key=open(str(Path.home())+"/.google_api_key").read()

You can also use environment variables to store access keys. Here is a code snippet for making an API call to get information about me:

 1 import json
 2 from urllib.parse import urlencode
 3 from urllib.request import urlopen
 4 from pathlib import Path
 5 from pprint import pprint
 7 api_key =
 8     open(str(Path.home()) + "/.google_api_key").read()
 9 query = "Mark Louis Watson"
10 service_url =
11     ""
12 params = {
13     "query": query,
14     "limit": 10,
15     "indent": True,
16     "key": api_key,
17 }
18 url = service_url + "?" + urlencode(params)
19 response = json.loads(urlopen(url).read())
20 pprint(response)

The JSON-LD output would look like:

 1 {'@context': {'@vocab': '',
 2               'EntitySearchResult':
 3               'goog:EntitySearchResult',
 4               'detailedDescription':
 5               'goog:detailedDescription',
 6               'goog': '',
 7               'kg': '',
 8               'resultScore': 'goog:resultScore'},
 9  '@type': 'ItemList',
10  'itemListElement': [{'@type': 'EntitySearchResult',
11                       'result': {'@id': 'kg:/m/0b6_g82',
12                                  '@type': ['Thing',
13                                            'Person'],
14                                  'description': 'Author',
15                                  'name':
16                                  'Mark Louis Watson',
17                                  'url':
18                                  ''},
19                       'resultScore': 43}]}

In order to not repeat the code for getting entity information from the Google KG, I wrote a utility that encapsulates the previous code and generalizes it into a mini-library:

 1 """Client for calling Knowledge Graph Search API."""
 3 import json
 4 from urllib.parse import urlencode
 5 from urllib.request import urlopen
 6 from pathlib import Path
 7 from pprint import pprint
 9 api_key =
10     open(str(Path.home()) + "/.google_api_key").read()
12 # use Google search API to get information
13 # about a named entity:
15 def get_entity_info(entity_name):
16     service_url =
17       ""
18     params = {
19         "query": entity_name,
20         "limit": 1,
21         "indent": True,
22         "key": api_key,
23     }
24     url = service_url + "?" + urlencode(params)
25     response = json.loads(urlopen(url).read())
26     return response
28 def tree_traverse(a_dict):
29     ret = []
30     def recur(dict_2, a_list):
31         if isinstance(dict_2, dict):
32             for key, value in dict_2.items():
33                 if key in ['name', 'description',
34                            'articleBody']:
35                     a_list += [value]
36                 recur(value, a_list)
37         if isinstance(dict_2, list):
38             for x in dict_2:
39                 recur(x, a_list)
40     recur(a_dict, ret)
41     return ret
44 def get_context_text(entity_name):
45     json_data = get_entity_info(entity_name)
46     return ' '.join(tree_traverse(json_data))
48 if __name__ == "__main__":
49     get_context_text("Bill Clinton")

The main test script is in the file

 1 """Example of Python client calling the
 2    Knowledge Graph Search API."""
 4 from llama_index.core.schema import Document
 5 from llama_index.core import VectorStoreIndex
 6 import Google_KG_helper
 8 def kg_search(entity_name, *questions):
 9     ret = ""
10     context_text = Google_KG_helper.get_context_text(entity_name)
11     print(f"Context text: {context_text}")
12     doc = Document(text=context_text)
13     index = VectorStoreIndex.from_documents([doc])
14     for question in questions:
15         response = index.as_query_engine().query(question)
16         ret += f"QUESTION:  {question}\nRESPONSE: {response}\n"
17     return ret
19 if __name__ == "__main__":
20     s = kg_search("Bill Clinton",
21                   "When was Bill president?")
22     print(s)

The example output is:

1 $ python
2 Context text: William Jefferson Clinton is an American politician who served as the \
3 42nd president of the United States from 1993 to 2001. A member of the Democratic Pa
4 rty, he previously served as Governor of Arkansas from 1979 to 1981 and again from 1
5 983 to 1992.  42nd U.S. President Bill Clinton
6 QUESTION:  When was Bill president?
7 RESPONSE: Bill Clinton was president from 1993 to 2001.

Accessing Knowledge Graphs from Google, DBPedia, and Wikidata allows you to integrate real world facts and knowledge with your applications. While I mostly work in the field of deep learning I frequently also use Knowledge Graphs in my work and in my personal research. I think that you, dear reader, might find accessing highly structured data in KGs to be more reliable and in many cases simpler than using web scraping.

Using DBPedia and WikiData as Knowledge Sources

Both DBPedia and Wikidata are public Knowledge Graphs (KGs) that store data as Resource Description Framework (RDF) and are accessed through the SPARQL Query Language for RDF. The examples for this project are in the GitHub repository for this book in the directory kg_search.

I am not going to spend much time here discussing RDF and SPARQL. Instead I ask you to read online the introductory chapter Linked Data, the Semantic Web, and Knowledge Graphs in my book A Lisp Programmer Living in Python-Land: The Hy Programming Language.

As we saw in the last chapter, a Knowledge Graph (that I often abbreviate as KG) is a graph database using a schema to define types (both objects and relationships between objects) and properties that link property values to objects. The term “Knowledge Graph” is both a general term and also sometimes refers to the specific Knowledge Graph used at Google which I worked with while working there in 2013. Here, we use KG to reference the general technology of storing knowledge in graph databases.

DBPedia and Wikidata are similar, with some important differences. Here is a summary of some similarities and differences between DBPedia and Wikidata:

  • Both projects aim to provide structured data from Wikipedia in various formats and languages. Wikidata also has data from other sources so it contains more data and more languages.
  • Both projects use RDF as a common data model and SPARQL as a query language.
  • DBPedia extracts data from the infoboxes in Wikipedia articles, while Wikidata collects data entered through its interfaces by both users and automated bots.
  • Wikidata requires sources for its data, while DBPedia does not.
  • DBpedia is more popular in the Semantic Web and Linked Open Data communities, while Wikidata is more integrated with Wikimedia projects.

To the last point: I personally prefer DBPedia when experimenting with the semantic web and linked data, mostly because DBPedia URIs are human readable while Wikidata URIs are abstract. The following URIs represent the town I live in, Sedona Arizona:

  • DBPedia:,_Arizona
  • Wikidata:

In RDF we enclose URIs in angle brackets like <>.

If you read the chapter on RDF and SPARQL in my book link that I mentioned previously, then you know that RDF data is represented by triples where each part is named:

  • subject
  • property
  • object

We will look at two similar examples in this chapter, one using DBPedia and one using Wikidata. Both services have SPARQL endpoint web applications that you will want to use for exploring both KGs. We will look at the DBPedia web interface later. Here is the Wikidata web interface:

In this SPARQL query the prefix wd: stands for Wikidata data while the prefix wdt: stands for Wikidata type (or property). The prefix rdfs: stands for RDF Schema.

Using DBPedia as a Data Source

DBpedia is a community-driven project that extracts structured content from Wikipedia and makes it available on the web as a Knowledge Graph (KG). The KG is a valuable resource for researchers and developers who need to access structured data from Wikipedia. With the use of SPARQL queries to DBpedia as a data source we can write a variety applications, including natural language processing, machine learning, and data analytics. We demonstrate the effectiveness of DBpedia as a data source by presenting several examples that illustrate its use in real-world applications. In my experience, DBpedia is a valuable resource for researchers and developers who need to access structured data from Wikipedia.

In general you will start projects using DBPedia by exploring available data using the web app that can be seen in this screen shot:

The following listing of file shows Python code for making a SPARQL query to DBPedia and saving the results as RDF triples in NT format in a local text file:

 1 from SPARQLWrapper import SPARQLWrapper
 2 from rdflib import Graph
 4 sparql = SPARQLWrapper("")
 5 sparql.setQuery("""
 6     PREFIX dbpedia-owl: <>
 7     PREFIX dbpedia: <>
 8     PREFIX dbpprop: <>
10     CONSTRUCT {
11         ?city dbpedia-owl:country ?country .
12         ?city rdfs:label ?citylabel .
13         ?country rdfs:label ?countrylabel .
14         <> rdfs:label "country"@en .
15     }
16     WHERE {
17         ?city rdf:type dbpedia-owl:City .
18         ?city rdfs:label ?citylabel .
19         ?city dbpedia-owl:country ?country .
20         ?country rdfs:label ?countrylabel .
21         FILTER (lang(?citylabel) = 'en')
22         FILTER (lang(?countrylabel) = 'en')
23     }
24     LIMIT 50
25 """)
26 sparql.setReturnFormat("rdf")
27 results = sparql.query().convert()
29 g = Graph()
30 g.parse(data=results.serialize(format="xml"), format="xml")
32 print("\nresults:\n")
33 results = g.serialize(format="nt").encode("utf-8").decode('utf-8')
34 print(results)
36 text_file = open("sample.nt", "w")
37 text_file.write(results)
38 text_file.close()

Here is the printed output from running this script (most output not shown, and manually edited to fit page width):

 1  $ python
 2 results:
 4 <>
 5     <>
 6     "Ethiopia"@en .
 7 <,_Buenos_Aires>
 8     <>
 9     "Valentin Alsina, Buenos Aires"@en .
10 <>
11     <>
12     <> .
13 <>
14     <>
15     "Davyd-Haradok"@en .
16 <>
17     <>
18     "Belarus"@en .
19  ...

This output was written to a local file sample.nt. I divided this example into two separate Python scripts because I thought it would be easier for you, dear reader, to experiment with fetching RDF data separately from using a LLM to process the RDF data. In production you may want to combine KG queries with semantic analysis.

This code example demonstrates the use of the GPTSimpleVectorIndex for querying RDF data and retrieving information about countries. The function download_loader loads data importers by string name. While it is not a type safe to load a Python class by name using a string, if you misspell the name of the class to load the call to download_loader then a Python ValueError(“Loader class name not found in library”) error is thrown. The GPTSimpleVectorIndex class represents an index data structure that can be used to efficiently search and retrieve information from the RDF data. This is similar to other types of LlamaIndex vector index types for different types of data sources.

Here is the script

 1 "Example from documentation"
 3 from llama_index import GPTSimpleVectorIndex, Document
 4 from llama_index import download_loader
 6 RDFReader = download_loader("RDFReader")
 7 doc = RDFReader().load_data("sample.nt")
 8 index = GPTSimpleVectorIndex(doc)
10 result = index.query("list all countries in a quoted Python array, then explain why")
12 print(result.response)

Here is the output:

 1 $ python
 2 INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
 3 INFO:root:> [build_index_from_documents] Total embedding token usage: 761 tokens
 4 INFO:root:> [query] Total LLM token usage: 921 tokens
 5 INFO:root:> [query] Total embedding token usage: 12 tokens
 7 ['Argentina', 'French Polynesia', 'Democratic Republic of the Congo', 'Benin', 'Ethi\
 8 opia', 'Australia', 'Uzbekistan', 'Tanzania', 'Albania', 'Belarus', 'Vanuatu', 'Arme
 9 nia', 'Syria', 'Andorra', 'Venezuela', 'France', 'Vietnam', 'Azerbaijan']
11 This is a list of all the countries mentioned in the context information. All of the\
12  countries are listed in the context information, so this list is complete.

Why are there only 18 countries listed? In the script used to perform a SPARQL query on DBPedia, we had a statement LIMIT 50 at the end of the query so only 50 RDF triples were written to the file sample.nt that only contains data for 18 countries.

Using Wikidata as a Data Source

It is slightly more difficult exploring Wikidata compared to DBPedia. Let’s revisit getting information about my home town of Sedona Arizona.

In writing this example, I experimented with SPARQL queries using the Wikidata SPARQL web app.

We can start by finding RDF statements with the object value being “Sedona” using the Wikidata web app:

1 select * where {
2   ?s ?p "Sedona"@en
3 } LIMIT 30

First we write a helper utility to gather prompt text for an entity name (e.g., name of a person, place, etc.) in the file

 1 from SPARQLWrapper import SPARQLWrapper, JSON
 2 from rdflib import Graph
 3 import pandas as pd
 5 def get_possible_entity_uris_from_wikidata(entity_name):
 6    sparql = SPARQLWrapper("")
 7    sparql.setQuery("""
 8       SELECT ?entity ?entityLabel WHERE {
 9          ?entity rdfs:label "%s"@en .
10       } limit 5
11    """ % entity_name)
13    sparql.setReturnFormat(JSON)
14    results = sparql.query().convert()
16    results = pd.json_normalize(results['results']['bindings']).values.tolist()
17    results = ["<" + x[1] + ">" for x in results]
18    return [*set(results)] # remove duplicates
20 def wikidata_query_to_df(entity_uri):
21    sparql = SPARQLWrapper("")
22    sparql.setQuery("""
23       SELECT ?description ?is_a_type_of WHERE {
24         %s schema:description ?description FILTER (lang(?description) = 'en') .
25         %s wdt:P31 ?instanceOf .  
26         ?instanceOf rdfs:label ?is_a_type_of FILTER (lang(?is_a_type_of) = 'en') .
27       } limit 10
28    """ % (entity_uri, entity_uri))
30    sparql.setReturnFormat(JSON)
31    results = sparql.query().convert()
32    results2 = pd.json_normalize(results['results']['bindings'])
33    prompt_text = ""
34    for index, row in results2.iterrows():
35         prompt_text += row['description.value'] + " is a type of " + row['is_a_type_\
36 of.value'] + "\n" 
37    return prompt_text
39 def generate_prompt_text(entity_name):
40    entity_uris = get_possible_entity_uris_from_wikidata(entity_name)
41    prompt_text = ""
42    for entity_uri in entity_uris:
43        p = wikidata_query_to_df(entity_uri)
44        if "disambiguation page" not in p:
45            prompt_text += entity_name + " is " + wikidata_query_to_df(entity_uri)
46    return prompt_text
48 if __name__ == "__main__":
49    print("Sedona:", generate_prompt_text("Sedona"))
50    print("California:",
51          generate_prompt_text("California"))
52    print("Bill Clinton:",
53          generate_prompt_text("Bill Clinton"))
54    print("Donald Trump:",
55           generate_prompt_text("Donald Trump"))

This utility does most of the work in getting prompt text for an entity.

The GPTTreeIndex class is similar to other LlamaIndex index classes. This class builds a tree-based index of the prompt texts, which can be used to retrieve information based on the input question. In LlamaIndex, a GPTTreeIndex is used to select the child node(s) to send the query down to. A GPTKeywordTableIndex uses keyword matching, and a GPTVectorStoreIndex uses embedding cosine similarity. The choice of which index class to use depends on how much text is being indexed, what the granularity of subject matter in the text is, and if you want summarization.

GPTTreeIndex is also more efficient than GPTSimpleVectorIndex because it uses a tree structure to store the data. This allows for faster searching and retrieval of data compared to a linear list index class like GPTSimpleVectorIndex.

The LlamaIndex code is relatively easy to implement in the script (edited to fit page width):

 1 from llama_index import StringIterableReader, GPTTreeIndex
 2 from wikidata_generate_prompt_text import generate_prompt_text
 4 def wd_query(question, *entity_names):
 5     prompt_texts = []
 6     for entity_name in entity_names:
 7         prompt_texts +=
 8           [generate_prompt_text(entity_name)]
 9     documents =
10       StringIterableReader().load_data(texts=prompt_texts)
11     index = GPTTreeIndex(documents)
12     index = index.as_query_engine(child_branching_factor=2)
13     return index.query(question)
15 if __name__ == "__main__":
16   print("Sedona:", wd_query("What is Sedona?", "Sedona"))
17   print("California:",
18         wd_query("What is California?", "California"))
19   print("Bill Clinton:",
20     wd_query("When was Bill Clinton president?",
21              "Bill Clinton"))
22   print("Donald Trump:",
23     wd_query("Who is Donald Trump?",
24              "Donald Trump"))

Here is the test output (with some lines removed):

 1 $ python
 2 Total LLM token usage: 162 tokens
 3 INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] INFO:lla\
 4 ma_index.indices.query.tree.leaf_query:> Starting query: What is Sedona?
 5 INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 154 to\
 6 kens
 7 Sedona: Sedona is a city in the United States located in the counties of Yavapai and\
 8  Coconino, Arizona. It is also the title of a 2012 film, a company, and a 2015 singl
 9 e by Houndmouth.
11 Total LLM token usage: 191 tokens
12 INFO:llama_index.indices.query.tree.leaf_query:> Starting query: What is California?
13 California: California is a U.S. state in the United States of America.
15 Total LLM token usage: 138 tokens
16 INFO:llama_index.indices.query.tree.leaf_query:> Starting query: When was Bill Clint\
17 on president?
18 Bill Clinton: Bill Clinton was the 42nd President of the United States from 1993 to \
19 2001.
21 Total LLM token usage: 159 tokens
22 INFO:llama_index.indices.query.tree.leaf_query:> Starting query: Who is Donald Trump?
23 Donald Trump: Donald Trump is the 45th President of the United States, serving from \
24 2017 to 2021.

Using LLMs To Organize Information in Our Google Drives

My digital life consists of writing, working as an AI practitioner, and learning activities that I justify with my self-image of a “gentleman scientist.” Cloud storage like GitHub, Google Drive, Microsoft OneDrive, and iCloud are central to my activities.

About ten years ago I spent two months of my time writing a system in Clojure that was planned to be my own custom and personal DropBox, augmented with various NLP tools and a FireFox plugin to send web clippings directly to my personal system. To be honest, I stopped using my own project after a few months because the time it took to organize my information was a greater opportunity cost than the value I received.

In this chapter I am going to walk you through parts of a new system that I am developing for my own personal use to help me organize my material on Google Drive (and eventually other cloud services). Don’t be surprised if the completed project is an additional example in a future edition of this book!

With the Google setup directions listed below, you will get a pop-up web browsing window with a warning like (this shows my Gmail address, you should see your own Gmail address here assuming that you have recently logged into Gmail using your default web browser):

You will need to first click Advanced and then click link Go to GoogleAPIExamples (unsafe) link in the lower left corner and then temporarily authorize this example on your Gmail account.

Setting Up Requirements.

You need to create a credential at (copied from the PyDrive documentation, changing application type to “Desktop”):

  • Search for ‘Google Drive API’, select the entry, and click ‘Enable’.
  • Select ‘Credentials’ from the left menu, click ‘Create Credentials’, select ‘OAuth client ID’.
  • Now, the product name and consent screen need to be set -> click ‘Configure consent screen’ and follow the instructions. Once finished:
  • Select ‘Application type’ to be Desktop application.
  • Enter an appropriate name.
  • Input http://localhost:8080 for ‘Authorized JavaScript origins’.
  • Input http://localhost:8080/ for ‘Authorized redirect URIs’.
  • Click ‘Save’.
  • Click ‘Download JSON’ on the right side of Client ID to download client_secret_.json. Copy the downloaded JSON credential file to the example directory google_drive_llm for this chapter.

Write Utility To Fetch All Text Files From Top Level Google Drive Folder

For this example we will just authenticate our test script with Google, and copy all top level text files with names ending with “.txt” to the local file system in subdirectory data. The code is in the directory google_drive_llm in file (edited to fit page width):

 1 from pydrive.auth import GoogleAuth
 2 from import GoogleDrive
 3 from pathlib import Path
 5 # good GD search docs:
 6 #
 8 # Authenticate with Google
 9 gauth = GoogleAuth()
10 gauth.LocalWebserverAuth()
11 drive = GoogleDrive(gauth)
13 def get_txt_files(dir_id='root'):
14     " get all plain text files with .txt extension in top level Google Drive directo\
15 ry "
17     file_list = drive.ListFile({'q': f"'{dir_id}' in parents and trashed=false"}).Ge\
18 tList()
19     for file1 in file_list:
20         print('title: %s, id: %s' % (file1['title'], file1['id']))
21     return [[file1['title'], file1['id'], file1.GetContentString()]
22             for file1 in file_list
23               if file1['title'].endswith(".txt")]
25 def create_test_file():
26     " not currently used, but useful for testing. "
28     # Create GoogleDriveFile instance with title 'Hello.txt':
29     file1 = drive.CreateFile({'title': 'Hello.txt'})
30     file1.SetContentString('Hello World!')
31     file1.Upload()
33 def test():
34     fl = get_txt_files()
35     for f in fl:
36         print(f)
37         file1 = open("data/" + f[0],"w")
38         file1.write(f[2])
39         file1.close()
41 if __name__ == '__main__':
42     test()

For testing I just have one text file with the file extension “.txt” on my Google Drive so my output from running this script looks like the following listing. I edited the output to change my file IDs and to only print a few lines of the debug printout of file titles.

 1 $ python
 2 Your browser has been opened to visit:
 6 180%2F&
 7 ponse_type=code
 9 Authentication successful.
11 title: testdata, id: 1TZ9bnL5XYQvKACJw8VoKWdVJ8jeCszJ
12 title: sports.txt, id: 18RN4ojvURWt5yoKNtDdAJbh4fvmRpzwb
13 title: Anaconda blog article, id: 1kpLaYQA4Ao8ZbdFaXU209hg-z0tv1xA7YOQ4L8y8NbU
14 title: backups_2023, id: 1-k_r1HTfuZRWN7vwWWsYqfssl0C96J2x
15 title: Work notes, id: 1fDyHyZtKI-0oRNabA_P41LltYjGoek21
16 title: Sedona Writing Group Contact List, id: 1zK-5v9OQUfy8Sw33nTCl9vnL822hL1w
17  ...
18 ['sports.txt', '18RN4ojvURWt5yoKNtDdAJbh4fvmRpzwb', 'Sport is generally recognised a\
19 s activities based in physical athleticism or physical dexterity.[3] Sports are usua
20 lly governed by rules to ensure fair competition and consistent adjudication of the 
21 winner.\n\n"Sport" comes from the Old French desport meaning "leisure", with the old
22 est definition in English from around 1300 being "anything humans find amusing or en
23 tertaining".[4]\n\nOther bodies advocate widening the definition of sport to include
24  all physical activity and exercise. For instance, the Council of Europe include all
25  forms of physical exercise, including those completed just for fun.\n\n']

Generate Vector Indices for Files in Specific Google Drive Directories

The example script in the last section should have created copies of the text files in you home Google Documents directory that end with “.txt”. Here, we use the same LlamaIndex test code that we used in a previous chapter. The test script is listed here:

 1 # make sure you set the following environment variable is set:
 4 from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
 5 documents = SimpleDirectoryReader('data').load_data()
 6 index = GPTSimpleVectorIndex(documents)
 8 # save to disk
 9 index.save_to_disk('index.json')
10 # load from disk
11 index = GPTSimpleVectorIndex.load_from_disk('index.json')
13 # search for a document
14 print(index.query("What is the definition of sport?"))

For my test file, the output looks like:

 1 $ python 
 2 INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LL\
 3 M token usage: 0 tokens
 4 INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total em\
 5 bedding token usage: 111 tokens
 6 INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 202 to\
 7 kens
 8 INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: \
 9 7 tokens
11 Sport is generally recognised as activities based in physical athleticism or physica\
12 l dexterity that are governed by rules to ensure fair competition and consistent adj
13 udication of the winner. It is anything humans find amusing or entertaining, and can
14  include all forms of physical exercise, even those completed just for fun.

It is interesting to see how the query result is rewritten in a nice form, compared to the raw text in the file sports.txt on my Google Drive:

 1 $ cat data/sports.txt 
 2 Sport is generally recognised as activities based in physical athleticism or physica\
 3 l dexterity.[3] Sports are usually governed by rules to ensure fair competition and 
 4 consistent adjudication of the winner.
 6 "Sport" comes from the Old French desport meaning "leisure", with the oldest definit\
 7 ion in English from around 1300 being "anything humans find amusing or entertaining"
 8 .[4]
10 Other bodies advocate widening the definition of sport to include all physical activ\
11 ity and exercise. For instance, the Council of Europe include all forms of physical 
12 exercise, including those completed just for fun.

Google Drive Example Wrap Up

If you already use Google Drive to store your working notes and other documents, then you might want to expand the simple example in this chapter to build your own query system for your documents. In addition to Google Drive, I also use Microsoft Office 365 and OneDrive in my work and personal projects.

I haven’t written my own connectors yet for OneDrive but this is on my personal to-do list using the Microsoft library

Using Zapier Integrations With GMail and Google Calendar

Zapier is a service for writing integrations with hundreds of cloud services. Here we will write some demos for writing automatic integrations with Gmail and Google Calendar.

Using the Zapier service is simple. You need to register the services you want to interact with on the Zapier developer web site and then you can express how you want to interact with services using natural language prompts.

Set Up Development Environment

You will need a developer key for Zapier Natural Language Actions API. Go to this linked web page and look for “Dev App” in the “Provider Name” column. If a key does not exist, you’ll need to set up an action to create a key. Click “Set up Actions” and follow the instructions. Your key will be in the Personal API Key column for the “Dev App.” Click to reveal and copy your key. You can read the documentation.

When I set up my Zapier account I set up three Zapier Natural Language Actions:

  • Gmail: Find Email
  • Gmail: Send Email
  • Google Calendar: Find Event

If you do the same then you will see the Zapier registered actions:

Sending a Test GMail

In the following example replace TEST_EMAIL_ADDRESS with an email address that you can use for testing.

 1 from langchain.llms import OpenAI
 2 from langchain.agents import initialize_agent
 3 from langchain.agents.agent_toolkits import ZapierToolkit
 4 from langchain.utilities.zapier import ZapierNLAWrapper
 6 llm = OpenAI(temperature=0)
 7 zapier = ZapierNLAWrapper()
 8 toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
 9 agent = initialize_agent(toolkit.get_tools(), llm, agent="zero-shot-react-descriptio\
10 n", verbose=True)
12"Send an Email to TEST_EMAIL_ADDRESS via gmail that is a pitch for hiring \
13 Mark Watson as a consultant for deep learning and large language models")

Here is the sample output:

 1 $ python
 4 > Entering new AgentExecutor chain...
 5  I need to use the Gmail: Send Email tool
 6 Action: Gmail: Send Email
 7 Action Input: Send an email to TEST_EMAIL_ADDRESS with the subject "Pitch for Hiring\
 8  Mark Watson as a Consultant for Deep Learning and Large Language Models" and the bo
 9 dy "Dear Mark Watson, I am writing to you to pitch the idea of hiring you as a consu
10 ltant for deep learning and large language models. I believe you have the expertise 
11 and experience to help us achieve our goals. Please let me know if you are intereste
12 d in discussing further. Thank you for your time."
13 Cc: not enough information provided in the instruction, missing Cc
14 Observation: {"labelIds": "SENT"}
15 Thought: I now know the final answer
16 Final Answer: An email has been sent to TEST_EMAIL_ADDRESS with the subject "Pitch f\
17 or Hiring Mark Watson as a Consultant for Deep Learning and Large Language Models" a
18 nd the body "Dear Mark Watson, I am writing to you to pitch the idea of hiring you a
19 s a consultant for deep learning and large language models. I believe you have the e
20 xpertise and experience to help us achieve our goals. Please let me know if you are 
21 interested in discussing further. Thank you for your time."
23 > Finished chain.

Google Calendar Integration Example

Assuming that you configured the Zapier Natural Language Action “Google Calendar: Find Event” then the same code we used to send an email in the last section works for checking calendar entries, you just need to change the natural language prompt:

 1 from langchain.llms import OpenAI
 2 from langchain.agents import initialize_agent
 3 from langchain.agents.agent_toolkits import ZapierToolkit
 4 from langchain.utilities.zapier import ZapierNLAWrapper
 6 llm = OpenAI(temperature=0)
 7 zapier = ZapierNLAWrapper()
 8 toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
 9 agent = initialize_agent(toolkit.get_tools(), llm, 
10                          agent="zero-shot-react-description", verbose=True)
12"Get my Google Calendar entries for tomorrow")

And the output looks like:

 1 $ python
 3 > Entering new AgentExecutor chain...
 4  I need to find events in my Google Calendar
 5 Action: Google Calendar: Find Event
 6 Action Input: Find events in my Google Calendar tomorrow
 7 Observation: {"location": "Greg to call Mark on (928) XXX-ZZZZ", "kind": "calendar#e\
 8 vent", "end__dateTime": "2023-03-23T10:00:00-07:00", "status": "confirmed", "end__da
 9 teTime_pretty": "Mar 23, 2023 10:00AM", "htmlLink": ""}
10 Thought: I now know the final answer
11 Final Answer: I have an event in my Google Calendar tomorrow at 10:00AM.
13 > Finished chain.

I edited this output to remove some private information.

Natural Language SQLite Database Queries With LangChain

The LangChain library support of SQLite databases uses the Python library SQLAlchemy for database connections. This abstraction layer allows LangChain to use the same logic and models for other relational databases.

I have a long work history of writing natural language interfaces for relational databases that I will review in the chapter wrap up. For now, I invite you to be amazed at how simple it is to write the LangChain scripts for querying a database in natural language.

We will use the SQLite sample database from the SQLite Tutorial web site:


This database has 11 tables. The above URI has documentation for this database so please take a minute to review the table schema diagram and text description.

This example is derived from the LangChain documentation. We use three classes from the langchain library:

  • OpenAI: A class that represents the OpenAI language model, which is capable of understanding natural language and generating a response.
  • SQLDatabase: A class that represents a connection to an SQL database.
  • SQLDatabaseChain: A class that connects the OpenAI language model with the SQL database to allow natural language querying.

The temperature parameter set to 0 in this example. The temperature parameter controls the randomness of the generated output. A lower value (like 0) makes the model’s output more deterministic and focused, while a higher value introduces more randomness (or “creativity”). The run method of the db_chain object translates the natural language query into an appropriate SQL query, execute it on the connected database, and then returns the result converting the output into natural language.

 1 # SQLite NLP Query Demo Script
 3 from langchain import OpenAI, SQLDatabase
 4 from langchain import SQLDatabaseChain
 6 db = SQLDatabase.from_uri("sqlite:///chinook.db")
 7 llm = OpenAI(temperature=0)
 9 db_chain = SQLDatabaseChain(llm=llm, database=db,
10                             verbose=True)
12"How many employees are there?")
13"What is the name of the first employee?")
14"Which customer has the most invoices?")
15"List all music genres in the database")

The output (edited for brevity) shows the generated SQL queries and the query results:

 1 $ python
 3 > Entering new SQLDatabaseChain chain...
 4 How many employees are there? 
 5  SELECT COUNT(*) FROM employees;
 6 SQLResult: [(8,)]
 7 Answer: There are 8 employees.
 8 > Finished chain.
10 > Entering new SQLDatabaseChain chain...
11 What is the name of the first employee? 
12  SELECT FirstName, LastName FROM employees WHERE EmployeeId = 1;
13 SQLResult: [('Andrew', 'Adams')]
14 Answer: The first employee is Andrew Adams.
15 > Finished chain.
17 > Entering new SQLDatabaseChain chain...
18 Which customer has the most invoices? 
19  SELECT customers.FirstName, customers.LastName, COUNT(invoices.InvoiceId) AS Number\
20 OfInvoices FROM customers INNER JOIN invoices ON customers.CustomerId = invoices.Cus
21 tomerId GROUP BY customers.CustomerId ORDER BY NumberOfInvoices DESC LIMIT 5;
22 SQLResult: [('Luis', 'Goncalves', 7), ('Leonie', 'Kohler', 7), ('Francois', 'Trembla\
23 y', 7), ('Bjorn', 'Hansen', 7), ('Frantisek', 'Wichterlova', 7)]
24 Answer: Luis Goncalves has the most invoices with 7.
25 > Finished chain.
27 > Entering new SQLDatabaseChain chain...
28 List all music genres in the database 
29 SQLQuery: SELECT Name FROM genres
30 SQLResult: [('Rock',), ('Jazz',), ('Metal',), ('Alternative & Punk',), ('Rock And Ro\
31 ll',), ('Blues',), ('Latin',), ('Reggae',), ('Pop',), ('Soundtrack',), ('Bossa Nova'
32 ,), ('Easy Listening',), ('Heavy Metal',), ('R&B/Soul',), ('Electronica/Dance',), ('
33 World',), ('Hip Hop/Rap',), ('Science Fiction',), ('TV Shows',), ('Sci Fi & Fantasy'
34 ,), ('Drama',), ('Comedy',), ('Alternative',), ('Classical',), ('Opera',)]
35 Answer: Rock, Jazz, Metal, Alternative & Punk, Rock And Roll, Blues, Latin, Reggae, \
36 Pop, Soundtrack, Bossa Nova, Easy Listening, Heavy Metal, R&B/Soul, Electronica/Danc
37 e, World, Hip Hop/Rap, Science Fiction, TV Shows, Sci Fi & Fantasy, Drama, Comedy, A
38 lternative, Classical, Opera
39 > Finished chain.

Natural Language Database Query Wrap Up

I had an example I wrote for the first two editions of my Java AI book (I later removed this example because the code was too long and too difficult to follow). I later reworked this example in Common Lisp and used both versions in several consulting projects in the late 1990s and early 2000s.

The last book I wrote Practical Python Artificial Intelligence Programming used an OpenAI example that shows relatively simple code (relative to my older hand-written Java and Common Lisp code) for a NLP database interface.

Compared to the elegant support for NLP database queries in LangChain, the previous examples have limited power and required a lot more code. As I write this in March 2023, it is a good feeling that for the rest of my career, NLP database access is now a solved problem!

Examples Using Hugging Face Open Source Models

To start with you will need to create a free account on the Hugging Face Hub and get an API key and install:

1 pip install --upgrade huggingface_hub

You need to set the following environment variable to your Hugging Face Hub access token:


So far in this book we have been using the OpenAI LLM wrapper:

1 from langchain.llms import OpenAI

Here we will use the alternative Hugging Face wrapper class:

1 from langchain import HuggingFaceHub

The LangChain library hides most of the details of using both APIs. This is a really good thing. I have had a few discussions on social tech media with people who object to the non open source nature of OpenAI. While I like the convenience of using OpenAI’s APIs, I always like to have alternatives for proprietary technology I use.

The Hugging Face Hub endpoint in LangChain connects to the Hugging Face Hub and runs the models via their free inference endpoints. We need a Hugging Face account and API key to use these endpoints3. There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the text2text-generation and text-generation tasks. Text2text-generation refers to the task of generating a text sequence from another text sequence. For example, generating a summary of a long article. Text-generation refers to the task of generating a text sequence from scratch.

Using LangChain as a Wrapper for Hugging Face Prediction Model APIs

We will start with a simple example using the prompt text support in LangChain. The following example is in the script

 1 from langchain import HuggingFaceHub, LLMChain
 2 from langchain.prompts import PromptTemplate
 4 hub_llm = HuggingFaceHub(
 5     repo_id='google/flan-t5-xl',
 6     model_kwargs={'temperature':1e-6}
 7 )
 9 prompt = PromptTemplate(
10     input_variables=["name"],
11     template="What year did {name} get elected as president?",
12 )
14 llm_chain = LLMChain(prompt=prompt, llm=hub_llm)
16 print("George Bush"))

By changing just a few lines of code, you can run many of the examples in this book using the Hugging Face APIs in place of the OpenAI APIs.

The LangChain documentation lists the source code for a wrapper to use local Hugging Face embeddings here.

Creating a Custom LlamaIndex Hugging Face LLM Wrapper Class That Runs on Your Laptop

We will be downloading the Hugging Face model facebook/opt-iml-1.3b that is a 2.6 gigabyte file. This model is downloaded the first time it is requested and is then cached in ~/.cache/huggingface/hub for later reuse.

This example is modified from an example for custom LLMs in the LlamaIndex documentation. Note that I have used a much smaller model in this example and reduced the prompt and output text size.

 1 # Derived from example:
 2 #
 4 import time
 5 import torch
 6 from langchain.llms.base import LLM
 7 from llama_index import SimpleDirectoryReader, LangchainEmbedding
 8 from llama_index import GPTListIndex, PromptHelper
 9 from llama_index import LLMPredictor
10 from transformers import pipeline
12 max_input_size = 512
13 num_output = 64
14 max_chunk_overlap = 10
15 prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
17 class CustomLLM(LLM):
18     model_name = "facebook/opt-iml-1.3b"
19     # I am not using a GPU, but you can add device="cuda:0"
20     # to the pipeline call if you have a local GPU or
21     # are running this on Google Colab:
22     pipeline = pipeline("text-generation", model=model_name,
23                         model_kwargs={"torch_dtype":torch.bfloat16})
25     def _call(self, prompt, stop = None):
26         prompt_length = len(prompt)
27         response = self.pipeline(prompt, max_new_tokens=num_output)
28         first_response = response[0]["generated_text"]
29         # only return newly generated tokens
30         returned_text = first_response[prompt_length:]
31         return returned_text
33     @property
34     def _identifying_params(self):
35         return {"name_of_model": self.model_name}
37     @property
38     def _llm_type(self):
39         return "custom"
41 time1 = time.time()
43 # define our LLM
44 llm_predictor = LLMPredictor(llm=CustomLLM())
46 # Load the your data
47 documents = SimpleDirectoryReader('../data_small').load_data()
48 index = GPTListIndex(documents, llm_predictor=llm_predictor,
49                      prompt_helper=prompt_helper)
50 index = index.as_query_engine(llm_predictor=llm_predictor)
52 time2 = time.time()
53 print(f"Time to load model from disk: {time2 - time1} seconds.")
55 # Query and print response
56 response = index.query("What is the definition of sport?")
57 print(response)
59 time3 = time.time()
60 print(f"Time for query/prediction: {time3 - time2} seconds.")

When running on my M1 MacBook Pro using only the CPU (no GPU or Neural Engine configuration) we can read the model from disk quickly but it takes a while to process queries:

 1 $ python
 2 INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LL\
 3 M token usage: 0 tokens
 4 INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total em\
 5 bedding token usage: 0 tokens
 6 Time to load model from disk: 1.5303528308868408 seconds.
 7 INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 182 to\
 8 kens
 9 INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: \
10 0 tokens
12 "Sport" comes from the Old French desport meaning "leisure", with the oldest definit\
13 ion in English from around 1300 being "anything humans find amusing or entertaining"
14 .[4]
15 Time for query/prediction: 228.8184850215912 seconds.

Even though my M1 MacBook does fairly well when I configure TensorFlow and PyTorch to use the Apple Silicon GPUs and Neural Engines, I usually do my model development using Google Colab.

Let’s rerun the last example on Colab:

Using a standard Colab GPU, the query/prediction time is much faster. Here is a link to my Colab notebook if you would prefer to run this example on Colab instead of on your laptop.

Running Local LLMs Using Llama.cpp and LangChain

We saw an example at the end of the last chapter running a local LLM. Here we use the Llama.cpp project to run a local model with LangChain. I write this in October 2023 about six months after I wrote the previous chapter. While the examples in the last chapter work very well if you have an NVIDIA GPU, I now prefer using Llama.cpp because it also works very well with Apple Silicon. My Mac has a M2 SOC with 32G of internal memory which is suitable for running fairly large LLMs efficiently.

Installing Llama.cpp with a Llama2-13b-orca Model

Now we look an an approach to run LLMs locally on your own computers.

Among the many open and public models, I chose Hugging Face’s Llama2-13b-orca model because of its support for natural language processing tasks. The combination of Llama2-13b-orca with the llama.cpp library is well supported by LangChain and will meet our requirements for local deployment and ease of installation and use.

Start by cloning the llama.cpp project and building it:

1 git clone
2 make
3 mkdir models

Then get a model file from and copy to ./models directory:

1 $ ls -lh models
2 8.6G openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf

It is not strictly required for you to clone Llama.cpp from GitHub because the LangChain library includes full support for encapsulating Llama.cpp via the llama-cpp-python library. That said, you can also run Llama.cpp from the command line and it includes a REST server option and I find it useful beyond the requirements for the example in this chapter.

Note that there are many different variations of this model that trade off quality for memory use. I am using one of the larger models. If you only have 8G of memory try a smaller model.

Python Example

The following script is in the file langchain-book-examples/llama.cpp/ and is derived from the LangChain documentation:

We start by importing the following modules and classes from the langchain library: LlamaCpp, PromptTemplate, LLMChain, and callback-related entities. An instance of PromptTemplate is then created with a specified template that structures the input question and answer format. A CallbackManager instance is established with StreamingStdOutCallbackHandler as its argument to facilitate token-wise streaming during the model’s inference, which is useful for seeing text as it is generated.

We then create an instance of the LlamaCpp class with specified parameters including the model path, temperature, maximum tokens, and others, along with the earlier created CallbackManager instance. The verbose parameter is set to True, implying that detailed logs or outputs would be provided during the model’s operation, and these are passed to the CallbackManager. The script then defines a new prompt regarding age comparison and invokes the LlamaCpp instance with this prompt to generate and output a response.

 1 from langchain.llms import LlamaCpp
 2 from langchain.prompts import PromptTemplate
 3 from langchain.chains import LLMChain
 4 from langchain.callbacks.manager import CallbackManager
 5 from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
 7 template = """Question: {question}
 9 Answer: Let's work this out in a step by step way to be sure we have the right answe\
10 r."""
12 prompt = PromptTemplate(template=template, input_variables=["question"])
14 # Callbacks support token-wise streaming
15 callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
17 # Make sure the model path is correct for your system!
18 llm = LlamaCpp(
19     model_path="/Users/markw/llama.cpp/models/openassistant-llama2-13b-orca-8k-3319.\
20 Q5_K_M.gguf",
21     temperature=0.75,
22     max_tokens=2000,
23     top_p=1,
24     callback_manager=callback_manager, 
25     verbose=True, # Verbose for callback manager
26 )
28 prompt = """
29 Question: If Mary is 30 years old and Bob is 25, who is older and by how much?
30 """
31 print(llm(prompt))

Here is example output (with output shortened for brevity):

 1  $ p
 2 llama_model_loader: loaded meta data with 20 key-value pairs and 363 tensors from /U\
 3 sers/markw/llama.cpp/models/openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf (versi
 4 on GGUF V2 (latest))
 6 My Answer: Mary is older by 5 years.
 7 A more complete answer should be: "To determine whether Mary or Bob is older, first \
 8 find the difference in their ages. This can be done by subtracting the smaller numbe
 9 r from the larger number. 
10 For example, let's say Mary is 30 years old and Bob is 25 years old. To find out who\
11  is older, we need to subtract Bob's age (25) from Mary's age (30). The answer is 5.
12  Therefore, Mary is 5 years older than Bob."

While using APIs from OpenAI, Anthropic, and other providers is simple and frees developers from the requirements for running LLMs, new tools like Llama.cpp make it easier and less expensive to run and deploy LLMs yourself. My preference, dear reader, is to have as much control as possible over software and systems that I depend on and experiment with.

Running Local LLMs Using Ollama

We saw an example at the end of the last chapter running Llama.cpp project to run a local model with LangChain. As I update this chapter in April 2024 I now most often use the Ollama app (download, documentation, and list of supported models at Ollama has a good command line interface and also runs a REST service that the examples in this chapter use.

Ollama works very well with Apple Silicon, systems with an NVIDIA GPU, and high end CPU-only systems. My Mac has a M2 SOC with 32G of internal memory which is suitable for running fairly large LLMs efficiently but most of the examples here run fine with 16G memory.

Most of this chapter involves Python code examples using Ollama to run local LLMs. However the Ollama command line interface is useful for interactive experiments. Another useful development technique is to write prompts in individual text files like p1.txt, p2.txt, etc. and run a prompt (on macOS and Linux) using:

1 $ ollama run llama3:instruct < p1.txt

And after the response is printed either stay in the Ollama REPL or type /bye to exit.

Simple Use of a local Mistral Model Using LangChain

We look at a simple example for asking questions and text completions using a local Mistral model. The Ollama support in LangChain requires that you run Ollama as a service on your laptop:

1 ollama serve

Here I am using a Mistral model but I usually have several LLMs installed to experiment with, for example:

 1  $ ollama list
 2 NAME                            ID              SIZE    MODIFIED     
 3 everythinglm:13b                bf6610a21b1e    7.4 GB  8 days ago      
 4 llama2:13b                      b3f03629d9a6    7.4 GB  4 weeks ago     
 5 llama2:latest                   fe938a131f40    3.8 GB  4 weeks ago     
 6 llava:latest                    e4c3eb471fd8    4.5 GB  4 days ago      
 7 meditron:7b                     ad11a6250f54    3.8 GB  8 days ago      
 8 mistral:7b-instruct             d364aa8d131e    4.1 GB  3 weeks ago     
 9 mistral:instruct                d364aa8d131e    4.1 GB  5 weeks ago     
10 mistral:latest                  d364aa8d131e    4.1 GB  3 weeks ago     
11 mixtral:8x7b-instruct-v0.1-q2_K 4278058671b6    15 GB   20 hours ago    
12 openhermes2.5-mistral:latest    ca4cd4e8a562    4.1 GB  3 weeks ago     
13 orca2:13b                       a8dcfac3ac32    7.4 GB  8 days ago      
14 samantha-mistral:latest         f7c8c9be1da0    4.1 GB  8 days ago      
15 wizard-vicuna-uncensored:30b    5a7102e25304    18 GB   4 weeks ago     
16 yi:34b                          5f8365d57cb8    19 GB   8 days ago 

Here is the file ollama_langchain/

 1 # requires "ollama serve" to be running in a terminal
 3 from langchain.llms import Ollama
 5 llm = Ollama(
 6     model="mistral:7b-instruct",
 7     verbose=False,
 8 )
10 s = llm("how much is 1 + 2?")
11 print(s)
13 s = llm("If Sam is 27, Mary is 42, and Jerry is 33, what are their age differences?")
14 print(s)

Here is the output:

 1 $ python
 2 1 + 2 = 3.
 4 To calculate their age differences, we simply subtract the younger person's age from\
 5  the older person's age. Here are the calculations:
 6 - Sam is 27 years old, and Mary is 42 years old, so their age difference is 42 - 27 \
 7 = 15 years.
 8 - Mary is 42 years old, and Jerry is 33 years old, so their age difference is 42 - 3\
 9 3 = 9 years.
10 - Jerry is 33 years old, and Sam is 27 years old, so their age difference is 33 - 27\
11  = 6 years.

Minimal Example Using Ollama with the Mistral Open Model for Retrieval Augmented Queries Against Local Documents

The following listing of file ollama_langchain/ demonstrates creating a persistent embeddings datastore and reusing it. In production, this example would be split into two separate Python scripts:

  • Create a persistent embeddings datastore from a directory of local documents.
  • Open a persisted embeddings datastore and use it for queries against local documents.

Creating a local persistent embeddings datastore for the example text files in ../data/*.txt takes about 90 seconds on my Mac Mini.

 1 # requires "ollama serve" to be running in another terminal
 3 from langchain.llms import Ollama
 4 from langchain.embeddings.ollama import OllamaEmbeddings
 5 from langchain.chains import RetrievalQA
 7 from langchain.vectorstores import Chroma
 8 from langchain.text_splitter import RecursiveCharacterTextSplitter
 9 from import DirectoryLoader
11 # Create index (can be reused):
13 loader = DirectoryLoader('../data', glob='**/*.txt')
15 data = loader.load()
17 text_splitter = RecursiveCharacterTextSplitter(
18     chunk_size=1000, chunk_overlap=100)
19 all_splits = text_splitter.split_documents(data)
21 persist_directory = 'cache'
23 vectorstore = Chroma.from_documents(
24     documents=all_splits, embedding=OllamaEmbeddings(model="mistral:instruct"),
25                                persist_directory=persist_directory)
27 vectorstore.persist()
29 # Try reloading index from disk and using for search:
31 persist_directory = 'cache'
33 vectorstore = Chroma(
34   persist_directory=persist_directory,
35   embedding_function=OllamaEmbeddings(model="mistral:instruct")
36 )
38 llm = Ollama(base_url="http://localhost:11434",
39              model="mistral:instruct",
40              verbose=False,
41             )
43 retriever = vectorstore.as_retriever()
45 qa_chain = RetrievalQA.from_chain_type(
46             llm=llm,
47             chain_type='stuff',
48             retriever=retriever,
49             verbose=True,)
51 while True:
52     query = input("Ask a question: ")
53     response = qa_chain(query)
54     print(response['result'])

Here is an example using this script. The first question uses the innate knowledge contained in the Mistral-7B model while the second question uses the text files in the directory ../data as local documents. The test input file economics.txt has been edited to add the name of a fictional economist. I added this data to show that the second question is answered from the local document store.

 1 $ python 
 2 > Entering new RetrievalQA chain...
 4 > Finished chain.
 6 11 + 2 = 13
 7 Ask a question: Who says that economics is bullshit?
10 > Entering new RetrievalQA chain...
12 > Finished chain.
14 Pauli Blendergast, an economist who teaches at the University of Krampton Ohio, is k\
15 nown for saying that economics is bullshit.

Wrap Up for Running Local LLMs Using Ollama

As I write this chapter in December 2023 most of my personal LLM experiments involve running models locally on my Mac mini (or sometimes in Google Colab) even though models available through OpenAI, Anthropic, etc. APIs are more capable. I find that the Ollama project is currently the easiest and most convenient way to run local models as REST services or embedded in Python scripts as in the two examples here.

Using Large Language Models to Write Recipes

If you ask the ChatGPT web app to write a recipe using a user supplied ingredient list and a description it does a fairly good job at generating recipes. For the example in this chapter I am taking a different approach:

  • Use the recipe and ingredient files from my web app to create context text, given a user prompt for a recipe.
  • Treat this as a text prediction problem.
  • Format the response for display.

This approach has an advantage (for me!) that the generated recipes will be more similar to the recipes I enjoy cooking since the context data will be derived from my own recipe files.

Preparing Recipe Data

I am using the JSON Recipe files from my web app The following Python script converts my JSON data to text descriptions, one per file:

 1 import json
 3 def process_json(fpath):
 4     with open(fpath, 'r') as f:
 5         data = json.load(f)
 7     for d in data:
 8         with open(f"text_data/{d['name']}.txt", 'w') as f:
 9             f.write("Recipe name: " + d['name'] + '\n\n')
10             f.write("Number of servings: " +
11                     str(d['num_served']) + '\n\n')
12             ingrediants = ["  " + str(ii['amount']) +
13                            ' ' + ii['units'] + ' ' +
14                            ii['description']
15                            for ii in d['ingredients']]
16             f.write("Ingredients:\n" +
17                     "\n".join(ingrediants) + '\n\n')
18             f.write("Directions: " +
19                     ' '.join(d['directions']) + '\n')
21 if __name__ == "__main__":
22     process_json('data/vegetarian.json')
23     process_json('data/desert.json')
24     process_json('data/fish.json')
25     process_json('data/meat.json')
26     process_json('data/misc.json')

Here is a listing of one of the shorter generated recipe files (i.e., text recipe data converted from raw JSON recipe data from my web site):

 1 Recipe name: Black Bean Dip
 3 Number of servings: 6
 5 Ingredients:
 6   2 cup Refried Black Beans
 7   1/4 cup Sour cream
 8   1 teaspoon Ground cumin
 9   1/2 cup Salsa
11 Directions: Use either a food processor or a mixing bowl and hand mixer to make this\
12  appetizer. Blend the black beans and cumin for at least one minute until the mixtur
13 e is fairly smooth. Stir in salsa and sour cream and lightly mix. Serve immediately 
14 or store in the refrigerator.

I have generated 41 individual recipe files that will be used for the remainder of this chapter.

In the next section when we use a LLM to generate a recipe, the directions are numbered steps and the formatting is different than my original recipe document files.

A Prediction Model Using the OpenAI text-embedding-3-large Model

Here we use the DirectoryLoader class that we have used in previous examples to load and then create an embedding index.

Here is the listing for the script

 1 from langchain.text_splitter import CharacterTextSplitter
 2 from langchain.vectorstores import Chroma
 3 from langchain.embeddings import OpenAIEmbeddings
 4 from langchain_community.document_loaders import DirectoryLoader
 5 from langchain import OpenAI, VectorDBQA
 7 embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
 9 loader = DirectoryLoader('./text_data/', glob="**/*.txt")
10 documents = loader.load()
11 text_splitter = CharacterTextSplitter(chunk_size=2500,
12                                       chunk_overlap=0)
14 texts = text_splitter.split_documents(documents)
16 docsearch = Chroma.from_documents(texts, embeddings)
18 qa = VectorDBQA.from_chain_type(llm=OpenAI(temperature=0,
19                                 model_name=
20                                 "text-davinci-002"),
21                                 chain_type="stuff",
22                                 vectorstore=docsearch)
24 def query(q):
25     print(f"\n\nRecipe creation request: {q}\n")
26     print(f"{}\n\n")
28 query("Create a new recipe using Broccoli and Chicken")
29 query("Create a recipe using Beans, Rice, and Chicken")

This generated two recipes. Here is the output for the first request:

 1 $ python
 2 Running Chroma using direct local API.
 3 Using DuckDB in-memory for database. Data will be transient.
 5 Recipe creation request: Create a new recipe using both Broccoli and Chicken
 7 Recipe name: Broccoli and Chicken Teriyaki
 8 Number of servings: 4
10 Ingredients:
11 1 cup broccoli
12 1 pound chicken meat
13 2 tablespoons soy sauce
14 1 tablespoon honey
15 1 tablespoon vegetable oil
16 1 clove garlic, minced
17 1 teaspoon rice vinegar
19 Directions:
21 1. In a large bowl, whisk together soy sauce, honey, vegetable oil, garlic, and rice\
22  vinegar.
23 2. Cut the broccoli into small florets. Add the broccoli and chicken to the bowl and\
24  toss to coat.
25 3. Preheat a grill or grill pan over medium-high heat.
26 4. Grill the chicken and broccoli for 5-7 minutes per side, or until the chicken is \
27 cooked through and the broccoli is slightly charred.
28 5. Serve immediately.

If you examine the text recipe files I indexed you see that the prediction model merged information from multiple training data recipes while creating new original text for directions that is loosely based on the directions that I wrote and information encoded in the OpenAI text-davinci-002 model.

Here is the output for the second request:

 1 Recipe creation request: Create a recipe using Beans, Rice, and Chicken
 3 Recipe name: Beans and Rice with Chicken
 4 Number of servings: 4
 5 Ingredients:
 6 1 cup white rice
 7 1 cup black beans
 8 1 chicken breast, cooked and shredded
 9 1/2 teaspoon cumin
10 1/2 teaspoon chili powder
11 1/4 teaspoon salt
12 1/4 teaspoon black pepper
13 1 tablespoon olive oil
14 1/2 cup salsa
15 1/4 cup cilantro, chopped
17 Directions:
18 1. Cook rice according to package instructions.
19 2. In a medium bowl, combine black beans, chicken, cumin, chili powder, salt, and bl\
20 ack pepper.
21 3. Heat olive oil in a large skillet over medium heat. Add the bean mixture and cook\
22  until heated through, about 5 minutes.
23 4. Stir in salsa and cilantro. Serve over cooked rice.

Cooking Recipe Generation Wrap Up

Cooking is one of my favorite activities (in addition to hiking, kayaking, and playing a variety of musical instruments). I originally wrote the web app to scratch a personal itch: due to a medical issue I had to closely monitor and regulate my vitamin K intake. I used the US Government’s USDA Nutrition Database to estimate the amounts of vitamins and nutrients in some recipes that I use.

When I wanted to experiment with generative models, backed by my personal recipe data, to create recipes, having available recipe data from my previous project as well as tools like OpenAI APIs and LangChain made this experiment simple to set up and run. It is a common theme in this book that it is now relatively easy to create personal projects based on our data and our interests.

LangChain Agents

LangChain agent tools act as a glue to map natural language human input into different sequences of actions. We are effectively using the real world knowledge in the text used to train LLMs to act as a reasoning agent.

The LangChain Agents Documentation provides everything you need to get started. Here we will dive a bit deeper into using local Python scripts in agents and look at an interesting example using SPARQL queries and the public DBPedia Knowledge Base. We will concentrate on just a few topics:

  • Understanding what LangChain tools are and using pre-built tools.
  • Get an overview of React reasoning. You should bookmark the original paper ReAct: Synergizing Reasoning and Acting in Language Models for reference. This paper inspired design and implementation of the agent tool code in LangChain.
  • Writing custom functions for OpenAI: how to write a custom tool. We will write a tool that uses SPARQL queries to the DBPedia public Knowledge Graph.

Overview of LangChain Tools

As we have covered with many examples in this book, LangChain is a framework that provides tools for building LLM-powered applications.

Here we look at using built in LangChain agent tools, understand reactive agents, and end the chapter with a custom tool agent application.

LangChain tools are interfaces that an agent can use to interact with the world. They can be generic utilities (e.g. search), other chains, or even other agents. The interface API of a tool has a single text input and a single text output, and includes a name and description that communicate to the model what the tool does and when to use it.

Some tools can be used as-is and some tools (e.g. chains, agents) may require a base LLM to use to initialize them. In that case, you can pass in an LLM as well:

1 from langchain.agents import load_tools
2 tool_names = [...]
3 llm = ...
4 tools = load_tools(tool_names, llm=llm).

To implement your own tool, you can subclass the Tool class and implement the _call method. The _call method is called with the input text and should return the output text. The Tool superclass implements the call method, which takes care of calling the right CallbackManager methods before and after calling your _call method. When an error occurs, the _call method should when possible return a string representing an error, rather than throwing an error. This allows the error to be passed to the LLM and the LLM can decide how to handle it.

LangChain also provides pre-built tools that provide a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

In summary, LangChain tools are interfaces that agents can use to interact with the world. They can be generic utilities or other chains or agents. Here is a list of some of the available LangChain agent tools:

  • AWSLambda - A wrapper around the AWS Lambda API, invoked via the Amazon Web Services Node.js SDK. Useful for invoking server less functions with any behavior which you need to provide to an Agent.
  • BingSerpAPI - A wrapper around the Bing Search API.
  • BraveSearch - A wrapper around the Brave Search API.
  • Calculator - Useful for getting the result of a math expression.
  • GoogleCustomSearch - A wrapper around the Google Custom Search API.
  • IFTTTWebHook - A wrapper around the IFTTT Web-hook API.
  • OpenAI - A wrapper around the OpenAI API.
  • OpenWeatherMap - A wrapper around the OpenWeatherMap API.
  • Random - Useful for generating random numbers, strings, and other values.
  • Wikipedia - A wrapper around the Wikipedia API.
  • WolframAlpha - A wrapper around the WolframAlpha API.

Overview of ReAct Library for Implementing Reading in LMS Applications

Most of the material in this section is referenced from the paper ReAct: Synergizing Reasoning and Acting in Language Models. The ReAct framework attempts to solve the basic problem of getting LLMs to accurately perform tasks. We want an LLM to understand us and actually do what we want. I take a different but similar approach for an example in my book Safe For Humans AI A “humans-first” approach to designing and building AI systems (link for reading free online) where I use two different LLMs, one to generate answers questions and another LLM to judge how well the first model did. That example is fairly ad-hoc because I was experimenting with an idea. Here we do much the same thing using a pre-built framework.

ReAct is an extension of the idea that LLMs perform better when we ask not only for an answer but also for the reasoning steps to generate an answer. The authors of the ReAct paper refer to these reasoning steps as “reasoning traces.”

Another approach to using LLMs in applications is to ask directions for actions from an LLM, take those actions, and report the results of the actions back to the LLM. This action loop can be repeated.

The ReAct paper combines reasoning traces and action loops. To paraphrase the paper:

Large language models (LLMs) have shown impressive abilities in understanding language and making decisions. However, their capabilities for reasoning and taking action has been new work with some promising results. Here we look at using LLMs to generate both reasoning traces and task-specific actions together. This allows for better synergy between the two: reasoning traces help the model create and update action plans, while actions let it gather more information from external sources. For question answering and fact verification tasks, ReAct avoids errors by using a simple Wikipedia API and generates human-like solutions. On interactive decision making tasks, ReAct has higher success rates compared to other methods, even with limited examples.

A ReAct prompt consists of example solutions to tasks, including reasoning traces, actions, and observations of the environment. ReAct prompting is easy to design and achieves excellent performance on various tasks, from answering questions to online shopping.

The ReAct paper serves as the basis for the design and implementation of support for LangChain agent tools. We look at an example application using a custom tool in the next section.

LangChain Agent Tool Example Using DBPedia SPARQL Queries

Before we look at the the LangChain agent custom tool code, let’s look at some utility code from my Colab notebook Question Answering Example using DBPedia and SPARQL (link to shared Colab notebook). I extracted just the code we need into the file (edited to fit page width):

 1 # Copyright 2021-2023 Mark Watson
 3 import spacy
 5 nlp_model = spacy.load("en_core_web_sm")
 7 from SPARQLWrapper import SPARQLWrapper, JSON
 9 sparql = SPARQLWrapper("")
11 def query(query):
12     sparql.setQuery(query)
13     sparql.setReturnFormat(JSON)
14     return sparql.query().convert()["results"]["bindings"]
16 def entities_in_text(s):
17     doc = nlp_model(s)
18     ret = {}
19     for [ename, etype] in [[entity.text, entity.label_] for entity in doc.ents]:
20         if etype in ret:
21             ret[etype] = ret[etype] + [ename]
22         else:
23             ret[etype] = [ename]
24     return ret
26 # NOTE: !! note "{{" .. "}}" double curly brackets: this is to escape for Python Str\
27 ing format method:
29 sparql_query_template = """
30 select distinct ?s ?comment where {{
31   ?s
32     <>
33     '{name}'@en .
34   ?s
35     <>
36     ?comment  .
37    FILTER  (lang(?comment) = 'en') .
38    ?s
39      <>
40      {dbpedia_type} .
41 }} limit 15
42 """
44 def dbpedia_get_entities_by_name(name, dbpedia_type):
45     print(f"{name=} {dbpedia_type=}")
46     s_query = \
47       sparql_query_template.format(
48         name=name,
49         dbpedia_type=dbpedia_type
50       )
51     print(s_query)
52     results = query(s_query)
53     return results
55 entity_type_to_type_uri = {
56     "PERSON": "<>",
57     "GPE": "<>",
58     "ORG": "<>",
59 }
61 def get_context_text(query_text):
62   entities = entities_in_text(query_text)
64   def helper(entity_type):
65     ret = ""
66     if entity_type in entities:
67       for hname in entities[entity_type]:
68         results = dbpedia_get_entities_by_name(
69                    hname,
70                    entity_type_to_type_uri[entity_type]
71                   )
72                 for result in results:
73                     ret += ret + \
74                     result["comment"]["value"] + \
75                     " . "
76         return ret
78     context_text = helper("PERSON") + \
79                    helper("ORG") + \
80                    helper("GPE")
81     #print("\ncontext text:\n", context_text, "\n")
82     return context_text

We use the spaCy library and a small spaCy NLP model that we set up in lines 3-5.

I have written two books dedicated to SPARQL queries as well as providing SPARQL overviews and examples in my Haskell, Common Lisp, and Hy Language books and I am not going to repeat that discusion here. You can read the semantic web and SPARQL material free online using [](this link).

Lines 7-14 contain Python code for querying DBPedia.

Lines 16-25 use the spaCy library to identify both the entities in a user’s query as well as the entity type.

We define a SPARQL query template in lines 30-43 that uses Python F string variables name and dbpedia_type.

The function dbpedia_get_entities_by_name defined in lines 45-54 replaces variables with values in the SPARQL query template and makes a SPARQL query to DBPedia.

The function get_context_text which is the function in this file we will directly call later is defined in lines 65-83. We get entities and entity types in line 63. We define an interbal helper function in lines 65-77 that we will call once for each of three DBPedia entity types that we use in this example (people, organizations. and organizations).

Here we use spaCy so install the library and a small NLP model:

1 pip install import spacy
2 python -m spacy download en_core_web_sm

The agent custom tool example is short so I list the source file first and then we will dive into the code (edited to fit page width):

 1 from QA import get_context_text
 3 def get_context_data(query_text):
 4     """
 5        Method to get context text for entities from
 6        DBPedia using SPARQL query
 7     """
 9     query_text_data = get_context_text(query_text)
10     return {"context_text": query_text_data}
12 ## Custom function example using DBPedia
14 from typing import Type
15 from pydantic import BaseModel, Field
16 from import BaseTool
18 class GetContextTextFromDbPediaInput(BaseModel):
19     """Inputs for get_context_data"""
21     query_text: str = \
22       Field(
23         description="query_text user supplied query text"
24       )
26 class GetContextTextFromDbPediaTool(BaseTool):
27     name = "get_context_data"
28     description = 
29       """
30         Useful when you want to make a query and get
31         context text from DBPedia. You should enter
32         and text containing entity names.
33       """
34     args_schema: Type[BaseModel] = \
35           GetContextTextFromDbPediaInput
37     def _run(self, query_text: str):
38         text = get_context_data(query_text)
39         return text
41     def _arun(self, query_text: str):
42         raise NotImplementedError
43                (
44                 "get_context_data does not support async"
45                )
47 ## Create agent
49 from langchain.agents import AgentType
50 from langchain.chat_models import ChatOpenAI
51 from langchain.agents import initialize_agent
53 llm = ChatOpenAI(model="gpt-3.5-turbo-0613",
54                  temperature=0)
56 tools = [GetContextTextFromDbPediaTool()]
58 agent = initialize_agent(tools, llm,
59                          agent=AgentType.OPENAI_FUNCTIONS,
60                          verbose=True)
62 ## Run agent
65     """
66       What country is Berlin in and what other
67       information about the city do you have?
68     """
69 )

The class GetContextTextFromDbPediaInput defined in lines 18-24 defines a tool input variable with an English language description for the variable that the LLM can use. The class GetContextTextFromDbPediaTool defined in lines 26-45 defines the tool name, a description for the use of an LLM, and the definition of the required methon _run. Methon _run uses the utility finction get_context_data defined in the source file

We define a GPT-3.5 model in lines 53-54. Out example only uses one tool (our custom tool). We define the tools list in line 56 and setup the agent in lines 58-60.

The example output is (edited to fit page width):

 1 > Entering new AgentExecutor chain...
 3 Invoking: `get_context_data` with `{'query_text':
 4                                     'Berlin'}`
 6 name='Berlin'
 7 dbpedia_type='<>'
 9  select distinct ?s ?comment where {
10     ?s
11       <> 
12       'Berlin'@en .
13     ?s
14       <>
15       ?comment  .
16     FILTER  (lang(?comment) = 'en') .
17     ?s
18       <>
19       <> .
20  } limit 15
22 {'context_text': "Berlin (/bɜːrˈlɪn/ bur-LIN, German: [bɛʁˈliːn]) is the capital and\
23  largest city of Germany by both area and population. Its 3.6 million inhabitants ma
24 ke it the European Union's most populous city, according to population within city l
25 imits. One of Germany's sixteen constituent states, Berlin is surrounded by the Stat
26 e of Brandenburg and contiguous with Potsdam, Brandenburg's capital. Berlin's urban 
27 area, which has a population of around 4.5 million, is the second most populous urba
28 n area in Germany after the Ruhr. The Berlin-Brandenburg capital region has around 6
29 .2 million inhabitants and is Germany's third-largest metropolitan region after the 
30 Rhine-Ruhr and Rhine-Main regions. . "}
32 Berlin is the capital and largest city of Germany. It is located in the northeastern\
33  part of the country. Berlin has a population of approximately 3.6 million people, m
34 aking it the most populous city in the European Union. It is surrounded by the State
35  of Brandenburg and is contiguous with Potsdam, the capital of Brandenburg. The urba
36 n area of Berlin has a population of around 4.5 million, making it the second most p
37 opulous urban area in Germany after the Ruhr. The Berlin-Brandenburg capital region 
38 has a population of approximately 6.2 million, making it Germany's third-largest met
39 ropolitan region after the Rhine-Ruhr and Rhine-Main regions.
41 > Finished chain.

LangChain Agent Tools Wrap Up

Writing custom agent tools is a great way to revisit the implementation of existing applications and improve them with the real world knowledge and reasoning abilities of LLMs. Both for practice in effectively using LLMs in your applications and also to extend your personal libraries and tools, I suggest that you look over you existing projects with an eye for either improving them using LLMs or refactoring them into reusable agent tools for your future projects.

Multi-prompt Search using LLMs, the Duckduckgo Search API, and Local Ollama Models

The short example we develop is inspired by commercial LLM apps like Perplexity. I subscribe to the Perplexity Pro plan and find it useful so I wanted to implement my own simple minimalist Python library that provides the same type of multiple LLM pass on a query and search results, finally producing a relevant summary.

We will start by looking at example uses of this library and then, dear reader, you can decide if you want to hack on this example code and make it your own.

The example code uses three simple prompt templates used to filter out non-useful search results, to summarize the text from fetch search result web links, and to write a final summary:

1 prompt1 = "return concisely either 'Y' or 'N' if this query | %s | is matched well b\
2 y the following text: %s"
3 prompt2 = "Using the query | %s | summarize the following text including only materi\
4 al relevant to the query:\n%s"
5 prompt3 = "Using the query | %s | summarize in multiple paragraphs the following tex\
6 t including only material relevant to the query:\n%s"

Example 1: “Write a business plan for a new startup using LLMs and expertise in medical billing.“

The example code has a ton of debug printout so here we only look at the final output summary:

 1  Title: Business Plan for Mercury Concierge Medical Practice Startup
 3 The business plan for the new startup, Mercury Concierge Medical Practice, is design\
 4 ed to transition an existing medical practice to a concierge model, leveraging the e
 5 xpertise of the Mercury Advisory Group. The primary objective is to provide personal
 6 ized and comprehensive healthcare services while ensuring financial sustainability.
 8 To achieve this, the startup will develop a mission statement, explore strategic opt\
 9 ions, and create a budget. Key areas of focus include market entry strategy, operati
10 ons plan, HIM technology requirements, revenue management strategy, organizational d
11 esign, goal setting, and business unit strategies.
13 Mercury Concierge Medical Practice offers an initial flat-fee consultation service p\
14 riced at $3500. This service includes a full day of private coaching, Q&A sessions, 
15 travel costs, a workbook, and a signed copy of the Handbook of Concierge Medical Pra
16 ctice Design.
18 Beyond the initial consultation, Mercury Advisory Group provides additional services\
19  such as conducting SWOT (Strengths, Weaknesses, Opportunities, Threats), PEST (Poli
20 tical, Economic, Social, Technological) analyses, and Porter's Five Forces assessmen
21 ts to develop a business sustainability strategy. They also assist with the developm
22 ent of a lean/continuous improvement strategy, staffing plan, promotional strategy, 
23 and operations plan.
25 The startup caters to various situations, including physicians who wish to jettison \
26 non-converting patients, those starting a new practice from scratch, or those transi
27 tioning while bound by contracts with health plans. The service is currently availab
28 le in the continental USA, with additional travel surcharges for Alaska, Guam, Puert
29 o Rico, Hawaii, and USVI.
31 In summary, Mercury Concierge Medical Practice aims to provide high-quality, persona\
32 lized healthcare services through a concierge model. By leveraging the expertise of 
33 the Mercury Advisory Group, the startup will create a sustainable business strategy,
34  develop an effective operations plan, and offer valuable consulting services to med
35 ical professionals.

Example 2: “Common Lisp and Deep Learning consultant”

Here we only look at the final output summary:

 1 Mark Watson is a professional deep learning and artificial intelligence consultant w\
 2 ho specializes in various programming languages such as Common Lisp, Clojure, Python
 3 , Java, Haskell, and Ruby. He has authored over 20 books on AI, deep learning, and o
 4 ther related topics, with his clients including notable companies like Google, Capit
 5 al One, Disney, and Olive AI.
 7 For those aiming to start with AI, Mark recommends several free courses: Generative \
 8 AI by Google, AI for Beginners by Microsoft, and Artificial Intelligence (6.034) by 
 9 MIT. These courses cover topics such as generative AI, neural networks, deep learnin
10 g, computer vision, natural language processing, and more. Each course is text-based
11  and offers exercises to help solidify understanding.

Example Code for Multi-prompt Search using LLMs, the Duckduckgo Search API, and Local Ollama Models

The example code script uses LangChain’s Ollama interface, the Duckduckgo library for accessing Duckduckgo’s internal quick results data (for low bandwidth non-commercial use), and the Trafilatura library for fetching p[lain text from a web URI:

 1 from ddg import Duckduckgo
 2 from langchain_community.llms.ollama import Ollama
 4 # pip install llama-index html2text trafilatura
 5 import trafilatura
 6 from pprint import pprint
 8 ddg_api = Duckduckgo()
10 llm = Ollama(
11     model="mistral:v0.3",
12     verbose=False,
13 )
15 prompt1 = "return concisely either 'Y' or 'N' if this query | %s | is matched well b\
16 y the following text: %s"
17 prompt2 = "Using the query | %s | summarize the following text including only materi\
18 al relevant to the query:\n%s"
19 prompt3 = "Using the query | %s | summarize in multiple paragraphs the following tex\
20 t including only material relevant to the query:\n%s"
22 def llm_search(query):
23     results =
24     data = results['data']
25     good_results = []
26     good_summaries = []
27     for d in data:
28         description = d['description']
29         p = prompt1 % (query, description)
30         s = llm.invoke(p)
31         print(f"Prompt: {p}\nResponse: {s}\n\n")
32         if s.strip()[0:1] == 'Y':
33             good_results.append(d)
34             uri = d['url']
35             downloaded = trafilatura.fetch_url(uri)
36             text = trafilatura.extract(downloaded)
37             p2 = prompt2 % (query, text)
38             s2 = llm.invoke(p2)
39             good_summaries.append(s2)
40     p3 = prompt3 % (query, "\n\n".join(good_summaries))
41     final_summary = llm.invoke(p3)
43     return (good_results, good_summaries, final_summary)

Here is an example use of this example code:

 1 (results, summaries, final_summary) = llm_search("Write a business plan for a new st\
 2 artup using LLMs and expertise in medical billing.")
 4 print(f"\n\n****** Good Results ******\n\n")
 5 print(results)
 7 print(f"\n\n****** Good Summaries ******\n\n")
 8 print(summaries)
10 print(f"\n\n****** Final Summary ******\n\n")
11 print(final_summary)

More Useful Libraries for Working with Unstructured Text Data

Here we look at examples using two libraries that I find useful for my work: EmbedChain and Kor.

EmbedChain Wrapper for LangChain Simplifies Application Development

Taranjeet Singh developed a very nice wrapper library EmbedChain that simplifies writing “query your own data” applications by choosing good defaults for the LangChain library.

I will show one simple example that I run on my laptop to search the contents of all of the books I have written as well as a large number of research papers. You can find my example in the GitHub repository for this book in the directory langchain-book-examples/embedchain_test. As usual, you will need an OpenAI API account and set the environment variable OPENAI_API_KEY to the value of your key.

I have copied PDF files for all of this content to the directory ~/data on my laptop. It takes a short while to build a local vector embedding data store so I use two Python scripts. The first script that is shown here:

 1 # reference:
 3 from embedchain import App
 4 import os
 6 test_chat = App()
 8 my_books_dir = "/Users/mark/data/"
10 for filename in os.listdir(my_books_dir):
11     if filename.endswith('.pdf'):
12         print("processing filename:", filename)
13         test_chat.add("pdf_file",
14                       os.path.join(my_books_dir,
15                       filename))

Here is a demo Python script that makes three queries:

 1 from embedchain import App
 3 test_chat = App()
 5 def test(q):
 6     print(q)
 7     print(test_chat.query(q), "\n")
 9 test("How can I iterate over a list in Haskell?")
10 test("How can I edit my Common Lisp files?")
11 test("How can I scrape a website using Common Lisp?")

The output looks like:

 1 $ python
 2 How can I iterate over a list in Haskell?
 3 To iterate over a list in Haskell, you can use recursion or higher-order functions l\
 4 ike `map` or `foldl`. 
 6 How can I edit my Common Lisp files?
 7 To edit Common Lisp files, you can use Emacs with the Lisp editing mode. By setting \
 8 the default auto-mode-alist in Emacs, whenever you open a file with the extensions "
 9 .lisp", ".lsp", or ".cl", Emacs will automatically use the Lisp editing mode. You ca
10 n search for an "Emacs tutorial" online to learn how to use the basic Emacs editing 
11 commands. 
13 How can I scrape a website using Common Lisp?
14 One way to scrape a website using Common Lisp is to use the Drakma library. Paul Nat\
15 han has written a library using Drakma called web-trotter.lisp, which is available u
16 nder the AGPL license at This library can 
17 be a good starting point for your scraping project. Additionally, you can use the wg
18 et utility to make local copies of a website. The command "wget -m -w 2 http:/knowle
19" can be used to mirror a site with a two-second delay between HTTP req
20 uests for resources. The option "-m" indicates to recursively follow all links on th
21 e website, and the option "-w 2" adds a two-second delay between requests. Another o
22 ption, "wget -mk -w 2 http:/", converts URI references to local f
23 ile references on your local mirror. Concatenating all web pages into one file can a
24 lso be a useful trick. 

Kor Library

The Kor library was written by Eugene Yurtsev. Kor is useful for using LLMs to extract structured data from unstructured text. Kor works by generating appropriate prompt text to explain to GPT-3.5 what information to extract and adding in the text to be processed.

The GitHub repository for Kor is under active development so please check the project for updates. Here is the documentation.

For the following example, I modified an example in the Kor documentation for extracting dates in text.

 1 " From documentation:"
 3 from kor.extraction import create_extraction_chain
 4 from kor.nodes import Object, Text, Number
 5 from langchain.chat_models import ChatOpenAI
 6 from pprint import pprint
 7 import warnings ; warnings.filterwarnings('ignore')
 9 llm = ChatOpenAI(
10     model_name="gpt-3.5-turbo",
11     temperature=0,
12     max_tokens=2000,
13     frequency_penalty=0,
14     presence_penalty=0,
15     top_p=1.0,
16 )
18 schema = Object(
19     id="date",
20     description=(
21         "Any dates found in the text. Should be output in the format:"
22         " January 12, 2023"
23     ),
24     attributes = [
25         Text(id = "month",
26              description = "The month of the date",
27              examples=[("Someone met me on December 21, 1995",
28                         "Let's meet up on January 12, 2023 and discuss our yearly bu\
29 dget")])
30     ],
31 )
33 chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')
36 pred = chain.predict_and_parse(text="I will go to California May 1, 2024")['data']
37 print("* month mentioned in text=", pred)

Sample output:

1 $ python
2 * month mentioned in text= {'date': {'month': 'May'}}

Kor is a library focused on extracting data from text. You can get the same effects by writing for own prompts manually for GPT style LLMs but using Tor can save development time.

Book Wrap Up

This book has been fun to write but it has also somewhat frustrating.

It was fun because I have never been as excited by new technology as I have by LLMs and utility software like LangChain and LlamaIndex for building personalized applications.

This book was frustrating in the sense that it is now so very easy to build applications that just a few years would have been impossible to write. Usually when I write books I have two criteria: I only write about things that I am personally interested in and use, and I also hope to figure out non-obvious edge cases and make easier for my readers to use new tech. Here my frustration is writing about something that it is increasingly simple to do so I feel like my value is diminished.

All that said I hope, dear reader, that you found this book to be worth your time reading.

What am I going to do next? Although I am not fond of programming in JavaScript (although I find TypeScript to be somewhat better), I want to explore the possibilities of writing an open source Persistent Storage Web App Personal Knowledge Base Management System. I might get pushback on this but I would probably make it Apple Safari specific so I can use Apple’s CloudKit JS to make its use seamless across macOS, iPadOS, and iOS. If I get the right kind of feedback on social media I might write a book around this project.

Thank you for reading my book!

Best regards, Mark Watson