Table of Contents
-
Preface
- Newer Commercial LangChain Integrations That Are Not Covered In This Book
- Deprecated Book Examples
- LLM Hallucinations and RAG, Summarization, Structured Data Conversion, and Fact/Relationship Extraction LLM Applications
- Comparing LangChain and LlamaIndex
- About the Author
- Book Cover
- Acknowledgements
- Requirements for Running and Modifying Book Examples
- Issues and Workarounds for Using the Material in this Book
- Large Language Model Overview
-
Getting Started With LangChain
- Installing Necessary Packages
- Creating a New LangChain Project
- Basic Usage and Examples
- Creating Embeddings
- Using LangChain Vector Stores to Query Documents: a Simple RAG Application
- Example Using LangChain Integrations: Using Server APIs for Google Search
- OpenAI Model GPT-4o Example
- LangChain Overview Wrap Up
- Overview of LlamaIndex
- Extraction of Facts and Relationships from Text Data
- Using LLMs to Summarize Text
- LLM Techniques for Structured Data Conversion
- Retrieval Augmented Generation (RAG) Applications
- Using Google’s Knowledge Graph APIs With LangChain
- Using DBPedia and WikiData as Knowledge Sources
- Using LLMs To Organize Information in Our Google Drives
- Natural Language SQLite Database Queries With LangChain
- Examples Using Hugging Face Open Source Models
- Running Local LLMs Using Llama.cpp and LangChain
- Running Local LLMs Using Ollama
- Using Large Language Models to Write Recipes
- LangChain Agents
- Multi-prompt Search using LLMs, the Duckduckgo Search API, and Local Ollama Models
- More Useful Libraries for Working with Unstructured Text Data
- Book Wrap Up
Preface
I have been working in the field of artificial intelligence since 1982 and without a doubt Large Language Models (LLMs) like GPT-4 represent the greatest advance in practical AI technology that I have experienced. Infrastructure projects like LangChain and LlamaIndex make it simpler to use LLMs and provide some level of abstraction to facilitate switching between LLMs. This book will use the LangChain and LlamaIndex projects along with the OpenAI GPT-40 APIs, and local models run on your computer using Ollama to solve a series of interesting problems.
If you read my eBooks free online then please consider tipping me https://markwatson.com/#tip.
As I write this new edition in October 2024, I mostly run local LLMs but I still use OpenAI APIs, as well as well as APIs from Anthropic for the Claude 3.5 model and APIs from the French company Mistral. Most of the newest examples are in the chapter on RAG.
Harrison Chase started the LangChain project in October 2022 and as I wrote the first edition of this book in February 2023, the GitHub repository for LangChain 171 contributors and as I write this new edition LangChain has 2100+ contributors on GitHub. Jerry Liu started the GPT Index project (recently renamed to LlamaIndex) at the end of 2022 and the GitHub Repository for LlamaIndex https://github.com/jerryjliu/gpt_index currently has 54 contributors.
The GitHub repository for examples in this book is https://github.com/mark-watson/langchain-book-examples.git. Please note that I usually update the code in the examples repository fairly frequently for library version updates, etc.
While the documentation and examples online for LangChain and LlamaIndex are excellent, I am still motivated to write this book to solve interesting problems that I like to work on involving information retrieval, natural language processing (NLP), dialog agents, and the semantic web/linked data fields. I hope that you, dear reader, will be delighted with these examples and that at least some of them will inspire your future projects.
Newer Commercial LangChain Integrations That Are Not Covered In This Book
LangChain has increasingly integrated proprietary services and developed features dependent on commercial APIs. Some of these services include:
- LangSmith: A commercial platform tied to LangChain that focuses on debugging, logging, and testing. It helps manage and improve prompt artifacts, and integrates with the LangChain Hub.
- LangChain Hub: This hub is designed as a repository where developers can share prompts, chains, agents, and more. It allows collaboration on LLM artifacts, making it easier for teams to build and improve applications. It supports private, organization-specific repositories alongside the public ones. Users can interact with prompts in a playground using their own API keys (for services like OpenAI or Anthropic) and can download or upload new artifacts using an SDK. LangChain is also extending this hub to include support for chains and agents
I limit the examples in this book to the core LangChain library. Most of the examples have been updated to LangChain version 0.3.2.
Deprecated Book Examples
As I add new examples to this book I sometimes remove older examples that no longer seem relevant. The book text can be found in the GitHub repository for the code examples. Check out https://github.com/mark-watson/langchain-book-examples/tree/main/CHAPTERS_and_CODE_no_longer_in-book.
LLM Hallucinations and RAG, Summarization, Structured Data Conversion, and Fact/Relationship Extraction LLM Applications
When we chat with LLMs and rely on the innate information they provide the results often contain so-called “hallucinations” that can occur when a model does not know an answer so it generates plausible sounding text. Many of the examples in this book are Retrieval Augmented Generation (RAG) style applications. RAG applications provide context text with chat prompts or queries so LLMs prioritize using the provided information to answer questions. The latest Google Gemini model has an input context size of one million tokens and even smaller models you can run on your own computer using tools like Ollama often support up to 128K context size.
Beyond RAG, other effective LLM applications effective at minimizing hallucinations include text summarization, structured data conversion and extraction, and fact/relationship extraction. In text summarization, LLMs condense lengthy documents into concise summaries, focusing on core information. Structured data conversion and extraction involve transforming unstructured text into organized formats or pinpointing specific information. Fact/relationship extraction allows LLMs to identify and understand key connections within text, making them less prone to misinterpretation and hallucinations.
Comparing LangChain and LlamaIndex
LangChain and LlamaIndex are two distinct frameworks designed to use the capabilities of large language models (LLMs) by integrating them into various applications. LangChain is a more general-purpose framework that provides extensive flexibility and control, allowing developers to build a wide range of applications. It is particularly noted for its ability to handle complex tasks by offering granular control over components and the ability to optimize performance across different use cases. LangChain’s architecture is designed to be modular, enabling the chaining of different components to address specific requirements, which makes it suitable for creating sophisticated, context-aware query engines and semantic search applications.
On the other hand, LlamaIndex is specifically tailored for indexing and retrieval tasks focusing on enhancing the performance of LLMs in these areas. It provides a streamlined interface for connecting custom data sources to LLMs, making it ideal for developers looking to build powerful search and retrieval systems. LlamaIndex optimizes the indexing and retrieval process, which results in increased speed and accuracy in data search and summarization tasks. This framework is particularly effective in environments where quick and accurate extraction of information from large datasets is crucial.
Both frameworks have their unique strengths and cater to different aspects of LLM application development. LangChain’s versatility makes it suitable for a broader range of applications offering developers the ability to customize and extend their LLM capabilities extensively. In contrast LlamaIndex is the go-to choice for specific use cases centered around efficient data retrieval making it highly effective for applications that require robust search functionalities. Each framework thus serves distinct needs in the ecosystem of LLM-powered applications, with LangChain providing a comprehensive toolkit for diverse applications and LlamaIndex offering specialized tools for optimized search and retrieval.
About the Author
I have written over 20 books, I have over 50 US patents, and I have worked at interesting companies like Google, Capital One, SAIC, Mind AI, and others. You can find links for reading most of my recent books free on my web site https://markwatson.com. If I had to summarize my career the short take would be that I have had a lot of fun and enjoyed my work. I hope that what you learn here will be both enjoyable and help you in your work.
If you would like to support my work please consider purchasing my books on Leanpub and star my git repositories that you find useful on GitHub. You can also interact with me on social media on Mastodon and Twitter. I am also available as a consultant: https://markwatson.com.
Book Cover
I live in Sedona, Arizona. I took the book cover photo in January 2023 from the street that I live on.
Acknowledgements
This picture shows me and my wife Carol who helps me with book production and editing.
I would also like to thank the following readers who reported errors or typos in this book: Armando Flores, Peter Solimine, and David Rupp.
Requirements for Running and Modifying Book Examples
I show full source code and a fair amount of example output for each book example so if you don’t want to get access to some of the following APIs then you can still read along in the book.
To use OpenAI’s GPT-3 and ChatGPT models you will need to sign up for an API key (free tier is OK) at https://openai.com/api/ and set the environment variable OPENAI_API_KEY to your key value.
You will need to get an API key for examples using Google’s Knowledge Graph APIs.
Reference: Google Knowledge Graph APIs.
The example programs using Google’s Knowledge Graph APIs assume that you have the file ~/.google_api_key in your home directory that contains your key from https://console.cloud.google.com/apis.
You will need to install SerpApi for examples integrating web search. Please reference: PyPi project page.
You can sign up for a free non-commercial 100 searches/month account with an email address and phone number at https://serpapi.com/users/welcome.
You will also need Zapier account for the GMail and Google Calendar examples.
After reading though this book, you can review the website LangChainHub which contains prompts, chains and agents that are useful for building LLM applications.
Issues and Workarounds for Using the Material in this Book
The libraries that I use in this book are frequently updated and sometimes the documentation or code links change, invalidating links in this book. I will try to keep everything up to date. Please report broken links to me.
In some cases you will need to use specific versions or libraries for some of the code examples.
Because the Python code listings use colorized text you may find that copying code from this eBook may drop space characters. All of the code listings are in the GitHub repository for this book so you should clone the repository to experiment with the example code.
Large Language Model Overview
Large language models are a subset of artificial intelligence that use deep learning and neural networks to process natural language. Transformers are a type of neural network architecture that can learn context in sequential data using self-attention mechanisms. They were introduced in 2017 by a team at Google Brain and have become popular for LLM research. Some older examples of transformer-based LLMs are BERT, GPT-3, T5 and Megatron-LM.
The main points we will discuss in this book are:
- LLMs are deep learning algorithms that can understand and generate natural language based on massive datasets.
- LLMs use techniques such as self-attention, masking, and fine-tuning to learn complex patterns and relationships in language. LLMs can understand and generate natural language because they use transformer models, which are a type of neural network that can process sequential data such as text using attention mechanisms. Attention mechanisms allow the model to focus on relevant parts of the input and output sequences while ignoring irrelevant ones.
- LLMs can perform various natural language processing (NLP) and natural language generation (NLG) tasks, such as summarization, translation, prediction, classification, and question answering.
- Even though LLMs were initially developed for NLP applications, LLMs have also shown potential in other domains such as computer vision and computational biology by leveraging their generalizable knowledge and transfer learning abilities.
BERT models are one of the first types of transformer models that were widely used. BERT was developed by Google AI Language in 2018. BERT models are a family of masked language models that use transformer architecture to learn bidirectional representations of natural language. BERT models can understand the meaning of ambiguous words by using the surrounding text as context. The “magic trick” here is that training data comes almost free because in masking models, you programatically chose random words, replace them with a missing word token, and the model is trained to predict the missing words. This process is repeated with massive amounts of training data from the web, books, etc.
Here are some “papers with code” links for BERT (links are for code, paper links in the code repositories):
Technological Change is Increasing at an Exponential Rate
When I wrote the first edition of this book it was difficult to run LLMs locally on my own computers. Now in May 2024, I can use Ollama to run very useful models on the old M1 8G MacBook I am writing this on:
The llama3 model released recently by Meta is arguably more powerful than any models I could run on my M2 32G system in late February. The good news is that techniques you learn now for incorporating LLMs into your own applications and you increased knowledge of and ease of writing effective prompts for LLMs will be useful even as models become more powerful.
What LLMs Are and What They Are Not
Large Language Models are text predictors. Given a prompt, or context text and a prompt or question, an LLM predicts a highly likely text completion. As human beings we have a tendency to ascribe deep intelligence and world knowledge to LLMs. I try to avoid this misconception. A year ago I asked ChatGPT to write a poem about my pet parrot escaping out the window in the style of poet Elizabeth Bishop. When an friend asked that ChatGPT rewrite the poem in the style of more modern poet Billy Collins we both were surprised how closely it mimicked the styles of both poets. Surely this must be some deep form of intelligence, right? No, this phenomenon is text prediction on a model trained on most books and most web content.
LLMs compress knowledge of language and some knowledge of the world into a compact representation. Clever software developers can certainly build useful and interesting systems using LLMs and this is the main topic of this book. My hope is that by experimenting with writing prompts, learning the differences between available models, and practicing applying LLMs to transform textual data that you will develop your own good ideas and build your own applications that you and other people find useful.
Big Tech Businesses vs. Small Startups Using Large Language Models
Both Microsoft and Google play both sides of this business game: they want to sell cloud LLM services to developers and small startup companies and they would also like to achieve lock-in for their consumer services like Office 365, Google Docs and Sheets, etc.
Microsoft has been integrating AI technology into workplace emails, slideshows, and spreadsheets as part of its ongoing partnership with OpenAI, the company behind ChatGPT. Microsoft’s Azure OpenAI service offers a powerful tool to enable these outcomes when leveraged with their data lake of more than two billion metadata and transactional elements.
Google has opened access to their Gemini Model based AI/chat search service. I have used various Google APIs for years in code I write. I have no favorites in the battle between tech giants, rather I am mostly interested in what they build that I can use in my own projects.
Hugging Face, creates LLMs and also hosts those developed by other companies, is working on open-source rivals to ChatGPT and will use AWS for that as well. Cohere AI, Anthropic, Hugging Face, and Stability AI are some of the startups that are competing with OpenAI and Hugging Face APIs. Hugging Face is a great source of specialized models, that is, standard models that have been fine tuned for specific applications. I love that Hugging Face models can be run via their APIs and also self-hosted on our own servers and sometimes even on our laptops. Hugging Face is a fantastic resource and even though I use their models much less frequently in this book than OpenAI APIs, you should embrace the hosting and open source flexibility of Hugging Face. Starting in late 2023 I also stated heavily using the Ollama platform for downloading and running models on my laptop. There is a chapter in this book on using Ollama. In this book I most freequently use OpenAI APIs because they are so widely used.
Dear reader, I didn’t write this book for developers working at established AI companies (although I hope such people find the material here useful). I wrote this book for small developers who want to scratch their own itch by writing tools that save them time. I also wrote this book hoping that it would help developers build capabilities into the programs they design and write that rival what the big tech companies are doing.
Getting Started With LangChain
LangChain is a framework for building applications with large language models (LLMs) through chaining different components together. Some of the applications of LangChain are chatbots, generative question-answering, summarization, data-augmented generation and more. LangChain can save time in building chatbots and other systems by providing a standard interface for chains, agents and memory, as well as integrations with other tools and end-to-end examples. We refer to “chains” as sequences of calls (to an LLMs and a different program utilities, cloud services, etc.) that go beyond just one LLM API call. LangChain provides a standard interface for chains, many integrations with other tools, and end-to-end chains for common applications. Often you will find existing chains already written that meet the requirements for your applications.
For example, one can create a chain that takes user input, formats it using a PromptTemplate, and then passes the formatted response to a Large Language Model (LLM) for processing.
While LLMs are very general in nature which means that while they can perform many tasks effectively, they often can not directly provide specific answers to questions or tasks that require deep domain knowledge or expertise. LangChain provides a standard interface for agents, a library of agents to choose from, and examples of end-to-end agents.
LangChain Memory is the concept of persisting state between calls of a chain or agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory. LangChain provides a large collection of common utils to use in your application. Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
LangChain can be integrated with one or more model providers, data stores, APIs, etc. LangChain can be used for in-depth question-and-answer chat sessions, API interaction, or action-taking. LangChain can be integrated with Zapier’s platform through a natural language API interface (we have an entire chapter dedicated to Zapier integrations).
Installing Necessary Packages
For the purposes of examples in this book, you might want to create a new Anaconda or other Python environment and install:
For the rest of this chapter we will use the subdirectory langchain_getting_started and in the next chapter use llama-index_case_study in the GitHub repository for this book.
Creating a New LangChain Project
Simple LangChain projects are often just a very short Python script file. As you read this book, when any example looks interesting or useful, I suggest copying the requirements.txt and Python source files to a new directory and making your own GitHub private repository to work in. Please make the examples in this book “your code,” that is, freely reuse any code or ideas you find here.
Basic Usage and Examples
While I try to make the material in this book independent, something you can enjoy with no external references, you should also take advantage of the high quality Langchain Quickstart Guide documentation and the individual detailed guides for prompts, chat, document loading, indexes, etc.
As we work through some examples please keep in mind what it is like to use the ChatGPT web application: you enter text and get responses. The way you prompt ChatGPT is obviously important if you want to get useful responses. In code examples we automate and formalize this manual process.
You need to choose a LLM to use. We will usually choose the GPT-4 API from OpenAI because it is general purpose but is more expensive that the older GPT-3.5 APIs. You will need to sign up for an API key and set it as an environment variable:
Both the libraries openai and langchain will look for this environment variable and use it. We will look at a few simple examples in a Python REPL. We will start by just using OpenAI’s text prediction API:
Notice how when we ran the same input text prompt twice that we see different results. Setting the temperature in line 3 to a higher value increases the randomness.
Our next example is in the source file directions_template.py in the directory langchain_getting_started and uses the PromptTemplate class. A prompt template is a reproducible way to generate a prompt. It contains a text string (“the template”), that can take in a set of parameters from the end user and generate a prompt. The prompt template may contain language model instructions, few-shot examples to improve the model’s response, or specific questions for the model to answer.
You could just write Python string manipulation code to create a prompt but using the utility class PromptTemplate is more legible and works with any number of prompt input variables.
The output is:
The next example in the file country_information.py is derived from an example in the LangChain documentation. In this example we use PromptTemplate that contains the pattern we would like the LLM to use when returning a response.
You can use the ChatGPT web interface to experiment with prompts and when you find a pattern that works well then write a Python script like the last example, but changing the data you supply in the PromptTemplate instance.
The output of the last example is:
Creating Embeddings
We will reference the LangChain embeddings documentation. We can use a Python REPL to see what text to vector space embeddings might look like:
Notice that the doc_embeddings is a list where each list element is the embeddings for one input text document. The query_embedding is a single embedding. Please read the above linked embedding documentation.
We will use vector stores to store calculated embeddings for future use. In the next chapter we will see a document database search example using LangChain and Llama-Index.
Using LangChain Vector Stores to Query Documents: a Simple RAG Application
We will reference the LangChain Vector Stores documentation.
We use a utility in file read_text_files.py to convert text files in a directory to a list of strings:
The example script is doc_search.py:
The output is:
Example Using LangChain Integrations: Using Server APIs for Google Search
NOTE: This example is deprecated.
The example shown here is in the directory from_langchain_docs in the source file search_simple.py. The relevant LangChain Integrations documentation page is https://python.langchain.com/docs/integrations/tools/google_serper.
You will need a Server API key form https://serper.dev. Currently you can get a free key for 2500 API calls. After that the paid tier currently starts at $50 for 50K API calls and these credits must be used within a 6 month period.
OpenAI Model GPT-4o Example
Here we use the new OpenAI model GPT-4o released May 13, 2024. The example file from_langchain_docs/gpt_4o_test.py uses the latest (May 2024) LangChain APIs:
The output is:
This new (as of May 2024) model is half the cost and much lower API call latency than the previous GPT-4 API.
LangChain Overview Wrap Up
We will continue using LangChain for the rest of this book as well as the LlamaIndex library that we introduce in the next chapter.
I cover just the subset of LangChain that I use in my own projects in this book. I urge you to read the LangChain documentation and to explore public LangChain chains that users have written on Langchain-hub.
Overview of LlamaIndex
The popular LlamaIndex project used to be called GPT-Index but has been generalized to work with many LLM models like GPT-4, Hugging Face, Anthropic, local models run using Ollama, and many other models.
LlamaIndex is a project that provides a central interface to connect your language models with external data. It was created by Jerry Liu and his team in the fall of 2022. It consists of a set of data structures designed to make it easier to use large external knowledge bases with language models. Some of its uses are:
- Querying structured data such as tables or databases using natural language
- Retrieving relevant facts or information from large text corpora
- Enhancing language models with domain-specific knowledge
LlamaIndex supports a variety of document types, including:
- Text documents are the most common type of document. They can be stored in a variety of formats, such as .txt, .doc, and .pdf.
- XML documents are a type of text document that is used to store data in a structured format.
- JSON documents are a type of text document that is used to store data in a lightweight format.
- HTML documents are a type of text document that is used to create web pages.
- PDF documents are a type of text document that is used to store documents in a fixed format.
LlamaIndex can also index data that is stored in a variety of databases, including:
- SQL databases such as MySQL, PostgreSQL, and Oracle. NoSQL databases such as MongoDB, Cassandra, and CouchDB.
- Solr is a popular open-source search engine that provides high performance and scalability.
- Elasticsearch is another popular open-source search engine that offers a variety of features, including full-text search, geospatial search, and machine learning.
- Apache Cassandra is a NoSQL database that can be used to store large amounts of data.
- MongoDB is another NoSQL database that is easy to use and scale.
- PostgreSQL is a relational database that is widely used in enterprise applications.
LlamaIndex is a flexible framework that can be used to index a variety of document types and data sources.
Compared to LangChain, LlamaIndex presents a focused advantage in the realm of indexing and retrieval tasks, making it a highly efficient choice for applications that prioritize these functions. Its design is tailored specifically for the efficient ingestion, structuring, and accessing of private or domain-specific data, which is crucial for applications that rely heavily on the quick retrieval of accurate and relevant information from large datasets. The streamlined interface of LlamaIndex simplifies the process of connecting custom data sources to large language models (LLMs), thereby reducing the complexity and development time for search-centric applications. This focus on indexing and retrieval, as highlighted in the search results, leads to increased speed and accuracy in search and summarization tasks, setting LlamaIndex apart as the go-to framework for developers building intelligent search tools.
Another significant advantage of LlamaIndex is its integration capabilities with a wide array of tools and services, which enhances the functionality and versatility of LLM-powered applications. The framework’s ability to merge with vector stores like Pinecone and Milvus facilitates efficient document search and retrieval. Additionally, its compatibility with tracing tools such as Graphsignal offers insights into LLM-powered application operations, while integration with application frameworks like LangChain and Streamlit enables easier building and deployment. These integrations extend to data loaders, agent tools, and observability tools, thus enhancing the capabilities of data agents and offering various structured output formats to facilitate the consumption of application results. This extensive integration ecosystem empowers developers to create powerful, versatile applications with minimal effort.
Lastly, LlamaIndex’s specialized focus on indexing and retrieval is complemented by its simplicity and ease of use, making it an attractive option for developers seeking to build efficient and straightforward search experiences. The framework’s optimization for these specific tasks, in comparison to more general-purpose frameworks like LangChain, results in a tool that is not only more efficient for search and retrieval applications but also easier to learn and implement. This simplicity is particularly beneficial for projects with tight deadlines or for developers new to working with LLMs, as it allows for the quick deployment of high-performance applications without the need for extensive customization or complex setup processes.
We will look at a short example derived from the LlamaIndex documentation.
Using LlamaIndex for Question Answering from a Web Site
In this example we use the trafilatura and html2text libraries to get text from a web page that we will index and search. The class TrafilaturaWebReader does the work of creating local documents from a list of web page URIs and the index class VectorStoreIndex builds a local index for use with OpenAI API calls to implement search.
The following listing shows the file web_page_QA.py:
This example is not efficient because we create a new index for each web page we want to search. That said, this example (that was derived from an example in the LlamaIndex documentation) implements a pattern that you can use, for example, to build a reusable index of your company’s web site and build an end-user web search app.
The output for these three test questions in the last code example is:
Note that the answer to the second question is strictly incorrect since it counted the books mentioned in the text. It did this correctly. However, the Trafilatura library skipped the text in the header block of my web site that said I have written over 20 books. This inaccuracy if from my use of the Trafilatura library.
LlamaIndex/GPT-Index Case Study Wrap Up
LlamaIndex is a set of data structures and library code designed to make it easier to use large external knowledge bases such as Wikipedia. LlamaIndex creates a vectorized index from your document data, making it highly efficient to query. It then uses this index to identify the most relevant sections of the document based on the query.
LlamaIndex is useful because it provides a central interface to connect your LLM’s with external data and offers data connectors to your existing data sources and data formats (API’s, PDF’s, docs, SQL, etc.). It provides a simple, flexible interface between your external data and LLMs.
Some projects that use LlamaIndex include building personal assistants with LlamaIndex and GPT-4, using LlamaIndex for document retrieval, and combining answers across documents.
Extraction of Facts and Relationships from Text Data
Traditional methods for extracting email addresses, names, addresses, etc. from text included the use of hand-crafted regular expressions and custom software. LLMs are text processing engines with knowledge of grammar, sentence structure, and some real world embedded knowledge. Using LLMs can reduce the development time of information extraction systems.
Key Capabilities of LLMs for Fact and Relationship Extraction
- Named Entity Recognition (NER): LLMs excel at identifying and classifying named entities within text. This includes pinpointing people, organizations, locations, dates, quantities, etc. NER forms the basis of any fact extraction process, as entities are the core elements around which facts are organized.
- Relationship Extraction (RE): LLMs are adept at understanding the grammatical structure of sentences and the contextual meaning of words. This enables them to identify relationships between the entities they’ve identified, such as: Employment relationships (“Jane Smith works for Microsoft”) Ownership (“Apple acquired Beats Electronics”) and Location-based relationships (“The Louvre Museum is located in Paris”)
- Semantic Understanding: LLMs possess a deep understanding of language semantics. This allows them to go beyond simple keyword matching and grasp the nuances and implicit meanings within text, leading to more accurate fact extraction.
- Knowledge Base Augmentation: Pre-trained LLMs draw on their vast knowledge bases (from being trained on massive text datasets) to fill in gaps when text is incomplete and support the disambiguation of entities or relationships.
Techniques and Approaches
- Fine-tuned Question Answering: LLMs can be fine-tuned to directly answer factual questions posed based on a text. For example, given a news article and the question, “When did the event occur?”, the LLM can pin down the relevant date within the text.
- Knowledge Graph Construction: LLMs play a crucial role in automatically constructing knowledge graphs. These graphs are structured representations of facts and relationships extracted from text. LLMs identify the entities, relationships, and help enrich the graphs with relevant attributes.
- Zero-shot or Few-shot Learning: Advanced LLMs can extract certain facts and relationships with minimal or no additional training examples. This is especially valuable in scenarios where manually labelled data is scarce or time-consuming to create.
Benefits
- Accuracy: LLMs often surpass traditional rule-based systems in accuracy, particularly when working with complex or varied text formats.
- Scalability: LLMs can process vast amounts of text data to efficiently extract facts and relationships, enabling the analysis of large-scale datasets.
- Time-saving: The ability of LLMs to adapt and learn reduces the need for extensive manual rule creation or feature engineering, leading to faster development of fact extraction systems.
Applications
- Financial Analysis: Identifying key facts and relationships within financial reports and news articles to support investment decisions.
- Legal Research: Extracting relevant clauses, case law, and legal relationships from complex legal documents.
- Scientific Literature Analysis: Building databases of scientific findings and discoveries by extracting relationships and networks from research papers.
- Customer Support: Analyzing customer feedback and queries to understand product issues, sentiment, and commonly reported problems.
Example Prompts for Getting Information About a Person from Text and Generating JSON
Before using LLMs directly in application code I like to experiment with prompts. Here we will use a two-shot approach of providing as context two examples of text and the extracted JSON data, followed by text we want to process. Consider the following that I ran on my old M1 8G MacBook:
This prompt is in the file prompt_examples/two-shot-2.txt.
The output can be overly verbose:
While the comments the llama3-8b-instruct model makes are interesting, let’s modify the prompt to ask for concise output that only includes the generated JSON:
The rest of the prompt is unchanged, now the output is:
Example Code
To use this example we would use the same prompt except we would make the Process Text a variable that is replaced before processing by an LLM. We copy the file two-shot-2.txt to two-shot-2-var.txt and change the second to the last line in the file:
Now let’s wrap these ideas up in a short Python example in the file extraction/person_data.py:
The output looks like:
For reference, the complete completion object looks like this:
Using LLMs to Summarize Text
LLMs bring a new level of ability to text summarization tasks. With their ability to process massive amounts of information and “understand” natural language, they’re able to capture the essence of lengthy documents and distill them into concise summaries. Two main types of summarization dominate with LLMs: extractive and abstractive. Extractive summarization pinpoints the most important sentences within the original text, while abstractive summarization requires the LLM to paraphrase or generate new text to represent the core ideas. If you are interested in extractive summarization there is a chapter on this topic in my Common Lisp AI book (link to read online).
LLMs excel in text summarization for several reasons. Their deep understanding of language semantics allows them to identify key themes, even when wording varies across a document. Additionally, they have an ability for maintaining logical consistency within summaries, ensuring that the condensed version makes sense as a cohesive unit. Modern LLMs are also trained on massive datasets encompassing diverse writing styles, helping them adapt to different sources and generate summaries tailored to specific audiences.
The applications of LLM-powered text summarization are vast. They can help researchers digest lengthy scientific reports quickly, allow businesses to analyze customer feedback efficiently, or provide concise news briefs for busy individuals. LLM-based summarization also has the potential to improve accessibility, creating summaries for those with reading difficulties or summarizing complex information into simpler language. As LLM technology continues to advance, we can expect even more innovative and accurate summarization tools in the future.
Example Prompt
In this example, the prompt is simply:
Code Example
The example in file summarization/summarization_example.py reads a prompt file and substitutes the text from the test file ../data/economics.txt:
The length of the output summary is about 20% of the length of the original text.
LLM Techniques for Structured Data Conversion
Here we look at a simple example of converting CSV spreadsheet files to JSON but the idea of data conversion using LLMs is general purpose.
Using LLMs helps handle ambiguity. Traditional Symbolic AI methods often struggle with the nuance of human language. LLMs, with their understanding of context, can resolve ambiguity and provide more accurate extraction.
LLMs are also effective at handling complex or previously unseen formats (one shot). LLMs are trained on vast amounts of diverse text data, making them more adaptable to unexpected variations in data formats than rule-based approaches.
Using LLMs for application development can reduce manual effort by automating many parts of the conversion process that traditionally required significant human intervention and the creation of detailed extraction rules.
Example Prompt for Converting CSV Files to JSON
In the prompt we supply a few examples for converting between these two formats:
Example Code for Converting CSV Files to JSON
The example in file structured_data_conversion/person_data.py reads the prompt template file and substitutes the CSV data from the test file test.csv. The modified prompt is passed to the OpenAI completion API:
Here is the test CSV input file:
Notice that this file is not consistent in quoting strings, hopefully making this a more general example of data you might see in the wild. The generated JSON looks like:
Retrieval Augmented Generation (RAG) Applications
Note: August 27, 2024: pinned library versions.
Retrieval Augmented Generation (RAG) Applications work by pre-processing a user’s query to search for indexed document fragments that are semantically similar to the user’s query. These fragments are concatenated together as context text that is attached to the user’s query and then passed of to a LLM model. The LLM can preferentially use information in this context text as well as innate knowledge stored in the LLM to process user queries.
Simple RAG Example Using LlamaIndex
We will start with an example that only uses a vector store to find documents similar to the text in a query. Here is a listing of rag/simple_llama_index_retrieve_docs.py.
The code imports VectorStoreIndex and Document from the llama_index.core module. It then defines a list of strings, each describing aspects of LlamaIndex, and converts these strings into Document objects. These documents are then used to create an index using VectorStoreIndex.from_documents(documents), which builds an index from the provided documents. This index is capable of understanding and storing the text data in a structured form that can be efficiently queried.
Following the index creation, the code initializes a query engine with index.as_query_engine(), which allows for querying the indexed data. The query “What is LlamaIndex?” is passed to the retrieve method of the query engine. This method processes the query against the indexed documents to find relevant information. The results are then printed. This demonstrates a basic use case of LlamaIndex for text retrieval where the system identifies and retrieves information directly related to the user’s query from the indexed data. In practice we usually combine the use of vector stores with LLM chat models, as we do in later examples.
The output looks like:
I almost never use just a vector index by itself. The next example is in the file simple_rag_llama_index.py.
This code demonstrates the use of the LlamaIndex library to process, index, and query text data, specifically focusing on extracting information about sports from a dataset. The code imports necessary classes from the llama_index library. VectorStoreIndex is used for creating a searchable index of documents.
SimpleDirectoryReader reads documents from a specified directory. OpenAIEmbedding is used to convert text into numerical embeddings using OpenAI’s models. SentenceSplitter and TitleExtractor are used for preprocessing the text by splitting it into sentences and extracting titles, respectively. The ingestion pipeline configuration object IngestionPipeline is configured with three transformations:
- SentenceSplitter breaks the text into smaller chunks or sentences with specified chunk_size and chunk_overlap. This helps in managing the granularity of text processing.
- TitleExtractor pulls out titles from the text, which can be useful for summarizing or categorizing documents.
- OpenAIEmbedding converts the processed text into vector embeddings. These embeddings represent the text in a high-dimensional space, capturing semantic meanings which are crucial for effective searching and querying.
We use SimpleDirectoryReader to convert the text files in a directory to a list or document objects:
As in the last example, we create a vector store object fro the list of document objects:
The index is built using the vector embeddings generated by OpenAIEmbedding, allowing for semantic search capabilities.
The statement
sets up a query engine from the index. This engine can process queries to find relevant documents based on their semantic content.
Finally we are ready to make a test query:
The engine searches through the indexed documents and retrieves information that semantically matches the query about sports.
Here is a complete code listing for this example:
The output looks like:
RAG With Reranking Example
Reranking in the context of Retrieval-Augmented Generation (RAG) for LLMs refers to a process of adjusting the order of documents retrieved by an initial search query to improve the relevance and quality of the results before they are used for generating responses. This step is crucial because the initial retrieval might fetch a broad set of documents, not all of which are equally relevant to the user’s query.
The primary function of reranking is to refine the selection of documents based on their actual relevance to the expanded query. This is typically achieved by employing more sophisticated models that can better understand the context and nuances of the query and the documents. For instance, cross-encoder models are commonly used in reranking due to their ability to process the query and document simultaneously, providing a more accurate evaluation of relevance.
Just as LLMs have the flexibility to handle a broad range of text topics, different programming languages, etc., reranking mechanisms have the flexibility to handle a wide range of source information as well as a wide range of query types.
For your RAG applications, you should notice a reduction of noise text and irrelevance in RAG system responses to user queries.
The code example is very similar to the last example but we now add a reranker as a query engine postprocessor:
The output is similar to before:
Let’s try a more complex query and instead of just using the document directory ../data_small that only contains information about sports, we will now use the text documents in ../data that cover more general topics. We make two code changes. First we use a different document directory:
We also change the query:
The output looks like:
RAG on CSV Spreadsheet Files
TBD
Using Google’s Knowledge Graph APIs With LangChain
Google’s Knowledge Graph (KG) is a knowledge base that Google uses to serve relevant information in an info-box beside its search results. It allows the user to see the answer in a glance, as an instant answer. The data is generated automatically from a variety of sources, covering places, people, businesses, and more. I worked at Google in 2013 on a project that used their KG for an internal project.
Google’s public Knowledge Graph Search API lets you find entities in the Google Knowledge Graph. The API uses standard schema.org types and is compliant with the JSON-LD specification. It supports entity search and lookup.
You can use the Knowledge Graph Search API to build applications that make use of Google’s Knowledge Graph. For example, you can use the API to build a search engine that returns results based on the entities in the Knowledge Graph.
In the next chapter we also use the public KGs DBPedia and Wikidata. One limitation of Google’s KG APIs is that it is designed for entity (people, places, organizations, etc.) lookup. When using DBPedia and Wikidata it is possible to find a wider range of information using the SPARQL query language, such as relationships between entities. You can use the Google KG APIs to find some entity relationships, e.g., all the movies directed by a particular director, or all the books written by a particular author. You can also use the API to find information like all the people who have worked on a particular movie, or all the actors who have appeared in a particular TV show.
Setting Up To Access Google Knowledge Graph APIs
To get an API key for Google’s Knowledge Graph Search API, you need to go to the Google API Console, enable the Google Knowledge Graph Search API, and create an API key to use in your project. You can then use this API key to make requests to the Knowledge Graph Search API.
To create your application’s API key, follow these steps:
- Go to the API Console.
- From the projects list, select a project or create a new one.
- If the APIs & services page isn’t already open, open the left side menu and select APIs & services.
- On the left, choose Credentials.
- Click Create credentials and then select API key.
You can then use this API key to make requests to the Knowledge Graph Search APIs.
When I use Google’s APIs I set the access key in ~/.google_api_key and read in the key using:
You can also use environment variables to store access keys. Here is a code snippet for making an API call to get information about me:
The JSON-LD output would look like:
In order to not repeat the code for getting entity information from the Google KG, I wrote a utility Google_KG_helper.py that encapsulates the previous code and generalizes it into a mini-library.
The main test script is in the file Google_Knowledge_Graph_Search.py:
The example output is:
Accessing Knowledge Graphs from Google, DBPedia, and Wikidata allows you to integrate real world facts and knowledge with your applications. While I mostly work in the field of deep learning I frequently also use Knowledge Graphs in my work and in my personal research. I think that you, dear reader, might find accessing highly structured data in KGs to be more reliable and in many cases simpler than using web scraping.
Using DBPedia and WikiData as Knowledge Sources
Both DBPedia and Wikidata are public Knowledge Graphs (KGs) that store data as Resource Description Framework (RDF) and are accessed through the SPARQL Query Language for RDF. The examples for this project are in the GitHub repository for this book in the directory kg_search.
I am not going to spend much time here discussing RDF and SPARQL. Instead I ask you to read online the introductory chapter Linked Data, the Semantic Web, and Knowledge Graphs in my book A Lisp Programmer Living in Python-Land: The Hy Programming Language.
As we saw in the last chapter, a Knowledge Graph (that I often abbreviate as KG) is a graph database using a schema to define types (both objects and relationships between objects) and properties that link property values to objects. The term “Knowledge Graph” is both a general term and also sometimes refers to the specific Knowledge Graph used at Google which I worked with while working there in 2013. Here, we use KG to reference the general technology of storing knowledge in graph databases.
DBPedia and Wikidata are similar, with some important differences. Here is a summary of some similarities and differences between DBPedia and Wikidata:
- Both projects aim to provide structured data from Wikipedia in various formats and languages. Wikidata also has data from other sources so it contains more data and more languages.
- Both projects use RDF as a common data model and SPARQL as a query language.
- DBPedia extracts data from the infoboxes in Wikipedia articles, while Wikidata collects data entered through its interfaces by both users and automated bots.
- Wikidata requires sources for its data, while DBPedia does not.
- DBpedia is more popular in the Semantic Web and Linked Open Data communities, while Wikidata is more integrated with Wikimedia projects.
To the last point: I personally prefer DBPedia when experimenting with the semantic web and linked data, mostly because DBPedia URIs are human readable while Wikidata URIs are abstract. The following URIs represent the town I live in, Sedona Arizona:
- DBPedia: https://dbpedia.org/page/Sedona,_Arizona
- Wikidata: https://www.wikidata.org/wiki/Q80041
In RDF we enclose URIs in angle brackets like <https://www.wikidata.org/wiki/Q80041>.
If you read the chapter on RDF and SPARQL in my book link that I mentioned previously, then you know that RDF data is represented by triples where each part is named:
- subject
- property
- object
We will look at two similar examples in this chapter, one using DBPedia and one using Wikidata. Both services have SPARQL endpoint web applications that you will want to use for exploring both KGs. We will look at the DBPedia web interface later. Here is the Wikidata web interface:
In this SPARQL query the prefix wd: stands for Wikidata data while the prefix wdt: stands for Wikidata type (or property). The prefix rdfs: stands for RDF Schema.
Using DBPedia as a Data Source
DBpedia is a community-driven project that extracts structured content from Wikipedia and makes it available on the web as a Knowledge Graph (KG). The KG is a valuable resource for researchers and developers who need to access structured data from Wikipedia. With the use of SPARQL queries to DBpedia as a data source we can write a variety applications, including natural language processing, machine learning, and data analytics. We demonstrate the effectiveness of DBpedia as a data source by presenting several examples that illustrate its use in real-world applications. In my experience, DBpedia is a valuable resource for researchers and developers who need to access structured data from Wikipedia.
In general you will start projects using DBPedia by exploring available data using the web app https://dbpedia.org/sparql that can be seen in this screen shot:
The following listing of file dbpedia_generate_rdf_as_nt.py shows Python code for making a SPARQL query to DBPedia and saving the results as RDF triples in NT format in a local text file:
Here is the printed output from running this script (most output not shown, and manually edited to fit page width):
This output was written to a local file sample.nt. I divided this example into two separate Python scripts because I thought it would be easier for you, dear reader, to experiment with fetching RDF data separately from using a LLM to process the RDF data. In production you may want to combine KG queries with semantic analysis.
This code example demonstrates the use of the GPTSimpleVectorIndex for querying RDF data and retrieving information about countries. The function download_loader loads data importers by string name. While it is not a type safe to load a Python class by name using a string, if you misspell the name of the class to load the call to download_loader then a Python ValueError(“Loader class name not found in library”) error is thrown. The GPTSimpleVectorIndex class represents an index data structure that can be used to efficiently search and retrieve information from the RDF data. This is similar to other types of LlamaIndex vector index types for different types of data sources.
Here is the script dbpedia_rdf_query.py:
Here is the output:
Why are there only 18 countries listed? In the script used to perform a SPARQL query on DBPedia, we had a statement LIMIT 50 at the end of the query so only 50 RDF triples were written to the file sample.nt that only contains data for 18 countries.
Using Wikidata as a Data Source
It is slightly more difficult exploring Wikidata compared to DBPedia. Let’s revisit getting information about my home town of Sedona Arizona.
In writing this example, I experimented with SPARQL queries using the Wikidata SPARQL web app.
We can start by finding RDF statements with the object value being “Sedona” using the Wikidata web app:
First we write a helper utility to gather prompt text for an entity name (e.g., name of a person, place, etc.) in the file wikidata_generate_prompt_text.py:
This utility does most of the work in getting prompt text for an entity.
The GPTTreeIndex class is similar to other LlamaIndex index classes. This class builds a tree-based index of the prompt texts, which can be used to retrieve information based on the input question. In LlamaIndex, a GPTTreeIndex is used to select the child node(s) to send the query down to. A GPTKeywordTableIndex uses keyword matching, and a GPTVectorStoreIndex uses embedding cosine similarity. The choice of which index class to use depends on how much text is being indexed, what the granularity of subject matter in the text is, and if you want summarization.
GPTTreeIndex is also more efficient than GPTSimpleVectorIndex because it uses a tree structure to store the data. This allows for faster searching and retrieval of data compared to a linear list index class like GPTSimpleVectorIndex.
The LlamaIndex code is relatively easy to implement in the script wikidata_query.py (edited to fit page width):
Here is the test output (with some lines removed):
Using LLMs To Organize Information in Our Google Drives
My digital life consists of writing, working as an AI practitioner, and learning activities that I justify with my self-image of a “gentleman scientist.” Cloud storage like GitHub, Google Drive, Microsoft OneDrive, and iCloud are central to my activities.
About ten years ago I spent two months of my time writing a system in Clojure that was planned to be my own custom and personal DropBox, augmented with various NLP tools and a FireFox plugin to send web clippings directly to my personal system. To be honest, I stopped using my own project after a few months because the time it took to organize my information was a greater opportunity cost than the value I received.
In this chapter I am going to walk you through parts of a new system that I am developing for my own personal use to help me organize my material on Google Drive (and eventually other cloud services). Don’t be surprised if the completed project is an additional example in a future edition of this book!
With the Google setup directions listed below, you will get a pop-up web browsing window with a warning like (this shows my Gmail address, you should see your own Gmail address here assuming that you have recently logged into Gmail using your default web browser):
You will need to first click Advanced and then click link Go to GoogleAPIExamples (unsafe) link in the lower left corner and then temporarily authorize this example on your Gmail account.
Setting Up Requirements.
You need to create a credential at https://console.cloud.google.com/cloud-resource-manager (copied from the PyDrive documentation, changing application type to “Desktop”):
- Search for ‘Google Drive API’, select the entry, and click ‘Enable’.
- Select ‘Credentials’ from the left menu, click ‘Create Credentials’, select ‘OAuth client ID’.
- Now, the product name and consent screen need to be set -> click ‘Configure consent screen’ and follow the instructions. Once finished:
- Select ‘Application type’ to be Desktop application.
- Enter an appropriate name.
- Input http://localhost:8080 for ‘Authorized JavaScript origins’.
- Input http://localhost:8080/ for ‘Authorized redirect URIs’.
- Click ‘Save’.
- Click ‘Download JSON’ on the right side of Client ID to download client_secret_.json. Copy the downloaded JSON credential file to the example directory google_drive_llm for this chapter.
Write Utility To Fetch All Text Files From Top Level Google Drive Folder
For this example we will just authenticate our test script with Google, and copy all top level text files with names ending with “.txt” to the local file system in subdirectory data. The code is in the directory google_drive_llm in file fetch_txt_files.py (edited to fit page width):
For testing I just have one text file with the file extension “.txt” on my Google Drive so my output from running this script looks like the following listing. I edited the output to change my file IDs and to only print a few lines of the debug printout of file titles.
Generate Vector Indices for Files in Specific Google Drive Directories
The example script in the last section should have created copies of the text files in you home Google Documents directory that end with “.txt”. Here, we use the same LlamaIndex test code that we used in a previous chapter. The test script index_and_QA.py is listed here:
For my test file, the output looks like:
It is interesting to see how the query result is rewritten in a nice form, compared to the raw text in the file sports.txt on my Google Drive:
Google Drive Example Wrap Up
If you already use Google Drive to store your working notes and other documents, then you might want to expand the simple example in this chapter to build your own query system for your documents. In addition to Google Drive, I also use Microsoft Office 365 and OneDrive in my work and personal projects.
I haven’t written my own connectors yet for OneDrive but this is on my personal to-do list using the Microsoft library https://github.com/OneDrive/onedrive-sdk-python.
Natural Language SQLite Database Queries With LangChain
Note: this chapter updated October 16, 2025 for LangChain 0.3.2.
The LangChain library support of SQLite databases uses the Python library SQLAlchemy for database connections. This abstraction layer allows LangChain to use the same logic and models for other relational databases.
I have a long work history of writing natural language interfaces for relational databases that I will review in the chapter wrap up. For now, I invite you to be amazed at how simple it is to write the LangChain scripts for querying a database in natural language.
We will use the SQLite sample database from the SQLite Tutorial web site:
This database has 11 tables. The above URI has documentation for this database so please take a minute to review the table schema diagram and text description.
This example is derived from the LangChain documentation. We use three classes from the LangChain library:
- OpenAI: A class that represents the OpenAI language model, which is capable of understanding natural language and generating a response.
- SQLDatabase: A class that represents a connection to an SQL database.
- SQLDatabaseChain: A class that connects the OpenAI language model with the SQL database to allow natural language querying.
The temperature parameter set to 0 in this example. The temperature parameter controls the randomness of the generated output. A lower value (like 0) makes the model’s output more deterministic and focused, while a higher value introduces more randomness (or “creativity”). The run method of the db_chain object translates the natural language query into an appropriate SQL query, execute it on the connected database, and then returns the result converting the output into natural language.
The output (edited for brevity) shows the generated SQL queries and the query results:
Natural Language Database Query Wrap Up
I had an example I wrote for the first two editions of my Java AI book (I later removed this example because the code was too long and too difficult to follow). I later reworked this example in Common Lisp and used both versions in several consulting projects in the late 1990s and early 2000s.
The last book I wrote Practical Python Artificial Intelligence Programming used an OpenAI example https://github.com/openai/openai-cookbook/blob/main/examples/Backtranslation_of_SQL_queries.py that shows relatively simple code (relative to my older hand-written Java and Common Lisp code) for a NLP database interface.
Compared to the elegant support for NLP database queries in LangChain, the previous examples have limited power and required a lot more code. As I write this in March 2023, it is a good feeling that for the rest of my career, NLP database access is now a solved problem!
Examples Using Hugging Face Open Source Models
To start with you will need to create a free account on the Hugging Face Hub and get an API key and install:
You need to set the following environment variable to your Hugging Face Hub access token:
So far in this book we have been using the OpenAI LLM wrapper:
Here we will use the alternative Hugging Face wrapper class:
The LangChain library hides most of the details of using both APIs. This is a really good thing. I have had a few discussions on social tech media with people who object to the non open source nature of OpenAI. While I like the convenience of using OpenAI’s APIs, I always like to have alternatives for proprietary technology I use.
The Hugging Face Hub endpoint in LangChain connects to the Hugging Face Hub and runs the models via their free inference endpoints. We need a Hugging Face account and API key to use these endpoints3. There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the text2text-generation and text-generation tasks. Text2text-generation refers to the task of generating a text sequence from another text sequence. For example, generating a summary of a long article. Text-generation refers to the task of generating a text sequence from scratch.
Using LangChain as a Wrapper for Hugging Face Prediction Model APIs
We will start with a simple example using the prompt text support in LangChain. The following example is in the script simple_example.py:
By changing just a few lines of code, you can run many of the examples in this book using the Hugging Face APIs in place of the OpenAI APIs.
The LangChain documentation lists the source code for a wrapper to use local Hugging Face embeddings here.
Creating a Custom LlamaIndex Hugging Face LLM Wrapper Class That Runs on Your Laptop
We will be downloading the Hugging Face model facebook/opt-iml-1.3b that is a 2.6 gigabyte file. This model is downloaded the first time it is requested and is then cached in ~/.cache/huggingface/hub for later reuse.
This example is modified from an example for custom LLMs in the LlamaIndex documentation. Note that I have used a much smaller model in this example and reduced the prompt and output text size.
When running on my M1 MacBook Pro using only the CPU (no GPU or Neural Engine configuration) we can read the model from disk quickly but it takes a while to process queries:
Even though my M1 MacBook does fairly well when I configure TensorFlow and PyTorch to use the Apple Silicon GPUs and Neural Engines, I usually do my model development using Google Colab.
Let’s rerun the last example on Colab:
Using a standard Colab GPU, the query/prediction time is much faster. Here is a link to my Colab notebook if you would prefer to run this example on Colab instead of on your laptop.
Running Local LLMs Using Llama.cpp and LangChain
We saw an example at the end of the last chapter running a local LLM. Here we use the Llama.cpp project to run a local model with LangChain. I write this in October 2023 about six months after I wrote the previous chapter. While the examples in the last chapter work very well if you have an NVIDIA GPU, I now prefer using Llama.cpp because it also works very well with Apple Silicon. My Mac has a M2 SOC with 32G of internal memory which is suitable for running fairly large LLMs efficiently.
Installing Llama.cpp with a Llama2-13b-orca Model
Now we look an an approach to run LLMs locally on your own computers.
Among the many open and public models, I chose Hugging Face’s Llama2-13b-orca model because of its support for natural language processing tasks. The combination of Llama2-13b-orca with the llama.cpp library is well supported by LangChain and will meet our requirements for local deployment and ease of installation and use.
Start by cloning the llama.cpp project and building it:
Then get a model file from https://huggingface.co/TheBloke/OpenAssistant-Llama2-13B-Orca-8K-3319-GGUF and copy to ./models directory:
It is not strictly required for you to clone Llama.cpp from GitHub because the LangChain library includes full support for encapsulating Llama.cpp via the llama-cpp-python library. That said, you can also run Llama.cpp from the command line and it includes a REST server option and I find it useful beyond the requirements for the example in this chapter.
Note that there are many different variations of this model that trade off quality for memory use. I am using one of the larger models. If you only have 8G of memory try a smaller model.
Python Example
The following script is in the file langchain-book-examples/llama.cpp/test.py and is derived from the LangChain documentation: https://python.langchain.com/docs/integrations/llms/llamacpp.
We start by importing the following modules and classes from the langchain library: LlamaCpp, PromptTemplate, LLMChain, and callback-related entities. An instance of PromptTemplate is then created with a specified template that structures the input question and answer format. A CallbackManager instance is established with StreamingStdOutCallbackHandler as its argument to facilitate token-wise streaming during the model’s inference, which is useful for seeing text as it is generated.
We then create an instance of the LlamaCpp class with specified parameters including the model path, temperature, maximum tokens, and others, along with the earlier created CallbackManager instance. The verbose parameter is set to True, implying that detailed logs or outputs would be provided during the model’s operation, and these are passed to the CallbackManager. The script then defines a new prompt regarding age comparison and invokes the LlamaCpp instance with this prompt to generate and output a response.
Here is example output (with output shortened for brevity):
While using APIs from OpenAI, Anthropic, and other providers is simple and frees developers from the requirements for running LLMs, new tools like Llama.cpp make it easier and less expensive to run and deploy LLMs yourself. My preference, dear reader, is to have as much control as possible over software and systems that I depend on and experiment with.
Running Local LLMs Using Ollama
We saw an example at the end of the last chapter running Llama.cpp project to run a local model with LangChain. As I update this chapter in April 2024 I now most often use the Ollama app (download, documentation, and list of supported models at https://ollama.ai). Ollama has a good command line interface and also runs a REST service that the examples in this chapter use.
Ollama works very well with Apple Silicon, systems with an NVIDIA GPU, and high end CPU-only systems. My Mac has a M2 SOC with 32G of internal memory which is suitable for running fairly large LLMs efficiently but most of the examples here run fine with 16G memory.
Most of this chapter involves Python code examples using Ollama to run local LLMs. However the Ollama command line interface is useful for interactive experiments. Another useful development technique is to write prompts in individual text files like p1.txt, p2.txt, etc. and run a prompt (on macOS and Linux) using:
And after the response is printed either stay in the Ollama REPL or type /bye to exit.
Simple Use of a local Mistral Model Using LangChain
We look at a simple example for asking questions and text completions using a local Mistral model. The Ollama support in LangChain requires that you run Ollama as a service on your laptop:
Here I am using a Mistral model but I usually have several LLMs installed to experiment with, for example:
Here is the file ollama_langchain/test.py:
Here is the output:
Minimal Example Using Ollama with the Mistral Open Model for Retrieval Augmented Queries Against Local Documents
The following listing of file ollama_langchain/rag_test.py demonstrates creating a persistent embeddings datastore and reusing it. In production, this example would be split into two separate Python scripts:
- Create a persistent embeddings datastore from a directory of local documents.
- Open a persisted embeddings datastore and use it for queries against local documents.
Creating a local persistent embeddings datastore for the example text files in ../data/*.txt takes about 90 seconds on my Mac Mini.
Here is an example using this script. The first question uses the innate knowledge contained in the Mistral-7B model while the second question uses the text files in the directory ../data as local documents. The test input file economics.txt has been edited to add the name of a fictional economist. I added this data to show that the second question is answered from the local document store.
Wrap Up for Running Local LLMs Using Ollama
As I write this chapter in December 2023 most of my personal LLM experiments involve running models locally on my Mac mini (or sometimes in Google Colab) even though models available through OpenAI, Anthropic, etc. APIs are more capable. I find that the Ollama project is currently the easiest and most convenient way to run local models as REST services or embedded in Python scripts as in the two examples here.
Using Large Language Models to Write Recipes
If you ask the ChatGPT web app to write a recipe using a user supplied ingredient list and a description it does a fairly good job at generating recipes. For the example in this chapter I am taking a different approach:
- Use the recipe and ingredient files from my web app http://cookingspace.com to create context text, given a user prompt for a recipe.
- Treat this as a text prediction problem.
- Format the response for display.
This approach has an advantage (for me!) that the generated recipes will be more similar to the recipes I enjoy cooking since the context data will be derived from my own recipe files.
Preparing Recipe Data
I am using the JSON Recipe files from my web app http://cookingspace.com. The following Python script converts my JSON data to text descriptions, one per file:
Here is a listing of one of the shorter generated recipe files (i.e., text recipe data converted from raw JSON recipe data from my CookingSpace.com web site):
I have generated 41 individual recipe files that will be used for the remainder of this chapter.
In the next section when we use a LLM to generate a recipe, the directions are numbered steps and the formatting is different than my original recipe document files.
A Prediction Model Using the OpenAI text-embedding-3-large Model
Here we use the DirectoryLoader class that we have used in previous examples to load and then create an embedding index.
Here is the listing for the script recipe_generator.py:
This generated two recipes. Here is the output for the first request:
If you examine the text recipe files I indexed you see that the prediction model merged information from multiple training data recipes while creating new original text for directions that is loosely based on the directions that I wrote and information encoded in the OpenAI text-davinci-002 model.
Here is the output for the second request:
Cooking Recipe Generation Wrap Up
Cooking is one of my favorite activities (in addition to hiking, kayaking, and playing a variety of musical instruments). I originally wrote the CookingSpace.com web app to scratch a personal itch: due to a medical issue I had to closely monitor and regulate my vitamin K intake. I used the US Government’s USDA Nutrition Database to estimate the amounts of vitamins and nutrients in some recipes that I use.
When I wanted to experiment with generative models, backed by my personal recipe data, to create recipes, having available recipe data from my previous project as well as tools like OpenAI APIs and LangChain made this experiment simple to set up and run. It is a common theme in this book that it is now relatively easy to create personal projects based on our data and our interests.
LangChain Agents
LangChain agent tools act as a glue to map natural language human input into different sequences of actions. We are effectively using the real world knowledge in the text used to train LLMs to act as a reasoning agent.
The LangChain Agents Documentation provides everything you need to get started. Here we will dive a bit deeper into using local Python scripts in agents and look at an interesting example using SPARQL queries and the public DBPedia Knowledge Base. We will concentrate on just a few topics:
- Understanding what LangChain tools are and using pre-built tools.
- Get an overview of React reasoning. You should bookmark the original paper ReAct: Synergizing Reasoning and Acting in Language Models for reference. This paper inspired design and implementation of the agent tool code in LangChain.
- Writing custom functions for OpenAI: how to write a custom tool. We will write a tool that uses SPARQL queries to the DBPedia public Knowledge Graph.
Overview of LangChain Tools
As we have covered with many examples in this book, LangChain is a framework that provides tools for building LLM-powered applications.
Here we look at using built in LangChain agent tools, understand reactive agents, and end the chapter with a custom tool agent application.
LangChain tools are interfaces that an agent can use to interact with the world. They can be generic utilities (e.g. search), other chains, or even other agents. The interface API of a tool has a single text input and a single text output, and includes a name and description that communicate to the model what the tool does and when to use it.
Some tools can be used as-is and some tools (e.g. chains, agents) may require a base LLM to use to initialize them. In that case, you can pass in an LLM as well:
To implement your own tool, you can subclass the Tool class and implement the _call method. The _call method is called with the input text and should return the output text. The Tool superclass implements the call method, which takes care of calling the right CallbackManager methods before and after calling your _call method. When an error occurs, the _call method should when possible return a string representing an error, rather than throwing an error. This allows the error to be passed to the LLM and the LLM can decide how to handle it.
LangChain also provides pre-built tools that provide a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
In summary, LangChain tools are interfaces that agents can use to interact with the world. They can be generic utilities or other chains or agents. Here is a list of some of the available LangChain agent tools:
- AWSLambda - A wrapper around the AWS Lambda API, invoked via the Amazon Web Services Node.js SDK. Useful for invoking server less functions with any behavior which you need to provide to an Agent.
- BingSerpAPI - A wrapper around the Bing Search API.
- BraveSearch - A wrapper around the Brave Search API.
- Calculator - Useful for getting the result of a math expression.
- GoogleCustomSearch - A wrapper around the Google Custom Search API.
- IFTTTWebHook - A wrapper around the IFTTT Web-hook API.
- OpenAI - A wrapper around the OpenAI API.
- OpenWeatherMap - A wrapper around the OpenWeatherMap API.
- Random - Useful for generating random numbers, strings, and other values.
- Wikipedia - A wrapper around the Wikipedia API.
- WolframAlpha - A wrapper around the WolframAlpha API.
Overview of ReAct Library for Implementing Reading in LMS Applications
Most of the material in this section is referenced from the paper ReAct: Synergizing Reasoning and Acting in Language Models. The ReAct framework attempts to solve the basic problem of getting LLMs to accurately perform tasks. We want an LLM to understand us and actually do what we want. I take a different but similar approach for an example in my book Safe For Humans AI A “humans-first” approach to designing and building AI systems (link for reading free online) where I use two different LLMs, one to generate answers questions and another LLM to judge how well the first model did. That example is fairly ad-hoc because I was experimenting with an idea. Here we do much the same thing using a pre-built framework.
ReAct is an extension of the idea that LLMs perform better when we ask not only for an answer but also for the reasoning steps to generate an answer. The authors of the ReAct paper refer to these reasoning steps as “reasoning traces.”
Another approach to using LLMs in applications is to ask directions for actions from an LLM, take those actions, and report the results of the actions back to the LLM. This action loop can be repeated.
The ReAct paper combines reasoning traces and action loops. To paraphrase the paper:
Large language models (LLMs) have shown impressive abilities in understanding language and making decisions. However, their capabilities for reasoning and taking action has been new work with some promising results. Here we look at using LLMs to generate both reasoning traces and task-specific actions together. This allows for better synergy between the two: reasoning traces help the model create and update action plans, while actions let it gather more information from external sources. For question answering and fact verification tasks, ReAct avoids errors by using a simple Wikipedia API and generates human-like solutions. On interactive decision making tasks, ReAct has higher success rates compared to other methods, even with limited examples.
A ReAct prompt consists of example solutions to tasks, including reasoning traces, actions, and observations of the environment. ReAct prompting is easy to design and achieves excellent performance on various tasks, from answering questions to online shopping.
The ReAct paper serves as the basis for the design and implementation of support for LangChain agent tools. We look at an example application using a custom tool in the next section.
LangChain Agent Tool Example Using DBPedia SPARQL Queries
Before we look at the the LangChain agent custom tool code, let’s look at some utility code from my Colab notebook Question Answering Example using DBPedia and SPARQL (link to shared Colab notebook). I extracted just the code we need into the file QA.py (edited to fit page width):
We use the spaCy library and a small spaCy NLP model that we set up in lines 3-5.
I have written two books dedicated to SPARQL queries as well as providing SPARQL overviews and examples in my Haskell, Common Lisp, and Hy Language books and I am not going to repeat that discusion here. You can read the semantic web and SPARQL material free online using [https://leanpub.com/lovinglisp/read#leanpub-auto-semantic-web-and-linked-data](this link).
Lines 7-14 contain Python code for querying DBPedia.
Lines 16-25 use the spaCy library to identify both the entities in a user’s query as well as the entity type.
We define a SPARQL query template in lines 30-43 that uses Python F string variables name and dbpedia_type.
The function dbpedia_get_entities_by_name defined in lines 45-54 replaces variables with values in the SPARQL query template and makes a SPARQL query to DBPedia.
The function get_context_text which is the function in this file we will directly call later is defined in lines 65-83. We get entities and entity types in line 63. We define an internal helper function in lines 65-77 that we will call once for each of three DBPedia entity types that we use in this example (people, organizations. and organizations).
Here we use spaCy so install the library and a small NLP model:
The agent custom tool example is short so I list the source file custom_func_dbpedia.py first and then we will dive into the code (edited to fit page width):
The class GetContextTextFromDbPediaInput defined in lines 18-24 defines a tool input variable with an English language description for the variable that the LLM can use. The class GetContextTextFromDbPediaTool defined in lines 26-45 defines the tool name, a description for the use of an LLM, and the definition of the required methon _run. Methon _run uses the utility finction get_context_data defined in the source file QA.py.
We define a GPT-3.5 model in lines 53-54. Out example only uses one tool (our custom tool). We define the tools list in line 56 and setup the agent in lines 58-60.
The example output is (edited to fit page width):
Another React Agent Tool Use Example: Search and Math Expressions
The example for this section is found in langchain-book-examples/tool_search_math_example.
In this chapter, we explore the development of a multi-tool intelligent agent by leveraging the power of LangChain and OpenAI’s GPT models. With recent advancements in Large Language Models (LLMs) and agent frameworks, it’s possible to create interactive tools capable of dynamic actions like web searches or evaluating mathematical expressions. This script demonstrates how to integrate two example tools, a DuckDuckGo-based search engine and a simple calculator, into a React-style agent capable of performing multi-step reasoning.
This example highlights the practicalities of creating a modular agent that combines natural language capabilities with functional tools. It illustrates the use of:
- LLMs for conversational abilities (via OpenAI’s GPT-4 mini model).
- Third-party API tools, like DuckDuckGo search, for retrieving real-time information.
- Custom tools, such as a calculator, for handling specific operations.
By the end of this chapter, readers will understand how to define custom tools, set up a prompt template, and integrate these tools into an interactive agent capable of multi-step tasks. This example demonstrates the potential of tool-enhanced AI agents to solve both simple and complex problems dynamically.
Tools Example Implementation
The script imports key modules from LangChain, OpenAI, and a DuckDuckGo wrapper to build the agent framework. The OpenAI API key is pulled from an environment variable to maintain security.
Defining Tools
The agent is equipped with two tools:
- DuckDuckGo Search Tool: Uses the DuckDuckGoSearchRun API to fetch real-time web search results.
- Simple Calculator Tool: Evaluates mathematical expressions using Python’s eval() function.
Each tool is encapsulated in its own class with a call() method to enable easy invocation.
The two tools are initialized using LangChain’s Tool class. Each tool is assigned a name, func, and description.
A prompt template defines how the agent should format its responses. The agent follows a structured reasoning process, using thought-action-observation loops. Note that this example is written in a general purpose style supporting chain of thought and memory but here the tool use examples are one-shot, not multiple prompts.
Sample Output
LangChain Agent Tools Wrap Up
Writing custom agent tools is a great way to revisit the implementation of existing applications and improve them with the real world knowledge and reasoning abilities of LLMs. Both for practice in effectively using LLMs in your applications and also to extend your personal libraries and tools, I suggest that you look over you existing projects with an eye for either improving them using LLMs or refactoring them into reusable agent tools for your future projects.
Multi-prompt Search using LLMs, the Duckduckgo Search API, and Local Ollama Models
The short example we develop is inspired by commercial LLM apps like Perplexity. I subscribe to the Perplexity Pro plan and find it useful so I wanted to implement my own simple minimalist Python library that provides the same type of multiple LLM pass on a query and search results, finally producing a relevant summary.
We will start by looking at example uses of this library and then, dear reader, you can decide if you want to hack on this example code and make it your own.
The example code uses three simple prompt templates used to filter out non-useful search results, to summarize the text from fetch search result web links, and to write a final summary:
Example 1: “Write a business plan for a new startup using LLMs and expertise in medical billing.“
The example code has a ton of debug printout so here we only look at the final output summary:
Example 2: “Common Lisp and Deep Learning consultant”
Here we only look at the final output summary:
Example Code for Multi-prompt Search using LLMs, the Duckduckgo Search API, and Local Ollama Models
The example code script uses LangChain’s Ollama interface, the Duckduckgo library for accessing Duckduckgo’s internal quick results data (for low bandwidth non-commercial use), and the Trafilatura library for fetching p[lain text from a web URI:
Here is an example use of this example code:
More Useful Libraries for Working with Unstructured Text Data
Here we look at examples using two libraries that I find useful for my work: EmbedChain and Kor.
EmbedChain Wrapper for LangChain Simplifies Application Development
Taranjeet Singh developed a very nice wrapper library EmbedChain https://github.com/embedchain/embedchain that simplifies writing “query your own data” applications by choosing good defaults for the LangChain library.
I will show one simple example that I run on my laptop to search the contents of all of the books I have written as well as a large number of research papers. You can find my example in the GitHub repository for this book in the directory langchain-book-examples/embedchain_test. As usual, you will need an OpenAI API account and set the environment variable OPENAI_API_KEY to the value of your key.
I have copied PDF files for all of this content to the directory ~/data on my laptop. It takes a short while to build a local vector embedding data store so I use two Python scripts. The first script process_pdfs.py that is shown here:
Here is a demo Python script app.py that makes three queries:
The output looks like:
Kor Library
The Kor library was written by Eugene Yurtsev. Kor is useful for using LLMs to extract structured data from unstructured text. Kor works by generating appropriate prompt text to explain to GPT-3.5 what information to extract and adding in the text to be processed.
The GitHub repository for Kor is under active development so please check the project for updates. Here is the documentation.
For the following example, I modified an example in the Kor documentation for extracting dates in text.
Sample output:
Kor is a library focused on extracting data from text. You can get the same effects by writing for own prompts manually for GPT style LLMs but using Tor can save development time.
Book Wrap Up
This book has been fun to write but it has also somewhat frustrating.
It was fun because I have never been as excited by new technology as I have by LLMs and utility software like LangChain and LlamaIndex for building personalized applications.
This book was frustrating in the sense that it is now so very easy to build applications that just a few years would have been impossible to write. Usually when I write books I have two criteria: I only write about things that I am personally interested in and use, and I also hope to figure out non-obvious edge cases and make easier for my readers to use new tech. Here my frustration is writing about something that it is increasingly simple to do so I feel like my value is diminished.
All that said I hope, dear reader, that you found this book to be worth your time reading.
What am I going to do next? Although I am not fond of programming in JavaScript (although I find TypeScript to be somewhat better), I want to explore the possibilities of writing an open source Persistent Storage Web App Personal Knowledge Base Management System. I might get pushback on this but I would probably make it Apple Safari specific so I can use Apple’s CloudKit JS to make its use seamless across macOS, iPadOS, and iOS. If I get the right kind of feedback on social media I might write a book around this project.
Thank you for reading my book!
Best regards, Mark Watson