Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents

Mark Watson

Preface
Setting Up Your Computing Environment for Using Ollama and Using Book Example Programs
- Python Build Tools
Using Ollama From the Command Line
Short Examples
- Using The Ollama Python SDK with Image and Text Prompts
- Using the OpenAI Compatibility APIs with Local Models Running on Ollama
LLM Tool Calling with Ollama
Automatic Evaluation of LLM Results: More Tool Examples
Building Agents with Ollama and the Hugging Face smolagents Library
Using AG2 Open-Source AgentOS LLM-Based Agent Framework for Generating and Executing Python Code
Using the Unsloth Library on Google Colab to FineTune Models for Ollama
Reasoning with Large Language Models
Using Property Graph Database with Ollama
Book Wrap Up

Preface

Ollama is an open-source framework that enables users to run large language models (LLMs) locally on their computers, facilitating tasks like text summarization, chatbot development, and more. It supports various models, including Llama 3, Mistral, and Gemma, and offers flexibility in model sizes and quantization options to balance performance and resource usage. Ollama provides a command-line interface and an HTTP API for seamless integration into applications, making advanced AI capabilities accessible without relying on cloud services. Ollama is available on macOS, Linux, and Windows.

A main theme of this book are the advantages of running models privately on either your personal computer or a computer at work. While many commercial LLM API venders have options to not reuse your prompt data and the output generated from your prompts to train their systems, there is no better privacy and security than running open weight models on your own hardware.

This book is about running Large Language Models (LLMs) on your own hardware using Ollama. We will be using both the Ollama Python SDK library’s native support for passing text and images to LLMs as well as Ollama’s OpenAI API compatibility layer that lets you take any of the projects you may already run using OpenAI’s APIs and port them easily to run locally on Ollama.

To be clear, dear reader, although I have a strong preference to running smaller LLMs on my own hardware, I also frequently use commercial LLM API vendors like Anthropic, OpenAI, ABACUS.AI, GROQ, and Google to take advantage of features like advanced models and scalability using cloud-based hardware.

About the Author

I am an AI practitioner and consultant specializing in large language models, LangChain/Llama-Index integrations, deep learning, and the semantic web. I have authored over 20 authored books on topics including artificial intelligence, Python, Common Lisp, deep learning, Haskell, Clojure, Java, Ruby, the Hy language, and the semantic web. I have 55 U.S. patents. Please check out my home page and social media: my personal web site https://markwatson.com, X/Twitter, my Blog on Blogspot, and my Blog on Substack

Requests from the Author

This book will always be available to read free online at https://leanpub.com/ollama/read.

That said, I appreciate it when readers purchase my books because the income enables me to spend more time writing.

Hire the Author as a Consultant

I am available for short consulting projects. Please see https://markwatson.com.

Why Should We Care About Privacy?

Running local models using tools like Ollama can enhance privacy when dealing with sensitive data. Let’s delve into why privacy is crucial and how Ollama contributes to improved security.

Why is privacy important?

Privacy is paramount for several reasons:

Protection from Data Breaches: When data is processed by third-party services, it becomes vulnerable to potential data breaches. Storing and processing data locally minimizes this risk significantly. This is especially critical for sensitive information like personal details, financial records, or proprietary business data.
Compliance with Regulations: Many industries are subject to stringent data privacy regulations, such as GDPR, HIPAA, and CCPA. Running models locally can help organizations maintain compliance by ensuring data remains under their control.
Maintaining Confidentiality: For certain applications, like handling legal documents or medical records, maintaining confidentiality is of utmost importance. Local processing ensures that sensitive data isn’t exposed to external parties.
Data Ownership and Control: Individuals and organizations have a right to control their own data. Local models empower users to maintain ownership and make informed decisions about how their data is used and shared.
Preventing Misuse: By keeping data local, you reduce the risk of it being misused by third parties for unintended purposes, such as targeted advertising, profiling, or even malicious activities.

Security Improvements with Ollama

Ollama, as a tool for running large language models (LLMs) locally, offers several security advantages:

Data Stays Local: Ollama allows you to run models on your own hardware, meaning your data never leaves your local environment. This eliminates the need to send data to external servers for processing.
Reduced Attack Surface: By avoiding external communication for model inference, you significantly reduce the potential attack surface for malicious actors. There’s no need to worry about vulnerabilities in third-party APIs or network security.
Control over Model Access: With Ollama, you have complete control over who has access to your models and data. This is crucial for preventing unauthorized access and ensuring data security.
Transparency and Auditability: Running models locally provides greater transparency into the processing pipeline. You can monitor and audit the model’s behavior more easily, ensuring it operates as intended.
Customization and Flexibility: Ollama allows you to customize your local environment and security settings according to your specific needs. This level of control is often not possible with cloud-based solutions.

It’s important to note that while Ollama enhances privacy and security, it’s still crucial to follow general security best practices for your local environment. This includes keeping your operating system and software updated, using strong passwords, and implementing appropriate firewall rules.

Setting Up Your Computing Environment for Using Ollama and Using Book Example Programs

There is a GitHub repository that I have prepared for you, dear reader, to both support working through the examples for this book as well as hopefully provide utilities for your own projects.

You need to git clone the following repository:

https://github.com/mark-watson/OllamaExamples that contains tools I have written in Python that you can use with Ollama as well as utilities I wrote to avoid repeated code in the book examples. There are also application level example files that have the string “example” in the file names. Tool library files begin with “tool” and files starting with “Agent” contain one of several approaches to writing Agents.

Python Build Tools

The requirements.txt file contains the library requirements for all code developed in this book. My preference is to use venv and maintain a separate Python environment for each of the few hundred Python projects I have on my laptop. I keep a personal directory ~/bin on my PATH and I use the following script venv_setup.sh in the ~/bin directory to use a requirements.txt file to set up a virtual environment:

 1 #!/bin/zsh
 2 
 3 # Check if the directory has a requirements.txt file
 4 if [ ! -f "requirements.txt" ]; then
 5     echo "No requirements.txt file found in the current directory."
 6     exit 1
 7 fi
 8 
 9 # Create a virtual environment in the venv directory
10 python3 -m venv venv
11 
12 # Activate the virtual environment
13 source venv/bin/activate
14 
15 # Upgrade pip to the latest version
16 pip3 install --upgrade pip
17 
18 # Install dependencies from requirements.txt
19 pip3 install -r requirements.txt
20 
21 # Display installed packages
22 pip3 list
23 
24 echo "Virtual environment setup complete. Reactivate it with:"
25 echo "source venv/bin/activate"
26 echo ""

I sometimes like to use the much faster uv build and package management tool:

1 uv venv
2 source .venv/bin/activate
3 uv pip install -r requirements.txt
4 uv run ollama_tools_examples.py

There are many other good options like Anaconda, miniconda, poetry, etc.

Using Ollama From the Command Line

Working with Ollama from the command line provides a straightforward and efficient way to interact with large language models locally. The basic command structure starts with ollama run modelname, where modelname could be models like ’llama3’, ‘mistral’, or ‘codellama’. You can enhance your prompts using the -f flag for system prompts or context files, and the —verbose flag to see token usage and generation metrics. For example, ollama run llama2 -f system_prompt.txt “Your question here” lets you provide consistent context across multiple interactions.

One powerful technique is using Ollama’s model tags to maintain different versions or configurations of the same base model. For any model on the Ollama web site, you can view all available model tags, for example: https://ollama.com/library/llama2/tags.

The ollama list command helps you track installed models, and ollama rm modelname keeps your system clean. For development work, the —format json flag outputs responses in JSON format, making it easier to parse in scripts or applications; for example:

Using JSON Format

 1 $ ollama run qwq:latest --format json
 2 >>> What are the capitals of Germany and France?
 3 { 
 4   "Germany": {
 5     "Capital": "Berlin",
 6     "Population": "83.2 million",
 7     "Area": "137,847 square miles"
 8   },
 9   "France": {
10     "Capital": "Paris",
11     "Population": "67.4 million",
12     "Area": "248,573 square miles"
13   }
14 }
15 
16 >>> /bye

Analysis of Images

Advanced users can leverage Ollama’s multimodal capabilities and streaming options. For models like llava, you can pipe in image files using standard input or file paths. For example:

1 $ ollama run llava:7b "Describe this image" markcarol.jpg
2  The image is a photograph featuring a man and a woman looking 
3 off-camera, towards the left side of the frame. In the background, there are indisti\
4 nct objects that give the impression of an outdoor setting, possibly on a patio or d
5 eck.
6 
7 The focus and composition suggest that the photo was taken during the day in natural\
8  light.

While I only cover command line use in this one short chapter, I use Ollama in command line mode several hours a week for software development, usually using a Qwen coding LLM:

1 $ ollama run qwen2.5-coder:14b
2 >>> Send a message (/? for help)

I find that the qwen2.5-coder:14b model performs well for my most often used programming languages: Python, Common Lisp, Racket Scheme, and Haskell.

I also enjoy experimenting with the QwQ reasoning model even though it is so large it barely runs on my 32G M2 Pro system:

1 $ ollama run qwq:latest       
2 >>>

Analysis of Source Code Files

Here, assuming we are in the main directory for the GitHub repository for this book, we can ask for analysis of the tool for using SQLite databases(most output is not shown):

 1 $ ollama run qwen2.5-coder:14b < tool_sqlite.py 
 2 This code defines a Python application that interacts with an SQLite database using \
 3 SQL queries 
 4 generated by the Ollama language model. The application is structured around two mai\
 5 n classes:
 6 
 7 1. **SQLiteTool**: Manages interactions with an SQLite database.
 8    - Handles creating sample data, managing database connections, and executing SQL \
 9 queries.
10    - Provides methods to list tables in the database, get table schemas, and execute\
11  arbitrary SQL 
12 queries.
13 
14 2. **OllamaFunctionCaller**: Acts as a bridge between user inputs and the SQLite dat\
15 abase through the 
16 Ollama model.
17    - Defines functions that can be called by the Ollama model (e.g., querying the da\
18 tabase or listing 
19 tables).
20    - Generates prompts for the Ollama model based on user input, parses the response\
21  to identify which 
22 function should be executed, and then calls the appropriate method in `SQLiteTool`.
23 
24 ...

Unfortunately, when using the command ollama run qwen2.5-coder:14b < tool_sqlite.py, Ollama processes the input from the file and then exits the REPL. There’s no built-in way to stay in the Ollama REPL. However, if you want to analyze code and then interactively chat about the code, ask for code modifications, etc., you can try:

Start Ollama:
Paste the source code to tool_sqlite.py into Ollama REPL
Ask for advice, for example: “Please add code to print out the number of input and output tokens that are used by Ollama when calling function_caller.process_request(query)”

Short Examples

Here we look at a few short examples before later using libraries we develop and longer application style example programs with Ollama to solve more difficult problems.

Using The Ollama Python SDK with Image and Text Prompts

We saw an example of image processing in the last chapter using Ollama command line mode. Here we do the same thing using a short Python script that you can find in the file short_programs/Ollama_sdk_image_example.py:

 1 import ollama
 2 import base64
 3 
 4 def analyze_image(image_path: str, prompt: str) -> str:
 5     # Read and encode the image
 6     with open(image_path, 'rb') as img_file:
 7         image_data = base64.b64encode(img_file.read()).decode('utf-8')
 8 
 9     try:
10         # Create a stream of responses using the Ollama SDK
11         stream = ollama.generate(
12             model='llava:7b',
13             prompt=prompt,
14             images=[image_data],
15             stream=True
16         )
17 
18         # Accumulate the response
19         full_response = ""
20         for chunk in stream:
21             if 'response' in chunk:
22                 full_response += chunk['response']
23 
24         return full_response
25 
26     except Exception as e:
27         return f"Error processing image: {str(e)}"
28 
29 def main():
30     image_path = "data/sample.jpg"
31     prompt = "Please describe this image in detail, focusing on the actions of peopl\
32 e in the picture."
33 
34     result = analyze_image(image_path, prompt)
35     print("Analysis Result:")
36     print(result)
37 
38 if __name__ == "__main__":
39     main()

The output may look like the following when you run this example:

 1 Analysis Result:
 2  The image appears to be a photograph taken inside a room that serves as a meeting o\
 3 r gaming space and capturing an indoor scene where five individuals are engaged in p
 4 laying a tabletop card game. In the foreground, there is a table with a green surfac
 5 e and multiple items on it, including what looks like playing cards spread out in fr
 6 ont of the people seated around it.
 7 
 8 The room has a comfortable and homely feel, with elements like a potted plant in the\
 9  background on the left, which suggests that this might be a living room or a simila
10 r space repurposed for a group activity.

Using the OpenAI Compatibility APIs with Local Models Running on Ollama

If you frequently use the OpenAI APIs for either your own LLM projects or work projects, you might want to simply use the same SDK library from OpenAI but specify a local Ollama REST endpoint:

 1 import openai
 2 from typing import List, Dict
 3 
 4 class OllamaClient:
 5     def __init__(self, base_url: str = "http://localhost:11434/v1"):
 6         self.client = openai.OpenAI(
 7             base_url=base_url,
 8             api_key="fake-key"  # Ollama doesn't require authentication locally
 9         )
10 
11     def chat_with_context(
12         self,
13         system_context: str,
14         user_prompt: str,
15         model: str = "llama3.2:latest",
16         temperature: float = 0.7
17     ) -> str:
18         try:
19             messages = [
20                 {"role": "system", "content": system_context},
21                 {"role": "user", "content": user_prompt}
22             ]
23 
24             response = self.client.chat.completions.create(
25                 model=model,
26                 messages=messages,
27                 temperature=temperature,
28                 stream=False
29             )
30 
31             return response.choices[0].message.content
32 
33         except Exception as e:
34             return f"Error: {str(e)}"
35 
36     def chat_conversation(
37         self,
38         messages: List[Dict[str, str]],
39         model: str = "llama2"
40     ) -> str:
41         try:
42             response = self.client.chat.completions.create(
43                 model=model,
44                 messages=messages,
45                 stream=False
46             )
47 
48             return response.choices[0].message.content
49 
50         except Exception as e:
51             return f"Error: {str(e)}"
52 
53 def main():
54     # Initialize the client
55     client = OllamaClient()
56 
57     # Example 1: Single interaction with context
58     system_context = """You are a helpful AI assistant with expertise in 
59     programming and technology. Provide clear, concise answers."""
60 
61     user_prompt = "Explain the concept of recursion in programming."
62 
63     response = client.chat_with_context(
64         system_context=system_context,
65         user_prompt=user_prompt,
66         model="llama3.2:latest",
67         temperature=0.7
68     )
69 
70     print("Response with context:")
71     print(response)
72     print("\n" + "="*50 + "\n")
73 
74     # Example 2: Multi-turn conversation
75     conversation = [
76         {"role": "system", "content": "You are a helpful AI assistant."},
77         {"role": "user", "content": "What is machine learning?"},
78         {"role": "assistant", "content": "Machine learning is a subset of AI that en\
79 ables systems to learn from data."},
80         {"role": "user", "content": "Can you give me a simple example?"}
81     ]
82 
83     response = client.chat_conversation(
84         messages=conversation,
85         model="llama3.2:latest"
86     )
87 
88     print("Conversation response:")
89     print(response)
90 
91 if __name__ == "__main__":
92     main()

The output might look like (following listing is edited for brevity):

 1 Response with context:
 2 Recursion is a fundamental concept in programming that allows a function or method t\
 3 o call itself repeatedly until it reaches a base case that stops the recursion.
 4 
 5 **What is Recursion?**
 6 
 7 In simple terms, recursion is a programming technique where a function invokes itsel\
 8 f as a sub-procedure, repeating the same steps until it solves a problem ...
 9 
10 **Key Characteristics of Recursion:**
11 
12 1. **Base case**: A trivial case that stops the recursion.
13 2. **Recursive call**: The function calls itself with new input or parameters.
14 3. **Termination condition**: When the base case is reached, the recursion terminate\
15 s.
16 
17 **How Recursion Works:**
18 
19 Here's an example to illustrate recursion:
20 
21 Imagine you have a recursive function `factorial(n)` that calculates the factorial o\
22 f a number `n`. The function works as follows:
23 
24 1. If `n` is 0 or 1 (base case), return 1.
25 2. Otherwise, call itself with `n-1` as input and multiply the result by `n`.
26 3. Repeat step 2 until `n` reaches 0 or 1.
27 
28 Here's a simple recursive implementation in Python ...
29 
30 **Benefits of Recursion:**
31 
32 Recursion offers several advantages:
33 
34 * **Divide and Conquer**: Break down complex problems into smaller, more manageable \
35 sub-problems.
36 * **Elegant Code**: Recursive solutions can be concise and easy to understand.
37 * **Efficient**: Recursion can avoid explicit loops and reduce memory usage.
38 ...
39 
40 In summary, recursion is a powerful technique that allows functions to call themselv\
41 es repeatedly until they solve a problem. By understanding the basics of recursion a
42 nd its applications, you can write more efficient and elegant code for complex probl
43 ems.
44 
45 ==================================================
46 
47 Conversation response:
48 A simple example of machine learning is a spam filter.
49 
50 Imagine we have a system that scans emails and identifies whether they are spam or n\
51 ot. The system learns to classify these emails as spam or not based on the following
52  steps:
53 
54 1. Initially, it receives a large number of labeled data points (e.g., 1000 emails),\
55  where some emails are marked as "spam" and others as "not spam".
56 2. The system analyzes these examples to identify patterns and features that disting\
57 uish spam emails from non-spam messages.
58 3. Once the patterns are identified, the system can use them to classify new, unseen\
59  email data (e.g., a new email) as either spam or not spam.
60 
61 Over time, the system becomes increasingly accurate in its classification because it\
62  has learned from the examples and improvements have been made. This is essentially 
63 an example of supervised machine learning, where the system learns by being trained 
64 on labeled data.

In the next chapter we start developing tools that can be used for “function calling” with Ollama.

LLM Tool Calling with Ollama

There are several example Python tool utilities in the GitHub repository https://github.com/mark-watson/OllamaExamples that we will use for function calling that start with the “tool” prefix:

1 OllamaExamples $ ls tool*
2 tool_file_contents.py   tool_llm_eval.py    tool_web_search.py
3 tool_file_dir.py    tool_sqlite.py
4 tool_judge_results.py   tool_summarize_text.py

We postpone using the tools tool_llm_eval.py and tool_judge_results.py until the next chapter Automatic Evaluation of LLM Results

If you have not done so yet, please clone the repository for my Ollama book examples using:

1 git clone https://github.com/mark-watson/OllamaExamples.git

Use of Python docstrings at runtime:

The Ollama Python SDK leverages docstrings as a crucial part of its runtime function calling mechanism. When defining functions that will be called by the LLM, the docstrings serve as structured metadata that gets parsed and converted into a JSON schema format. This schema describes the function’s parameters, their types, and expected behavior, which is then used by the model to understand how to properly invoke the function. The docstrings follow a specific format that includes parameter descriptions, type hints, and return value specifications, allowing the SDK to automatically generate the necessary function signatures that the LLM can understand and work with.

During runtime execution, when the LLM determines it needs to call a function, it first reads these docstring-derived schemas to understand the function’s interface. The SDK parses these docstrings using Python’s introspection capabilities (through the inspect module) and matches the LLM’s intended function call with the appropriate implementation. This system allows for a clean separation between the function’s implementation and its interface description, while maintaining human-readable documentation that serves both as API documentation and runtime function calling specifications. The docstring parsing is done lazily at runtime when the function is first accessed, and the resulting schema is typically cached to improve performance in subsequent calls.

Example Showing the Use of Tools Developed Later in this Chapter

The source file ollama_tools_examples.py contains simple examples of using the tools we develop later in this chapter. We will look at example code using the tools, then at the implementation of the tools. In this examples source file we first import these tools:

 1 from tool_file_dir import list_directory
 2 from tool_file_contents import read_file_contents
 3 from tool_web_search import uri_to_markdown
 4 
 5 import ollama
 6 
 7 # Map function names to function objects
 8 available_functions = {
 9     'list_directory': list_directory,
10     'read_file_contents': read_file_contents,
11     'uri_to_markdown': uri_to_markdown,
12 }
13 
14 # User prompt
15 user_prompt = "Please list the contents of the current directory, read the 'requirem\
16 ents.txt' file, and convert 'https://markwatson.com' to markdown."
17 
18 # Initiate chat with the model
19 response = ollama.chat(
20     model='llama3.1',
21     messages=[{'role': 'user', 'content': user_prompt}],
22     tools=[list_directory, read_file_contents, uri_to_markdown],
23 )
24 
25 # Process the model's response
26 for tool_call in response.message.tool_calls or []:
27     function_to_call = available_functions.get(tool_call.function.name)
28     print(f"{function_to_call=}")
29     if function_to_call:
30         result = function_to_call(**tool_call.function.arguments)
31         print(f"Output of {tool_call.function.name}: {result}")
32     else:
33         print(f"Function {tool_call.function.name} not found.")

This code demonstrates the integration of a local LLM with custom tool functions for file system operations and web content processing. It imports three utility functions for listing directories, reading file contents, and converting URLs to markdown, then maps them to a dictionary for easy access.

The main execution flow involves sending a user prompt to the Ollama hosted model (here we are using the small IBM “granite3-dense” model), which requests directory listing, file reading, and URL conversion operations. The code then processes the model’s response by iterating through any tool calls returned, executing the corresponding functions, and printing their results. Error handling is included for cases where requested functions aren’t found in the available tools dictionary.

Here is sample output from using these three tools (most output removed for brevity and blank lines added for clarity):

 1 $ python ollama_tools_examples.py
 2 
 3 function_to_call=<function read_file_contents at 0x104fac9a0>
 4 
 5 Output of read_file_contents: {'content': 'git+https://github.com/mark-watson/Ollama\
 6 _Tools.git\nrequests\nbeautifulsoup4\naisuite[ollama]\n\n', 'size': 93, 'exists': Tr
 7 ue, 'error': None}
 8 
 9 function_to_call=<function list_directory at 0x1050389a0>
10 Output of list_directory: {'files': ['.git', '.gitignore', 'LICENSE', 'Makefile', 'R\
11 EADME.md', 'ollama_tools_examples.py', 'requirements.txt', 'venv'], 'count': 8, 'cur
12 rent_dir': '/Users/markw/GITHUB/Ollama-book-examples', 'error': None}
13 
14 function_to_call=<function uri_to_markdown at 0x105038c20>
15 
16 Output of uri_to_markdown: {'content': 'Read My Blog on Blogspot\n\nRead My Blog on \
17 Substack\n\nConsulting\n\nFree Mentoring\n\nFun stuff\n\nMy Books\n\nOpen Source\n\n
18  Privacy Policy\n\n# Mark Watson AI Practitioner and Consultant Specializing in Larg
19 e Language Models, LangChain/Llama-Index Integrations, Deep Learning, and the Semant
20 ic Web\n\n# I am the author of 20+ books on Artificial Intelligence, Python, Common 
21 Lisp, Deep Learning, Haskell, Clojure, Java, Ruby, Hy language, and the Semantic Web
22 . I have 55 US Patents.\n\nMy customer list includes: Google, Capital One, Babylist,
23  Olive AI, CompassLabs, Mind AI, Disney, SAIC, Americast, PacBell, CastTV, Lutris Te
24 chnology, Arctan Group, Sitescout.com, Embed.ly, and Webmind Corporation.
25 
26  ...
27 
28  # Fun stuff\n\nIn addition to programming and writing my hobbies are cooking,\n pho\
29 tography, hiking, travel, and playing the following musical instruments: guitar, did
30 geridoo, and American Indian flute:\n\nMy guitar playing: a boogie riff\n\nMy didger
31 idoo playing\n\nMy Spanish guitar riff\n\nPlaying with George (didgeridoo), Carol an
32 d Crystal (drums and percussion) and Mark (Indian flute)\n\n# Open Source\n\nMy Open
33  Source projects are hosted on my github account so please check that out!
34 
35  ...
36 
37 Hosted on Cloudflare Pages\n\n © Mark Watson 1994-2024\n\nPrivacy Policy', 'title': \
38 'Mark Watson: AI Practitioner and Author of 20+ AI Books | Mark Watson', 'error': No
39 ne}

Please note that the text extracted from a web page is mostly plain text. Section heads are maintained but the format is changed to markdown format. In the last (edited for brevity) listing, the HTML H1 element with the text Fun Stuff is converted to markdown:

1 # Fun stuff
2 
3 In addition to programming and writing my hobbies are cooking,
4 photography, hiking, travel, and playing the following musical
5 instruments: guitar, didgeridoo, and American Indian flute ...

You have now looked at example tool use. We will now implement the several tools in this chapter and the next. We will look at the first tool for reading and writing files in fine detail and then more briefly discuss the other tools in the https://github.com/mark-watson/OllamaExamples repository.

Tool for Reading and Writing File Contents

This tool is meant to be combined with other tools, for example a summarization tool and a file reading tool might be used to process a user prompt to summarize a specific local file on your laptop.

Here is the contents of tool utility tool_file_contents.py:

 1 """
 2 Provides functions for reading and writing file contents with proper error handling
 3 """
 4 
 5 from pathlib import Path
 6 import json
 7 
 8 
 9 def read_file_contents(file_path: str, encoding: str = "utf-8") -> str:
10     """
11     Reads contents from a file and returns the text
12 
13     Args:
14         file_path (str): Path to the file to read
15         encoding (str): File encoding to use (default: utf-8)
16 
17     Returns:
18         Contents of the file as a string
19     """
20     try:
21         path = Path(file_path)
22         if not path.exists():
23             return f"File not found: {file_path}"
24 
25         with path.open("r", encoding=encoding) as f:
26             content = f.read()
27             return f"Contents of file '{file_path}' is:\n{content}\n"
28 
29     except Exception as e:
30         return f"Error reading file '{file_path}' is: {str(e)}"
31 
32 
33 def write_file_contents(
34         file_path: str, content: str,
35         encoding: str = "utf-8",
36         mode: str = "w") -> str:
37     """
38     Writes content to a file and returns operation status
39 
40     Args:
41         file_path (str): Path to the file to write
42         content (str): Content to write to the file
43         encoding (str): File encoding to use (default: utf-8)
44         mode (str): Write mode ('w' for write, 'a' for append)
45 
46     Returns:
47         a message string
48     """
49     try:
50         path = Path(file_path)
51 
52         # Create parent directories if they don't exist
53         path.parent.mkdir(parents=True, exist_ok=True)
54 
55         with path.open(mode, encoding=encoding) as f:
56             bytes_written = f.write(content)
57 
58         return f"File '{file_path}' written OK."
59 
60     except Exception as e:
61         return f"Error writing file '{file_path}': {str(e)}"
62 
63 
64 # Function metadata for Ollama integration
65 read_file_contents.metadata = {
66     "name": "read_file_contents",
67     "description": "Reads contents from a file and returns the content as a string",
68     "parameters": {"file_path": "Path to the file to read"},
69 }
70 
71 write_file_contents.metadata = {
72     "name": "write_file_contents",
73     "description": "Writes content to a file and returns operation status",
74     "parameters": {
75         "file_path": "Path to the file to write",
76         "content": "Content to write to the file",
77         "encoding": "File encoding (default: utf-8)",
78         "mode": 'Write mode ("w" for write, "a" for append)',
79     },
80 }
81 
82 # Export the functions
83 __all__ = ["read_file_contents", "write_file_contents"]

read_file_contents

This function provides file reading capabilities with robust error handling with parameters:

file_path (str): Path to the file to read
encoding (str, optional): File encoding (defaults to “utf-8”)

Features:

Uses pathlib.Path for cross-platform path handling
Checks file existence before attempting to read
Returns file contents with descriptive message
Comprehensive error handling

LLM Integration:

Includes metadata for function discovery
Returns descriptive string responses instead of raising exceptions

write_file_contents

This function handles file writing operations with built-in safety features. The parameters are:

file_path (str): Path to the file to write
content (str): Content to write to the file
encoding (str, optional): File encoding (defaults to “utf-8”)
mode (str, optional): Write mode (‘w’ for write, ‘a’ for append)

Features:

Automatically creates parent directories
Supports write and append modes
Uses context managers for safe file handling
Returns operation status messages

LLM Integration:

Includes detailed metadata for function calling
Provides clear feedback about operations

Common Features of both functions:

Type hints for better code clarity
Detailed docstrings that are used at runtime in the tool/function calling code. The text in the doc strings is supplied as context to the LLM currently in use.
Proper error handling
UTF-8 default encoding
Context managers for file operations
Metadata for LLM function discovery

Design Benefits for LLM Integration: the utilities are optimized for LLM function calling by:

Returning descriptive string responses
Including metadata for function discovery
Handling errors gracefully
Providing clear operation feedback
Using consistent parameter patterns

Tool for Getting File Directory Contents

This tool is similar to the last tool so here we just list the worker function from the file tool_file_dir.py:

 1 def list_directory(pattern: str = "*", list_dots=None) -> Dict[str, Any]:
 2     """
 3     Lists files and directories in the current working directory
 4 
 5     Args:
 6         pattern (str): Glob pattern for filtering files (default: "*")
 7 
 8     Returns:
 9         string with directory name, followed by list of files in the directory
10     """
11     try:
12         current_dir = Path.cwd()
13         files = list(current_dir.glob(pattern))
14 
15         # Convert Path objects to strings and sort
16         file_list = sorted([str(f.name) for f in files])
17 
18         file_list = [file for file in file_list if not file.endswith("~")]
19         if not list_dots:
20             file_list = [file for file in file_list if not file.startswith(".")]
21 
22         return f"Contents of current directory: [{', '.join(file_list)}]"
23 
24     except Exception as e:
25         return f"Error listing directory: {str(e)}"

Tool for Accessing SQLite Databases Using Natural Language Queries

The example file tool_sqlite.py serves two purposes here:

Test and example code: utility function _create_sample_data creates several database tables and the function main serves as an example program.
The Python class definitions SQLiteTool and OllamaFunctionCaller are meant to be copied and used in your applications.

  1 import sqlite3
  2 import json
  3 from typing import Dict, Any, List, Optional
  4 import ollama
  5 from functools import wraps
  6 import re
  7 from contextlib import contextmanager
  8 from textwrap import dedent # for multi-line string literals
  9 
 10 class DatabaseError(Exception):
 11     """Custom exception for database operations"""
 12     pass
 13 
 14 
 15 def _create_sample_data(cursor):  # Helper function to create sample data
 16     """Create sample data for tables"""
 17     sample_data = {
 18         'example': [
 19             ('Example 1', 10.5),
 20             ('Example 2', 25.0)
 21         ],
 22         'users': [
 23             ('Bob', 'bob@example.com'),
 24             ('Susan', 'susan@test.net')
 25         ],
 26         'products': [
 27             ('Laptop', 1200.00),
 28             ('Keyboard', 75.50)
 29         ]
 30     }
 31 
 32     for table, data in sample_data.items():
 33         for record in data:
 34             if table == 'example':
 35                 cursor.execute(
 36                     "INSERT INTO example (name, value) VALUES (?, ?) ON CONFLICT DO \
 37 NOTHING",
 38                     record
 39                 )
 40             elif table == 'users':
 41                 cursor.execute(
 42                     "INSERT INTO users (name, email) VALUES (?, ?) ON CONFLICT DO NO\
 43 THING",
 44                     record
 45                 )
 46             elif table == 'products':
 47                 cursor.execute(
 48                     "INSERT INTO products (product_name, price) VALUES (?, ?) ON CON\
 49 FLICT DO NOTHING",
 50                     record
 51                 )
 52 
 53 
 54 class SQLiteTool:
 55     _instance = None
 56 
 57     def __new__(cls, *args, **kwargs):
 58         if not isinstance(cls._instance, cls):
 59             cls._instance = super(SQLiteTool, cls).__new__(cls)
 60         return cls._instance
 61 
 62     def __init__(self, default_db: str = "test.db"):
 63         if hasattr(self, 'default_db'):  # Skip initialization if already done
 64             return
 65         self.default_db = default_db
 66         self._initialize_database()
 67 
 68     @contextmanager
 69     def get_connection(self):
 70         """Context manager for database connections"""
 71         conn = sqlite3.connect(self.default_db)
 72         try:
 73             yield conn
 74         finally:
 75             conn.close()
 76 
 77     def _initialize_database(self):
 78         """Initialize database with tables"""
 79         tables = {
 80             'example': """
 81                 CREATE TABLE IF NOT EXISTS example (
 82                     id INTEGER PRIMARY KEY,
 83                     name TEXT,
 84                     value REAL
 85                 );
 86             """,
 87             'users': """
 88                 CREATE TABLE IF NOT EXISTS users (
 89                     id INTEGER PRIMARY KEY,
 90                     name TEXT,
 91                     email TEXT UNIQUE
 92                 );
 93             """,
 94             'products': """
 95                 CREATE TABLE IF NOT EXISTS products (
 96                     id INTEGER PRIMARY KEY,
 97                     product_name TEXT,
 98                     price REAL
 99                 );
100             """
101         }
102 
103         with self.get_connection() as conn:
104             cursor = conn.cursor()
105             for table_sql in tables.values():
106                 cursor.execute(table_sql)
107             conn.commit()
108             _create_sample_data(cursor)
109             conn.commit()
110 
111     def get_tables(self) -> List[str]:
112         """Get list of tables in the database"""
113         with self.get_connection() as conn:
114             cursor = conn.cursor()
115             cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
116             return [table[0] for table in cursor.fetchall()]
117 
118     def get_table_schema(self, table_name: str) -> List[tuple]:
119         """Get schema for a specific table"""
120         with self.get_connection() as conn:
121             cursor = conn.cursor()
122             cursor.execute(f"PRAGMA table_info({table_name});")
123             return cursor.fetchall()
124 
125     def execute_query(self, query: str) -> List[tuple]:
126         """Execute a SQL query and return results"""
127         with self.get_connection() as conn:
128             cursor = conn.cursor()
129             try:
130                 cursor.execute(query)
131                 return cursor.fetchall()
132             except sqlite3.Error as e:
133                 raise DatabaseError(f"Query execution failed: {str(e)}")
134 
135 class OllamaFunctionCaller:
136     def __init__(self, model: str = "llama3.2:latest"):
137         self.model = model
138         self.sqlite_tool = SQLiteTool()
139         self.function_definitions = self._get_function_definitions()
140 
141     def _get_function_definitions(self) -> Dict:
142         return {
143             "query_database": {
144                 "description": "Execute a SQL query on the database",
145                 "parameters": {
146                     "type": "object",
147                     "properties": {
148                         "query": {
149                             "type": "string",
150                             "description": "The SQL query to execute"
151                         }
152                     },
153                     "required": ["query"]
154                 }
155             },
156             "list_tables": {
157                 "description": "List all tables in the database",
158                 "parameters": {
159                     "type": "object",
160                     "properties": {}
161                 }
162             }
163         }
164 
165     def _generate_prompt(self, user_input: str) -> str:
166         prompt = dedent(f"""
167             You are a SQL assistant. Based on the user's request, generate a JSON re\
168 sponse that calls the appropriate function.
169             Available functions: {json.dumps(self.function_definitions, indent=2)}
170 
171             User request: {user_input}
172 
173             Respond with a JSON object containing:
174             - "function": The function name to call
175             - "parameters": The parameters for the function
176 
177             Response:
178         """).strip()
179         return prompt
180 
181     def _parse_ollama_response(self, response: str) -> Dict[str, Any]:
182         try:
183             json_match = re.search(r'\{.*\}', response, re.DOTALL)
184             if not json_match:
185                 raise ValueError("No valid JSON found in response")
186             return json.loads(json_match.group())
187         except json.JSONDecodeError as e:
188             raise ValueError(f"Invalid JSON in response: {str(e)}")
189 
190     def process_request(self, user_input: str) -> Any:
191         try:
192             response = ollama.generate(model=self.model, prompt=self._generate_promp\
193 t(user_input))
194             function_call = self._parse_ollama_response(response.response)
195 
196             if function_call["function"] == "query_database":
197                 return self.sqlite_tool.execute_query(function_call["parameters"]["q\
198 uery"])
199             elif function_call["function"] == "list_tables":
200                 return self.sqlite_tool.get_tables()
201             else:
202                 raise ValueError(f"Unknown function: {function_call['function']}")
203         except Exception as e:
204             raise RuntimeError(f"Request processing failed: {str(e)}")
205 
206 def main():
207     function_caller = OllamaFunctionCaller()
208     queries = [
209         "Show me all tables in the database",
210         "Get all users from the users table",
211         "What are the top 5 products by price?"
212     ]
213 
214     for query in queries:
215         try:
216             print(f"\nQuery: {query}")
217             result = function_caller.process_request(query)
218             print(f"Result: {result}")
219         except Exception as e:
220             print(f"Error processing query: {str(e)}")
221 
222 if __name__ == "__main__":
223     main()

This code provides a natural language interface for interacting with an SQLite database. It uses a combination of Python classes, SQLite, and Ollama for running a language model to interpret user queries and execute corresponding database operations. Below is a breakdown of the code:

Database Setup and Error Handling: a custom exception class, DatabaseError, is defined to handle database-specific errors. The database is initialized with three tables: example, users, and products. These tables are populated with sample data for demonstration purposes.
SQLiteTool Class: the SQLiteTool class is a singleton that manages all SQLite database operations. Key features include:–Singleton Pattern: Ensures only one instance of the class is created.–Database Initialization: Creates tables (example, users, products) if they do not already exist.–Sample Data: Populates the tables with predefined sample data.–Context Manager: Safely manages database connections using a context manager.

Utility Methods:

get_tables: Retrieves a list of all tables in the database.
get_table_schema: Retrieves the schema of a specific table.
execute_query: Executes a given SQL query and returns the results.

Sample Data Creation:

A helper function, _create_sample_data, is used to populate the database with sample data. It inserts records into the example, users, and products tables. This ensures the database has some initial data for testing and demonstration.

OllamaFunctionCaller Class:

The OllamaFunctionCaller class acts as the interface between natural language queries and database operations. Key components include:

Integration with Ollama LLM: Uses the Ollama language model to interpret natural language queries.
Function Definitions: Defines two main functions:–query_database: Executes SQL queries on the database.–list_tables: Lists all tables in the database.
Prompt Generation: Converts user input into a structured prompt for the language model.
Response Parsing: Parses the language model’s response into a JSON object that specifies the function to call and its parameters.
Request Processing: Executes the appropriate database operation based on the parsed response.

Function Definitions:

The OllamaFunctionCaller class defines two main functions that can be called based on user input:

query_database: Executes a SQL query provided by the user and returns the results of the query.
list_tables: Lists all tables in the database and is useful for understanding the database structure.

Request Processing Workflow:

The process_request method handles the entire workflow of processing a user query:

Input: Takes a natural language query from the user.
Prompt Generation: Converts the query into a structured prompt for the Ollama language model.
Response Parsing: Parses the language model’s response into a JSON object.
Function Execution: Calls the appropriate function (query_database or list_tables) based on the parsed response.
Output: Returns the results of the database operation.

Main test/example function:

The main function demonstrates how the system works with sample queries. It initializes the OllamaFunctionCaller and processes a list of example queries, such as:

“Show me all tables in the database.“
“Get all users from the users table.“
“What are the top 5 products by price?“

For each query, the system interprets the natural language input, executes the corresponding database operation, and prints the results.

Summary:

This code creates a natural language interface for interacting with an SQLite database. It works as follows:

Database Management: The SQLiteTool class handles all database operations, including initialization, querying, and schema inspection.
Natural Language Processing: The OllamaFunctionCaller uses the Ollama language model to interpret user queries and map them to database functions.
Execution: The system executes the appropriate database operation and returns the results to the user.

This approach allows users to interact with the database using natural language instead of writing SQL queries directly, making it more user-friendly and accessible.

The output looks like this:

 1 python /Users/markw/GITHUB/OllamaExamples/tool_sqlite.py 
 2 
 3 Query: Show me all tables in the database
 4 Result: ['example', 'users', 'products']
 5 
 6 Query: Get all users from the users table
 7 Result: [(1, 'Bob', 'bob@example.com'), (2, 'Susan', 'susan@test.net')]
 8 
 9 Query: What are the top 5 products by price?
10 Result: [(1, 'Laptop', 1200.0), (3, 'Laptop', 1200.0), (2, 'Keyboard', 75.5), (4, 'K\
11 eyboard', 75.5)]

Tool for Summarizing Text

Tools that are used by LLMs can themselves also use other LLMs. The tool defined in the file tool_summarize_text.py might be triggered by a user prompt such as “summarize the text in local file test1.txt” of “summarize text from web page https://markwatson.com” where it is used by other tools like reading a local file contents, fetching a web page, etc.

We will start by looking at the file tool_summarize_text.py and then look at an example in example_chain_web_summary.py.

 1 """
 2 Summarize text
 3 """
 4 
 5 from ollama import ChatResponse
 6 from ollama import chat
 7 
 8 
 9 def summarize_text(text: str, context: str = "") -> str:
10     """
11     Summarizes text
12 
13     Parameters:
14         text (str): text to summarize
15         context (str): another tool's output can at the application layer can be use\
16 d set the context for this tool.
17 
18     Returns:
19         a string of summarized text
20 
21     """
22     prompt = "Summarize this text (and be concise), returning only the summary with \
23 NO OTHER COMMENTS:\n\n"
24     if len(text.strip()) < 50:
25         text = context
26     elif len(context) > 50:
27         prompt = f"Given this context:\n\n{context}\n\n" + prompt
28 
29     summary: ChatResponse = chat(
30         model="llama3.2:latest",
31         messages=[
32             {"role": "system", "content": prompt},
33             {"role": "user", "content": text},
34         ],
35     )
36     return summary["message"]["content"]
37 
38 
39 # Function metadata for Ollama integration
40 summarize_text.metadata = {
41     "name": "summarize_text",
42     "description": "Summarizes input text",
43     "parameters": {"text": "string of text to summarize",
44                    "context": "optional context string"},
45 }
46 
47 # Export the functions
48 __all__ = ["summarize_text"]

This Python code implements a text summarization tool using the Ollama chat model. The core function summarize_text takes two parameters: the main text to summarize and an optional context string. The function operates by constructing a prompt that instructs the model to provide a concise summary without additional commentary. It includes an interesting logic where if the input text is very short (less than 50 characters), it defaults to using the context parameter instead. Additionally, if there’s substantial context provided (more than 50 characters), it prepends this context to the prompt. The function utilizes the Ollama chat model “llama3.2:latest” to generate the summary, structuring the request with a system message containing the prompt and a user message containing the text to be summarized. The program includes metadata for Ollama integration, specifying the function name, description, and parameter details, and exports the summarize_text function through all.

Here is an example of using this tool that you can find in the file example_chain_web_summary.py. Please note that this example also uses the web search tool that is discussed in the next section.

 1 from tool_web_search import uri_to_markdown
 2 from tool_summarize_text import summarize_text
 3 
 4 from pprint import pprint
 5 
 6 import ollama
 7 
 8 # Map function names to function objects
 9 available_functions = {
10     "uri_to_markdown": uri_to_markdown,
11     "summarize_text": summarize_text,
12 }
13 
14 memory_context = ""
15 # User prompt
16 user_prompt = "Get the text of 'https://knowledgebooks.com' and then summarize the t\
17 ext."
18 
19 # Initiate chat with the model
20 response = ollama.chat(
21     model='llama3.2:latest',
22     messages=[{"role": "user", "content": user_prompt}],
23     tools=[uri_to_markdown, summarize_text],
24 )
25 
26 # Process the model's response
27 
28 pprint(response.message.tool_calls)
29 
30 for tool_call in response.message.tool_calls or []:
31     function_to_call = available_functions.get(tool_call.function.name)
32     print(
33         f"\n***** {function_to_call=}\n\nmemory_context[:70]:\n\n{memory_context[:70\
34 ]}\n\n*****\n"
35     )
36     if function_to_call:
37         print()
38         if len(memory_context) > 10:
39             tool_call.function.arguments["context"] = memory_context
40         print("\n* * tool_call.function.arguments:\n")
41         pprint(tool_call.function.arguments)
42         print(f"Arguments for {function_to_call.__name__}: {tool_call.function.argum\
43 ents}")
44         result = function_to_call(**tool_call.function.arguments)  # , memory_contex\
45 t)
46         print(f"\n\n** Output of {tool_call.function.name}: {result}")
47         memory_context = memory_context + "\n\n" + result
48     else:
49         print(f"\n\n** Function {tool_call.function.name} not found.")

Here is the output edited for brevity:

  1 python /Users/markw/GITHUB/OllamaExamples/example_chain_web_summary.py 
  2 [ToolCall(function=Function(name='uri_to_markdown', arguments={'a_uri': 'https://kno\
  3 wledgebooks.com'})),
  4  ToolCall(function=Function(name='summarize_text', arguments={'context': '', 'text':\
  5  'uri_to_markdown(a_uri = "https://knowledgebooks.com")'}))]
  6 
  7 ***** function_to_call=<function uri_to_markdown at 0x1047da200>
  8 
  9 memory_context[:70]:
 10 
 11 
 12 
 13 *****
 14 
 15 
 16 
 17 * * tool_call.function.arguments:
 18 
 19 {'a_uri': 'https://knowledgebooks.com'}
 20 Arguments for uri_to_markdown: {'a_uri': 'https://knowledgebooks.com'}
 21 INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
 22 
 23 
 24 ** Output of uri_to_markdown: Contents of URI https://knowledgebooks.com is:
 25 # KnowledgeBooks.com - research on the Knowledge Management, and the Semantic Web 
 26 
 27 KnowledgeBooks.com - research on the Knowledge Management, and the Semantic Web 
 28 
 29 KnowledgeBooks.com 
 30 
 31 Knowledgebooks.com 
 32 a sole proprietorship company owned by Mark Watson
 33 to promote Knowledge Management, Artificial Intelligence (AI), NLP, and Semantic Web\
 34  technologies.
 35 
 36 Site updated: December 1, 2018
 37 With the experience of working on Machine Learning and Knowledge Graph applications \
 38 for 30 years (at Google,
 39  Capital One, SAIC, Compass Labs, etc.) I am now concerned that the leverage of deep\
 40  learning and knowledge
 41  representation technologies are controlled by a few large companies, mostly in Chin\
 42 a and the USA. I am proud
 43  to be involved organizations like Ocean Protocol and Common Crawl that seek tp incr\
 44 ease the availability of quality data
 45  to individuals and smaller organizations.
 46 Traditional knowledge management tools relied on structured data often stored in rel\
 47 ational databases. Adding
 48  new relations to this data would require changing the schemas used to store data wh\
 49 ich could negatively
 50  impact exisiting systems that used that data. Relationships between data in traditi\
 51 onal systems was
 52  predefined by the structure/schema of stored data. With RDF and OWL based data mode\
 53 ling, relationships in
 54  data are explicitly defined in the data itself. Semantic data is inherently flexibl\
 55 e and extensible: adding
 56  new data and relationships is less likely to break older systems that relied on the\
 57  previous verisons of
 58  data.
 59 A complementary technology for knowledge management is the automated processing of u\
 60 nstructured text data
 61  into semantic data using natural language processing (NLP) and statistical-base tex\
 62 t analytics.
 63 We will help you integrate semantic web and text analytics technologies into your or\
 64 ganization by working
 65  with your staff in a mentoring role and also help as needed with initial developmen\
 66 t. All for reasonable consulting rates
 67 Knowledgebooks.com Technologies:
 68 
 69 SAAS KnowledgeBooks Semantic NLP Portal (KBSportal.com) used for
 70  in-house projects and available as a product to run on your servers.
 71 Semantic Web Ontology design and development
 72 Semantic Web application design and development using RDF data stores, PostgreSQL, a\
 73 nd MongoDB.
 74 
 75 Research
 76 Natural Language Processing (NLP) using deep learning
 77 Fusion of classic symbolic AI systems with deep learning models
 78 Linked data, semantic web, and Ontology's
 79 News ontology
 80 Note: this ontology was created in 2004 using the Protege modeling tool.
 81 About
 82 KnowledgeBooks.com is owned as a sole proprietor business by Mark and Carol Watson.
 83 Mark Watson is an author of 16 published books and a consultant specializing in the \
 84 JVM platform
 85  (Java, Scala, JRuby, and Clojure), artificial intelligence, and the Semantic Web.
 86 Carol Watson helps prepare training data and serves as the editor for Mark's publish\
 87 ed books.
 88 Privacy policy: this site collects no personal data or information on site visitors
 89 Hosted on Cloudflare Pages.
 90 
 91 
 92 ***** function_to_call=<function summarize_text at 0x107519260>
 93 
 94 memory_context[:70]:
 95 
 96 
 97 
 98 Contents of URI https://knowledgebooks.com is:
 99 # KnowledgeBooks.com 
100 
101 *****
102 
103 
104 
105 * * tool_call.function.arguments:
106 
107 {'context': '\n'
108             '\n'
109             'Contents of URI https://knowledgebooks.com is:\n'
110             '# KnowledgeBooks.com - research on the Knowledge Management, and '
111             'the Semantic Web \n'
112             '\n'
113             'KnowledgeBooks.com - research on the Knowledge Management, and '
114 ...
115             'Carol Watson helps prepare training data and serves as the editor '
116             "for Mark's published books.\n"
117             'Privacy policy: this site collects no personal data or '
118             'information on site visitors\n'
119             'Hosted on Cloudflare Pages.\n',
120  'text': 'uri_to_markdown(a_uri = "https://knowledgebooks.com")'}
121 Arguments for summarize_text: {'context': "\n\nContents of URI https://knowledgebook\
122 s.com is:\n# KnowledgeBooks.com - research on the Knowledge Management, and the Sema
123 ntic Web \n\nKnowledgeBooks.com - research on the Knowledge Management, and the Sema
124 ntic Web \n\nKnowledgeBooks.com \n\nKnowledgebooks.com \na sole proprietorship compa
125 ny owned by Mark Watson\nto promote Knowledge Management, Artificial Intelligence (A
126 I), NLP, and Semantic Web technologies.
127 
128 ...
129 
130 \n\nResearch\nNatural Language Processing (NLP) using deep learning\nFusion of class\
131 ic symbolic AI systems with deep learning models\nLinked data, semantic web, and Ont
132 ology's\nNews ontology\nNote: this ontology was created in 2004 using the Protege mo
133 deling tool.\nAbout\nKnowledgeBooks.com is owned as a sole proprietor business by Ma
134 rk and Carol Watson.\nMark Watson is an author of 16 published books and a consultan
135 t specializing in the JVM platform\n (Java, Scala, JRuby, and Clojure), artificial i
136 ntelligence, and the Semantic Web.\nCarol Watson helps prepare training data and ser
137 ves as the editor for Mark's published books.\nPrivacy policy: this site collects no
138  personal data or information on site visitors\nHosted on Cloudflare Pages.\n", 'tex
139 t': 'uri_to_markdown(a_uri = "https://knowledgebooks.com")'}
140 
141 
142 ** Output of summarize_text: # Knowledge Management and Semantic Web Research
143 ## About KnowledgeBooks.com
144 A sole proprietorship company by Mark Watson promoting AI, NLP, and Semantic Web tec\
145 hnologies.
146 ### Technologies
147 - **SAAS KnowledgeBooks**: Semantic NLP Portal for in-house projects and product sal\
148 es.
149 - **Semantic Web Development**: Ontology design and application development using RD\
150 F data stores.
151 
152 ### Research Areas
153 - Natural Language Processing (NLP) with deep learning
154 - Fusion of symbolic AI systems with deep learning models
155 - Linked data, semantic web, and ontologies

Tool for Web Search and Fetching Web Pages

This code provides a set of functions for web searching and HTML content processing, with the main functions being uri_to_markdown, search_web, brave_search_summaries, and brave_search_text. The uri_to_markdown function fetches content from a given URI and converts HTML to markdown-style text, handling various edge cases and cleaning up the text by removing multiple blank lines and spaces while converting HTML entities. The search_web function is a placeholder that’s meant to be implemented with a preferred search API, while brave_search_summaries implements actual web searching using the Brave Search API, requiring an API key from the environment variables and returning structured results including titles, URLs, and descriptions. The brave_search_text function builds upon brave_search_summaries by fetching search results and then using uri_to_markdown to convert the content of each result URL to text, followed by summarizing the content using a separate summarize_text function. The code also includes utility functions like replace_html_tags_with_text which uses BeautifulSoup to strip HTML tags and return plain text, and includes proper error handling, logging, and type hints throughout. The module is designed to be integrated with Ollama and exports uri_to_markdown and search_web as its primary interfaces.

  1 """
  2 Provides functions for web searching and HTML to Markdown conversion
  3 and for returning the contents of a URI as plain text (with minimal markdown)
  4 """
  5 
  6 from typing import Dict, Any
  7 import requests
  8 from bs4 import BeautifulSoup
  9 import re
 10 from urllib.parse import urlparse
 11 import html
 12 from ollama import chat
 13 import json
 14 from tool_summarize_text import summarize_text
 15 
 16 import requests
 17 import os
 18 import logging
 19 from pprint import pprint
 20 from bs4 import BeautifulSoup
 21 
 22 logging.basicConfig(level=logging.INFO)
 23 
 24 api_key = os.environ.get("BRAVE_SEARCH_API_KEY")
 25 if not api_key:
 26     raise ValueError(
 27         "API key not found. Set 'BRAVE_SEARCH_API_KEY' environment variable."
 28     )
 29 
 30 
 31 def replace_html_tags_with_text(html_string):
 32     soup = BeautifulSoup(html_string, "html.parser")
 33     return soup.get_text()
 34 
 35 
 36 def uri_to_markdown(a_uri: str) -> Dict[str, Any]:
 37     """
 38     Fetches content from a URI and converts HTML to markdown-style text
 39 
 40     Args:
 41         a_uri (str): URI to fetch and convert
 42 
 43     Returns:
 44         web page text converted converted markdown content
 45     """
 46     try:
 47         # Validate URI
 48         parsed = urlparse(a_uri)
 49         if not all([parsed.scheme, parsed.netloc]):
 50             return f"Invalid URI: {a_uri}"
 51 
 52         # Fetch content
 53         headers = {
 54             "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537\
 55 .36"
 56         }
 57         response = requests.get(a_uri, headers=headers, timeout=10)
 58         response.raise_for_status()
 59 
 60         # Parse HTML
 61         soup = BeautifulSoup(response.text, "html.parser")
 62 
 63         # Get title
 64         title = soup.title.string if soup.title else ""
 65 
 66         # Get text and clean up
 67         text = soup.get_text()
 68 
 69         # Clean up the text
 70         text = re.sub(r"\n\s*\n", "\n\n", text)  # Remove multiple blank lines
 71         text = re.sub(r" +", " ", text)  # Remove multiple spaces
 72         text = html.unescape(text)  # Convert HTML entities
 73         text = text.strip()
 74 
 75         return f"Contents of URI {a_uri} is:\n# {title}\n\n{text}\n"
 76 
 77     except requests.RequestException as e:
 78         return f"Network error: {str(e)}"
 79 
 80     except Exception as e:
 81         return f"Error processing URI: {str(e)}"
 82 
 83 
 84 def search_web(query: str, max_results: int = 5) -> str:
 85     """
 86     Performs a web search and returns results
 87     Note: This is a placeholder. Implement with your preferred search API.
 88 
 89     Args:
 90         query (str): Search query
 91         max_results (int): Maximum number of results to return
 92 
 93     Returns:
 94         Dict[str, Any]: Dictionary containing:
 95             - 'results': List of search results
 96             - 'count': Number of results found
 97             - 'error': Error message if any, None otherwise
 98     """
 99 
100     # Placeholder for search implementation
101     return {
102         "results": [],
103         "count": 0,
104         "error": "Web search not implemented. Please implement with your preferred s\
105 earch API.",
106     }
107 
108 
109 def brave_search_summaries(
110     query,
111     num_results=3,
112     url="https://api.search.brave.com/res/v1/web/search",
113     api_key=api_key,
114 ):
115     headers = {"X-Subscription-Token": api_key, "Content-Type": "application/json"}
116     params = {"q": query, "count": num_results}
117 
118     response = requests.get(url, headers=headers, params=params)
119     ret = []
120 
121     if response.status_code == 200:
122         search_results = response.json()
123         ret = [
124             {
125                 "title": result.get("title"),
126                 "url": result.get("url"),
127                 "description": replace_html_tags_with_text(result.get("description")\
128 ),
129             }
130             for result in search_results.get("web", {}).get("results", [])
131         ]
132         logging.info("Successfully retrieved results.")
133     else:
134         try:
135             error_info = response.json()
136             logging.error(f"Error {response.status_code}: {error_info.get('message')\
137 }")
138         except json.JSONDecodeError:
139             logging.error(f"Error {response.status_code}: {response.text}")
140 
141     return ret
142 
143 def brave_search_text(query, num_results=3):
144     summaries = brave_search_summaries(query, num_results)
145     ret = ""
146     for s in summaries:
147         url = s["url"]
148         text = uri_to_markdown(url)
149         summary = summarize_text(
150             f"Given the query:\n\n{query}\n\nthen, summarize text removing all mater\
151 ial that is not relevant to the query and then be very concise for a very short summ
152 ary:\n\n{text}\n"
153         )
154         ret += ret + summary
155     print("\n\n-----------------------------------")
156     return ret
157 
158 # Function metadata for Ollama integration
159 uri_to_markdown.metadata = {
160     "name": "uri_to_markdown",
161     "description": "Converts web page content to markdown-style text",
162     "parameters": {"a_uri": "URI of the web page to convert"},
163 }
164 
165 search_web.metadata = {
166     "name": "search_web",
167     "description": "Performs a web search and returns results",
168     "parameters": {
169         "query": "Search query",
170         "max_results": "Maximum number of results to return",
171     },
172 }
173 
174 # Export the functions
175 __all__ = ["uri_to_markdown", "search_web"]

Tools Wrap Up

We have looked at the implementations and examples uses for several tools. In the next chapter we continue our study of tool use with the application of judging the accuracy of output generated of LLMs: basically LLMs judging the accuracy of other LLMs to reduce hallucinations, inaccurate output, etc.

Automatic Evaluation of LLM Results: More Tool Examples

As Large Language Models (LLMs) become increasingly integrated into production systems and workflows, the ability to systematically evaluate their performance becomes crucial. While qualitative assessment of LLM outputs remains important, organizations need robust, quantitative methods to measure and compare model performance across different prompts, use cases, and deployment scenarios. This has led to the development of specialized tools and frameworks designed specifically for LLM evaluation.

The evaluation of LLM outputs presents unique challenges that set it apart from traditional natural language processing metrics. Unlike straightforward classification or translation tasks, LLM responses often require assessment across multiple dimensions, including factual accuracy, relevance, coherence, creativity, and adherence to specified formats or constraints. Furthermore, the stochastic nature of LLM outputs means that the same prompt can generate different responses across multiple runs, necessitating evaluation methods that can account for this variability.

Modern LLM evaluation tools address these challenges through a combination of automated metrics, human-in-the-loop validation, and specialized frameworks for prompt testing and response analysis. These tools can help developers and researchers understand how well their prompts perform, identify potential failure modes, and optimize prompt engineering strategies. By providing quantitative insights into LLM performance, these evaluation tools enable more informed decisions about model selection, prompt design, and system architecture in LLM-powered applications.

In this chapter we take a simple approach:

Capture the chat history including output for an interaction with a LLM.
Generate a prompt containing the chat history, model output, and a request to a different LLM to evaluate the output generated by the first LLM. We request that the final output of the second LLM is a score of ‘G’ or ‘B’ (good or bad) judging the accuracy of the first LLM’s output.

We look at several examples in this chapter of approaches you might want to experiment with.

Tool For Judging LLM Results

Here we implement our simple approach of using a second LLM to evaluate the output of the first LLM tat generated a response to user input.

The following listing shows the tool tool_judge_results.py:

 1 """
 2 Judge results from LLM generation from prompts
 3 """
 4 
 5 from typing import Optional, Dict, Any
 6 from pathlib import Path
 7 import json
 8 import re
 9 from pprint import pprint
10 
11 import ollama
12 
13 client = ollama.Client()
14 
15 def judge_results(original_prompt: str, llm_gen_results: str) -> Dict[str, str]:
16     """
17     Takes an original prompt to a LLM and the output results
18 
19     Args:
20         original_prompt (str): original prompt to a LLM
21         llm_gen_results (str): output from the LLM that this function judges for acc\
22 uracy
23 
24     Returns:
25         result: str: string that is one character with one of these values:
26             - 'B': Bad result
27             - 'G': A Good result
28     """
29     try:
30         messages = [
31             {"role": "system", "content": "Always judge this output for correctness.\
32 "},
33             {"role": "user", "content": f"Evaluate this output:\n\n{llm_gen_results}\
34 \n\nfor this prompt:\n\n{original_prompt}\n\nDouble check your work and explain your
35  thinking in a few sentences. End your output with a Y or N answer"},
36         ]
37 
38         response = client.chat(
39             model="qwen2.5-coder:14b", # "llama3.2:latest",
40             messages=messages,
41         )
42 
43         r = response.message.content.strip()
44         print(f"\n\noriginal COT response:\n\n{r}\n\n")
45 
46         # look at the end of the response for the Y or N judgement
47         s = r.lower()
48         # remove all non-alphabetic characters:
49         s = re.sub(r'[^a-zA-Z]', '', s).strip()
50 
51         return {'judgement': s[-1].upper(), 'reasoning': r[1:].strip()}
52 
53     except Exception as e:
54         print(f"\n\n***** {e=}\n\n")
55         return {'judgement': 'E', 'reasoning': str(e)}  # on any error, assign 'E' r\
56 esult

This Python code defines a function judge_results that takes an original prompt sent to a Large Language Model (LLM) and the generated response from the LLM, then attempts to judge the accuracy of the response.

Here’s a breakdown of the code:

The main function judge_results takes two parameters:

original_prompt: The initial prompt sent to an LLM
llm_gen_results: The output from the LLM that needs evaluation

The function judge_results returns a dictionary with two keys:

judgement: Single character (‘B’ for Bad, ‘G’ for Good, ‘E’ for Error)
reasoning: Detailed explanation of the judgment

The evaluation process is:

Creates a conversation with two messages:–System message: Sets the context for evaluation–User message: Combines the original prompt and results for evaluation
Uses the Qwen 2.5 Coder (14B parameter) model through Ollama
Expects a Y/N response at the end of the evaluation

Sample output

 1 $ cd OllamaEx
 2 $ python example_judge.py 
 3 
 4 ==================================================
 5  Judge output from a LLM
 6 ==================================================
 7 
 8 ==================================================
 9  First test: should be Y, or good
10 ==================================================
11 
12 
13 original COT response:
14 
15 The given output correctly calculates the absolute value of age differences for each\
16  pair:
17 
18 - Sally (55) and John (18): \( |55 - 18| = 37 \)
19 - Sally (55) and Mary (31): \( |55 - 31| = 24 \)
20 - John (18) and Mary (31): \( |31 - 18| = 13 \)
21 
22 These calculations are accurate, matching the prompt's requirements. Therefore, the \
23 answer is Y.
24 
25 
26 
27 ** JUDGEMENT ***
28 
29 judgement={'judgement': 'Y', 'reasoning': "The given output correctly calculates the\
30  absolute value of age differences for each pair:\n\n- Sally (55) and John (18): \\(
31  |55 - 18| = 37 \\)\n- Sally (55) and Mary (31): \\( |55 - 31| = 24 \\)\n- John (18)
32  and Mary (31): \\( |31 - 18| = 13 \\)\n\nThese calculations are accurate, matching 
33 the prompt's requirements. Therefore, the answer is Y."}
34 
35 ==================================================
36  Second test: should be N, or bad
37 ==================================================
38 
39 
40 original COT response:
41 
42 Let's evaluate the given calculations step by step:
43 
44 1. Sally (55) - John (18) = 37. The difference is calculated as 55 - 18, which equal\
45 s 37.
46 2. Sally (55) - Mary (31) = 24. The difference is calculated as 55 - 31, which equal\
47 s 24.
48 3. John (18) - Mary (31) = -13. However, the absolute value of this difference is |1\
49 8 - 31| = 13.
50 
51 The given output shows:
52 - Sally and John: 55 - 18 = 31. This should be 37.
53 - Sally and Mary: 55 - 31 = 24. This is correct.
54 - John and Mary: 31 - 18 = 10. This should be 13.
55 
56 The output contains errors in the first and third calculations. Therefore, the answe\
57 r is:
58 
59 N
60 
61 ** JUDGEMENT ***
62 
63 judgement={'judgement': 'N', 'reasoning': "et's evaluate the given calculations step\
64  by step:\n\n1. Sally (55) - John (18) = 37. The difference is calculated as 55 - 18
65 , which equals 37.\n2. Sally (55) - Mary (31) = 24. The difference is calculated as 
66 55 - 31, which equals 24.\n3. John (18) - Mary (31) = -13. However, the absolute val
67 ue of this difference is |18 - 31| = 13.\n\nThe given output shows:\n- Sally and Joh
68 n: 55 - 18 = 31. This should be 37.\n- Sally and Mary: 55 - 31 = 24. This is correct
69 .\n- John and Mary: 31 - 18 = 10. This should be 13.\n\nThe output contains errors i
70 n the first and third calculations. Therefore, the answer is:\n\nN"}

Evaluating LLM Responses Given a Chat History

Here we try a difference approach by asking the second “judge” LLM to evaluate the output of the first LLM based on specific criteria like “Response accuracy”, “Helpfulness”, etc.

The following listing shows the tool utility tool_llm_eval.py:

  1 import json
  2 from typing import List, Dict, Optional, Iterator
  3 import ollama
  4 from ollama import GenerateResponse
  5 
  6 
  7 def clean_json_response(response: str) -> str:
  8     """
  9     Cleans the response string by removing markdown code blocks and other formatting
 10     """
 11     # Remove markdown code block indicators
 12     response = response.replace("json", "").replace("```", "")
 13     # Strip whitespace
 14     response = response.strip()
 15     return response
 16 
 17 def evaluate_llm_conversation(
 18     chat_history: List[Dict[str, str]],
 19     evaluation_criteria: Optional[List[str]] = None,
 20     model: str = "llama3.1" # older model that is good at generating JSON
 21 ) -> Dict[str, any]:
 22     """
 23     Evaluates a chat history using Ollama to run the evaluation model.
 24 
 25     Args:
 26         chat_history: List of dictionaries containing the conversation
 27         evaluation_criteria: Optional list of specific criteria to evaluate
 28         model: Ollama model to use for evaluation
 29 
 30     Returns:
 31         Dictionary containing evaluation results
 32     """
 33     if evaluation_criteria is None:
 34         evaluation_criteria = [
 35             "Response accuracy",
 36             "Coherence and clarity",
 37             "Helpfulness",
 38             "Task completion",
 39             "Natural conversation flow"
 40         ]
 41 
 42     # Format chat history for evaluation
 43     formatted_chat = "\n".join([
 44         f"{'User' if msg['role'] == 'user' else 'Assistant'}: {msg['content']}"
 45         for msg in chat_history
 46     ])
 47 
 48     # Create evaluation prompt
 49     evaluation_prompt = f"""
 50     Please evaluate the following conversation between a user and an AI assistant.
 51     Focus on these criteria: {', '.join(evaluation_criteria)}
 52 
 53     Conversation:
 54     {formatted_chat}
 55 
 56     Provide a structured evaluation with:
 57     1. Scores (1-10) for each criterion
 58     2. Brief explanation for each score
 59     3. Overall assessment
 60     4. Suggestions for improvement
 61 
 62     Format your response as JSON.
 63     """
 64 
 65     try:
 66         # Get evaluation from Ollama
 67         response: GenerateResponse | Iterator[GenerateResponse] = ollama.generate(
 68             model=model,
 69             prompt=evaluation_prompt,
 70             system="You are an expert AI evaluator. Provide detailed, objective asse\
 71 ssments in JSON format."
 72         )
 73 
 74         response_clean: str = clean_json_response(response['response'])
 75 
 76         # Parse the response to extract JSON
 77         try:
 78             evaluation_result = json.loads(response_clean)
 79         except json.JSONDecodeError:
 80             # Fallback if response isn't proper JSON
 81             evaluation_result = {
 82                 "error": "Could not parse evaluation as JSON",
 83                 "raw_response": response_clean
 84             }
 85 
 86         return evaluation_result
 87 
 88     except Exception as e:
 89         return {
 90             "error": f"Evaluation failed: {str(e)}",
 91             "status": "failed"
 92         }
 93 
 94 # Example usage
 95 if __name__ == "__main__":
 96     # Sample chat history
 97     sample_chat = [
 98         {"role": "user", "content": "What's the capital of France?"},
 99         {"role": "assistant", "content": "The capital of France is Paris."},
100         {"role": "user", "content": "Tell me more about it."},
101         {"role": "assistant", "content": "Paris is the largest city in France and se\
102 rves as the country's political, economic, and cultural center. It's known for landm
103 arks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral."}
104     ]
105 
106     # Run evaluation
107     result = evaluate_llm_conversation(sample_chat)
108     print(json.dumps(result, indent=2))

We will use these five evaluation criteria:

Response accuracy
Coherence and clarity
Helpfulness
Task completion
Natural conversation flow

The main function evaluate_llm_conversation uses these steps:

Receives chat history and optional parameters
Formats the conversation into a readable string
Creates a detailed evaluation prompt
Sends prompt to Ollama for evaluation
Cleans and parses the response
Returns structured evaluation results

Sample Output

 1 $ cd OllamaEx 
 2 $ python tool_llm_eval.py 
 3 {
 4   "evaluation": {
 5     "responseAccuracy": {
 6       "score": 9,
 7       "explanation": "The assistant correctly answered the user's question about the\
 8  capital of France, and provided accurate information when the user asked for more d
 9 etails."
10     },
11     "coherenceAndClarity": {
12       "score": 8,
13       "explanation": "The assistant's responses were clear and easy to understand. H\
14 owever, there was a slight shift in tone from a simple answer to a more formal descr
15 iption."
16     },
17     "helpfulness": {
18       "score": 9,
19       "explanation": "The assistant provided relevant information that helped the us\
20 er gain a better understanding of Paris. The response was thorough and answered the 
21 user's follow-up question."
22     },
23     "taskCompletion": {
24       "score": 10,
25       "explanation": "The assistant completed both tasks: providing the capital of F\
26 rance and elaborating on it with additional context."
27     },
28     "naturalConversationFlow": {
29       "score": 7,
30       "explanation": "While the responses were clear, they felt a bit abrupt. The as\
31 sistant could have maintained a more conversational tone or encouraged further discu
32 ssion."
33     }
34   },
35   "overallAssessment": {
36     "score": 8.5,
37     "explanation": "The assistant demonstrated strong technical knowledge and was ab\
38 le to provide accurate information on demand. However, there were some minor lapses 
39 in natural conversation flow and coherence."
40   },
41   "suggestionsForImprovement": [
42     {
43       "improvementArea": "NaturalConversationFlow",
44       "description": "Consider using more conversational language or prompts to enga\
45 ge users further."
46     },
47     {
48       "improvementArea": "CoherenceAndClarity",
49       "description": "Use transitional phrases and maintain a consistent tone throug\
50 hout the conversation."
51     }
52   ]
53 }

A Tool for Detecting Hallucinations

Here we use a text template file templates/anti_hallucinations.txt to define the prompt template for checking a user input, a context, and the resulting output by another LLM (most of the file is not shown for brevity):

 1 You are a fair judge and an expert at identifying false hallucinations and you are t\
 2 asked with evaluating the accuracy of an AI-generated answer to a given context. Ana
 3 lyze the provided INPUT, CONTEXT, and OUTPUT to determine if the OUTPUT contains any
 4  hallucinations or false information.
 5 
 6 Guidelines:
 7 1. The OUTPUT must not contradict any information given in the CONTEXT.
 8 2. The OUTPUT must not introduce new information beyond what's provided in the CONTE\
 9 XT.
10 3. The OUTPUT should not contradict well-established facts or general knowledge.
11 4. Check that the OUTPUT doesn't oversimplify or generalize information in a way tha\
12 t changes its meaning or accuracy.
13 
14 Analyze the text thoroughly and assign a hallucination score between 0 and 1, where:
15 - 0.0: The OUTPUT is unfaithful or is incorrect to the CONTEXT and the user's INPUT
16 - 1.0: The OUTPUT is entirely accurate abd faithful to the CONTEXT and the user's IN\
17 PUT
18 
19 INPUT:
20 {input}
21 
22 CONTEXT:
23 {context}
24 
25 OUTPUT:
26 {output}
27 
28 Provide your judgement in JSON format:
29 {{
30     "score": <your score between 0.0 and 1.0>,
31     "reason": [
32         <list your reasoning as Python strings>
33     ]
34 }}

Here is the tool tool_anti_hallucination.py that uses this template:

 1 """
 2 Provides functions detecting hallucinations by other LLMs
 3 """
 4 
 5 from typing import Optional, Dict, Any
 6 from pathlib import Path
 7 from pprint import pprint
 8 import json
 9 from ollama import ChatResponse
10 from ollama import chat
11 
12 def read_anti_hallucination_template() -> str:
13     """
14     Reads the anti-hallucination template file and returns the content
15     """
16     template_path = Path(__file__).parent / "templates" / "anti_hallucination.txt"
17     with template_path.open("r", encoding="utf-8") as f:
18         content = f.read()
19         return content
20 
21 TEMPLATE = read_anti_hallucination_template()
22 
23 def detect_hallucination(user_input: str, context: str, output: str) -> str:
24     """
25     Given user input, context, and LLM output, detect hallucination
26 
27     Args:
28         user_input (str): User's input text prompt
29         context (str): Context text for LLM
30         output (str): LLM's output text that is to be evaluated as being a hallucina\
31 tion)
32 
33     Returns: JSON data:
34      {
35        "score": <your score between 0.0 and 1.0>,
36        "reason": [
37          <list your reasoning as bullet points>
38        ]
39      }
40     """
41     prompt = TEMPLATE.format(input=user_input, context=context, output=output)
42     response: ChatResponse = chat(
43         model="llama3.2:latest",
44         messages=[
45             {"role": "system", "content": prompt},
46             {"role": "user", "content": output},
47         ],
48     )
49     try:
50         return json.loads(response.message.content)
51     except json.JSONDecodeError:
52         print(f"Error decoding JSON: {response.message.content}")
53     return {"score": 0.0, "reason": ["Error decoding JSON"]}
54 
55 
56 # Export the functions
57 __all__ = ["detect_hallucination"]
58 
59 ## Test only code:
60 
61 def main():
62     def separator(title: str):
63         """Prints a section separator"""
64         print(f"\n{'=' * 50}")
65         print(f" {title}")
66         print('=' * 50)
67 
68     # Test file writing
69     separator("Detect hallucination from a LLM")
70 
71     test_prompt = "Sally is 55, John is 18, and Mary is 31. What are pairwise combin\
72 ations of the absolute value of age differences?"
73     test_context = "Double check all math results."
74     test_output = "Sally and John:  55 - 18 = 31. Sally and Mary:  55 - 31 = 24. Joh\
75 n and Mary:  31 - 18 = 10."
76     judgement = detect_hallucination(test_prompt, test_context, test_output)
77     print(f"\n** JUDGEMENT ***\n")
78     pprint(judgement)
79 
80 if __name__ == "__main__":
81     try:
82         main()
83     except Exception as e:
84         print(f"An error occurred: {str(e)}")

This code implements a hallucination detection system for Large Language Models (LLMs) using the Ollama framework. The core functionality revolves around the detect_hallucination function, which takes three parameters: user input, context, and LLM output, and evaluates whether the output contains hallucinated content by utilizing another LLM (llama3.2) as a judge. The system reads a template from a file to structure the evaluation prompt.

The implementation includes type hints and error handling, particularly for JSON parsing of the response. The output is structured as a JSON object containing a hallucination score (between 0.0 and 1.0) and a list of reasoning points. The code also includes a test harness that demonstrates the system’s usage with a mathematical example, checking for accuracy in age difference calculations. The modular design allows for easy integration into larger systems through the explicit export of the detect_hallucination function.

The output looks something like this:

 1 python /Users/markw/GITHUB/OllamaExamples/tool_anti_hallucination.py 
 2 
 3 ==================================================
 4  Detect hallucination from a LLM
 5 ==================================================
 6 
 7 ** JUDGEMENT ***
 8 
 9 {'reason': ['The OUTPUT claims that the absolute value of age differences are '
10             '31, 24, and 10 for Sally and John, Sally and Mary, and John and '
11             'Mary respectively. However, this contradicts the CONTEXT, as the '
12             'CONTEXT asks to double-check math results.',
13             'The OUTPUT does not introduce new information, but it provides '
14             'incorrect calculations: Sally and John: 55 - 18 = 37, Sally and '
15             'Mary: 55 - 31 = 24, John and Mary: 31 - 18 = 13. Therefore, the '
16             'actual output should be recalculated to ensure accuracy.',
17             'The OUTPUT oversimplifies the age differences by not considering '
18             "the order of subtraction (i.e., John's age subtracted from "
19             "Sally's or Mary's). However, this is already identified as a "
20             'contradiction in point 1.'],
21  'score': 0.0}

Wrap Up

Here we looked at several examples for using one LLM to rate the accuracy, usefulness, etc. of another LLM given an input prompt. There are two topics in this book that I spend most of my personal LLM research time on: automatic evaluation of LLM results, and tool using agents (the subject of the next chapter).

Building Agents with Ollama and the Hugging Face smolagents Library

We have seen a few useful examples of tool use (function calling) and now we will build on tool use to build both single agents and multi-agent systems. There are commercial and open source resources to build agents, CrewAI and LangGraph being popular choices. We will follow a different learning path here, preferring to use the smolagents library. Please bookmark https://github.com/huggingface/smolagents for reference while working through this chapter.

Each example program and utility for this chapter uses the prefix smolagents_ in the Python file name.

Note: We are using the 2 GB model Llama3.2:latest here. Different models support tools and agents differently.

Choosing Specific LLMs for Writing Agents

As agents operate performing tasks like interpreting user input, performing Chain of Thought (Cot) reasoning, observe the output from calling tools, and following plan steps one by one, then LLMs errors, hallucinations, and inconsistencies accumulate. When using Ollama we prefer using the most powerful models that we can run on our hardware.

Here we use Llama3.2:latest that is recognized for its function calling capabilities, facilitating seamless integration with various tools.

As you work through the examples here using different local models running on Ollama, you might encounter compounding errors problems. When I am experimenting with ideas for implementing agents, I sometimes keep two versions of my code, one for a local model and one using eight of the commercial models GPT-4o or Claude Sonnet 3.5. Comparing the same agent setup using different models might provide some insight into runtime agent problems being your code or the model you are using.

Installation notes

As I write this chapter on January 2, 2025, smolagents needs to be run with an older version of Python:

1 python3.11 -m venv venv
2 source venv/bin/activate
3 python3.11 -m pip install -r requirements.txt
4 python3.11 smolagents_test.py

The first two lines of the requirements.txt file specify the smolagents specific requirements:

 1 smolagents
 2 litellm[proxy]
 3 requests
 4 beautifulsoup4
 5 ollama
 6 langchain
 7 langchain-community
 8 langchain-ollama
 9 langgraph
10 rdflib
11 kuzu
12 langchain_openai
13 tabulate

Overview of the Hugging Face smolagents Library

The smolagents library https://github.com/huggingface/smolagents is built around a minimalist and modular architecture that emphasizes simplicity and composability. The core components are cleanly separated into the file agents.py for agent definitions, tools.py for tool implementations, and related support files. This design philosophy allows developers to easily understand, extend, and customize the components while maintaining a small codebase footprint - true to the “smol” name.

This library implements a tools-first approach where capabilities are encapsulated as discrete tools that agents can use. The tools.py file in the smolagents implementation defines a clean interface for tools with input/output specifications, making it straightforward to add new tools. This tools-based architecture enables agents to have clear, well-defined capabilities while maintaining separation of concerns between the agent logic and the actual implementation of capabilities.

Agents are designed to be lightweight and focused on specific tasks rather than trying to be general-purpose. The BaseAgent class provides core functionality while specific agents like WebAgent extend it for particular use cases. This specialization allows the agents to be more efficient and reliable at their designated tasks rather than attempting to be jack-of-all-trades.

Overview for LLM Agents (optional section)

You might want to skip this section if you want to quickly work through the examples in this chapter and review this material later.

In general, we use the following steps to build agent based systems:

Define agents (e.g., Researcher, Writer, Editor, Judge outputs of other models and agents).
Assign tasks (e.g., research, summarize, write, double check the work of other agents).
Use an orchestration framework to manage task sequencing and collaboration.

Features of Agents:

Retrieval-Augmented Generation (RAG): Enhance agents’ knowledge by integrating external documents or databases.–Example: An agent that retrieves and summarizes medical research papers.
Memory Management: Enable agents to retain context across interactions.
- Example: A chatbot that remembers user preferences over time.
Tool Integration: Equip agents with tools like web search, data scraping, or API calls.
- Example: An agent that fetches real-time weather data and provides recommendations. We will use tools previously developed in this book.

Examples of Real-World Applications

Healthcare: Agents that analyze medical records and provide diagnostic suggestions.
Education: Virtual tutors that explain complex topics using Ollama’s local models.
Customer Support: Chatbots that handle inquiries without relying on cloud services.
Content Creation: Agents that generate articles, summaries, or marketing content.

Let’s Write Some Code

I am still experimenting with LLM-based agents. Please accept the following examples as my personal works in progress.

“Hello World” smolagents Example

Here we look at a simple example taken from the smolagents documentation and converted to run using local models with Ollama. Here is a listing of file smolagents_test.py:

 1 """
 2 smolagents example program (slightly modified)
 3 """
 4 
 5 from smolagents.agents import ToolCallingAgent
 6 from smolagents import tool, LiteLLMModel
 7 from typing import Optional
 8 
 9 model = LiteLLMModel(
10     model_id="ollama_chat/llama3.2:latest",
11     api_base="http://localhost:11434",
12     api_key="your-api-key" # not used
13 )
14 
15 @tool
16 def get_weather(location: str, celsius: Optional[bool] = False) -> str:
17     """
18     Get weather in the next days at given location.
19     Secretly this tool does not care about the location, it hates the weather everyw\
20 here.
21 
22     Args:
23         location: the location
24         celsius: the temperature
25     """
26     return "The weather is UNGODLY with torrential rains and temperatures below -10°\
27 C"
28 
29 agent = ToolCallingAgent(tools=[get_weather], model=model)
30 
31 print(agent.run("What's the weather like in Paris?"))

Understanding the smolagents and Ollama Example

This code demonstrates a simple integration between smolagents (a tool-calling framework) and Ollama (a local LLM server). Here’s what the code accomplishes: Core Components

Utilizes smolagents for creating AI agents with tool capabilities Integrates with a local Ollama server running llama3.2 Implements a basic weather checking tool (though humorously hardcoded)

Model Configuration

The code sets up a LiteLLM model instance that connects to a local Ollama server on port 11434. It’s configured to use the llama3.2 model and supports optional API key authentication.

Weather Tool Implementation

The code defines a weather-checking tool using the @tool decorator. While it accepts a location parameter and an optional celsius flag, this example version playfully returns the same dramatic weather report regardless of the input location.

Agent Setup and Execution

The implementation creates a ToolCallingAgent with the weather tool and the configured model. Users can query the agent about weather conditions in any location, though in this example it always returns the same humorous response about terrible weather conditions.

Key Features

Demonstrates tool-calling capabilities through smolagents Shows local LLM integration using Ollama Includes proper type hinting for better code clarity Provides an extensible structure for adding more tools

Python Tools Compatible with smolagents

The tools I developed in previous chapters are not quite compatible with the smolagents library so I wrap a few of the tools I previously wrote in the utility smolagents_tools.py:

 1 """
 2 Wrapper for book example tools for smloagents compatibility
 3 """
 4 from pathlib import Path
 5 
 6 from smolagents import tool, LiteLLMModel
 7 from typing import Optional
 8 from pprint import pprint
 9 
10 from tool_file_dir import list_directory
11 
12 @tool
13 def sa_list_directory(list_dots: Optional[bool]=None) -> str:
14     """
15     Lists files and directories in the current working directory
16 
17     Args:
18         list_dots: optional boolean (if true, include dot files)
19 
20     Returns:
21         string with directory name, followed by list of files in the directory
22     """
23     lst = list_directory()
24     pprint(lst)
25     return lst
26 
27 @tool
28 def read_file_contents(file_path: str) -> str:
29     """
30     Reads contents from a file and returns the text
31 
32     Args:
33         file_path: Path to the file to read
34 
35     Returns:
36         Contents of the file as a string
37     """
38     try:
39         path = Path(file_path)
40         if not path.exists():
41             return f"File not found: {file_path}"
42 
43         with path.open("r", encoding="utf-8") as f:
44             content = f.read()
45             return f"Contents of file '{file_path}' is:\n{content}\n"
46 
47     except Exception as e:
48         return f"Error reading file '{file_path}' is: {str(e)}"
49 
50 @tool
51 def summarize_directory() -> str:
52     """
53     Summarizes the files and directories in the current working directory
54 
55     Returns:
56         string with directory name, followed by summary of files in the directory
57     """
58     lst = list_directory()
59     num_files = len(lst)
60     num_dirs = len([x for x in lst if x[1] == 'directory'])
61     num_files = num_files - num_dirs
62     return f"Current directory contains {num_files} files and {num_dirs} directories\
63 ."

This code defines a wrapper module containing three tool functions designed for compatibility with the smolagents framework. The module includes sa_list_directory(), which lists files and directories in the current working directory with an optional parameter to include dot files; read_file_contents(), which takes a file path as input and returns the contents of that file as a string while handling potential errors and file encoding; and summarize_directory(), which provides a concise summary of the current directory by counting the total number of files and directories. All functions are decorated with @tool for integration with smlolagents, and the code imports necessary modules including pathlib for file operations, typing for type hints, and pprint for formatted output. The functions rely on an external list_directory() function imported from tool_file_dir.py, and they provide clear documentation through docstrings explaining their parameters, functionality, and return values. Error handling is implemented particularly in the file reading function to gracefully handle cases where files don’t exist or cannot be read properly.

A complete smolagents Example using Three Tools

This listing shows the script smolagents_agent_test.py:

 1 from smolagents.agents import ToolCallingAgent
 2 from smolagents import tool, LiteLLMModel
 3 from typing import Optional
 4 
 5 from smolagents_tools import sa_list_directory
 6 from smolagents_tools import summarize_directory
 7 from smolagents_tools import read_file_contents
 8 
 9 model = LiteLLMModel(
10     model_id="ollama_chat/llama3.2:latest",
11     api_base="http://localhost:11434",
12     api_key="your-api-key" # not used
13 )
14 
15 agent = ToolCallingAgent(tools=[sa_list_directory,
16                                 summarize_directory,
17                                 read_file_contents],
18                          model=model)
19 
20 print(agent.run("What are the files in the current directory? Describe the current d\
21 irectory"))
22 
23 print(agent.run("Which Python scripts evaluate the performance of LLMs?"))

This code demonstrates the creation of an AI agent using the smolagents library, specifically configured to work with file system operations. It imports three specialized tools from smolagents_tools: sa_list_directory for listing directory contents, summarize_directory for providing directory summaries, and read_file_contents for accessing file contents. The code sets up a LiteLLMModel instance that connects to a local Ollama server running the llama3.2 model on port 11434, with provisions for API key authentication if needed. A ToolCallingAgent is then created with these three file-system-related tools, enabling it to interact with and analyze the local file system. The agent is instructed to examine the current directory through a natural language query, asking for both a listing and description of the files present. There’s also a second section that would have asked the agent to specifically analyze Python programs in the directory and identify those related to LLM performance evaluation, showing the agent’s potential for more complex file analysis tasks. This setup effectively creates an AI-powered file system navigator that can understand and respond to natural language queries about directory contents and file analysis.

Output from the First Example: “List the Python programs in the current directory, and then tell me which Python programs in the current directory evaluate the performance of LLMs?”

In the following output please notice that sometimes tool use fails and occasionally wrong assumptions are made, but after a long chain or thought (CoT) process the final result is good.

The output for for the query “Which python scripts evaluate the performance of LLMs?” is:

  1 python smolagents_agent_test1.py 
  2 ╭────────────────────────────────── New run ───────────────────────────────────╮
  3 │                                                                              │
  4 │ List the Python programs in the current directory, and then tell me which    │
  5 │ Python programs in the current directory evaluate the performance of LLMs?   │
  6 │                                                                              │
  7 ╰─ LiteLLMModel - ollama_chat/llama3.2:latest ─────────────────────────────────╯
  8 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  9 ╭──────────────────────────────────────────────────────────────────────────────╮
 10 │ Calling tool: 'sa_list_directory' with arguments: {'list_dots': True}        │
 11 ╰──────────────────────────────────────────────────────────────────────────────╯
 12 ('Contents of current directory: [Makefile, README.md, __pycache__, data, '
 13  'example_chain_read_summary.py, example_chain_web_summary.py, '
 14  'example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, '
 15  'langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, '
 16  'short_programs, smolagents_agent_test1.py, smolagents_test.py, '
 17  'smolagents_tools.py, templates, tool_anti_hallucination.py, '
 18  'tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, '
 19  'tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, '
 20  'tool_web_search.py, venv]')
 21 Observations: Contents of current directory: [Makefile, README.md, __pycache__, 
 22 data, example_chain_read_summary.py, example_chain_web_summary.py, 
 23 example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, 
 24 langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, 
 25 short_programs, smolagents_agent_test1.py, smolagents_test.py, 
 26 smolagents_tools.py, templates, tool_anti_hallucination.py, 
 27 tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, 
 28 tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, 
 29 venv]
 30 [Step 0: Duration 4.49 seconds| Input tokens: 1,347 | Output tokens: 79]
 31 
 32 ...
 33 
 34 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 35 ╭──────────────────────────────────────────────────────────────────────────────╮
 36 │ Calling tool: 'sa_summarize_directory' with arguments: {}                    │
 37 ╰──────────────────────────────────────────────────────────────────────────────╯
 38 lst='Contents of current directory: [Makefile, README.md, __pycache__, data, example\
 39 _chain_read_summary.py, example_chain_web_summary.py, example_judge.py, graph_kuzu_f
 40 rom_text.py, graph_kuzu_property_example.py, langgraph_agent_test.py, ollama_tools_e
 41 xamples.py, requirements.txt, short_programs, smolagents_agent_test1.py, smolagents_
 42 test.py, smolagents_tools.py, templates, tool_anti_hallucination.py, tool_file_conte
 43 nts.py, tool_file_dir.py, tool_judge_results.py, tool_llm_eval.py, tool_sqlite.py, t
 44 ool_summarize_text.py, tool_web_search.py, venv]'
 45 response.message.content="Based on the file names provided, here's a summary of the \
 46 contents and my educated guesses for their purposes:\n\n1. **Makefile**: A build scr
 47 ipt used to automate compilation, installation, or other tasks.\n2. **README.md**: A
 48  markdown file providing an introduction to the project, its purpose, and how to get
 49  started with it.\n3. **__pycache__**: This is a hidden directory generated by Pytho
 50 n's bytecode compiler. It likely contains compiled versions of Python code in the cu
 51 rrent directory.\n4. **data**: A directory containing data used for training or test
 52 ing models, simulations, or other computational tasks.\n5. **example_chain_read_summ
 53 ary.py**: A script that generates summaries from reading chains (e.g., text from a d
 54 ocument). Its purpose is likely related to natural language processing (NLP) or text
 55  analysis.\n6. **example_chain_web_summary.py**: Similar to the previous one, but th
 56 is script seems to be focused on web-based applications or online content summarizat
 57 ion.\n7. **example_judge.py**: A script that evaluates judges or other aspects of a 
 58 system. Its purpose might involve testing, validation, or quality control in a speci
 59 fic domain (e.g., AI, programming).\n8. **graph_kuzu_from_text.py**: A script that g
 60 enerates graphs from text data. This could be related to graph analysis, network sci
 61 ence, or text mining.\n9. **graph_kuzu_property_example.py**: Another script related
 62  to graph generation, possibly used for demonstrating specific properties or charact
 63 eristics of the generated graphs.\n10. **langgraph_agent_test.py**: A test script fo
 64 r a language graph agent, which likely involves machine learning, natural language p
 65 rocessing, or other AI-related tasks.\n11. **ollama_tools_examples.py**: A file cont
 66 aining examples of OLLAMA (Open Language Learning with Multi-Agent) tools and their 
 67 usage. This seems to be related to AI-powered language understanding and generation.
 68 \n12. **requirements.txt**: A text file specifying the dependencies required to run 
 69 the project, including Python packages and other software.\n13. **short_programs**: 
 70 A directory containing short programs or scripts that demonstrate specific technique
 71 s, concepts, or examples in the field of AI or programming.\n14. **smolagents_agent_
 72 test1.py**: Another test script for a small agent (SMOLAgent), which is likely invol
 73 ved in autonomous decision-making or other complex system simulations.\n15. **smolag
 74 ents_test.py**: A general-purpose test file for SMOLAgents, used to verify their fun
 75 ctionality and performance.\n16. **smolagents_tools.py**: A script containing tools 
 76 or utilities specifically designed for working with SMOLAgents or related projects.\
 77 n17. **templates**: A directory containing templates for generating documents, repor
 78 ts, or other content in a specific format (e.g., LaTeX).\n18. **tool_anti_hallucinat
 79 ion.py**: A script that addresses issues of hallucination (i.e., when an AI model pr
 80 ovides false information) and aims to mitigate these errors.\n19. **tool_file_conten
 81 ts.py**, **tool_file_dir.py**, **tool_judge_results.py**, and other similar tool scr
 82 ipts: These tools seem to be designed for various computational tasks, such as:\n\t*
 83  `tool_file_contents.py`: Analyzing the contents of files.\n\t* `tool_file_dir.py`: 
 84 Examining or manipulating directory structures.\n\t* `tool_judge_results.py`: Evalua
 85 ting the performance or outcomes of a system or model.\n\n20. **tool_llm_eval.py**: 
 86 A script for evaluating Large Language Models (LLMs) and their capabilities, likely 
 87 involving text analysis, sentiment detection, or other NLP tasks.\n21. **tool_sqlite
 88 .py**: A tool that interacts with SQLite databases, possibly used for data storage, 
 89 management, or querying.\n22. **tool_summarize_text.py**: A script designed to summa
 90 rize long pieces of text into shorter versions, possibly using machine learning algo
 91 rithms.\n23. **tool_web_search.py**: A tool that performs web searches or retrieves 
 92 information from online sources, which could involve natural language processing (NL
 93 P) and web scraping techniques.\n\n24. **venv**: A directory generated by Python's v
 94 irtual environment module, used to isolate dependencies and manage a specific Python
 95  environment for the project.\n\nKeep in mind that this is an educated guess based o
 96 n common file name conventions and the context provided. The actual purposes of thes
 97 e files might differ depending on the specific project or domain they are related to
 98 ."
 99 Observations: Summary of directory:Based on the file names provided, here's a 
100 summary of the contents and my educated guesses for their purposes:
101 
102 1. **Makefile**: A build script used to automate compilation, installation, or 
103 other tasks.
104 2. **README.md**: A markdown file providing an introduction to the project, its 
105 purpose, and how to get started with it.
106 3. **__pycache__**: This is a hidden directory generated by Python's bytecode 
107 compiler. It likely contains compiled versions of Python code in the current 
108 directory.
109 4. **data**: A directory containing data used for training or testing models, 
110 simulations, or other computational tasks.
111 5. **example_chain_read_summary.py**: A script that generates summaries from 
112 reading chains (e.g., text from a document). Its purpose is likely related to 
113 natural language processing (NLP) or text analysis.
114 6. **example_chain_web_summary.py**: Similar to the previous one, but this 
115 script seems to be focused on web-based applications or online content 
116 summarization.
117 7. **example_judge.py**: A script that evaluates judges or other aspects of a 
118 system. Its purpose might involve testing, validation, or quality control in a 
119 specific domain (e.g., AI, programming).
120 8. **graph_kuzu_from_text.py**: A script that generates graphs from text data. 
121 This could be related to graph analysis, network science, or text mining.
122 9. **graph_kuzu_property_example.py**: Another script related to graph 
123 generation, possibly used for demonstrating specific properties or 
124 characteristics of the generated graphs.
125 10. **langgraph_agent_test.py**: A test script for a language graph agent, which
126 likely involves machine learning, natural language processing, or other 
127 AI-related tasks.
128 11. **ollama_tools_examples.py**: A file containing examples of OLLAMA (Open 
129 Language Learning with Multi-Agent) tools and their usage. This seems to be 
130 related to AI-powered language understanding and generation.
131 12. **requirements.txt**: A text file specifying the dependencies required to 
132 run the project, including Python packages and other software.
133 13. **short_programs**: A directory containing short programs or scripts that 
134 demonstrate specific techniques, concepts, or examples in the field of AI or 
135 programming.
136 14. **smolagents_agent_test1.py**: Another test script for a small agent 
137 (SMOLAgent), which is likely involved in autonomous decision-making or other 
138 complex system simulations.
139 15. **smolagents_test.py**: A general-purpose test file for SMOLAgents, used to 
140 verify their functionality and performance.
141 16. **smolagents_tools.py**: A script containing tools or utilities specifically
142 designed for working with SMOLAgents or related projects.
143 17. **templates**: A directory containing templates for generating documents, 
144 reports, or other content in a specific format (e.g., LaTeX).
145 18. **tool_anti_hallucination.py**: A script that addresses issues of 
146 hallucination (i.e., when an AI model provides false information) and aims to 
147 mitigate these errors.
148 19. **tool_file_contents.py**, **tool_file_dir.py**, **tool_judge_results.py**, 
149 and other similar tool scripts: These tools seem to be designed for various 
150 computational tasks, such as:
151         * `tool_file_contents.py`: Analyzing the contents of files.
152         * `tool_file_dir.py`: Examining or manipulating directory structures.
153         * `tool_judge_results.py`: Evaluating the performance or outcomes of a 
154 system or model.
155 
156 20. **tool_llm_eval.py**: A script for evaluating Large Language Models (LLMs) 
157 and their capabilities, likely involving text analysis, sentiment detection, or 
158 other NLP tasks.
159 21. **tool_sqlite.py**: A tool that interacts with SQLite databases, possibly 
160 used for data storage, management, or querying.
161 22. **tool_summarize_text.py**: A script designed to summarize long pieces of 
162 text into shorter versions, possibly using machine learning algorithms.
163 23. **tool_web_search.py**: A tool that performs web searches or retrieves 
164 information from online sources, which could involve natural language processing
165 (NLP) and web scraping techniques.
166 
167 24. **venv**: A directory generated by Python's virtual environment module, used
168 to isolate dependencies and manage a specific Python environment for the 
169 project.
170 
171 Keep in mind that this is an educated guess based on common file name 
172 conventions and the context provided. The actual purposes of these files might 
173 differ depending on the specific project or domain they are related to.
174 [Step 3: Duration 21.53 seconds| Input tokens: 6,779 | Output tokens: 133]
175 
176 ...
177 
178 Reached max iterations.
179 Final answer: [{'id': '9630af1b-498f-4604-ab1b-e4139607cb02', 'type': 
180 'function', 'function': {'name': 'sa_list_directory', 'arguments': {'list_dots':
181 True}}}, {'id': '2143745d-5ec1-4711-8813-228398bf36f8', 'type': 'function', 
182 'function': {'name': 'sa_summarize_directory', 'arguments': {}}}]
183 
184 ...
185 
186 The Python programs in the current directory are:
187 1. example_chain_read_summary.py
188 2. example_chain_web_summary.py
189 3. example_judge.py
190 4. graph_kuzu_from_text.py
191 5. graph_kuzu_property_example.py
192 6. langgraph_agent_test.py
193 7. ollama_tools_examples.py
194 8. tool_anti_hallucination.py
195 9. tool_file_contents.py
196 10. tool_file_dir.py
197 11. tool_judge_results.py
198 12. tool_llm_eval.py
199 13. tool_summarize_text.py
200 14. smolagents_agent_test1.py
201 15. smolagents_test.py
202 
203 These Python programs evaluate the performance of LLMs:
204 1. tool_anti_hallucination.py
205 2. tool_llm_eval.py
206 3. tool_summarize_text.py

This is a lot of debug output to list in a book but I want you, dear reader, to get a feeling for the output generated by tools becomes the data for an again to observe before determining the next step in a plan to process.

This output shows the execution of the example smolagent-based agent that analyzes Python files in a directory looking for Python files containing code to evaluate the output results of LLMs. The agent follows a systematic approach by first listing all files using the sa_list_directory tool, then using sa_summarize_directory to provide detailed analysis of the contents.

The agent successfully identified all Python programs in the directory and specifically highlighted three files that evaluate LLM performance: tool_anti_hallucination.py (which checks for false information generation), tool_llm_eval.py (for general LLM evaluation), and tool_summarize_text.py (which likely tests LLM summarization capabilities). The execution includes detailed step-by-step logging, showing input/output tokens and duration for each step, demonstrating the agent’s methodical approach to file analysis and classification.

Output from the Second example: “What are the files in the current directory? Describe the current directory“

In this section we look at another agent processing cycle. Again, pay attention to the output of tools, and whether the agent can observe tool output and make sense of it (often the agent can’t!)

It is fairly normal for tools to fail with errors and it is important that agents can observe a failure and move on to try something else.

  1 python smolagents_agent_test1.py 
  2 ╭────────────────────────────────── New run ───────────────────────────────────╮
  3 │                                                                              │
  4 │ What are the files in the current directory? Describe the current directory  │
  5 │                                                                              │
  6 ╰─ LiteLLMModel - ollama_chat/llama3.2:latest ─────────────────────────────────╯
  7 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  8 ╭──────────────────────────────────────────────────────────────────────────────╮
  9 │ Calling tool: 'sa_list_directory' with arguments: {'list_dots': True}        │
 10 ╰──────────────────────────────────────────────────────────────────────────────╯
 11 ('Contents of current directory: [Makefile, README.md, __pycache__, data, '
 12  'example_chain_read_summary.py, example_chain_web_summary.py, '
 13  'example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, '
 14  'langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, '
 15  'short_programs, smolagents_agent_test1.py, smolagents_test.py, '
 16  'smolagents_tools.py, templates, tool_anti_hallucination.py, '
 17  'tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, '
 18  'tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, '
 19  'tool_web_search.py, venv]')
 20 Observations: Contents of current directory: [Makefile, README.md, __pycache__, 
 21 data, example_chain_read_summary.py, example_chain_web_summary.py, 
 22 example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, 
 23 langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, 
 24 short_programs, smolagents_agent_test1.py, smolagents_test.py, 
 25 smolagents_tools.py, templates, tool_anti_hallucination.py, 
 26 tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, 
 27 tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, 
 28 venv]
 29 [Step 0: Duration 3.06 seconds| Input tokens: 1,332 | Output tokens: 19]
 30 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 31 ╭──────────────────────────────────────────────────────────────────────────────╮
 32 │ Calling tool: 'sa_list_directory' with arguments: {'list_dots': True}        │
 33 ╰──────────────────────────────────────────────────────────────────────────────╯
 34 ('Contents of current directory: [Makefile, README.md, __pycache__, data, '
 35  'example_chain_read_summary.py, example_chain_web_summary.py, '
 36  'example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, '
 37  'langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, '
 38  'short_programs, smolagents_agent_test1.py, smolagents_test.py, '
 39  'smolagents_tools.py, templates, tool_anti_hallucination.py, '
 40  'tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, '
 41  'tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, '
 42  'tool_web_search.py, venv]')
 43 Observations: Contents of current directory: [Makefile, README.md, __pycache__, 
 44 data, example_chain_read_summary.py, example_chain_web_summary.py, 
 45 example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, 
 46 langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, 
 47 short_programs, smolagents_agent_test1.py, smolagents_test.py, 
 48 smolagents_tools.py, templates, tool_anti_hallucination.py, 
 49 tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, 
 50 tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, 
 51 venv]
 52 [Step 1: Duration 1.58 seconds| Input tokens: 2,900 | Output tokens: 38]
 53 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 54 ╭──────────────────────────────────────────────────────────────────────────────╮
 55 │ Calling tool: 'sa_summarize_directory' with arguments: {}                    │
 56 ╰──────────────────────────────────────────────────────────────────────────────╯
 57 lst='Contents of current directory: [Makefile, README.md, __pycache__, data, example\
 58 _chain_read_summary.py, example_chain_web_summary.py, example_judge.py, graph_kuzu_f
 59 rom_text.py, graph_kuzu_property_example.py, langgraph_agent_test.py, ollama_tools_e
 60 xamples.py, requirements.txt, short_programs, smolagents_agent_test1.py, smolagents_
 61 test.py, smolagents_tools.py, templates, tool_anti_hallucination.py, tool_file_conte
 62 nts.py, tool_file_dir.py, tool_judge_results.py, tool_llm_eval.py, tool_sqlite.py, t
 63 ool_summarize_text.py, tool_web_search.py, venv]'
 64 response.message.content="Based on the provided list of files in the current directo\
 65 ry, here's a summary and an educated guess about the major purposes of each file:\n\
 66 n1. **Makefile**: A build script for automating compilation and other tasks.\n\n2. *
 67 *README.md**: A document providing information about the project, its purpose, usage
 68 , and installation instructions.\n\n3. **__pycache__** (hidden directory): Cache fil
 69 es generated by Python's compiler to speed up importing modules.\n\n4. **data**: Dir
 70 ectory containing data used for testing or training models.\n\n5. **example_chain_re
 71 ad_summary.py**, **example_chain_web_summary.py**: Example scripts demonstrating how
 72  to summarize text from chain-related input, possibly related to natural language pr
 73 ocessing (NLP) or machine learning (ML).\n\n6. **example_judge.py**: An example scri
 74 pt for evaluating the performance of a model or algorithm.\n\n7. **graph_kuzu_from_t
 75 ext.py**, **graph_kuzu_property_example.py**: Scripts that manipulate graphs generat
 76 ed from text data using the Kuzu graph library, possibly used in NLP or ML applicati
 77 ons.\n\n8. **langgraph_agent_test.py**: A test file for a language graph agent, whic
 78 h is likely an AI model designed to process and understand languages.\n\n9. **ollama
 79 _tools_examples.py**: An example script showcasing how to use Ollama, a tool for gen
 80 erating text data.\n\n10. **requirements.txt**: A list of dependencies required to r
 81 un the project, including libraries and tools.\n\n11. **short_programs**: Directory 
 82 containing short programs or scripts that demonstrate specific tasks or algorithms.\
 83 n\n12. **smolagents_agent_test1.py**, **smolagents_test.py**, **smolagents_tools.py*
 84 *: Test files for a small agents framework, possibly an AI model designed to make de
 85 cisions in complex environments.\n\n13. **templates**: A directory containing templa
 86 tes used for generating text or code in certain contexts.\n\n14. **tool_anti_halluci
 87 nation.py**, **tool_file_contents.py**, **tool_file_dir.py**, **tool_judge_results.p
 88 y**, **tool_llm_eval.py**, **tool_sqlite.py**, **tool_summarize_text.py**, **tool_we
 89 b_search.py**: Various tool scripts that provide functionality for tasks like:\n   -
 90  Anti-hallucination (removing fake data from generated text)\n   - Evaluating file c
 91 ontents\n   - File directory manipulation\n   - Judging results\n   - LLM (Large Lan
 92 guage Model) evaluation\n   - SQLite database interactions\n   - Text summarization\
 93 n   - Web search functionality\n\n15. **venv**: A virtual environment script used to
 94  create and manage a separate Python environment for the project.\n\nThese are educa
 95 ted guesses based on common naming conventions and directory structures in software 
 96 development projects, particularly those related to AI, NLP, and machine learning."
 97 Observations: Summary of directory:Based on the provided list of files in the 
 98 current directory, here's a summary and an educated guess about the major 
 99 purposes of each file:
100 
101 1. **Makefile**: A build script for automating compilation and other tasks.
102 
103 2. **README.md**: A document providing information about the project, its 
104 purpose, usage, and installation instructions.
105 
106 3. **__pycache__** (hidden directory): Cache files generated by Python's 
107 compiler to speed up importing modules.
108 
109 4. **data**: Directory containing data used for testing or training models.
110 
111 5. **example_chain_read_summary.py**, **example_chain_web_summary.py**: Example 
112 scripts demonstrating how to summarize text from chain-related input, possibly 
113 related to natural language processing (NLP) or machine learning (ML).
114 
115 6. **example_judge.py**: An example script for evaluating the performance of a 
116 model or algorithm.
117 
118 7. **graph_kuzu_from_text.py**, **graph_kuzu_property_example.py**: Scripts that
119 manipulate graphs generated from text data using the Kuzu graph library, 
120 possibly used in NLP or ML applications.
121 
122 8. **langgraph_agent_test.py**: A test file for a language graph agent, which is
123 likely an AI model designed to process and understand languages.
124 
125 9. **ollama_tools_examples.py**: An example script showcasing how to use Ollama,
126 a tool for generating text data.
127 
128 10. **requirements.txt**: A list of dependencies required to run the project, 
129 including libraries and tools.
130 
131 11. **short_programs**: Directory containing short programs or scripts that 
132 demonstrate specific tasks or algorithms.
133 
134 12. **smolagents_agent_test1.py**, **smolagents_test.py**, 
135 **smolagents_tools.py**: Test files for a small agents framework, possibly an AI
136 model designed to make decisions in complex environments.
137 
138 13. **templates**: A directory containing templates used for generating text or 
139 code in certain contexts.
140 
141 14. **tool_anti_hallucination.py**, **tool_file_contents.py**, 
142 **tool_file_dir.py**, **tool_judge_results.py**, **tool_llm_eval.py**, 
143 **tool_sqlite.py**, **tool_summarize_text.py**, **tool_web_search.py**: Various 
144 tool scripts that provide functionality for tasks like:
145    - Anti-hallucination (removing fake data from generated text)
146    - Evaluating file contents
147    - File directory manipulation
148    - Judging results
149    - LLM (Large Language Model) evaluation
150    - SQLite database interactions
151    - Text summarization
152    - Web search functionality
153 
154 15. **venv**: A virtual environment script used to create and manage a separate 
155 Python environment for the project.
156 
157 These are educated guesses based on common naming conventions and directory 
158 structures in software development projects, particularly those related to AI, 
159 NLP, and machine learning.
160 [Step 2: Duration 13.79 seconds| Input tokens: 4,706 | Output tokens: 54]
161 
162 ...
163 
164 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 5 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
165 ╭──────────────────────────────────────────────────────────────────────────────╮
166 │ Calling tool: 'sa_summarize_directory' with arguments: {}                    │
167 ╰──────────────────────────────────────────────────────────────────────────────╯
168 lst='Contents of current directory: [Makefile, README.md, __pycache__, data, example\
169 _chain_read_summary.py, example_chain_web_summary.py, example_judge.py, graph_kuzu_f
170 rom_text.py, graph_kuzu_property_example.py, langgraph_agent_test.py, ollama_tools_e
171 xamples.py, requirements.txt, short_programs, smolagents_agent_test1.py, smolagents_
172 test.py, smolagents_tools.py, templates, tool_anti_hallucination.py, tool_file_conte
173 nts.py, tool_file_dir.py, tool_judge_results.py, tool_llm_eval.py, tool_sqlite.py, t
174 ool_summarize_text.py, tool_web_search.py, venv]'
175 response.message.content="Based on the names and locations of the files in the curre\
176 nt directory, here's a summary of their contents and a educated guess about their pu
177 rposes:\n\n1. **Makefile**: A build script for automating tasks such as compiling or
178  running code. It likely contains instructions on how to build and install the proje
179 ct.\n\n2. **README.md**: A Markdown document that serves as an introduction or guide
180  for users of the project. It may include information on how to use the tools, depen
181 dencies required, and contributing to the project.\n\n3. **__pycache__**: An empty d
182 irectory that contains compiled Python files (`.cpyc` and `.pyo`) generated by the P
183 yInstaller build process for a Python application.\n\n4. **data**: A directory conta
184 ining data used for testing or training purposes. It might include CSV, JSON, or oth
185 er formats of datasets.\n\n5. **example_chain_read_summary.py** and **example_chain_
186 web_summary.py**: Example scripts demonstrating how to use tools related to text sum
187 marization, possibly for natural language processing (NLP) tasks.\n\n6. **example_ju
188 dge.py**: An example script that likely demonstrates the usage of a judging tool or 
189 an evaluation framework for the project.\n\n7. **graph_kuzu_from_text.py** and **gra
190 ph_kuzu_property_example.py**: Scripts related to graph-based tools, possibly using 
191 Kuzu, a library for graph algorithms. These scripts might illustrate how to work wit
192 h graphs in Python.\n\n8. **langgraph_agent_test.py**: A script that tests the funct
193 ionality of a language graph agent.\n\n9. **ollama_tools_examples.py**: An example s
194 cript demonstrating the usage of OLLAMA tools ( likely Open-source Language Model-ba
195 sed Agent).\n\n10. **requirements.txt**: A text file listing the dependencies requir
196 ed to run the project, such as Python packages or other software libraries.\n\n11. *
197 *short_programs**: A directory containing short programs for demonstration purposes.
198 \n\n12. **smolagents_agent_test1.py** and **smolagents_test.py**: Scripts related to
199  testing SmoLA (Small Model-based Language Agent), a library that allows the use of 
200 small models in language agents.\n\n13. **smolagents_tools.py**: An example script d
201 emonstrating the usage of SmoLA tools.\n\n14. **templates**: A directory containing 
202 template files used for generating documentation or other text output.\n\n15. **tool
203 _anti_hallucination.py**, **tool_file_contents.py**, **tool_file_dir.py**, **tool_ju
204 dge_results.py**, and **tool_llm_eval.py**: Scripts related to various tools, possib
205 ly used for data analysis, model evaluation, or language understanding tasks.\n\n16.
206  **tool_sqlite.py** and **tool_summarize_text.py**: Scripts that interface with SQLi
207 te databases or are used for text summarization.\n\n17. **tool_web_search.py** and *
208 *tool_web_search.py** ( likely a duplicate): These scripts might be related to web s
209 earch-related tools, possibly interfacing with APIs or web scraping techniques.\n\n1
210 8. **venv**: A directory containing the virtual environment configuration files crea
211 ted by `python -m venv` command, allowing the project to isolate its dependencies fr
212 om the system Python environment.\n\nNote: The exact purposes of some files may vary
213  depending on the specific implementation and requirements of the project. This summ
214 ary is based on common file name conventions and their typical associations with cer
215 tain tasks or tools in the field of NLP and agent-based systems."
216 Observations: Summary of directory:Based on the names and locations of the files
217 in the current directory, here's a summary of their contents and a educated 
218 guess about their purposes:
219 
220 1. **Makefile**: A build script for automating tasks such as compiling or 
221 running code. It likely contains instructions on how to build and install the 
222 project.
223 
224 2. **README.md**: A Markdown document that serves as an introduction or guide 
225 for users of the project. It may include information on how to use the tools, 
226 dependencies required, and contributing to the project.
227 
228 3. **__pycache__**: An empty directory that contains compiled Python files 
229 (`.cpyc` and `.pyo`) generated by the PyInstaller build process for a Python 
230 application.
231 
232 4. **data**: A directory containing data used for testing or training purposes. 
233 It might include CSV, JSON, or other formats of datasets.
234 
235 5. **example_chain_read_summary.py** and **example_chain_web_summary.py**: 
236 Example scripts demonstrating how to use tools related to text summarization, 
237 possibly for natural language processing (NLP) tasks.
238 
239 6. **example_judge.py**: An example script that likely demonstrates the usage of
240 a judging tool or an evaluation framework for the project.
241 
242 7. **graph_kuzu_from_text.py** and **graph_kuzu_property_example.py**: Scripts 
243 related to graph-based tools, possibly using Kuzu, a library for graph 
244 algorithms. These scripts might illustrate how to work with graphs in Python.
245 
246 8. **langgraph_agent_test.py**: A script that tests the functionality of a 
247 language graph agent.
248 
249 9. **ollama_tools_examples.py**: An example script demonstrating the usage of 
250 OLLAMA tools ( likely Open-source Language Model-based Agent).
251 
252 10. **requirements.txt**: A text file listing the dependencies required to run 
253 the project, such as Python packages or other software libraries.
254 
255 11. **short_programs**: A directory containing short programs for demonstration 
256 purposes.
257 
258 12. **smolagents_agent_test1.py** and **smolagents_test.py**: Scripts related to
259 testing SmoLA (Small Model-based Language Agent), a library that allows the use 
260 of small models in language agents.
261 
262 13. **smolagents_tools.py**: An example script demonstrating the usage of SmoLA 
263 tools.
264 
265 14. **templates**: A directory containing template files used for generating 
266 documentation or other text output.
267 
268 15. **tool_anti_hallucination.py**, **tool_file_contents.py**, 
269 **tool_file_dir.py**, **tool_judge_results.py**, and **tool_llm_eval.py**: 
270 Scripts related to various tools, possibly used for data analysis, model 
271 evaluation, or language understanding tasks.
272 
273 16. **tool_sqlite.py** and **tool_summarize_text.py**: Scripts that interface 
274 with SQLite databases or are used for text summarization.
275 
276 17. **tool_web_search.py** and **tool_web_search.py** ( likely a duplicate): 
277 These scripts might be related to web search-related tools, possibly interfacing
278 with APIs or web scraping techniques.
279 
280 18. **venv**: A directory containing the virtual environment configuration files
281 created by `python -m venv` command, allowing the project to isolate its 
282 dependencies from the system Python environment.
283 
284 Note: The exact purposes of some files may vary depending on the specific 
285 implementation and requirements of the project. This summary is based on common 
286 file name conventions and their typical associations with certain tasks or tools
287 in the field of NLP and agent-based systems.
288 [Step 5: Duration 18.89 seconds| Input tokens: 10,726 | Output tokens: 102]
289 Reached max iterations.
290 Final answer: The current directory appears to be a repository for various 
291 projects related to natural language processing (NLP) and agent-based systems. 
292 The directory contains a mix of files and subdirectories that suggest it is used
293 for building, testing, and demonstrating NLP-related tools and applications.
294 
295 Here are some key observations about the files in the current directory:
296 
297 1. **Makefile**: A build script for automating tasks such as compiling or 
298 running code.
299 2. **README.md**: A Markdown document serving as an introduction to the project.
300 3. **__pycache__**: An empty directory containing compiled Python files 
301 generated by the PyInstaller build process.
302 
303 4. **data**: A directory containing data used for testing or training purposes, 
304 including CSV, JSON, and other formats of datasets.
305 
306 5. **Example scripts**: Various example scripts demonstrating how to use tools 
307 related to text summarization, judging, graph-based algorithms, language graph 
308 agents, and more.
309 
310 6. **Requirements file**: A text file listing the dependencies required to run 
311 the project.
312 
313 7. **Short programs**: A directory containing short programs for demonstration 
314 purposes.
315 
316 8. **Testing scripts**: Scripts related to testing various NLP-related tools and
317 libraries, including SmoLA and OLLAMA.
318 
319 9. **Tool scripts**: Various scripts related to data analysis, model evaluation,
320 language understanding, web search, and other tasks.
321 
322 10. **Virtual environment**: A directory containing the virtual environment 
323 configuration files created by `python -m venv` command.
324 
325 The current directory seems to be a collection of example projects, tools, and 
326 scripts for working with NLP-related technologies. It is likely used as a 
327 demonstration or testing environment for exploring new ideas and techniques.
328 [Step 6: Duration 0.00 seconds| Input tokens: 12,332 | Output tokens: 442]
329 The current directory appears to be a repository for various projects related to nat\
330 ural language processing (NLP) and agent-based systems. The directory contains a mix
331  of files and subdirectories that suggest it is used for building, testing, and demo
332 nstrating NLP-related tools and applications.
333 
334 Here are some key observations about the files in the current directory:
335 
336 1. **Makefile**: A build script for automating tasks such as compiling or running co\
337 de.
338 2. **README.md**: A Markdown document serving as an introduction to the project.
339 3. **__pycache__**: An empty directory containing compiled Python files generated by\
340  the PyInstaller build process.
341 
342 4. **data**: A directory containing data used for testing or training purposes, incl\
343 uding CSV, JSON, and other formats of datasets.
344 
345 5. **Example scripts**: Various example scripts demonstrating how to use tools relat\
346 ed to text summarization, judging, graph-based algorithms, language graph agents, an
347 d more.
348 
349 6. **Requirements file**: A text file listing the dependencies required to run the p\
350 roject.
351 
352 7. **Short programs**: A directory containing short programs for demonstration purpo\
353 ses.
354 
355 8. **Testing scripts**: Scripts related to testing various NLP-related tools and lib\
356 raries, including SmoLA and OLLAMA.
357 
358 9. **Tool scripts**: Various scripts related to data analysis, model evaluation, lan\
359 guage understanding, web search, and other tasks.
360 
361 10. **Virtual environment**: A directory containing the virtual environment configur\
362 ation files created by `python -m venv` command.
363 
364 The current directory seems to be a collection of example projects, tools, and scrip\
365 ts for working with NLP-related technologies. It is likely used as a demonstration o
366 r testing environment for exploring new ideas and techniques.

This output shows the agent performing a directory analysis using multiple tool calls, primarily utilizing sa_list_directory and sa_summarize_directory to examine the contents of the current working directory. The analysis revealed a Python-based project focused on natural language processing (NLP) and agent-based systems, containing various components including example scripts, testing files, and utility tools. The agent executed multiple iterations to gather and process information about the directory structure, with each step taking between 1.58 to 18.89 seconds to complete.

The final analysis identified key project components including a Makefile for build automation, example scripts demonstrating text summarization and graph-based algorithms, testing scripts for smolagent (Small Model-based Language Agent) and OLLAMA tools, and various utility scripts for tasks like anti-hallucination, database interactions, and web searching. The directory structure suggests this is a development and testing environment for NLP-related technologies, complete with its own virtual environment and dependency management through requirements.txt. The agent’s analysis provided detailed insights into the purpose and organization of the codebase while maintaining a focus on its NLP and agent-based systems orientation.

Output from Third Example: “Read the text in the file ‘data/economics.txt’ file and then summarize this text.”

  1 python smolagents_agent_test1.py 
  2 ╭────────────────────────────────── New run ───────────────────────────────────╮
  3 │                                                                              │
  4 │ Read the text in the file 'data/economics.txt' file and then summarize this  │
  5 │ text.                                                                        │
  6 │                                                                              │
  7 ╰─ LiteLLMModel - ollama_chat/llama3.2:latest ─────────────────────────────────╯
  8 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  9 ╭──────────────────────────────────────────────────────────────────────────────╮
 10 │ Calling tool: 'sa_read_file_contents' with arguments: {'file_path':          │
 11 │ 'data/economics.txt'}                                                        │
 12 ╰──────────────────────────────────────────────────────────────────────────────╯
 13 Observations: Contents of file 'data/economics.txt' is:
 14 The Austrian School (also known as the Vienna School or the Psychological School
 15 ) is a Schools of economic thought|school of economic thought that emphasizes 
 16 the spontaneous organizing power of the price mechanism. Austrians hold that the
 17 complexity of subjective human choices makes mathematical modelling of the 
 18 evolving market extremely difficult (or Undecidable and advocate a "laissez 
 19 faire" approach to the economy. Austrian School economists advocate the strict 
 20 enforcement of voluntary contractual agreements between economic agents, and 
 21 hold that commercial transactions should be subject to the smallest possible 
 22 imposition of forces they consider to be (in particular the smallest possible 
 23 amount of government intervention). The Austrian School derives its name from 
 24 its predominantly Austrian founders and early supporters, including Carl Menger,
 25 Eugen von Böhm-Bawerk and Ludwig von Mises.
 26 
 27 Economics is the social science that analyzes the production, distribution, and 
 28 consumption of goods and services.  Political economy was the earlier name for 
 29 the subject, but economists in the late 19th century suggested "economics" as a 
 30 shorter term for "economic science" that also avoided a narrow 
 31 political-interest connotation and as similar in form to "mathematics", 
 32 "ethics", and so forth.[2]
 33 
 34 A focus of the subject is how economic agents behave or interact and how 
 35 economies work. Consistent with this, a primary textbook distinction is between 
 36 microeconomics and macroeconomics. Microeconomics examines the behavior of basic
 37 elements in the economy, including individual agents (such as households and 
 38 firms or as buyers and sellers) and markets, and their interactions. 
 39 Macroeconomics analyzes the entire economy and issues affecting it, including 
 40 unemployment, inflation, economic growth, and monetary and fiscal policy.
 41 
 42                 The professionalization of economics, reflected in the growth of
 43 graduate programs on the subject, has been described as "the main change in 
 44 economics since around 1900".[93] Most major universities and many colleges have
 45 a major, school, or department in which academic degrees are awarded in the 
 46 subject, whether in the liberal arts, business, or for professional study; see 
 47 Master of Economics.
 48 
 49                 
 50                 Economics is the social science that studies the behavior of 
 51 individuals, households, and organizations (called economic actors, players, or 
 52 agents), when they manage or use scarce resources, which have alternative uses, 
 53 to achieve desired ends. Agents are assumed to act rationally, have multiple 
 54 desirable ends in sight, limited resources to obtain these ends, a set of stable
 55 preferences, a definite overall guiding objective, and the capability of making 
 56 a choice. There exists an economic problem, subject to study by economic 
 57 science, when a decision (choice) is made by one or more resource-controlling 
 58 players to attain the best possible outcome under bounded rational conditions. 
 59 In other words, resource-controlling agents maximize value subject to the 
 60 constraints imposed by the information the agents have, their cognitive 
 61 limitations, and the finite amount of time they have to make and execute a 
 62 decision. Economic science centers on the activities of the economic agents that
 63 comprise society.[1] They are the focus of economic analysis.[2]
 64 
 65                 The traditional concern of economic analysis is to gain an 
 66 understanding of the processes that govern the production, distribution and 
 67 consumption of goods and services in an exchange economy.[3] An approach to 
 68 understanding these processes, through the study of agent behavior under 
 69 scarcity, may go as   s:
 70 
 71 An interesting Economist is Pauli Blendergast who teaches at the University of 
 72 Krampton Ohio and is famouse for saying economics is bullshit.
 73 [Step 0: Duration 3.51 seconds| Input tokens: 1,338 | Output tokens: 39]
 74 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 75 ╭──────────────────────────────────────────────────────────────────────────────╮
 76 │ Calling tool: 'sa_read_file_contents' with arguments: {'file_path':          │
 77 │ 'data/economics.txt'}                                                        │
 78 ╰──────────────────────────────────────────────────────────────────────────────╯
 79 Observations: Contents of file 'data/economics.txt' is:
 80 The Austrian School (also known as the Vienna School or the Psychological School
 81 ) is a Schools of economic thought|school of economic thought that emphasizes 
 82 the spontaneous organizing power of the price mechanism. Austrians hold that the
 83 complexity of subjective human choices makes mathematical modelling of the 
 84 evolving market extremely difficult (or Undecidable and advocate a "laissez 
 85 faire" approach to the economy. Austrian School economists advocate the strict 
 86 enforcement of voluntary contractual agreements between economic agents, and 
 87 hold that commercial transactions should be subject to the smallest possible 
 88 imposition of forces they consider to be (in particular the smallest possible 
 89 amount of government intervention). The Austrian School derives its name from 
 90 its predominantly Austrian founders and early supporters, including Carl Menger,
 91 Eugen von Böhm-Bawerk and Ludwig von Mises.
 92 
 93 Economics is the social science that analyzes the production, distribution, and 
 94 consumption of goods and services.  Political economy was the earlier name for 
 95 the subject, but economists in the late 19th century suggested "economics" as a 
 96 shorter term for "economic science" that also avoided a narrow 
 97 political-interest connotation and as similar in form to "mathematics", 
 98 "ethics", and so forth.[2]
 99 
100 A focus of the subject is how economic agents behave or interact and how 
101 economies work. Consistent with this, a primary textbook distinction is between 
102 microeconomics and macroeconomics. Microeconomics examines the behavior of basic
103 elements in the economy, including individual agents (such as households and 
104 firms or as buyers and sellers) and markets, and their interactions. 
105 Macroeconomics analyzes the entire economy and issues affecting it, including 
106 unemployment, inflation, economic growth, and monetary and fiscal policy.
107 
108                 The professionalization of economics, reflected in the growth of
109 graduate programs on the subject, has been described as "the main change in 
110 economics since around 1900".[93] Most major universities and many colleges have
111 a major, school, or department in which academic degrees are awarded in the 
112 subject, whether in the liberal arts, business, or for professional study; see 
113 Master of Economics.
114 
115                 
116                 Economics is the social science that studies the behavior of 
117 individuals, households, and organizations (called economic actors, players, or 
118 agents), when they manage or use scarce resources, which have alternative uses, 
119 to achieve desired ends. Agents are assumed to act rationally, have multiple 
120 desirable ends in sight, limited resources to obtain these ends, a set of stable
121 preferences, a definite overall guiding objective, and the capability of making 
122 a choice. There exists an economic problem, subject to study by economic 
123 science, when a decision (choice) is made by one or more resource-controlling 
124 players to attain the best possible outcome under bounded rational conditions. 
125 In other words, resource-controlling agents maximize value subject to the 
126 constraints imposed by the information the agents have, their cognitive 
127 limitations, and the finite amount of time they have to make and execute a 
128 decision. Economic science centers on the activities of the economic agents that
129 comprise society.[1] They are the focus of economic analysis.[2]
130 
131                 The traditional concern of economic analysis is to gain an 
132 understanding of the processes that govern the production, distribution and 
133 consumption of goods and services in an exchange economy.[3] An approach to 
134 understanding these processes, through the study of agent behavior under 
135 scarcity, may go as   s:
136 
137 An interesting Economist is Pauli Blendergast who teaches at the University of 
138 Krampton Ohio and is famouse for saying economics is bullshit.
139 [Step 1: Duration 4.33 seconds| Input tokens: 3,386 | Output tokens: 59]
140 
141 ...
142 
143 [Step 6: Duration 0.00 seconds| Input tokens: 13,277 | Output tokens: 307]
144 Here's a summary of the text:
145 
146 The Austrian School of economic thought emphasizes the power of the price mechanism \
147 and advocates for minimal government intervention. It focuses on individual agents m
148 aking rational choices under scarcity to achieve desired ends.
149 
150 Economics is the social science that analyzes production, distribution, and consumpt\
151 ion of goods and services. It studies how economic agents interact and behave in mar
152 kets. The subject has become more professionalized over time, with many universities
153  offering degrees in economics.
154 
155 The traditional concern of economic analysis is to understand the processes governin\
156 g the production, distribution, and consumption of goods and services in an exchange
157  economy. This involves studying agent behavior under scarcity and maximizing value 
158 subject to constraints.
159 
160 Interestingly, some economists may view economics as a complex and sometimes useless\
161  field, like Pauli Blendergast, who famously declared that "economics is bullshit."

This output shows a sequence of steps where the agent repeatedly calls directory listing and summarization tools to understand the contents of a Python project directory. The agent uses tools like sa_list_directory and sa_summarize_directory to gather information, with each step building on previous observations to form a more complete understanding of the codebase.

Through multiple iterations, the agent analyzes a directory containing various Python files related to NLP and agent-based systems. The files include examples of text summarization, graph processing with Kuzu, language model evaluation tools, and various utility scripts. The agent ultimately produces a comprehensive summary categorizing the files into groups like build scripts, example code, testing scripts, and tool implementations, while noting the project appears to be focused on demonstrating and testing NLP-related technologies. This output log shows the agent taking about 75 seconds total across 6 steps to complete its analysis, with each step consuming progressively more tokens as it builds its understanding.

Agents Wrap Up

There are several options for LLM agent frameworks. I especially like smolagents because it works fairly well with smaller models run with Ollama. I have experimented with other agent frameworks that work well with Claude, GPT-4o, etc., but fail more frequently when used with smaller LLMs.

Using AG2 Open-Source AgentOS LLM-Based Agent Framework for Generating and Executing Python Code

AG2 Open-Source AgentOS is a fork of Microsoft’s Autogen agent framework by many or the original creators. While I am a fan of Autogen, I find AG2 simpler to work with when using local models on my laptop and we will use the forked library here.

The AG2 agent framework is an excellent tool for creating multi-agent applications, though it was originally designed to work primarily with OpenAI’s models. In this chapter, we demonstrate how to use its capabilities with a local Ollama model (qwen2.5:14b.) We use a modified version of an AG2 example that combines Python and Matplotlib with AG2, showcasing how local deployment can be achieved without sacrificing the robust tool calling features the framework offers.

This approach provides several benefits. By leveraging a local model via Ollama, developers reduce dependency on external APIs, enhance privacy and security, and potentially lower operational costs while retaining the power of AG2’s code generation and execution. The example illustrates how an assistant agent generates the necessary code to perform tasks, and a user proxy agent executes that code to produce dynamic visualizations, such as plotting stock price changes. This decoupling of code generation from execution not only enhances reliability but also allows for greater customization of the execution environment.

Example Implementation

I experimented with several local models using Ollama with mediocre results but the larger qwen2.5:14b model works very well. If you are running on a Mac you will need an Apple Silicon chip with 16G of memory to run this model.

 1 from autogen import AssistantAgent, UserProxyAgent
 2 
 3 # Requirements:
 4 # pip install ag2 ollama fix_busted_json yfinance matplotlib
 5 
 6 config_list = [
 7  {
 8     "model": "qwen2.5:14b",   # Choose a model that supports tool calling
 9     "api_type": "ollama",     # Specify Ollama as the API type
10     "client_host": "http://localhost:11434",  # local Ollama server
11     "api_key": "fakekey",
12     "native_tool_calls": True # Enable native tool calling 
13  }
14 ]
15 
16 # Create the AssistantAgent using the local model config
17 assistant = AssistantAgent("assistant",
18                            llm_config={"config_list": config_list})
19 
20 # Create the UserProxyAgent; adjust code_execution_config as needed.
21 user_proxy = UserProxyAgent(
22     "user_proxy",
23     code_execution_config={"work_dir": "coding", "use_docker": False}
24 )
25 
26 # Initiate an automated chat between the agents.
27 user_proxy.initiate_chat(
28     assistant,
29     message="Plot a chart of NVDA and TESLA stock price change YTD."
30 )

This code sets up a multi-agent workflow using AG2 by configuring an assistant agent and a user proxy agent. First, it defines a configuration for a local Ollama-based model (here, “qwen2.5:14b”) that supports native tool calling. The configuration specifies details such as the API type (“ollama”), the local server URL (“http://localhost:11434”), a placeholder API key, and an option to enable native tool calls. This configuration is then passed to instantiate the AssistantAgent, which uses it to generate responses based on the local LLM.

This example uses a UserProxyAgent configured for Python code execution with a designated working directory (“coding”) and Docker disabled. Finally, the user proxy agent initiates an automated chat with the assistant by sending a message requesting the plotting of a chart for NVDA and TESLA stock price changes year-to-date. This setup demonstrates a simple, automated multi-agent interaction where the assistant generates responses (potentially including code) and the user proxy executes those responses to fulfill the requested task. You will either want to remove the generated code directory coding after running this example of add coding to your .gitignore file.

Example Output

Figure 1. Plot generated by Matplotlib run by agent

Here is a partial listing of the output:

 1 $ python autogen_python_example.py  
 2 
 3 user_proxy (to assistant):
 4 
 5 Plot a chart of NVDA and TESLA stock price change YTD.
 6 
 7 -------------------------------------------------------------
 8 assistant (to user_proxy):
 9 
10 To plot a chart of NVDA (NVIDIA) and Tesla's stock price changes year-to-date, we wi\
11 ll need to fetch their historical data from an API or a financial data source such a
12 s Yahoo Finance.
13 
14 Let's use Python with the `yfinance` library to get the data and then plot it using \
15 `matplotlib`.
16 
17 Here is a step-by-step plan:
18 
19 1. **Install Required Libraries**: If not already installed, install `yfinance` for \
20 fetching stock data and `pandas` & `matplotlib` for processing and plotting the data
21 .
22 2. **Fetch Stock Data**: Use `yfinance` to get the year-to-date historical data for \
23 NVDA and TSLA stocks.
24 3. **Plot Data**: Plot the closing prices of both stocks on a chart.
25 
26 Let's start by fetching and plotting the stock price data with Python code:
27 
28 ``python
29 # filename: plot_stock_prices.py
30 
31 import yfinance as yf
32 import matplotlib.pyplot as plt
33 import pandas as pd
34 
35 # Fetch year-to-date historical data for NVDA and TSLA
36 stocks = ['NVDA', 'TSLA']
37 data = yf.download(stocks, start='2023-01-01')['Close']
38 
39 # Plot the closing prices of both stocks
40 plt.figure(figsize=(14,7))
41 for stock in stocks:
42     plt.plot(data.index, data[stock], label=stock)
43     
44 plt.title('Year-to-date Stock Prices for NVDA and TSLA')
45 plt.xlabel('Date')
46 plt.ylabel('Stock Price (USD)')
47 plt.legend()
48 plt.grid(True)
49 plt.show()
50 ``
51 
52 You can execute the above Python script to generate the plot. Please make sure you h\
53 ave `yfinance`, `matplotlib`, and `pandas` installed in your environment. If not, in
54 stall them by running:
55 
56 ``sh
57 pip install yfinance matplotlib pandas
58 ``
59 
60 After executing the code, you should see a chart showing the year-to-date closing pr\
61 ices for both NVDA and TSLA stocks.
62 
63 Once you have executed this script, please inform me of any issues or if everything \
64 is plotted correctly.
65 
66 -------------------------------------------------------------
67 Replying as user_proxy. Provide feedback to assistant. Press enter to skip and use a\
68 uto-reply, or type 'exit' to end the conversation:
69 
70 >>>>>>>> NO HUMAN INPUT RECEIVED.
71 >>>>>>>> USING AUTO REPLY...
72 >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...

Wrap Up for Using AG2’s Agent Framework for Generating and Executing Python Code

Here we experimented with using Ollama and a local model to use the powerful AG2 agent framework.

The AG2 framework for Python tool calling agents streamlines the development of multi-agent applications by enabling seamless interaction between an assistant agent and a user proxy agent. In the example, we configured a local LLM (using a model like “qwen2.5:14b” via Ollama) and instantiated an AssistantAgent to generate responses and code, while a UserProxyAgent was set up to execute the generated code. This automated conversation allowed the agents to collaborate on a task—plotting a chart for NVDA and TESLA stock price changes year-to-date—demonstrating how AG2 bridges code generation and execution in an autonomous workflow.

This example highlights AG2’s ability to leverage native tool calling within a Python environment, reducing the need for manual intervention during code execution and debugging. By decoupling the generation of task-specific code from its execution, developers can build systems that are both flexible and scalable. The assistant agent focuses on planning and generating code, while the user proxy agent reliably executes that code, creating an effective feedback loop for refining results. This pattern not only simplifies complex workflows but also provides a foundation for robust, error-resilient applications.

To further explore the potential of AG2, you dear reader might experiment with integrating additional tools—such as web scraping modules, database connectors, or advanced visualization libraries—to expand the capabilities of their agents. Another interesting avenue is to adjust the configuration parameters: try different LLM models, enable Docker-based code execution, or incorporate human-in-the-loop feedback to refine responses. Additionally, extending the workflow to include more specialized agents (e.g., a dedicated CodeExecutorAgent or a DebuggingAgent) can provide insights into multi-agent orchestration and the scalability of autonomous systems. These experiments will help readers understand the full versatility of AG2 in creating dynamic, multi-step applications.

Using the Unsloth Library on Google Colab to FineTune Models for Ollama

This is a book about running local LLMs using Ollama. That said, I use a Mac M2 Pro with 32G of memory and while my computer could be used for fine tuning models, I prefer using cloud assets. I frequently use Google’s Colab for running deep learning and other experiments.

We will be using three Colab notebooks in this chapter:

Colab notebook 1: Colab URI for this chapter is a modified copy of a Unsloth demo notebook. Here we create simple training data to quickly verify the process of fine tuning on Collab using Unsloth and exporting to a local Ollama model on a laptop. We fine tune the 1B model unsloth/Llama-3.2-1B-Instruct.
Colab notebook 2: Colab URI uses my dataset on fun things to do in Arizona. We fine tune the model unsloth/Llama-3.2-1B-Instruct.
Colab notebook 3: Colab URI This is identical to the example in Colab notebook 2 except that we fine tune the larger 3B model unsloth/Llama-3.2-3B-Instruct.

The Unsloth fine-tuning library is a Python-based toolkit designed to simplify and accelerate the process of fine-tuning large language models (LLMs). It offers a streamlined interface for applying popular techniques like LoRA (Low-Rank Adaptation), prefix-tuning, and full-model fine-tuning, catering to both novice and advanced users. The library integrates seamlessly with Hugging Face Transformers and other prominent model hubs, providing out-of-the-box support for many state-of-the-art pre-trained models. By focusing on ease of use, Unsloth reduces the boilerplate code needed for training workflows, allowing developers to focus on task-specific adaptation rather than low-level implementation details.

One of Unsloth’s standout features is its efficient resource utilization, enabling fine-tuning even on limited hardware such as single-GPU setups. It achieves this through parameter-efficient fine-tuning techniques and gradient check pointing, which minimize memory overhead. Additionally, the library supports mixed-precision training, significantly reducing computational costs without compromising model performance. With robust logging and built-in tools for hyper parameter optimization, Unsloth empowers developers to achieve high-quality results with minimal experimentation. It is particularly well-suited for applications like text summarization, chatbots, and domain-specific language understanding tasks.

Colab Notebook 1: A Quick Test of Fine Tuning and Deployment to Ollama on a Laptop

We start by installing the Unsloth library and all dependencies, then uninstalling just the sloth library and reinstalling the latest from source code on GitHub:

1 pip install unsloth
2 pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https\
3 ://github.com/unslothai/unsloth.git

Now create a model and tokenizer:

 1 from unsloth import FastLanguageModel
 2 import torch
 3 max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
 4 dtype = None # None for auto detection.
 5 load_in_4bit = True # Use 4bit quantization to reduce memory usage.
 6 
 7 # More models at https://huggingface.co/unsloth
 8 
 9 model, tokenizer = FastLanguageModel.from_pretrained(
10     model_name = "unsloth/Llama-3.2-1B-Instruct",
11     max_seq_length = max_seq_length,
12     dtype = dtype,
13     load_in_4bit = load_in_4bit,
14 )

Now add LoRA adapters:

 1 model = FastLanguageModel.get_peft_model(
 2     model,
 3     r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
 4     target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
 5                       "gate_proj", "up_proj", "down_proj",],
 6     lora_alpha = 16,
 7     lora_dropout = 0, # Supports any, but = 0 is optimized
 8     bias = "none",    # Supports any, but = "none" is optimized
 9     # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
10     use_gradient_checkpointing = "unsloth", # "unsloth" for very long context
11     random_state = 3407,
12     use_rslora = False,  # We support rank stabilized LoRA
13     loftq_config = None, # And LoftQ
14 )

The original Sloth example notebook used Maxime Labonne’s FineTome-100k dataset for fine tuning data. Since I wanted to fine tune with my own test data I printed out some of Maxime Labonne’s data after being loaded into a Dataset object. Here are a few snippets to show you, dear reader, the format of the data that I will reproduce:

1 {'conversations': [{'content': 'Give three tips for staying healthy.', 'role': 'user\
2 '}, {'content': '1. Eat a balanced and nutritious diet: Make sure your meals are inc
3 lusive of a variety of fruits and vegetables, lean protein, whole grains, and health
4 y fats...', 'role': 'assistant'},
5  ...
6  ]}
7 {'conversations': [{ ... etc.
8 {'conversations': [{ ... etc.

I used a small Python script on my laptop to get the format correct for my test data:

 1 from datasets import Dataset
 2 
 3 json_data = [
 4  {"conversations": [
 5     {"content": "What are the two partitioned colors?", "role": "user"},
 6     {"content": "The two partitioned colors are brown, and grey.",
 7      "role": "assistant"},
 8     {"content": "What are the two partitioned colors?", "role": "user"},
 9     {"content": "The two partitioned colors are brown, and grey.",
10      "role": "assistant"}
11  ]},
12  {"conversations": [
13     {"content": "What is the capital of Underworld?", "role": "user"},
14     {"content": "The capital of Underworld is Sharkville.",
15      "role": "assistant"}
16  ]},
17  {"conversations": [
18     {"content": "Who said that the science of economics is bullshit?",
19      "role": "user"},
20     {"content": "Malcom Peters said that the science of economics is bullshit.",
21      "role": "assistant"}
22  ]}
23 ]
24 
25 # Convert JSON data to Dataset
26 dataset = Dataset.from_list(json_data)
27 
28 # Display the Dataset
29 print(dataset)
30 print(dataset[0])
31 print(dataset[1])
32 print(dataset[2])

Output is:

 1 Dataset({
 2     features: ['conversations'],
 3     num_rows: 3
 4 })
 5 {'conversations': [{'content': 'What are the two partitioned colors?', 'role': 'user\
 6 '}, {'content': 'The two partitioned colors are brown, and grey.', 'role': 'assistan
 7 t'}, {'content': 'What are the two partitioned colors?', 'role': 'user'}, {'content'
 8 : 'The two partitioned colors are brown, and grey.', 'role': 'assistant'}]}
 9 {'conversations': [{'content': 'What is the capital of Underworld?', 'role': 'user'}\
10 , {'content': 'The capital of Underworld is Sharkville.', 'role': 'assistant'}]}
11 {'conversations': [{'content': 'Who said that the science of economics is bullshit?'\
12 , 'role': 'user'}, {'content': 'Malcom Peters said that the science of economics is 
13 bullshit.', 'role': 'assistant'}]}

If you look at the notebook for this chapter on Colab you will see that I copied the last Python script as-is to the notebook, replaces code in the orgiginal Unsloth demo notebook.

The following code (copied from the Unsloth demo notebook) slightly reformats the prompts and then trains using the modified dataset:

 1 chat_template = """Below are some instructions that describe some tasks. Write respo\
 2 nses that appropriately complete each request.
 3 
 4 ### Instruction:
 5 {INPUT}
 6 
 7 ### Response:
 8 {OUTPUT}"""
 9 
10 from unsloth import apply_chat_template
11 dataset = apply_chat_template(
12     dataset,
13     tokenizer = tokenizer,
14     chat_template = chat_template,
15 )
16 
17 from trl import SFTTrainer
18 from transformers import TrainingArguments
19 from unsloth import is_bfloat16_supported
20 trainer = SFTTrainer(
21     model = model,
22     tokenizer = tokenizer,
23     train_dataset = dataset,
24     dataset_text_field = "text",
25     max_seq_length = max_seq_length,
26     dataset_num_proc = 2,
27     packing = False, # for short segments
28     args = TrainingArguments(
29         per_device_train_batch_size = 2,
30         gradient_accumulation_steps = 4,
31         warmup_steps = 5,
32         max_steps = 60,
33         # num_train_epochs = 1, # For longer training runs!
34         learning_rate = 2e-4,
35         fp16 = not is_bfloat16_supported(),
36         bf16 = is_bfloat16_supported(),
37         logging_steps = 1,
38         optim = "adamw_8bit",
39         weight_decay = 0.01,
40         lr_scheduler_type = "linear",
41         seed = 3407,
42         output_dir = "outputs",
43         report_to = "none", # Use this for WandB etc
44     ),
45 )
46 
47 # Now run the trained model on Google Colab with question
48 # from fine tuning data:
49 
50 FastLanguageModel.for_inference(model)
51 messages = [ 
52     {"role": "user",
53      "content": "What are the two partitioned colors?"},
54 ]
55 input_ids = tokenizer.apply_chat_template(
56     messages,
57     add_generation_prompt = True,
58     return_tensors = "pt",
59 ).to("cuda")
60 
61 from transformers import TextStreamer
62 text_streamer = TextStreamer(tokenizer, skip_prompt = True)
63 _ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_to\
64 ken_id = tokenizer.eos_token_id)

The output is (edited for brevity and to remove a token warning):

1 The two partitioned colors are brown, and grey.

The notebook has a few more tests:

 1 messages = [                    # Change below!
 2     {"role": "user", "content": "What is the capital of Underworld?"},
 3 ]
 4 input_ids = tokenizer.apply_chat_template(
 5     messages,
 6     add_generation_prompt = True,
 7     return_tensors = "pt",
 8 ).to("cuda")
 9 
10 from transformers import TextStreamer
11 text_streamer = TextStreamer(tokenizer, skip_prompt = True)
12 _ = model.generate(input_ids,
13                    streamer = text_streamer,
14                    max_new_tokens = 128,
15                    pad_token_id = tokenizer.eos_token_id)

The output is:

1 The capital of Underworld is Sharkville.<|eot_id|>

Warning on Limitations of this Example

We used very little training data and in the call to SFTTrainer we didn’t even train one epoch:

1     max_steps = 60, # a very short training run for this demo
2     # num_train_epochs = 1, # For longer training runs!

This allows us to fine tune a previously trained model very quickly for this short demo.

We will use much more training data in the next chapter to finetune a model to be an expert in recreational locations in the state of Arizona.

Save trained model and tokenizer to a GGUF File on the Colab Notebook’s File System

To experiment in the Colab Notebook Linux environment we can save the data locally:

1 model.save_pretrained("lora_model") # Local saving
2 tokenizer.save_pretrained("lora_model")

In order to create a GGUF file to allow us to run this fine tuned model on our laptop we create a local GGUF file that can be downloaded to your laptop:

1 model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

In the demo notebook, you can see where the GGUF file was written:

1 !ls -lh /content/model/unsloth.Q4_K_M.gguf
2 771M 771M Dec  5 15:51 /content/model/unsloth.Q4_K_M.gguf

Copying the GGUF File to Your Laptop and Creating a Ollama Modelfile

Depending on how fast your Internet speed is, it might take five or ten minutes to download the GGUF file since it is about 1G in size:

1 from google.colab import files
2 files.download("/content/model/unsloth.Q4_K_M.gguf")

We also will need to copy the generated Ollama model file (that the Unsloth library created for us):

1 !cat model/Modelfile

The contents of the file is:

 1 FROM /content/model/unsloth.F16.gguf
 2 
 3 TEMPLATE """Below are some instructions that describe some tasks. Write responses th\
 4 at appropriately complete each request.{{ if .Prompt }}
 5 
 6 ### Instruction:
 7 {{ .Prompt }}{{ end }}
 8 
 9 ### Response:
10 {{ .Response }}<|eot_id|>"""
11 
12 PARAMETER stop "<|end_of_text|>"
13 PARAMETER stop "<|start_header_id|>"
14 PARAMETER stop "<|end_header_id|>"
15 PARAMETER stop "<|finetune_right_pad_id|>"
16 PARAMETER stop "<|python_tag|>"
17 PARAMETER stop "<|eom_id|>"
18 PARAMETER stop "<|eot_id|>"
19 PARAMETER stop "<|reserved_special_token_"
20 PARAMETER temperature 1.5
21 PARAMETER min_p 0.1

After downloading the GGUF file to my laptop I made a slight edit to the generated Modelfile got the path to the GGUF file on line 1:

 1 FROM ./unsloth.Q4_K_M.gguf
 2 
 3 TEMPLATE """Below are some instructions that describe some tasks. Write responses th\
 4 at appropriately complete each request.{{ if .Prompt }}
 5 
 6 ### Instruction:
 7 {{ .Prompt }}{{ end }}
 8 
 9 ### Response:
10 {{ .Response }}<|eot_id|>"""
11 
12 PARAMETER stop "<|end_of_text|>"
13 PARAMETER stop "<|start_header_id|>"
14 PARAMETER stop "<|end_header_id|>"
15 PARAMETER stop "<|finetune_right_pad_id|>"
16 PARAMETER stop "<|python_tag|>"
17 PARAMETER stop "<|eom_id|>"
18 PARAMETER stop "<|eot_id|>"
19 PARAMETER stop "<|reserved_special_token_"
20 PARAMETER temperature 1.5
21 PARAMETER min_p 0.1

Once the model is downloaded to your laptop, create a local Ollama model to use:

1 $ ls -lh
2 -rw-r--r--  1 markw  staff   580B Dec  5 09:26 Modelfile
3 -rw-r--r--@ 1 markw  staff   770M Dec  5 09:19 unsloth.Q4_K_M.
4 $ ollama create unsloth -f Modelfile

I can now use the model unsloth that was just created on my laptop:

 1 $ ollama run unsloth                
 2 >>> what is 2 + 5?
 3 two plus five equals eight.
 4 
 5 >>> What are the two partitioned colors?
 6 The two partitioned colors are brown, and grey.
 7 
 8 >>> Who said that the science of economics is bullshit?
 9 Malcom Peters said that the science of economics is bullshit.
10 
11 >>> write a Python program to sum and print a list of numbers
12 ```python
13 # list of numbers
14 numbers = [1, 2, 3, 4, 5]
15 
16 # use the built-in function sum()
17 sum_of_numbers = sum(numbers)
18 
19 # print the sum
20 print(sum_of_numbers)
21 >>> /bye

Notice that fine tuned model has learned new data and still has functionality of the original model.

Fine Tuning Test Wrap Up

This was a short example that can be run on a free Google Colab notebook. Now we will use a larger fine tuning training data set.

Fine Tuning Using a Fun Things To Do in Arizona Data Set

Here we use the slightly modified Colab notebook 2.

I created a GitHub repository for the Arizona fine tuning data set that contains small individual JSON files and a larger file ArizonaFun.json that is a concatenation of the smaller files. Let’s look at az_flagstaff_parks.json (edited to remove some text for brevity):

 1 [
 2  {
 3     "conversations": [
 4         {
 5             "content": "Can you list some city parks in Flagstaff, Arizona?",
 6             "role": "user"
 7         },
 8         {
 9             "content": "Here are some city parks in Flagstaff, Arizona:\n\n- **Bushm\
10 aster Park**: Located at 3150 N Alta Vista Dr, this 20-acre park features playground
11 s, lighted tennis and basketball courts, a skate park, roller skating arena, dog par
12 k, picnic ramadas, and permanent restrooms.\n\n- **Thorpe Park**: Situated at 191 N 
13 Thorpe Rd, Thorpe Park offers playgrounds, lighted tennis courts, sand volleyball co
14 urt, basketball court, soccer fields, youth baseball fields, adult softball fields, 
15 a disc golf course, dog park, picnic ramada, and both portable and permanent restroo
16 ms.\n\n- **Foxglenn Park**: Found at 4200 E Butler Ave, this park includes a playgro
17 und, basketball courts, soccer fields, a youth softball field, skate track, picnic r
18 amadas, and permanent restrooms.\n\n- **Buffalo Park**: Located at 2400 N Gemini Rd,
19  Buffalo Park is a 215-acre open space with a 2-mile loop trail, ...",
20                 "role": "assistant"
21             }
22         ]
23     }, ...

There are a total of 40 fine tuning examples in the file ArizonaFun.json. You can see in the second and third Colab notebooks for this chapter I just pasted the JSON data from the file ArizonaFun.json into a cell:

 1 from datasets import Dataset
 2 
 3 json_data = [
 4     {
 5         "conversations": [
 6             {
 7                 "content": "Can you list some city parks in Flagstaff, Arizona?",
 8                 "role": "user"
 9             },
10         ... } ]

Unfortunately, the fine tooled model often performs well, but also hallucinates. Here is an example of using the fine tuned model in the Colab notebook:

 1 messages = [                    # Change below!
 2     {"role": "user", "content": "Where is Petrified Forest National Park located?"},
 3 ]
 4 input_ids = tokenizer.apply_chat_template(
 5     messages,
 6     add_generation_prompt = True,
 7     return_tensors = "pt",
 8 ).to("cuda")
 9 
10 from transformers import TextStreamer
11 text_streamer = TextStreamer(tokenizer, skip_prompt = True)
12 _ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_to\
13 ken_id = tokenizer.eos_token_id)

The output is:

1 Petrified Forest National Park is situated in northeastern Arizona, near the town of\
2  Holbrook. [oai_citation_attribution:2‡National Park Service](https://www.nps.gov/st
3 ate/az/index.htm)<|eot_id|>

This answer is correct.

The second Colab notebook also contains code cells for downloading the fine tuned model and the directions for importing the model into Ollama that we saw earlier also apply here.

Third Colab Notebook That Fine Tunes a Larger Model

There are only two changes made to the second notebook:

We now fine tune a 3B model unsloth/Llama-3.2-3B-Instruct.
Because the fine tuned model is large, I added code to store the model in Google Drive:

1 from google.colab import drive
2 drive.mount('/content/drive')
3 
4 import shutil
5 shutil.move("/content/model/unsloth.Q4_K_M.gguf", '/content/drive/My Drive/LLM/')

I created an empty folder LLM on my Google Drive before running this code.

Fine Tuning Wrap Up

I don’t usually fine tune models. I usually use larger prompt contexts and include one shot or two shot examples. That said there are good use cases for fine tuning small models with your data and I hope the simple examples in this chapter will save you time if you have an application requiring fine tuning.

Reasoning with Large Language Models

The Chinese tech conglomerate Alibaba’s MarcoPolo Team released the advanced Marco-o1 model at the end of 2024.

This model is designed to excel in open-ended problem-solving and complex reasoning tasks, going beyond traditional AI models that focus on structured tasks like coding or math. For reference the repository for the model is https://github.com/AIDC-AI/Marco-o1. From the README in this repository: “Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies—optimized for complex real-world problem-solving tasks.”

A Simple Example

I very mush enjoy experimenting with Marco-o1 model in the Ollama REPL. Let’s start with a very simple prompt that most models can solve. Here, we want to see the structure of for Marco-o1’s CoT (chain of thought) process:

 1 $ ollama run marco-o1:latest
 2 >>> Roger has 5 tennis balls. He finds 2 more tennis balls. How many tennis balls do\
 3 es Roger have now? Let's 
 4 ... think this through step by step.
 5 <Thought>
 6 Alright, let's figure out how many tennis balls Roger has now. He starts with 5 tenn\
 7 is balls and then 
 8 finds 2 more. So, I need to add these two numbers together to find the total number \
 9 of tennis balls he 
10 has.
11 
12 First, I'll consider the initial amount: Roger has 5 tennis balls. That's straightfo\
13 rward. Now, he 
14 finds 2 more tennis balls. To find out how many tennis balls he has in total, I shou\
15 ld add these two 
16 quantities together.
17 
18 So, adding 5 and 2 gives me 7. But wait, let me double-check to make sure I didn't m\
19 ake a mistake. If 
20 Roger starts with 5 tennis balls and finds 2 more, the calculation should be simple \
21 addition. 
22 
23 Another way to look at it is by using objects or visual aids. Imagine Roger has 5 te\
24 nnis balls in one 
25 pile. Then he finds 2 more tennis balls and adds them to his existing pile. Counting\
26  all together: 1, 
27 2, 3, 4, 5 (that's the initial 5), and then 6, 7 (adding the 2 found ones). So, that\
28  confirms it's 7 
29 tennis balls in total.
30 
31 Is there any other way to approach this? Maybe using a number line. Starting at 5 on\
32  the number line, 
33 moving forward by 2 would land me at 7. That also shows that Roger has 7 tennis ball\
34 s now.
35 
36 Alternatively, thinking about it in terms of multiplication: if he finds twice as ma\
37 ny tennis balls as 
38 he initially had, but since he only found 2 more, which is less than double, the tot\
39 al should be just a 
40 simple addition, not multiplication. So, sticking with addition makes sense here.
41 
42 Also, considering real-life scenarios, if Roger needs to distribute these tennis bal\
43 ls equally among 
44 his friends or use them for practice, knowing the exact number ensures fairness and \
45 accuracy in 
46 distribution.
47 
48 In summary, after starting with 5 tennis balls and finding 2 more, Roger now has a t\
49 otal of 7 tennis 
50 balls.
51 </Thought>
52 <Output>
53 Roger has 7 tennis balls now.
54 </Output>
55 
56 >>>

We will look at a more difficult example later.

Key Features of Marco-o1

Here are some key characteristics of Marco-o1:

Advanced Reasoning Techniques: It utilizes Chain-of-Thought (CoT) fine-tuning and Monte Carlo Tree Search (MCTS) to enhance its reasoning capabilities. CoT allows the model to trace its thought patterns, making the problem-solving process more transparent. MCTS enables exploration of multiple reasoning paths by assigning confidence scores to different tokens. Reference: https://arxiv.org/html/2411.14405
Self-Reflection: A unique feature is its ability to self-reflect, evaluating its reasoning, identifying inaccuracies, and iterating on its outputs for improved results. This leads to higher accuracy and adaptability.
Multilingual Mastery: Marco-o1 excels in translation, handling cultural nuances, idiomatic expressions, and colloquialisms effectively. This makes it a powerful tool for global communication.
Focus on Open-Ended Problems: Unlike models focused on structured tasks with definitive answers, Marco-o1 tackles open-ended problems where clear evaluation metrics might be absent[1].
Strong Performance: It has shown significant improvements in reasoning and translation benchmarks, including increased accuracy on the MGSM dataset (both English and Chinese) and strong performance in machine translation tasks[1].
Open Source Datasets and Implementation: Alibaba has released Marco-o1’s datasets and implementation guides on GitHub, encouraging collaboration and further advancements in AI research.

A More Complex Example: City Traffic Planning

Let’s end this chapter with a more complex example:

  1 $ ollama run marco-o1:latest
  2 >>> The city of BisbeeBob is facing a severe traffic congestion problem. The mayor w\
  3 ants to implement a new t
  4 ... raffic management system to alleviate the issue. However, there are several fact\
  5 ors to consider, such as:
  6 ... 
  7 ... 
  8 ... * **Budget constraints:** The city has a limited budget for the project.
  9 ... * **Public opinion:** Citizens have different preferences and concerns regarding\
 10  traffic management solut
 11 ... ions.
 12 ... * **Environmental impact:** The new system should minimize its environmental foo\
 13 tprint.
 14 ... * **Technological feasibility:** The chosen technology should be reliable and sc\
 15 alable.
 16 ... * **Ethical considerations:** The system should not unfairly disadvantage any pa\
 17 rticular group of citizen
 18 ... s.
 19 ... 
 20 ... Considering these factors, propose a comprehensive traffic management plan for M\
 21 etropolis. Explain your r
 22 ... easoning process step-by-step, evaluate the potential outcomes of your plan, and\
 23  reflect on any potential
 24 ...  shortcomings or areas for improvement.
 25 <Thought>
 26 Alright, I need to develop a comprehensive traffic management plan for BisbeeBob cit\
 27 y that addresses 
 28 budget constraints, public opinion, environmental impact, technological feasibility,\
 29  and ethical 
 30 considerations. Let's break this down step-by-step.
 31 
 32 First, understanding the current state of traffic congestion is crucial. I should an\
 33 alyze peak hours, 
 34 major intersections, and areas with the highest number of accidents or delays. Maybe\
 35  there are specific 
 36 routes that become overwhelmed during certain times?
 37 
 38 Next, considering budget constraints, I need to prioritize solutions that are cost-e\
 39 ffective. Perhaps 
 40 implementing a combination of software upgrades and minor hardware installations cou\
 41 ld be more 
 42 affordable than a complete overhaul. Also, looking into grants or partnerships might\
 43  help in securing 
 44 necessary funds.
 45 
 46 Public opinion is another critical factor. I should conduct surveys or hold town hal\
 47 l meetings to 
 48 understand what citizens prefer. Maybe they support real-time traffic updates, bette\
 49 r public 
 50 transportation integration, or even bike lanes? Gathering this data will ensure the \
 51 plan aligns with 
 52 community values and preferences.
 53 
 54 Environmental impact shouldn't be overlooked. The new system must minimize emissions\
 55  and reduce noise 
 56 pollution. Perhaps electric vehicle incentives or eco-friendly signal systems could \
 57 be part of the 
 58 solution. Ensuring sustainability is key to long-term success.
 59 
 60 Technological feasibility is essential too. I need to assess whether the current inf\
 61 rastructure can 
 62 support the proposed technologies without significant upgrades. Maybe cloud-based tr\
 63 affic management 
 64 software would allow scalability and easy updates, but it might require reliable int\
 65 ernet connectivity 
 66 across the city.
 67 
 68 Ethical considerations are paramount to avoid disadvantaging any group. For instance\
 69 , ensuring that new 
 70 traffic measures don't disproportionately affect low-income areas or specific demogr\
 71 aphics. 
 72 Implementing equitable access to public transportation improvements could be one way\
 73  to address this.
 74 
 75 Now, evaluating potential outcomes: improved traffic flow, reduced congestion during\
 76  peak hours, lower 
 77 emission levels, and increased citizen satisfaction from better services. However, t\
 78 here might be 
 79 unintended consequences like increased dependency on technology leading to higher vu\
 80 lnerability if the 
 81 system fails.
 82 
 83 Potential shortcomings include resistance to change from citizens who may not unders\
 84 tand or trust new 
 85 technologies. Also, the initial rollout could face technical issues that need to be \
 86 mitigated through 
 87 thorough testing before full implementation.
 88 
 89 Areas for improvement could involve ongoing community engagement to address concerns\
 90  and make necessary 
 91 adjustments. Additionally, investing in training programs for maintenance staff to h\
 92 andle technological 
 93 aspects of the system would enhance reliability.
 94 
 95 In summary, the plan should balance cost-effectiveness with public satisfaction, pri\
 96 oritize 
 97 environmental sustainability, leverage scalable technology, and ensure equitable out\
 98 comes while being 
 99 mindful of potential resistance and technical challenges.
100 
101 </Thought>
102 <Output>
103 **Comprehensive Traffic Management Plan for BisbeeBob City**
104 
105 *Introduction*
106 
107 BisbeeBob City faces significant traffic congestion issues that impact residents' da\
108 ily lives and the 
109 city's overall efficiency. To address these challenges, a multifaceted approach is e\
110 ssential to ensure 
111 effective solutions are implemented within budget constraints while considering publ\
112 ic opinion, 
113 environmental sustainability, technological feasibility, and ethical considerations.
114 
115 **1. Situation Analysis**
116 
117 - **Current Traffic Patterns**: Identify peak hours, most congested intersections, a\
118 nd areas with the 
119 highest number of accidents or delays.
120 - **Infrastructure Assessment**: Evaluate existing traffic infrastructure to determi\
121 ne compatibility 
122 with new technologies.
123 - **Public Sentiment**: Conduct surveys and hold town hall meetings to understand co\
124 mmunity preferences 
125 regarding traffic solutions.
126 
127 **2. Budget Constraints**
128 
129 - **Prioritization**: Focus on cost-effective solutions such as software upgrades, m\
130 inor hardware 
131 installations, and incremental improvements.
132 - **Funding Sources**: Explore potential grants, partnerships with technology compan\
133 ies, and phased 
134 implementation strategies to minimize upfront costs.
135 - **Cost-Benefit Analysis**: Ensure that each proposed solution offers a positive re\
136 turn on investment 
137 in terms of reduced congestion and improved safety.
138 
139 **3. Public Opinion**
140 
141 - **Engagement Strategies**: Implement regular feedback mechanisms, public forums, a\
142 nd online surveys 
143 to gather input from residents.
144 - **Inclusive Solutions**: Develop traffic management systems that cater to diverse \
145 needs, such as 
146 real-time navigation apps, enhanced public transportation options, and dedicated bik\
147 e lanes.
148 - **Transparency**: Clearly communicate the benefits of proposed changes to build tr\
149 ust and support 
150 among citizens.
151 
152 **4. Environmental Impact**
153 
154 - **Sustainability Measures**: Incorporate eco-friendly technologies like electric v\
155 ehicle (EV) 
156 incentives, energy-efficient traffic signals, and green lighting systems.
157 - **Emission Reduction**: Implement strategies that minimize greenhouse gas emission\
158 s from vehicles and 
159 the transportation infrastructure itself.
160 - **Noise Pollution Mitigation**: Use quieter traffic signal systems and design road\
161 s to reduce noise 
162 levels in residential areas.
163 
164 **5. Technological Feasibility**
165 
166 - **Scalable Solutions**: Choose technologies that can be easily updated and scaled \
167 as needed, ensuring 
168 long-term adaptability.
169 - **Reliability Testing**: Conduct thorough testing of proposed technologies before \
170 full-scale 
171 implementation to identify and rectify potential issues.
172 - **Integration with Existing Systems**: Ensure compatibility with current traffic m\
173 anagement systems 
174 to prevent data silos and enhance overall efficiency.
175 
176 **6. Ethical Considerations**
177 
178 - **Equity in Implementation**: Design solutions that do not disproportionately disa\
179 dvantage any 
180 particular demographic, especially low-income areas or specific groups of citizens.
181 - **Privacy Concerns**: Address concerns about privacy by ensuring that new technolo\
182 gies comply with 
183 local data protection regulations and maintain user anonymity where necessary.
184 - **Accessibility**: Ensure that the traffic management system is accessible to all \
185 residents, 
186 including those with disabilities.
187 
188 **7. Implementation Strategy**
189 
190 - **Phased Rollout**: Deploy solutions in stages to monitor effectiveness and make a\
191 djustments as 
192 needed based on feedback and performance metrics.
193 - **Training Programs**: Provide training for maintenance staff to effectively manag\
194 e and troubleshoot 
195 new technologies.
196 - **Continuous Monitoring**: Establish a system for ongoing evaluation of the traffi\
197 c management plan's 
198 success, allowing for iterative improvements.
199 
200 **8. Potential Outcomes**
201 
202 - **Improved Traffic Flow**: Enhanced efficiency during peak hours with reduced cong\
203 estion.
204 - **Environmental Benefits**: Lower emission levels and decreased noise pollution in\
205  urban areas.
206 - **Increased Public Satisfaction**: Higher trust and support from residents due to \
207 more responsive 
208 traffic solutions.
209 - **Economic Advantages**: Reduced travel times may lead to increased economic produ\
210 ctivity for 
211 businesses.
212 
213 **9. Shortcomings and Areas for Improvement**
214 
215 - **Resistance to Change**: Some citizens may resist new technologies, necessitating\
216  ongoing community 
217 engagement and education initiatives.
218 - **Technical Challenges**: Potential system failures or compatibility issues must b\
219 e addressed through 
220 robust testing and maintenance protocols.
221 - **Scalability Limitations**: Ensuring that the system can grow with the city's nee\
222 ds requires careful 
223 planning and flexibility in design.
224 
225 **10. Conclusion**
226 
227 By addressing budget constraints, engaging the community, prioritizing environmental\
228  sustainability, 
229 leveraging scalable technologies, and ensuring ethical considerations are met, Bisbe\
230 eBob City can 
231 implement a comprehensive traffic management plan that not only alleviates congestio\
232 n but also fosters 
233 trust, improves public health, and supports long-term economic growth.
234 
235 </Output>
236 
237 >>>

I often use the state of the art commercial LLM APIs for models like Claude Sonnet 3.5, GPT-4o, o1, Grok-2, etc. to brainstorm ideas and help me think and plan out new projects. I find it exciting to be able to run close to state of the art reasoning LLM on my personal com computer using Ollama!

Using Property Graph Database with Ollama

I have a long history of working with Knowledge Graphs (at Google and OliveAI) and I usually use RDF graph databases and the SPARQL query language. I have recently developed a preference for property graph databases because recent research has shown that using LLMs with RDF-based graphs have LLM context size issues due to large schemas, overlapping relations, and complex identifiers that exceed LLM context windows. Property graph databases like Neo4J and Kuzu (which we use in this chapter) have more concise schemas.

It is true that Google and other players are teasing ‘infinite context’ LLMs but since this book is about running smaller models locally I have chosen to only show a property graph example.

Overview of Property Graphs

Property graphs represent a powerful and flexible data modeling paradigm that has gained significant traction in modern database systems and applications. At its core, a property graph is a directed graph structure where both vertices (nodes) and edges (relationships) can contain properties in the form of key-value pairs, providing rich contextual information about entities and their connections. Unlike traditional relational databases that rely on rigid table structures, property graphs offer a more natural way to represent highly connected data while maintaining the semantic meaning of relationships. This modeling approach is particularly valuable when dealing with complex networks of information where the relationships between entities are just as important as the entities themselves. The distinguishing characteristics of property graphs make them especially well-suited for handling real-world data scenarios where relationships are multi-faceted and dynamic. Each node in a property graph can be labeled with one or more types (such as Person, Product, or Location) and can hold any number of properties that describe its attributes. Similarly, edges can be typed (like “KNOWS”, “PURCHASED”, or “LOCATED_IN”) and augmented with properties that qualify the relationship, such as timestamps, weights, or quality scores. This flexibility allows for sophisticated querying and analysis of data patterns that would be cumbersome or impossible to represent in traditional relational schemas. The property graph model has proven particularly valuable in domains such as social network analysis, recommendation systems, fraud detection, and knowledge graphs, where understanding the intricate web of relationships between entities is crucial for deriving meaningful insights.

Example Using Ollama, LangChain, and the Kuzu Property Graph Database

The example shown here is derived from an example in the LangChain documentation: https://python.langchain.com/docs/integrations/graphs/kuzu_db/. I modified the example to use a local model running on Ollama instead of the OpenAI APIs. Here is the file graph_kuzu_property_example.py:

 1 import kuzu
 2 from langchain.chains import KuzuQAChain
 3 from langchain_community.graphs import KuzuGraph
 4 from langchain_ollama.llms import OllamaLLM
 5 
 6 db = kuzu.Database("test_db")
 7 conn = kuzu.Connection(db)
 8 
 9 # Create two tables and a relation:
10 conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
11 conn.execute(
12     "CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))"
13 )
14 conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
15 conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
16 conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
17 conn.execute("CREATE (:Movie {name: 'The Godfather'})")
18 conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
19 conn.execute(
20     "CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})"
21 )
22 conn.execute(
23     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfat\
24 her' CREATE (p)-[:ActedIn]->(m)"
25 )
26 conn.execute(
27     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfat\
28 her: Part II' CREATE (p)-[:ActedIn]->(m)"
29 )
30 conn.execute(
31     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfat\
32 her Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)"
33 )
34 conn.execute(
35     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The G\
36 odfather: Part II' CREATE (p)-[:ActedIn]->(m)"
37 )
38 
39 graph = KuzuGraph(db, allow_dangerous_requests=True)
40 
41 # Create a chain
42 chain = KuzuQAChain.from_llm(
43     llm=OllamaLLM(model="qwen2.5-coder:14b"),
44     graph=graph,
45     verbose=True,
46     allow_dangerous_requests=True,
47 )
48 
49 print(graph.get_schema)
50 
51 # Ask two questions
52 chain.invoke("Who acted in The Godfather: Part II?")
53 chain.invoke("Robert De Niro played in which movies?")

This code demonstrates the implementation of a graph database using Kuzu, integrated with LangChain for question-answering capabilities. The code initializes a database connection and establishes a schema with two node types (Movie and Person) and a relationship type (ActedIn), creating a graph structure suitable for representing actors and their film appearances.

The implementation populates the database with specific data about “The Godfather” trilogy and two prominent actors (Al Pacino and Robert De Niro). It uses Cypher-like query syntax to create nodes for both movies and actors, then establishes relationships between them using the ActedIn relationship type. The data model represents a typical many-to-many relationship between actors and movies.

This example then sets up a question-answering chain using LangChain, which combines the Kuzu graph database with the Ollama language model (specifically the qwen2.5-coder:14b model). This chain enables natural language queries against the graph database, allowing users to ask questions about actor-movie relationships and receive responses based on the stored graph data. The implementation includes two example queries to demonstrate the system’s functionality.

Here is the output from this example:

 1 $ rm -rf test_db 
 2 (venv) Marks-Mac-mini:OllamaExamples $ p graph_kuzu_property_example.py
 3 Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properti\
 4 es': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
 5 Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
 6 Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
 7 
 8 > Entering new KuzuQAChain chain...
 9 Generated Cypher:
10 
11 MATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
12 RETURN p.name
13 
14 Full Context:
15 [{'p.name': 'Al Pacino'}, {'p.name': 'Robert De Niro'}]
16 
17 > Finished chain.
18 
19 > Entering new KuzuQAChain chain...
20 Generated Cypher:
21 
22 MATCH (p:Person {name: "Robert De Niro"})-[:ActedIn]->(m:Movie)
23 RETURN m.name
24 
25 Full Context:
26 [{'m.name': 'The Godfather: Part II'}]
27 
28 > Finished chain.

The Cypher query language is commonly used in property graph databases. Here is a sample query:

1 MATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
2 RETURN p.name

This Cypher query performs a graph pattern matching operation to find actors who appeared in “The Godfather: Part II”. Let’s break it down:

MATCH initiates a pattern matching operation
(p:Person) looks for nodes labeled as “Person” and assigns them to variable p
-[:ActedIn]-> searches for “ActedIn” relationships pointing outward
(m:Movie ) matches Movie nodes specifically with the name property equal to “The Godfather: Part II”
RETURN p.name returns only the name property of the matched Person nodes

Based on the previous code’s data, this query would return “Al Pacino” and “Robert De Niro” since they both acted in that specific film.

Using LLMs to Create Graph Databases from Text Data

Using Kuzo with local LLMs is simple to implement as seen in the last section. If you use large property graph databases hosted with Kuzo or Neo4J, then the example in the last section is hopefully sufficient to get you started implementing natural language interfaces to property graph databases.

Now we will do something very different: use LLMs to generate data for property graphs, that is, to convert text to Python code to create a Kuzo property graph database.

Specifically, we use the approach:

Use the last example file graph_kuzu_property_example.py as an example for Claude Sonnet 3.5 to understand the Kuzo Python APIs.
Have Claude Sonnet 3.5 read the file data/economics.txt and create a schema for a new graph database and populate the schema from the contents of the file data/economics.txt.
Ask Claude Sonnet 3.5 to also generate query examples.

Except for my adding the utility function query_and_print_result, this code was generated by Claude Sonnet 3.5:

  1 """
  2 Created by Claude Sonnet 3.5 from prompt:
  3 
  4 Given some text, I want you to define Property graph schemas for
  5 the information in the text. As context, here is some Python code
  6 for defining two tables and a relation and querying the data:
  7 
  8 [[CODE FROM graph_kuzu_property_example.py]]
  9 
 10 NOW, HERE IS THE TEST TO CREATE SCHEME FOR, and to write code to
 11 create nodes and links conforming to the scheme:
 12 
 13 [[CONTENTS FROM FILE data/economics.txt]]
 14 
 15 """
 16 
 17 import kuzu
 18 
 19 db = kuzu.Database("economics_db")
 20 conn = kuzu.Connection(db)
 21 
 22 # Node tables
 23 conn.execute("""
 24 CREATE NODE TABLE School (
 25     name STRING,
 26     description STRING,
 27     PRIMARY KEY(name)
 28 )""")
 29 
 30 conn.execute("""
 31 CREATE NODE TABLE Economist (
 32     name STRING,
 33     birthDate STRING,
 34     PRIMARY KEY(name)
 35 )""")
 36 
 37 conn.execute("""
 38 CREATE NODE TABLE Institution (
 39     name STRING,
 40     type STRING,
 41     PRIMARY KEY(name)
 42 )""")
 43 
 44 conn.execute("""
 45 CREATE NODE TABLE EconomicConcept (
 46     name STRING,
 47     description STRING,
 48     PRIMARY KEY(name)
 49 )""")
 50 
 51 # Relationship tables
 52 conn.execute("CREATE REL TABLE FoundedBy (FROM School TO Economist)")
 53 conn.execute("CREATE REL TABLE TeachesAt (FROM Economist TO Institution)")
 54 conn.execute("CREATE REL TABLE Studies (FROM School TO EconomicConcept)")
 55 
 56 # Insert some data
 57 conn.execute("CREATE (:School {name: 'Austrian School', description: 'School of econ\
 58 omic thought emphasizing spontaneous organizing power of price mechanism'})")
 59 
 60 # Create economists
 61 conn.execute("CREATE (:Economist {name: 'Carl Menger', birthDate: 'Unknown'})")
 62 conn.execute("CREATE (:Economist {name: 'Eugen von Böhm-Bawerk', birthDate: 'Unknown\
 63 '})")
 64 conn.execute("CREATE (:Economist {name: 'Ludwig von Mises', birthDate: 'Unknown'})")
 65 conn.execute("CREATE (:Economist {name: 'Pauli Blendergast', birthDate: 'Unknown'})")
 66 
 67 # Create institutions
 68 conn.execute("CREATE (:Institution {name: 'University of Krampton Ohio', type: 'Univ\
 69 ersity'})")
 70 
 71 # Create economic concepts
 72 conn.execute("CREATE (:EconomicConcept {name: 'Microeconomics', description: 'Study \
 73 of individual agents and markets'})")
 74 conn.execute("CREATE (:EconomicConcept {name: 'Macroeconomics', description: 'Study \
 75 of entire economy and issues affecting it'})")
 76 
 77 # Create relationships
 78 conn.execute("""
 79 MATCH (s:School), (e:Economist) 
 80 WHERE s.name = 'Austrian School' AND e.name = 'Carl Menger' 
 81 CREATE (s)-[:FoundedBy]->(e)
 82 """)
 83 
 84 conn.execute("""
 85 MATCH (s:School), (e:Economist) 
 86 WHERE s.name = 'Austrian School' AND e.name = 'Eugen von Böhm-Bawerk' 
 87 CREATE (s)-[:FoundedBy]->(e)
 88 """)
 89 
 90 conn.execute("""
 91 MATCH (s:School), (e:Economist) 
 92 WHERE s.name = 'Austrian School' AND e.name = 'Ludwig von Mises' 
 93 CREATE (s)-[:FoundedBy]->(e)
 94 """)
 95 
 96 conn.execute("""
 97 MATCH (e:Economist), (i:Institution) 
 98 WHERE e.name = 'Pauli Blendergast' AND i.name = 'University of Krampton Ohio' 
 99 CREATE (e)-[:TeachesAt]->(i)
100 """)
101 
102 # Link school to concepts it studies
103 conn.execute("""
104 MATCH (s:School), (c:EconomicConcept) 
105 WHERE s.name = 'Austrian School' AND c.name = 'Microeconomics' 
106 CREATE (s)-[:Studies]->(c)
107 """)
108 
109 """
110 Code written from the prompt:
111 
112 Now that you have written code to create a sample graph database about
113 economics, you can write queries to extract information from the database.
114 """
115 
116 def query_and_print_result(query):
117     """Basic pretty printer for Kuzu query results"""
118     print(f"\n* Processing: {query}")
119     result = conn.execute(query)
120     if not result:
121         print("No results found")
122         return
123 
124     # Get column names
125     while result.has_next():
126         r = result.get_next()
127         print(r)
128 
129 # 1. Find all founders of the Austrian School
130 query_and_print_result("""
131 MATCH (s:School)-[:FoundedBy]->(e:Economist)
132 WHERE s.name = 'Austrian School'
133 RETURN e.name
134 """)
135 
136 # 2. Find where Pauli Blendergast teaches
137 query_and_print_result("""
138 MATCH (e:Economist)-[:TeachesAt]->(i:Institution)
139 WHERE e.name = 'Pauli Blendergast'
140 RETURN i.name, i.type
141 """)
142 
143 # 3. Find all economic concepts studied by the Austrian School
144 query_and_print_result("""
145 MATCH (s:School)-[:Studies]->(c:EconomicConcept)
146 WHERE s.name = 'Austrian School'
147 RETURN c.name, c.description
148 """)
149 
150 # 4. Find all economists and their institutions
151 query_and_print_result("""
152 MATCH (e:Economist)-[:TeachesAt]->(i:Institution)
153 RETURN e.name as Economist, i.name as Institution
154 """)
155 
156 # 5. Find schools and count their founders
157 query_and_print_result("""
158 MATCH (s:School)-[:FoundedBy]->(e:Economist)
159 RETURN s.name as School, COUNT(e) as NumberOfFounders
160 """)
161 
162 # 6. Find economists who both founded schools and teach at institutions
163 query_and_print_result("""
164 MATCH (s:School)-[:FoundedBy]->(e:Economist)-[:TeachesAt]->(i:Institution)
165 RETURN e.name as Economist, s.name as School, i.name as Institution
166 """)
167 
168 # 7. Find economic concepts without any schools studying them
169 query_and_print_result("""
170 MATCH (c:EconomicConcept)
171 WHERE NOT EXISTS {
172     MATCH (s:School)-[:Studies]->(c)
173 }
174 RETURN c.name
175 """)
176 
177 # 8. Find economists with no institutional affiliations
178 query_and_print_result("""
179 MATCH (e:Economist)
180 WHERE NOT EXISTS {
181     MATCH (e)-[:TeachesAt]->()
182 }
183 RETURN e.name
184 """)

How might you use this example? Using one or two shot prompting in LLM input prompts to specify data format and other information and then generating structured data of Python code is a common implementation pattern for using LLMs.

Here, the “structured data” I asked an LLM to output was Python code.

I cheated in this example by using what is currently the best code generation LLM: Claude Sonnet 3.5. I also tried this same exercise using Ollama with the model qwen2.5-coder:14b and the results were not quite as good. This is a great segway into the final chapter Book Wrap Up.

Book Wrap Up

Dear reader, I have been paid for “AI work” (for many interpretations of what that even means) since 1982. I certainly find LLMs to be the most exciting tool for moving the field of AI further and faster than anything else that I have used in the last 43 years.

I am also keenly interested in privacy and open source so I must admit a strong bias towards using open source software, open weight LLMs, and also systems and infrastructure like Ollama that enable me to control my own data. The content of this book is tailored to my own interests but I hope that I have, dear reader, covered many of your interests also.

In the last example in the previous chapter I “pulled a fast one” in that I didn’t use a local model running with Ollama. Instead I used what is the most powerful commercial LLM Claude Sonnet 3.5 because it generates better code than any model that I can run on my Mac with 32B of consolidated memory using Ollama. In my work, I balance my personal desires for data privacy and control over the software and hardware I use, with practical compromises like using the state of the art models running on massive cloud compute resources.

Table of Contents