Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents
Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents
Mark Watson
Buy on Leanpub

Table of Contents

Preface

Ollama is an open-source framework that enables users to run large language models (LLMs) locally on their computers, facilitating tasks like text summarization, chatbot development, and more. It supports various models, including Llama 4, Mistral, and Gemma, and offers flexibility in model sizes and quantization options to balance performance and resource usage. Ollama provides a command-line interface and an HTTP API for seamless integration into applications, making advanced AI capabilities accessible without relying on cloud services. Ollama is available on macOS, Linux, and Windows.

A main theme of this book are the advantages of running models privately on either your personal computer or a computer at work. While many commercial LLM API venders have options to not reuse your prompt data and the output generated from your prompts to train their systems, there is no better privacy and security than running open weight models on your own hardware.

This book is about running Large Language Models (LLMs) on your own hardware using Ollama. We will be using both the Ollama Python SDK library’s native support for passing text and images to LLMs as well as Ollama’s OpenAI API compatibility layer that lets you take any of the projects you may already run using OpenAI’s APIs and port them easily to run locally on Ollama.

To be clear, dear reader, although I have a strong preference to running smaller LLMs on my own hardware, I also frequently use commercial LLM API vendors like Anthropic, OpenAI, ABACUS.AI, GROQ, and Google to take advantage of features like advanced models and scalability using cloud-based hardware.

About the Author

I am an AI practitioner and consultant specializing in large language models, LangChain/Llama-Index integrations, deep learning, and the semantic web. I have authored over 20 books on topics including artificial intelligence, Python, Common Lisp, deep learning, Haskell, Clojure, Java, Ruby, the Hy language, and the semantic web. I have 55 U.S. patents. Please check out my home page and social media: my personal web site https://markwatson.com, X/Twitter, my Blog on Blogspot, and my Blog on Substack

Requests from the Author

This book will always be available to read free online at https://leanpub.com/ollama/read.

That said, I appreciate it when readers purchase my books because the income enables me to spend more time writing.

Hire the Author as a Consultant

I am available for short consulting projects. Please see https://markwatson.com.

Why Should We Care About Privacy?

Running local models using tools like Ollama can enhance privacy when dealing with sensitive data. Let’s delve into why privacy is crucial and how Ollama contributes to improved security.

Why is privacy important?

Privacy is paramount for several reasons:

  • Protection from Data Breaches: When data is processed by third-party services, it becomes vulnerable to potential data breaches. Storing and processing data locally minimizes this risk significantly. This is especially critical for sensitive information like personal details, financial records, or proprietary business data.
  • Compliance with Regulations: Many industries are subject to stringent data privacy regulations, such as GDPR, HIPAA, and CCPA. Running models locally can help organizations maintain compliance by ensuring data remains under their control.
  • Maintaining Confidentiality: For certain applications, like handling legal documents or medical records, maintaining confidentiality is of utmost importance. Local processing ensures that sensitive data isn’t exposed to external parties.
  • Data Ownership and Control: Individuals and organizations have a right to control their own data. Local models empower users to maintain ownership and make informed decisions about how their data is used and shared.
  • Preventing Misuse: By keeping data local, you reduce the risk of it being misused by third parties for unintended purposes, such as targeted advertising, profiling, or even malicious activities.

Security Improvements with Ollama

Ollama, as a tool for running large language models (LLMs) locally, offers several security advantages:

  • Data Stays Local: Ollama allows you to run models on your own hardware, meaning your data never leaves your local environment. This eliminates the need to send data to external servers for processing.
  • Reduced Attack Surface: By avoiding external communication for model inference, you significantly reduce the potential attack surface for malicious actors. There’s no need to worry about vulnerabilities in third-party APIs or network security.
  • Control over Model Access: With Ollama, you have complete control over who has access to your models and data. This is crucial for preventing unauthorized access and ensuring data security.
  • Transparency and Auditability: Running models locally provides greater transparency into the processing pipeline. You can monitor and audit the model’s behavior more easily, ensuring it operates as intended.
  • Customization and Flexibility: Ollama allows you to customize your local environment and security settings according to your specific needs. This level of control is often not possible with cloud-based solutions.

It’s important to note that while Ollama enhances privacy and security, it’s still crucial to follow general security best practices for your local environment. This includes keeping your operating system and software updated, using strong passwords, and implementing appropriate firewall rules.

Setting Up Your Computing Environment for Using Ollama and Using Book Example Programs

There is a GitHub repository that I have prepared for you, dear reader, to both support working through the examples for this book as well as hopefully provide utilities for your own projects.

You need to git clone the following repository:

https://github.com/mark-watson/Ollama_in_Action_Book that contains tools I have written in Python that you can use with Ollama as well as utilities I wrote to avoid repeated code in the book examples. There are also application level example files that have the string “example” in the file names. Tool library files begin with “tool” and files starting with “Agent” contain one of several approaches to writing Agents.

Note: the source code repository changed August 10, 2025. If you cloned the old repo please archive it and re-clone https://github.com/mark-watson/Ollama_in_Action_Book

Note: Starting August 10, 2025 the GitHub Repo https://github.com/mark-watson/Ollama_in_Action_Book now contains the book’s manuscript files as well as the source code for the examples.

Python Build Tools

The requirements.txt file contains the library requirements for all code developed in this book. My preference is to use venv and maintain a separate Python environment for each of the few hundred Python projects I have on my laptop. I keep a personal directory ~/bin on my PATH and I use the following script venv_setup.sh in the ~/bin directory to use a requirements.txt file to set up a virtual environment:

 1 #!/bin/zsh
 2 
 3 # Check if the directory has a requirements.txt file
 4 if [ ! -f "requirements.txt" ]; then
 5     echo "No requirements.txt file found in the current directory."
 6     exit 1
 7 fi
 8 
 9 # Create a virtual environment in the venv directory
10 python3 -m venv venv
11 
12 # Activate the virtual environment
13 source venv/bin/activate
14 
15 # Upgrade pip to the latest version
16 pip3 install --upgrade pip
17 
18 # Install dependencies from requirements.txt
19 pip3 install -r requirements.txt
20 
21 # Display installed packages
22 pip3 list
23 
24 echo "Virtual environment setup complete. Reactivate it with:"
25 echo "source venv/bin/activate"
26 echo ""

I sometimes like to use the much faster uv build and package management tool:

1 uv venv
2 source .venv/bin/activate
3 uv pip install -r requirements.txt
4 uv run ollama_tools_examples.py

There are many other good options like Anaconda, miniconda, poetry, etc.

Using Ollama From the Command Line

Working with Ollama from the command line provides a straightforward and efficient way to interact with large language models locally. The basic command structure starts with ollama run modelname, where modelname could be models like ’llama3’, ‘mistral’, or ‘codellama’. You can enhance your prompts using the -f flag for system prompts or context files, and the —verbose flag to see token usage and generation metrics. For example, ollama run llama2 -f system_prompt.txt “Your question here” lets you provide consistent context across multiple interactions.

One powerful technique is using Ollama’s model tags to maintain different versions or configurations of the same base model. For any model on the Ollama web site, you can view all available model tags, for example: https://ollama.com/library/llama2/tags.

The ollama list command helps you track installed models, and ollama rm modelname keeps your system clean. For development work, the —format json flag outputs responses in JSON format, making it easier to parse in scripts or applications; for example:

Using JSON Format

 1 $ ollama run qwq:latest --format json
 2 >>> What are the capitals of Germany and France?
 3 { 
 4   "Germany": {
 5     "Capital": "Berlin",
 6     "Population": "83.2 million",
 7     "Area": "137,847 square miles"
 8   },
 9   "France": {
10     "Capital": "Paris",
11     "Population": "67.4 million",
12     "Area": "248,573 square miles"
13   }
14 }
15 
16 >>> /bye

Analysis of Images

Advanced users can leverage Ollama’s multimodal capabilities and streaming options. For models like llava, you can pipe in image files using standard input or file paths. For example:

1 $ ollama run llava:7b "Describe this image" markcarol.jpg
2  The image is a photograph featuring a man and a woman looking 
3 off-camera, towards the left side of the frame. In the background, there are indistinct objects that give the impression of an outdoor setting, possibly on a patio or deck.
4 
5 The focus and composition suggest that the photo was taken during the day in natural light. 

While I only cover command line use in this one short chapter, I use Ollama in command line mode several hours a week for software development, usually using a Qwen coding LLM:

1 $ ollama run qwen2.5-coder:14b
2 >>> Send a message (/? for help)

I find that the qwen2.5-coder:14b model performs well for my most often used programming languages: Python, Common Lisp, Racket Scheme, and Haskell.

I also enjoy experimenting with the QwQ reasoning model even though it is so large it barely runs on my 32G M2 Pro system:

1 $ ollama run qwq:latest       
2 >>>

Analysis of Source Code Files

Here, assuming we are in the main directory for the GitHub repository for this book, we can ask for analysis of the tool for using SQLite databases(most output is not shown):

 1 $ ollama run qwen2.5-coder:14b < tool_sqlite.py 
 2 This code defines a Python application that interacts with an SQLite database using SQL queries 
 3 generated by the Ollama language model. The application is structured around two main classes:
 4 
 5 1. **SQLiteTool**: Manages interactions with an SQLite database.
 6    - Handles creating sample data, managing database connections, and executing SQL queries.
 7    - Provides methods to list tables in the database, get table schemas, and execute arbitrary SQL 
 8 queries.
 9 
10 2. **OllamaFunctionCaller**: Acts as a bridge between user inputs and the SQLite database through the 
11 Ollama model.
12    - Defines functions that can be called by the Ollama model (e.g., querying the database or listing 
13 tables).
14    - Generates prompts for the Ollama model based on user input, parses the response to identify which 
15 function should be executed, and then calls the appropriate method in `SQLiteTool`.
16 
17 ...

Unfortunately, when using the command ollama run qwen2.5-coder:14b < tool_sqlite.py, Ollama processes the input from the file and then exits the REPL. There’s no built-in way to stay in the Ollama REPL. However, if you want to analyze code and then interactively chat about the code, ask for code modifications, etc., you can try:

  • Start Ollama:
  • Paste the source code to tool_sqlite.py into Ollama REPL
  • Ask for advice, for example: “Please add code to print out the number of input and output tokens that are used by Ollama when calling function_caller.process_request(query)”

Short Examples

Here we look at a few short examples before later using libraries we develop and longer application style example programs with Ollama to solve more difficult problems.

Using The Ollama Python SDK with Image and Text Prompts

We saw an example of image processing in the last chapter using Ollama command line mode. Here we do the same thing using a short Python script that you can find in the file short_programs/Ollama_sdk_image_example.py:

 1 import ollama
 2 import base64
 3 
 4 def analyze_image(image_path: str, prompt: str) -> str:
 5     # Read and encode the image
 6     with open(image_path, 'rb') as img_file:
 7         image_data = base64.b64encode(img_file.read()).decode('utf-8')
 8 
 9     try:
10         # Create a stream of responses using the Ollama SDK
11         stream = ollama.generate(
12             model='llava:7b',
13             prompt=prompt,
14             images=[image_data],
15             stream=True
16         )
17 
18         # Accumulate the response
19         full_response = ""
20         for chunk in stream:
21             if 'response' in chunk:
22                 full_response += chunk['response']
23 
24         return full_response
25 
26     except Exception as e:
27         return f"Error processing image: {str(e)}"
28 
29 def main():
30     image_path = "data/sample.jpg"
31     prompt = "Please describe this image in detail, focusing on the actions of people in the picture."
32 
33     result = analyze_image(image_path, prompt)
34     print("Analysis Result:")
35     print(result)
36 
37 if __name__ == "__main__":
38     main()

The output may look like the following when you run this example:

1 Analysis Result:
2  The image appears to be a photograph taken inside a room that serves as a meeting or gaming space and capturing an indoor scene where five individuals are engaged in playing a tabletop card game. In the foreground, there is a table with a green surface and multiple items on it, including what looks like playing cards spread out in front of the people seated around it.
3 
4 The room has a comfortable and homely feel, with elements like a potted plant in the background on the left, which suggests that this might be a living room or a similar space repurposed for a group activity.

Using the OpenAI Compatibility APIs with Local Models Running on Ollama

If you frequently use the OpenAI APIs for either your own LLM projects or work projects, you might want to simply use the same SDK library from OpenAI but specify a local Ollama REST endpoint:

 1 import openai
 2 from typing import List, Dict
 3 
 4 class OllamaClient:
 5     def __init__(self, base_url: str = "http://localhost:11434/v1"):
 6         self.client = openai.OpenAI(
 7             base_url=base_url,
 8             api_key="fake-key"  # Ollama doesn't require authentication locally
 9         )
10 
11     def chat_with_context(
12         self,
13         system_context: str,
14         user_prompt: str,
15         model: str = "llama3.2:latest",
16         temperature: float = 0.7
17     ) -> str:
18         try:
19             messages = [
20                 {"role": "system", "content": system_context},
21                 {"role": "user", "content": user_prompt}
22             ]
23 
24             response = self.client.chat.completions.create(
25                 model=model,
26                 messages=messages,
27                 temperature=temperature,
28                 stream=False
29             )
30 
31             return response.choices[0].message.content
32 
33         except Exception as e:
34             return f"Error: {str(e)}"
35 
36     def chat_conversation(
37         self,
38         messages: List[Dict[str, str]],
39         model: str = "llama2"
40     ) -> str:
41         try:
42             response = self.client.chat.completions.create(
43                 model=model,
44                 messages=messages,
45                 stream=False
46             )
47 
48             return response.choices[0].message.content
49 
50         except Exception as e:
51             return f"Error: {str(e)}"
52 
53 def main():
54     # Initialize the client
55     client = OllamaClient()
56 
57     # Example 1: Single interaction with context
58     system_context = """You are a helpful AI assistant with expertise in 
59     programming and technology. Provide clear, concise answers."""
60 
61     user_prompt = "Explain the concept of recursion in programming."
62 
63     response = client.chat_with_context(
64         system_context=system_context,
65         user_prompt=user_prompt,
66         model="llama3.2:latest",
67         temperature=0.7
68     )
69 
70     print("Response with context:")
71     print(response)
72     print("\n" + "="*50 + "\n")
73 
74     # Example 2: Multi-turn conversation
75     conversation = [
76         {"role": "system", "content": "You are a helpful AI assistant."},
77         {"role": "user", "content": "What is machine learning?"},
78         {"role": "assistant", "content": "Machine learning is a subset of AI that enables systems to learn from data."},
79         {"role": "user", "content": "Can you give me a simple example?"}
80     ]
81 
82     response = client.chat_conversation(
83         messages=conversation,
84         model="llama3.2:latest"
85     )
86 
87     print("Conversation response:")
88     print(response)
89 
90 if __name__ == "__main__":
91     main()

The output might look like (following listing is edited for brevity):

 1 Response with context:
 2 Recursion is a fundamental concept in programming that allows a function or method to call itself repeatedly until it reaches a base case that stops the recursion.
 3 
 4 **What is Recursion?**
 5 
 6 In simple terms, recursion is a programming technique where a function invokes itself as a sub-procedure, repeating the same steps until it solves a problem ...
 7 
 8 **Key Characteristics of Recursion:**
 9 
10 1. **Base case**: A trivial case that stops the recursion.
11 2. **Recursive call**: The function calls itself with new input or parameters.
12 3. **Termination condition**: When the base case is reached, the recursion terminates.
13 
14 **How Recursion Works:**
15 
16 Here's an example to illustrate recursion:
17 
18 Imagine you have a recursive function `factorial(n)` that calculates the factorial of a number `n`. The function works as follows:
19 
20 1. If `n` is 0 or 1 (base case), return 1.
21 2. Otherwise, call itself with `n-1` as input and multiply the result by `n`.
22 3. Repeat step 2 until `n` reaches 0 or 1.
23 
24 Here's a simple recursive implementation in Python ...
25 
26 **Benefits of Recursion:**
27 
28 Recursion offers several advantages:
29 
30 * **Divide and Conquer**: Break down complex problems into smaller, more manageable sub-problems.
31 * **Elegant Code**: Recursive solutions can be concise and easy to understand.
32 * **Efficient**: Recursion can avoid explicit loops and reduce memory usage.
33 ...
34 
35 In summary, recursion is a powerful technique that allows functions to call themselves repeatedly until they solve a problem. By understanding the basics of recursion and its applications, you can write more efficient and elegant code for complex problems.
36 
37 ==================================================
38 
39 Conversation response:
40 A simple example of machine learning is a spam filter.
41 
42 Imagine we have a system that scans emails and identifies whether they are spam or not. The system learns to classify these emails as spam or not based on the following steps:
43 
44 1. Initially, it receives a large number of labeled data points (e.g., 1000 emails), where some emails are marked as "spam" and others as "not spam".
45 2. The system analyzes these examples to identify patterns and features that distinguish spam emails from non-spam messages.
46 3. Once the patterns are identified, the system can use them to classify new, unseen email data (e.g., a new email) as either spam or not spam.
47 
48 Over time, the system becomes increasingly accurate in its classification because it has learned from the examples and improvements have been made. This is essentially an example of supervised machine learning, where the system learns by being trained on labeled data.

In the next chapter we start developing tools that can be used for “function calling” with Ollama.

LLM Tool Calling with Ollama

There are several example Python tool utilities in the GitHub repository https://github.com/mark-watson/Ollama_in_Action_Book in the source-code directory that we will use for function calling that start with the “tool” prefix:

1 https://github.com/mark-watson/Ollama_in_Action_Book/source-code $ ls tool*
2 tool_file_contents.py   tool_llm_eval.py    tool_web_search.py
3 tool_file_dir.py    tool_sqlite.py
4 tool_judge_results.py   tool_summarize_text.py

We postpone using the tools tool_llm_eval.py and tool_judge_results.py until the next chapter Automatic Evaluation of LLM Results

If you have not yet done so, please clone the repository for my Ollama book examples using:

1 git clone https://github.com/mark-watson/Ollama_in_Action_Book.git

Use of Python docstrings at runtime:

The Ollama Python SDK leverages docstrings as a crucial part of its runtime function calling mechanism. When defining functions that will be called by the LLM, the docstrings serve as structured metadata that gets parsed and converted into a JSON schema format. This schema describes the function’s parameters, their types, and expected behavior, which is then used by the model to understand how to properly invoke the function. The docstrings follow a specific format that includes parameter descriptions, type hints, and return value specifications, allowing the SDK to automatically generate the necessary function signatures that the LLM can understand and work with.

During runtime execution, when the LLM determines it needs to call a function, it first reads these docstring-derived schemas to understand the function’s interface. The SDK parses these docstrings using Python’s introspection capabilities (through the inspect module) and matches the LLM’s intended function call with the appropriate implementation. This system allows for a clean separation between the function’s implementation and its interface description, while maintaining human-readable documentation that serves as both API documentation and runtime function calling specifications. The docstring parsing is done lazily at runtime when the function is first accessed, and the resulting schema is typically cached to improve performance in subsequent calls.

Example Showing the Use of Tools Developed Later in this Chapter

The source file ollama_tools_examples.py contains simple examples of using the tools we develop later in this chapter. We will look at example code using the tools, then at the implementation of the tools. In this examples source file we first import these tools:

 1 from tool_file_dir import list_directory
 2 from tool_file_contents import read_file_contents
 3 from tool_web_search import uri_to_markdown
 4 
 5 import ollama
 6 
 7 # Map function names to function objects
 8 available_functions = {
 9     'list_directory': list_directory,
10     'read_file_contents': read_file_contents,
11     'uri_to_markdown': uri_to_markdown,
12 }
13 
14 # User prompt
15 user_prompt = "Please list the contents of the current directory, read the 'requirements.txt' file, and convert 'https://markwatson.com' to markdown."
16 
17 # Initiate chat with the model
18 response = ollama.chat(
19     model='llama3.1',
20     messages=[{'role': 'user', 'content': user_prompt}],
21     tools=[list_directory, read_file_contents, uri_to_markdown],
22 )
23 
24 # Process the model's response
25 for tool_call in response.message.tool_calls or []:
26     function_to_call = available_functions.get(tool_call.function.name)
27     print(f"{function_to_call=}")
28     if function_to_call:
29         result = function_to_call(**tool_call.function.arguments)
30         print(f"Output of {tool_call.function.name}: {result}")
31     else:
32         print(f"Function {tool_call.function.name} not found.")

This code demonstrates the integration of a local LLM with custom tool functions for file system operations and web content processing. It imports three utility functions for listing directories, reading file contents, and converting URLs to markdown, then maps them to a dictionary for easy access.

The main execution flow involves sending a user prompt to the Ollama hosted model (here we are using the small IBM “granite3-dense” model), which requests directory listing, file reading, and URL conversion operations. The code then processes the model’s response by iterating through any tool calls returned, executing the corresponding functions, and printing their results. Error handling is included for cases where requested functions aren’t found in the available tools dictionary.

Here is sample output from using these three tools (most output removed for brevity and blank lines added for clarity):

 1 $ python ollama_tools_examples.py
 2 
 3 function_to_call=<function read_file_contents at 0x104fac9a0>
 4 
 5 Output of read_file_contents: {'content': 'git+https://github.com/mark-watson/Ollama_Tools.git\nrequests\nbeautifulsoup4\naisuite[ollama]\n\n', 'size': 93, 'exists': True, 'error': None}
 6 
 7 function_to_call=<function list_directory at 0x1050389a0>
 8 Output of list_directory: {'files': ['.git', '.gitignore', 'LICENSE', 'Makefile', 'README.md', 'ollama_tools_examples.py', 'requirements.txt', 'venv'], 'count': 8, 'current_dir': '/Users/markw/GITHUB/Ollama-book-examples', 'error': None}
 9 
10 function_to_call=<function uri_to_markdown at 0x105038c20>
11 
12 Output of uri_to_markdown: {'content': 'Read My Blog on Blogspot\n\nRead My Blog on Substack\n\nConsulting\n\nFree Mentoring\n\nFun stuff\n\nMy Books\n\nOpen Source\n\n Privacy Policy\n\n# Mark Watson AI Practitioner and Consultant Specializing in Large Language Models, LangChain/Llama-Index Integrations, Deep Learning, and the Semantic Web\n\n# I am the author of 20+ books on Artificial Intelligence, Python, Common Lisp, Deep Learning, Haskell, Clojure, Java, Ruby, Hy language, and the Semantic Web. I have 55 US Patents.\n\nMy customer list includes: Google, Capital One, Babylist, Olive AI, CompassLabs, Mind AI, Disney, SAIC, Americast, PacBell, CastTV, Lutris Technology, Arctan Group, Sitescout.com, Embed.ly, and Webmind Corporation.
13 
14  ...
15 
16  # Fun stuff\n\nIn addition to programming and writing my hobbies are cooking,\n photography, hiking, travel, and playing the following musical instruments: guitar, didgeridoo, and American Indian flute:\n\nMy guitar playing: a boogie riff\n\nMy didgeridoo playing\n\nMy Spanish guitar riff\n\nPlaying with George (didgeridoo), Carol and Crystal (drums and percussion) and Mark (Indian flute)\n\n# Open Source\n\nMy Open Source projects are hosted on my github account so please check that out!
17 
18  ...
19 
20 Hosted on Cloudflare Pages\n\n © Mark Watson 1994-2024\n\nPrivacy Policy', 'title': 'Mark Watson: AI Practitioner and Author of 20+ AI Books | Mark Watson', 'error': None}

Please note that the text extracted from a web page is mostly plain text. Section heads are maintained but the format is changed to markdown format. In the last (edited for brevity) listing, the HTML H1 element with the text Fun Stuff is converted to markdown:

1 # Fun stuff
2 
3 In addition to programming and writing my hobbies are cooking,
4 photography, hiking, travel, and playing the following musical
5 instruments: guitar, didgeridoo, and American Indian flute ...

You have now looked at example tool use. We will now implement the several tools in this chapter and the next. We will look at the first tool for reading and writing files in fine detail and then more briefly discuss the other tools in the https://github.com/mark-watson/Ollama_in_Action_Book repository in the source-code directory.

Tool for Reading and Writing File Contents

This tool is meant to be combined with other tools, for example a summarization tool and a file reading tool might be used to process a user prompt to summarize a specific local file on your laptop.

Here is the contents of tool utility tool_file_contents.py:

 1 """
 2 Provides functions for reading and writing file contents with proper error handling
 3 """
 4 
 5 from pathlib import Path
 6 import json
 7 
 8 
 9 def read_file_contents(file_path: str, encoding: str = "utf-8") -> str:
10     """
11     Reads contents from a file and returns the text
12 
13     Args:
14         file_path (str): Path to the file to read
15         encoding (str): File encoding to use (default: utf-8)
16 
17     Returns:
18         Contents of the file as a string
19     """
20     try:
21         path = Path(file_path)
22         if not path.exists():
23             return f"File not found: {file_path}"
24 
25         with path.open("r", encoding=encoding) as f:
26             content = f.read()
27             return f"Contents of file '{file_path}' is:\n{content}\n"
28 
29     except Exception as e:
30         return f"Error reading file '{file_path}' is: {str(e)}"
31 
32 
33 def write_file_contents(
34         file_path: str, content: str,
35         encoding: str = "utf-8",
36         mode: str = "w") -> str:
37     """
38     Writes content to a file and returns operation status
39 
40     Args:
41         file_path (str): Path to the file to write
42         content (str): Content to write to the file
43         encoding (str): File encoding to use (default: utf-8)
44         mode (str): Write mode ('w' for write, 'a' for append)
45 
46     Returns:
47         a message string
48     """
49     try:
50         path = Path(file_path)
51 
52         # Create parent directories if they don't exist
53         path.parent.mkdir(parents=True, exist_ok=True)
54 
55         with path.open(mode, encoding=encoding) as f:
56             bytes_written = f.write(content)
57 
58         return f"File '{file_path}' written OK."
59 
60     except Exception as e:
61         return f"Error writing file '{file_path}': {str(e)}"
62 
63 
64 # Function metadata for Ollama integration
65 read_file_contents.metadata = {
66     "name": "read_file_contents",
67     "description": "Reads contents from a file and returns the content as a string",
68     "parameters": {"file_path": "Path to the file to read"},
69 }
70 
71 write_file_contents.metadata = {
72     "name": "write_file_contents",
73     "description": "Writes content to a file and returns operation status",
74     "parameters": {
75         "file_path": "Path to the file to write",
76         "content": "Content to write to the file",
77         "encoding": "File encoding (default: utf-8)",
78         "mode": 'Write mode ("w" for write, "a" for append)',
79     },
80 }
81 
82 # Export the functions
83 __all__ = ["read_file_contents", "write_file_contents"]

read_file_contents

This function provides file reading capabilities with robust error handling with parameters:

  • file_path (str): Path to the file to read
  • encoding (str, optional): File encoding (defaults to “utf-8”)

Features:

  • Uses pathlib.Path for cross-platform path handling
  • Checks file existence before attempting to read
  • Returns file contents with descriptive message
  • Comprehensive error handling

LLM Integration:

  • Includes metadata for function discovery
  • Returns descriptive string responses instead of raising exceptions

write_file_contents

This function handles file writing operations with built-in safety features. The parameters are:

  • file_path (str): Path to the file to write
  • content (str): Content to write to the file
  • encoding (str, optional): File encoding (defaults to “utf-8”)
  • mode (str, optional): Write mode (‘w’ for write, ‘a’ for append)

Features:

  • Automatically creates parent directories
  • Supports write and append modes
  • Uses context managers for safe file handling
  • Returns operation status messages

LLM Integration:

  • Includes detailed metadata for function calling
  • Provides clear feedback about operations

Common Features of both functions:

  • Type hints for better code clarity
  • Detailed docstrings that are used at runtime in the tool/function calling code. The text in the doc strings is supplied as context to the LLM currently in use.
  • Proper error handling
  • UTF-8 default encoding
  • Context managers for file operations
  • Metadata for LLM function discovery

Design Benefits for LLM Integration: the utilities are optimized for LLM function calling by:

  • Returning descriptive string responses
  • Including metadata for function discovery
  • Handling errors gracefully
  • Providing clear operation feedback
  • Using consistent parameter patterns

Tool for Getting File Directory Contents

This tool is similar to the last tool so here we just list the worker function from the file tool_file_dir.py:

 1 def list_directory(pattern: str = "*", list_dots=None) -> Dict[str, Any]:
 2     """
 3     Lists files and directories in the current working directory
 4 
 5     Args:
 6         pattern (str): Glob pattern for filtering files (default: "*")
 7 
 8     Returns:
 9         string with directory name, followed by list of files in the directory
10     """
11     try:
12         current_dir = Path.cwd()
13         files = list(current_dir.glob(pattern))
14 
15         # Convert Path objects to strings and sort
16         file_list = sorted([str(f.name) for f in files])
17 
18         file_list = [file for file in file_list if not file.endswith("~")]
19         if not list_dots:
20             file_list = [file for file in file_list if not file.startswith(".")]
21 
22         return f"Contents of current directory: [{', '.join(file_list)}]"
23 
24     except Exception as e:
25         return f"Error listing directory: {str(e)}"

Tool for Accessing SQLite Databases Using Natural Language Queries

The example file tool_sqlite.py serves two purposes here:

  • Test and example code: utility function _create_sample_data creates several database tables and the function main serves as an example program.
  • The Python class definitions SQLiteTool and OllamaFunctionCaller are meant to be copied and used in your applications.
  1 import sqlite3
  2 import json
  3 from typing import Dict, Any, List, Optional
  4 import ollama
  5 from functools import wraps
  6 import re
  7 from contextlib import contextmanager
  8 from textwrap import dedent # for multi-line string literals
  9 
 10 class DatabaseError(Exception):
 11     """Custom exception for database operations"""
 12     pass
 13 
 14 
 15 def _create_sample_data(cursor):  # Helper function to create sample data
 16     """Create sample data for tables"""
 17     sample_data = {
 18         'example': [
 19             ('Example 1', 10.5),
 20             ('Example 2', 25.0)
 21         ],
 22         'users': [
 23             ('Bob', 'bob@example.com'),
 24             ('Susan', 'susan@test.net')
 25         ],
 26         'products': [
 27             ('Laptop', 1200.00),
 28             ('Keyboard', 75.50)
 29         ]
 30     }
 31 
 32     for table, data in sample_data.items():
 33         for record in data:
 34             if table == 'example':
 35                 cursor.execute(
 36                     "INSERT INTO example (name, value) VALUES (?, ?) ON CONFLICT DO NOTHING",
 37                     record
 38                 )
 39             elif table == 'users':
 40                 cursor.execute(
 41                     "INSERT INTO users (name, email) VALUES (?, ?) ON CONFLICT DO NOTHING",
 42                     record
 43                 )
 44             elif table == 'products':
 45                 cursor.execute(
 46                     "INSERT INTO products (product_name, price) VALUES (?, ?) ON CONFLICT DO NOTHING",
 47                     record
 48                 )
 49 
 50 
 51 class SQLiteTool:
 52     _instance = None
 53 
 54     def __new__(cls, *args, **kwargs):
 55         if not isinstance(cls._instance, cls):
 56             cls._instance = super(SQLiteTool, cls).__new__(cls)
 57         return cls._instance
 58 
 59     def __init__(self, default_db: str = "test.db"):
 60         if hasattr(self, 'default_db'):  # Skip initialization if already done
 61             return
 62         self.default_db = default_db
 63         self._initialize_database()
 64 
 65     @contextmanager
 66     def get_connection(self):
 67         """Context manager for database connections"""
 68         conn = sqlite3.connect(self.default_db)
 69         try:
 70             yield conn
 71         finally:
 72             conn.close()
 73 
 74     def _initialize_database(self):
 75         """Initialize database with tables"""
 76         tables = {
 77             'example': """
 78                 CREATE TABLE IF NOT EXISTS example (
 79                     id INTEGER PRIMARY KEY,
 80                     name TEXT,
 81                     value REAL
 82                 );
 83             """,
 84             'users': """
 85                 CREATE TABLE IF NOT EXISTS users (
 86                     id INTEGER PRIMARY KEY,
 87                     name TEXT,
 88                     email TEXT UNIQUE
 89                 );
 90             """,
 91             'products': """
 92                 CREATE TABLE IF NOT EXISTS products (
 93                     id INTEGER PRIMARY KEY,
 94                     product_name TEXT,
 95                     price REAL
 96                 );
 97             """
 98         }
 99 
100         with self.get_connection() as conn:
101             cursor = conn.cursor()
102             for table_sql in tables.values():
103                 cursor.execute(table_sql)
104             conn.commit()
105             _create_sample_data(cursor)
106             conn.commit()
107 
108     def get_tables(self) -> List[str]:
109         """Get list of tables in the database"""
110         with self.get_connection() as conn:
111             cursor = conn.cursor()
112             cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
113             return [table[0] for table in cursor.fetchall()]
114 
115     def get_table_schema(self, table_name: str) -> List[tuple]:
116         """Get schema for a specific table"""
117         with self.get_connection() as conn:
118             cursor = conn.cursor()
119             cursor.execute(f"PRAGMA table_info({table_name});")
120             return cursor.fetchall()
121 
122     def execute_query(self, query: str) -> List[tuple]:
123         """Execute a SQL query and return results"""
124         with self.get_connection() as conn:
125             cursor = conn.cursor()
126             try:
127                 cursor.execute(query)
128                 return cursor.fetchall()
129             except sqlite3.Error as e:
130                 raise DatabaseError(f"Query execution failed: {str(e)}")
131 
132 class OllamaFunctionCaller:
133     def __init__(self, model: str = "llama3.2:latest"):
134         self.model = model
135         self.sqlite_tool = SQLiteTool()
136         self.function_definitions = self._get_function_definitions()
137 
138     def _get_function_definitions(self) -> Dict:
139         return {
140             "query_database": {
141                 "description": "Execute a SQL query on the database",
142                 "parameters": {
143                     "type": "object",
144                     "properties": {
145                         "query": {
146                             "type": "string",
147                             "description": "The SQL query to execute"
148                         }
149                     },
150                     "required": ["query"]
151                 }
152             },
153             "list_tables": {
154                 "description": "List all tables in the database",
155                 "parameters": {
156                     "type": "object",
157                     "properties": {}
158                 }
159             }
160         }
161 
162     def _generate_prompt(self, user_input: str) -> str:
163         prompt = dedent(f"""
164             You are a SQL assistant. Based on the user's request, generate a JSON response that calls the appropriate function.
165             Available functions: {json.dumps(self.function_definitions, indent=2)}
166 
167             User request: {user_input}
168 
169             Respond with a JSON object containing:
170             - "function": The function name to call
171             - "parameters": The parameters for the function
172 
173             Response:
174         """).strip()
175         return prompt
176 
177     def _parse_ollama_response(self, response: str) -> Dict[str, Any]:
178         try:
179             json_match = re.search(r'\{.*\}', response, re.DOTALL)
180             if not json_match:
181                 raise ValueError("No valid JSON found in response")
182             return json.loads(json_match.group())
183         except json.JSONDecodeError as e:
184             raise ValueError(f"Invalid JSON in response: {str(e)}")
185 
186     def process_request(self, user_input: str) -> Any:
187         try:
188             response = ollama.generate(model=self.model, prompt=self._generate_prompt(user_input))
189             function_call = self._parse_ollama_response(response.response)
190 
191             if function_call["function"] == "query_database":
192                 return self.sqlite_tool.execute_query(function_call["parameters"]["query"])
193             elif function_call["function"] == "list_tables":
194                 return self.sqlite_tool.get_tables()
195             else:
196                 raise ValueError(f"Unknown function: {function_call['function']}")
197         except Exception as e:
198             raise RuntimeError(f"Request processing failed: {str(e)}")
199 
200 def main():
201     function_caller = OllamaFunctionCaller()
202     queries = [
203         "Show me all tables in the database",
204         "Get all users from the users table",
205         "What are the top 5 products by price?"
206     ]
207 
208     for query in queries:
209         try:
210             print(f"\nQuery: {query}")
211             result = function_caller.process_request(query)
212             print(f"Result: {result}")
213         except Exception as e:
214             print(f"Error processing query: {str(e)}")
215 
216 if __name__ == "__main__":
217     main()

This code provides a natural language interface for interacting with an SQLite database. It uses a combination of Python classes, SQLite, and Ollama for running a language model to interpret user queries and execute corresponding database operations. Below is a breakdown of the code:

  • Database Setup and Error Handling: a custom exception class, DatabaseError, is defined to handle database-specific errors. The database is initialized with three tables: example, users, and products. These tables are populated with sample data for demonstration purposes.
  • SQLiteTool Class: the SQLiteTool class is a singleton that manages all SQLite database operations. Key features include:–Singleton Pattern: Ensures only one instance of the class is created.–Database Initialization: Creates tables (example, users, products) if they do not already exist.–Sample Data: Populates the tables with predefined sample data.–Context Manager: Safely manages database connections using a context manager.

Utility Methods:

  • get_tables: Retrieves a list of all tables in the database.
  • get_table_schema: Retrieves the schema of a specific table.
  • execute_query: Executes a given SQL query and returns the results.

Sample Data Creation:

A helper function, _create_sample_data, is used to populate the database with sample data. It inserts records into the example, users, and products tables. This ensures the database has some initial data for testing and demonstration.

OllamaFunctionCaller Class:

The OllamaFunctionCaller class acts as the interface between natural language queries and database operations. Key components include:

  • Integration with Ollama LLM: Uses the Ollama language model to interpret natural language queries.
  • Function Definitions: Defines two main functions:–query_database: Executes SQL queries on the database.–list_tables: Lists all tables in the database.
  • Prompt Generation: Converts user input into a structured prompt for the language model.
  • Response Parsing: Parses the language model’s response into a JSON object that specifies the function to call and its parameters.
  • Request Processing: Executes the appropriate database operation based on the parsed response.

Function Definitions:

The OllamaFunctionCaller class defines two main functions that can be called based on user input:

  • query_database: Executes a SQL query provided by the user and returns the results of the query.
  • list_tables: Lists all tables in the database and is useful for understanding the database structure.

Request Processing Workflow:

The process_request method handles the entire workflow of processing a user query:

  • Input: Takes a natural language query from the user.
  • Prompt Generation: Converts the query into a structured prompt for the Ollama language model.
  • Response Parsing: Parses the language model’s response into a JSON object.
  • Function Execution: Calls the appropriate function (query_database or list_tables) based on the parsed response.
  • Output: Returns the results of the database operation.

Main test/example function:

The main function demonstrates how the system works with sample queries. It initializes the OllamaFunctionCaller and processes a list of example queries, such as:

  • “Show me all tables in the database.“
  • “Get all users from the users table.“
  • “What are the top 5 products by price?“

For each query, the system interprets the natural language input, executes the corresponding database operation, and prints the results.

Summary:

This code creates a natural language interface for interacting with an SQLite database. It works as follows:

  • Database Management: The SQLiteTool class handles all database operations, including initialization, querying, and schema inspection.
  • Natural Language Processing: The OllamaFunctionCaller uses the Ollama language model to interpret user queries and map them to database functions.
  • Execution: The system executes the appropriate database operation and returns the results to the user.

This approach allows users to interact with the database using natural language instead of writing SQL queries directly, making it more user-friendly and accessible.

The output looks like this:

 1 python /Users/markw/GITHUB/Ollama_in_Action_Book/source-code/tool_sqlite.py 
 2 
 3 Query: Show me all tables in the database
 4 Result: ['example', 'users', 'products']
 5 
 6 Query: Get all users from the users table
 7 Result: [(1, 'Bob', 'bob@example.com'), (2, 'Susan', 'susan@test.net')]
 8 
 9 Query: What are the top 5 products by price?
10 Result: [(1, 'Laptop', 1200.0), (3, 'Laptop', 1200.0), (2, 'Keyboard', 75.5), (4, 'Keyboard', 75.5)]

Tool for Summarizing Text

Tools that are used by LLMs can themselves also use other LLMs. The tool defined in the file tool_summarize_text.py might be triggered by a user prompt such as “summarize the text in local file test1.txt” of “summarize text from web page https://markwatson.com” where it is used by other tools like reading a local file contents, fetching a web page, etc.

We will start by looking at the file tool_summarize_text.py and then look at an example in example_chain_web_summary.py.

 1 """
 2 Summarize text
 3 """
 4 
 5 from ollama import ChatResponse
 6 from ollama import chat
 7 
 8 
 9 def summarize_text(text: str, context: str = "") -> str:
10     """
11     Summarizes text
12 
13     Parameters:
14         text (str): text to summarize
15         context (str): another tool's output can at the application layer can be used set the context for this tool.
16 
17     Returns:
18         a string of summarized text
19 
20     """
21     prompt = "Summarize this text (and be concise), returning only the summary with NO OTHER COMMENTS:\n\n"
22     if len(text.strip()) < 50:
23         text = context
24     elif len(context) > 50:
25         prompt = f"Given this context:\n\n{context}\n\n" + prompt
26 
27     summary: ChatResponse = chat(
28         model="llama3.2:latest",
29         messages=[
30             {"role": "system", "content": prompt},
31             {"role": "user", "content": text},
32         ],
33     )
34     return summary["message"]["content"]
35 
36 
37 # Function metadata for Ollama integration
38 summarize_text.metadata = {
39     "name": "summarize_text",
40     "description": "Summarizes input text",
41     "parameters": {"text": "string of text to summarize",
42                    "context": "optional context string"},
43 }
44 
45 # Export the functions
46 __all__ = ["summarize_text"]

This Python code implements a text summarization tool using the Ollama chat model. The core function summarize_text takes two parameters: the main text to summarize and an optional context string. The function operates by constructing a prompt that instructs the model to provide a concise summary without additional commentary. It includes an interesting logic where if the input text is very short (less than 50 characters), it defaults to using the context parameter instead. Additionally, if there’s substantial context provided (more than 50 characters), it prepends this context to the prompt. The function utilizes the Ollama chat model “llama3.2:latest” to generate the summary, structuring the request with a system message containing the prompt and a user message containing the text to be summarized. The program includes metadata for Ollama integration, specifying the function name, description, and parameter details, and exports the summarize_text function through all.

Here is an example of using this tool that you can find in the file example_chain_web_summary.py. Please note that this example also uses the web search tool that is discussed in the next section.

 1 from tool_web_search import uri_to_markdown
 2 from tool_summarize_text import summarize_text
 3 
 4 from pprint import pprint
 5 
 6 import ollama
 7 
 8 # Map function names to function objects
 9 available_functions = {
10     "uri_to_markdown": uri_to_markdown,
11     "summarize_text": summarize_text,
12 }
13 
14 memory_context = ""
15 # User prompt
16 user_prompt = "Get the text of 'https://knowledgebooks.com' and then summarize the text."
17 
18 # Initiate chat with the model
19 response = ollama.chat(
20     model='llama3.2:latest',
21     messages=[{"role": "user", "content": user_prompt}],
22     tools=[uri_to_markdown, summarize_text],
23 )
24 
25 # Process the model's response
26 
27 pprint(response.message.tool_calls)
28 
29 for tool_call in response.message.tool_calls or []:
30     function_to_call = available_functions.get(tool_call.function.name)
31     print(
32         f"\n***** {function_to_call=}\n\nmemory_context[:70]:\n\n{memory_context[:70]}\n\n*****\n"
33     )
34     if function_to_call:
35         print()
36         if len(memory_context) > 10:
37             tool_call.function.arguments["context"] = memory_context
38         print("\n* * tool_call.function.arguments:\n")
39         pprint(tool_call.function.arguments)
40         print(f"Arguments for {function_to_call.__name__}: {tool_call.function.arguments}")
41         result = function_to_call(**tool_call.function.arguments)  # , memory_context)
42         print(f"\n\n** Output of {tool_call.function.name}: {result}")
43         memory_context = memory_context + "\n\n" + result
44     else:
45         print(f"\n\n** Function {tool_call.function.name} not found.")

Here is the output edited for brevity:

  1 python /Users/markw/GITHUB/Ollama_in_Action_Book/source-code/example_chain_web_summary.py 
  2 [ToolCall(function=Function(name='uri_to_markdown', arguments={'a_uri': 'https://knowledgebooks.com'})),
  3  ToolCall(function=Function(name='summarize_text', arguments={'context': '', 'text': 'uri_to_markdown(a_uri = "https://knowledgebooks.com")'}))]
  4 
  5 ***** function_to_call=<function uri_to_markdown at 0x1047da200>
  6 
  7 memory_context[:70]:
  8 
  9 
 10 
 11 *****
 12 
 13 
 14 
 15 * * tool_call.function.arguments:
 16 
 17 {'a_uri': 'https://knowledgebooks.com'}
 18 Arguments for uri_to_markdown: {'a_uri': 'https://knowledgebooks.com'}
 19 INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
 20 
 21 
 22 ** Output of uri_to_markdown: Contents of URI https://knowledgebooks.com is:
 23 # KnowledgeBooks.com - research on the Knowledge Management, and the Semantic Web 
 24 
 25 KnowledgeBooks.com - research on the Knowledge Management, and the Semantic Web 
 26 
 27 KnowledgeBooks.com 
 28 
 29 Knowledgebooks.com 
 30 a sole proprietorship company owned by Mark Watson
 31 to promote Knowledge Management, Artificial Intelligence (AI), NLP, and Semantic Web technologies.
 32 
 33 Site updated: December 1, 2018
 34 With the experience of working on Machine Learning and Knowledge Graph applications for 30 years (at Google,
 35  Capital One, SAIC, Compass Labs, etc.) I am now concerned that the leverage of deep learning and knowledge
 36  representation technologies are controlled by a few large companies, mostly in China and the USA. I am proud
 37  to be involved organizations like Ocean Protocol and Common Crawl that seek tp increase the availability of quality data
 38  to individuals and smaller organizations.
 39 Traditional knowledge management tools relied on structured data often stored in relational databases. Adding
 40  new relations to this data would require changing the schemas used to store data which could negatively
 41  impact exisiting systems that used that data. Relationships between data in traditional systems was
 42  predefined by the structure/schema of stored data. With RDF and OWL based data modeling, relationships in
 43  data are explicitly defined in the data itself. Semantic data is inherently flexible and extensible: adding
 44  new data and relationships is less likely to break older systems that relied on the previous verisons of
 45  data.
 46 A complementary technology for knowledge management is the automated processing of unstructured text data
 47  into semantic data using natural language processing (NLP) and statistical-base text analytics.
 48 We will help you integrate semantic web and text analytics technologies into your organization by working
 49  with your staff in a mentoring role and also help as needed with initial development. All for reasonable consulting rates
 50 Knowledgebooks.com Technologies:
 51 
 52 SAAS KnowledgeBooks Semantic NLP Portal (KBSportal.com) used for
 53  in-house projects and available as a product to run on your servers.
 54 Semantic Web Ontology design and development
 55 Semantic Web application design and development using RDF data stores, PostgreSQL, and MongoDB.
 56 
 57 Research
 58 Natural Language Processing (NLP) using deep learning
 59 Fusion of classic symbolic AI systems with deep learning models
 60 Linked data, semantic web, and Ontology's
 61 News ontology
 62 Note: this ontology was created in 2004 using the Protege modeling tool.
 63 About
 64 KnowledgeBooks.com is owned as a sole proprietor business by Mark and Carol Watson.
 65 Mark Watson is an author of 16 published books and a consultant specializing in the JVM platform
 66  (Java, Scala, JRuby, and Clojure), artificial intelligence, and the Semantic Web.
 67 Carol Watson helps prepare training data and serves as the editor for Mark's published books.
 68 Privacy policy: this site collects no personal data or information on site visitors
 69 Hosted on Cloudflare Pages.
 70 
 71 
 72 ***** function_to_call=<function summarize_text at 0x107519260>
 73 
 74 memory_context[:70]:
 75 
 76 
 77 
 78 Contents of URI https://knowledgebooks.com is:
 79 # KnowledgeBooks.com 
 80 
 81 *****
 82 
 83 
 84 
 85 * * tool_call.function.arguments:
 86 
 87 {'context': '\n'
 88             '\n'
 89             'Contents of URI https://knowledgebooks.com is:\n'
 90             '# KnowledgeBooks.com - research on the Knowledge Management, and '
 91             'the Semantic Web \n'
 92             '\n'
 93             'KnowledgeBooks.com - research on the Knowledge Management, and '
 94 ...
 95             'Carol Watson helps prepare training data and serves as the editor '
 96             "for Mark's published books.\n"
 97             'Privacy policy: this site collects no personal data or '
 98             'information on site visitors\n'
 99             'Hosted on Cloudflare Pages.\n',
100  'text': 'uri_to_markdown(a_uri = "https://knowledgebooks.com")'}
101 Arguments for summarize_text: {'context': "\n\nContents of URI https://knowledgebooks.com is:\n# KnowledgeBooks.com - research on the Knowledge Management, and the Semantic Web \n\nKnowledgeBooks.com - research on the Knowledge Management, and the Semantic Web \n\nKnowledgeBooks.com \n\nKnowledgebooks.com \na sole proprietorship company owned by Mark Watson\nto promote Knowledge Management, Artificial Intelligence (AI), NLP, and Semantic Web technologies.
102 
103 ...
104 
105 \n\nResearch\nNatural Language Processing (NLP) using deep learning\nFusion of classic symbolic AI systems with deep learning models\nLinked data, semantic web, and Ontology's\nNews ontology\nNote: this ontology was created in 2004 using the Protege modeling tool.\nAbout\nKnowledgeBooks.com is owned as a sole proprietor business by Mark and Carol Watson.\nMark Watson is an author of 16 published books and a consultant specializing in the JVM platform\n (Java, Scala, JRuby, and Clojure), artificial intelligence, and the Semantic Web.\nCarol Watson helps prepare training data and serves as the editor for Mark's published books.\nPrivacy policy: this site collects no personal data or information on site visitors\nHosted on Cloudflare Pages.\n", 'text': 'uri_to_markdown(a_uri = "https://knowledgebooks.com")'}
106 
107 
108 ** Output of summarize_text: # Knowledge Management and Semantic Web Research
109 ## About KnowledgeBooks.com
110 A sole proprietorship company by Mark Watson promoting AI, NLP, and Semantic Web technologies.
111 ### Technologies
112 - **SAAS KnowledgeBooks**: Semantic NLP Portal for in-house projects and product sales.
113 - **Semantic Web Development**: Ontology design and application development using RDF data stores.
114 
115 ### Research Areas
116 - Natural Language Processing (NLP) with deep learning
117 - Fusion of symbolic AI systems with deep learning models
118 - Linked data, semantic web, and ontologies

Tool for Web Search and Fetching Web Pages

This code provides a set of functions for web searching and HTML content processing, with the main functions being uri_to_markdown, search_web, brave_search_summaries, and brave_search_text. The uri_to_markdown function fetches content from a given URI and converts HTML to markdown-style text, handling various edge cases and cleaning up the text by removing multiple blank lines and spaces while converting HTML entities. The search_web function is a placeholder that’s meant to be implemented with a preferred search API, while brave_search_summaries implements actual web searching using the Brave Search API, requiring an API key from the environment variables and returning structured results including titles, URLs, and descriptions. The brave_search_text function builds upon brave_search_summaries by fetching search results and then using uri_to_markdown to convert the content of each result URL to text, followed by summarizing the content using a separate summarize_text function. The code also includes utility functions like replace_html_tags_with_text which uses BeautifulSoup to strip HTML tags and return plain text, and includes proper error handling, logging, and type hints throughout. The module is designed to be integrated with Ollama and exports uri_to_markdown and search_web as its primary interfaces.

  1 """
  2 Provides functions for web searching and HTML to Markdown conversion
  3 and for returning the contents of a URI as plain text (with minimal markdown)
  4 """
  5 
  6 from typing import Dict, Any
  7 import requests
  8 from bs4 import BeautifulSoup
  9 import re
 10 from urllib.parse import urlparse
 11 import html
 12 from ollama import chat
 13 import json
 14 from tool_summarize_text import summarize_text
 15 
 16 import requests
 17 import os
 18 import logging
 19 from pprint import pprint
 20 from bs4 import BeautifulSoup
 21 
 22 logging.basicConfig(level=logging.INFO)
 23 
 24 api_key = os.environ.get("BRAVE_SEARCH_API_KEY")
 25 if not api_key:
 26     raise ValueError(
 27         "API key not found. Set 'BRAVE_SEARCH_API_KEY' environment variable."
 28     )
 29 
 30 
 31 def replace_html_tags_with_text(html_string):
 32     soup = BeautifulSoup(html_string, "html.parser")
 33     return soup.get_text()
 34 
 35 
 36 def uri_to_markdown(a_uri: str) -> Dict[str, Any]:
 37     """
 38     Fetches content from a URI and converts HTML to markdown-style text
 39 
 40     Args:
 41         a_uri (str): URI to fetch and convert
 42 
 43     Returns:
 44         web page text converted converted markdown content
 45     """
 46     try:
 47         # Validate URI
 48         parsed = urlparse(a_uri)
 49         if not all([parsed.scheme, parsed.netloc]):
 50             return f"Invalid URI: {a_uri}"
 51 
 52         # Fetch content
 53         headers = {
 54             "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
 55         }
 56         response = requests.get(a_uri, headers=headers, timeout=10)
 57         response.raise_for_status()
 58 
 59         # Parse HTML
 60         soup = BeautifulSoup(response.text, "html.parser")
 61 
 62         # Get title
 63         title = soup.title.string if soup.title else ""
 64 
 65         # Get text and clean up
 66         text = soup.get_text()
 67 
 68         # Clean up the text
 69         text = re.sub(r"\n\s*\n", "\n\n", text)  # Remove multiple blank lines
 70         text = re.sub(r" +", " ", text)  # Remove multiple spaces
 71         text = html.unescape(text)  # Convert HTML entities
 72         text = text.strip()
 73 
 74         return f"Contents of URI {a_uri} is:\n# {title}\n\n{text}\n"
 75 
 76     except requests.RequestException as e:
 77         return f"Network error: {str(e)}"
 78 
 79     except Exception as e:
 80         return f"Error processing URI: {str(e)}"
 81 
 82 
 83 def search_web(query: str, max_results: int = 5) -> str:
 84     """
 85     Performs a web search and returns results
 86     Note: This is a placeholder. Implement with your preferred search API.
 87 
 88     Args:
 89         query (str): Search query
 90         max_results (int): Maximum number of results to return
 91 
 92     Returns:
 93         Dict[str, Any]: Dictionary containing:
 94             - 'results': List of search results
 95             - 'count': Number of results found
 96             - 'error': Error message if any, None otherwise
 97     """
 98 
 99     # Placeholder for search implementation
100     return {
101         "results": [],
102         "count": 0,
103         "error": "Web search not implemented. Please implement with your preferred search API.",
104     }
105 
106 
107 def brave_search_summaries(
108     query,
109     num_results=3,
110     url="https://api.search.brave.com/res/v1/web/search",
111     api_key=api_key,
112 ):
113     headers = {"X-Subscription-Token": api_key, "Content-Type": "application/json"}
114     params = {"q": query, "count": num_results}
115 
116     response = requests.get(url, headers=headers, params=params)
117     ret = []
118 
119     if response.status_code == 200:
120         search_results = response.json()
121         ret = [
122             {
123                 "title": result.get("title"),
124                 "url": result.get("url"),
125                 "description": replace_html_tags_with_text(result.get("description")),
126             }
127             for result in search_results.get("web", {}).get("results", [])
128         ]
129         logging.info("Successfully retrieved results.")
130     else:
131         try:
132             error_info = response.json()
133             logging.error(f"Error {response.status_code}: {error_info.get('message')}")
134         except json.JSONDecodeError:
135             logging.error(f"Error {response.status_code}: {response.text}")
136 
137     return ret
138 
139 def brave_search_text(query, num_results=3):
140     summaries = brave_search_summaries(query, num_results)
141     ret = ""
142     for s in summaries:
143         url = s["url"]
144         text = uri_to_markdown(url)
145         summary = summarize_text(
146             f"Given the query:\n\n{query}\n\nthen, summarize text removing all material that is not relevant to the query and then be very concise for a very short summary:\n\n{text}\n"
147         )
148         ret += ret + summary
149     print("\n\n-----------------------------------")
150     return ret
151 
152 # Function metadata for Ollama integration
153 uri_to_markdown.metadata = {
154     "name": "uri_to_markdown",
155     "description": "Converts web page content to markdown-style text",
156     "parameters": {"a_uri": "URI of the web page to convert"},
157 }
158 
159 search_web.metadata = {
160     "name": "search_web",
161     "description": "Performs a web search and returns results",
162     "parameters": {
163         "query": "Search query",
164         "max_results": "Maximum number of results to return",
165     },
166 }
167 
168 # Export the functions
169 __all__ = ["uri_to_markdown", "search_web"]

Tools Wrap Up

We have looked at the implementations and examples uses for several tools. In the next chapter we continue our study of tool use with the application of judging the accuracy of output generated of LLMs: basically LLMs judging the accuracy of other LLMs to reduce hallucinations, inaccurate output, etc.

Automatic Evaluation of LLM Results: More Tool Examples

As Large Language Models (LLMs) become increasingly integrated into production systems and workflows, the ability to systematically evaluate their performance becomes crucial. While qualitative assessment of LLM outputs remains important, organizations need robust, quantitative methods to measure and compare model performance across different prompts, use cases, and deployment scenarios. This has led to the development of specialized tools and frameworks designed specifically for LLM evaluation.

The evaluation of LLM outputs presents unique challenges that set it apart from traditional natural language processing metrics. Unlike straightforward classification or translation tasks, LLM responses often require assessment across multiple dimensions, including factual accuracy, relevance, coherence, creativity, and adherence to specified formats or constraints. Furthermore, the stochastic nature of LLM outputs means that the same prompt can generate different responses across multiple runs, necessitating evaluation methods that can account for this variability.

Modern LLM evaluation tools address these challenges through a combination of automated metrics, human-in-the-loop validation, and specialized frameworks for prompt testing and response analysis. These tools can help developers and researchers understand how well their prompts perform, identify potential failure modes, and optimize prompt engineering strategies. By providing quantitative insights into LLM performance, these evaluation tools enable more informed decisions about model selection, prompt design, and system architecture in LLM-powered applications.

In this chapter we take a simple approach:

  • Capture the chat history including output for an interaction with a LLM.
  • Generate a prompt containing the chat history, model output, and a request to a different LLM to evaluate the output generated by the first LLM. We request that the final output of the second LLM is a score of ‘G’ or ‘B’ (good or bad) judging the accuracy of the first LLM’s output.

We look at several examples in this chapter of approaches you might want to experiment with.

Tool For Judging LLM Results

Here we implement our simple approach of using a second LLM to evaluate the output of the first LLM tat generated a response to user input.

The following listing shows the tool tool_judge_results.py:

 1 """
 2 Judge results from LLM generation from prompts
 3 """
 4 
 5 from typing import Optional, Dict, Any
 6 from pathlib import Path
 7 import json
 8 import re
 9 from pprint import pprint
10 
11 import ollama
12 
13 client = ollama.Client()
14 
15 def judge_results(original_prompt: str, llm_gen_results: str) -> Dict[str, str]:
16     """
17     Takes an original prompt to a LLM and the output results
18 
19     Args:
20         original_prompt (str): original prompt to a LLM
21         llm_gen_results (str): output from the LLM that this function judges for accuracy
22 
23     Returns:
24         result: str: string that is one character with one of these values:
25             - 'B': Bad result
26             - 'G': A Good result
27     """
28     try:
29         messages = [
30             {"role": "system", "content": "Always judge this output for correctness."},
31             {"role": "user", "content": f"Evaluate this output:\n\n{llm_gen_results}\n\nfor this prompt:\n\n{original_prompt}\n\nDouble check your work and explain your thinking in a few sentences. End your output with a Y or N answer"},
32         ]
33 
34         response = client.chat(
35             model="qwen2.5-coder:14b", # "llama3.2:latest",
36             messages=messages,
37         )
38 
39         r = response.message.content.strip()
40         print(f"\n\noriginal COT response:\n\n{r}\n\n")
41 
42         # look at the end of the response for the Y or N judgement
43         s = r.lower()
44         # remove all non-alphabetic characters:
45         s = re.sub(r'[^a-zA-Z]', '', s).strip()
46 
47         return {'judgement': s[-1].upper(), 'reasoning': r[1:].strip()}
48 
49     except Exception as e:
50         print(f"\n\n***** {e=}\n\n")
51         return {'judgement': 'E', 'reasoning': str(e)}  # on any error, assign 'E' result

This Python code defines a function judge_results that takes an original prompt sent to a Large Language Model (LLM) and the generated response from the LLM, then attempts to judge the accuracy of the response.

Here’s a breakdown of the code:

The main function judge_results takes two parameters:

  • original_prompt: The initial prompt sent to an LLM
  • llm_gen_results: The output from the LLM that needs evaluation

The function judge_results returns a dictionary with two keys:

  • judgement: Single character (‘B’ for Bad, ‘G’ for Good, ‘E’ for Error)
  • reasoning: Detailed explanation of the judgment

The evaluation process is:

  • Creates a conversation with two messages:–System message: Sets the context for evaluation–User message: Combines the original prompt and results for evaluation
  • Uses the Qwen 2.5 Coder (14B parameter) model through Ollama
  • Expects a Y/N response at the end of the evaluation

Sample output

 1 $ cd OllamaEx
 2 $ python example_judge.py 
 3 
 4 ==================================================
 5  Judge output from a LLM
 6 ==================================================
 7 
 8 ==================================================
 9  First test: should be Y, or good
10 ==================================================
11 
12 
13 original COT response:
14 
15 The given output correctly calculates the absolute value of age differences for each pair:
16 
17 - Sally (55) and John (18): \( |55 - 18| = 37 \)
18 - Sally (55) and Mary (31): \( |55 - 31| = 24 \)
19 - John (18) and Mary (31): \( |31 - 18| = 13 \)
20 
21 These calculations are accurate, matching the prompt's requirements. Therefore, the answer is Y.
22 
23 
24 
25 ** JUDGEMENT ***
26 
27 judgement={'judgement': 'Y', 'reasoning': "The given output correctly calculates the absolute value of age differences for each pair:\n\n- Sally (55) and John (18): \\( |55 - 18| = 37 \\)\n- Sally (55) and Mary (31): \\( |55 - 31| = 24 \\)\n- John (18) and Mary (31): \\( |31 - 18| = 13 \\)\n\nThese calculations are accurate, matching the prompt's requirements. Therefore, the answer is Y."}
28 
29 ==================================================
30  Second test: should be N, or bad
31 ==================================================
32 
33 
34 original COT response:
35 
36 Let's evaluate the given calculations step by step:
37 
38 1. Sally (55) - John (18) = 37. The difference is calculated as 55 - 18, which equals 37.
39 2. Sally (55) - Mary (31) = 24. The difference is calculated as 55 - 31, which equals 24.
40 3. John (18) - Mary (31) = -13. However, the absolute value of this difference is |18 - 31| = 13.
41 
42 The given output shows:
43 - Sally and John: 55 - 18 = 31. This should be 37.
44 - Sally and Mary: 55 - 31 = 24. This is correct.
45 - John and Mary: 31 - 18 = 10. This should be 13.
46 
47 The output contains errors in the first and third calculations. Therefore, the answer is:
48 
49 N
50 
51 ** JUDGEMENT ***
52 
53 judgement={'judgement': 'N', 'reasoning': "et's evaluate the given calculations step by step:\n\n1. Sally (55) - John (18) = 37. The difference is calculated as 55 - 18, which equals 37.\n2. Sally (55) - Mary (31) = 24. The difference is calculated as 55 - 31, which equals 24.\n3. John (18) - Mary (31) = -13. However, the absolute value of this difference is |18 - 31| = 13.\n\nThe given output shows:\n- Sally and John: 55 - 18 = 31. This should be 37.\n- Sally and Mary: 55 - 31 = 24. This is correct.\n- John and Mary: 31 - 18 = 10. This should be 13.\n\nThe output contains errors in the first and third calculations. Therefore, the answer is:\n\nN"}

Evaluating LLM Responses Given a Chat History

Here we try a difference approach by asking the second “judge” LLM to evaluate the output of the first LLM based on specific criteria like “Response accuracy”, “Helpfulness”, etc.

The following listing shows the tool utility tool_llm_eval.py:

  1 import json
  2 from typing import List, Dict, Optional, Iterator
  3 import ollama
  4 from ollama import GenerateResponse
  5 
  6 
  7 def clean_json_response(response: str) -> str:
  8     """
  9     Cleans the response string by removing markdown code blocks and other formatting
 10     """
 11     # Remove markdown code block indicators
 12     response = response.replace("json", "").replace("```", "")
 13     # Strip whitespace
 14     response = response.strip()
 15     return response
 16 
 17 def evaluate_llm_conversation(
 18     chat_history: List[Dict[str, str]],
 19     evaluation_criteria: Optional[List[str]] = None,
 20     model: str = "llama3.1" # older model that is good at generating JSON
 21 ) -> Dict[str, any]:
 22     """
 23     Evaluates a chat history using Ollama to run the evaluation model.
 24 
 25     Args:
 26         chat_history: List of dictionaries containing the conversation
 27         evaluation_criteria: Optional list of specific criteria to evaluate
 28         model: Ollama model to use for evaluation
 29 
 30     Returns:
 31         Dictionary containing evaluation results
 32     """
 33     if evaluation_criteria is None:
 34         evaluation_criteria = [
 35             "Response accuracy",
 36             "Coherence and clarity",
 37             "Helpfulness",
 38             "Task completion",
 39             "Natural conversation flow"
 40         ]
 41 
 42     # Format chat history for evaluation
 43     formatted_chat = "\n".join([
 44         f"{'User' if msg['role'] == 'user' else 'Assistant'}: {msg['content']}"
 45         for msg in chat_history
 46     ])
 47 
 48     # Create evaluation prompt
 49     evaluation_prompt = f"""
 50     Please evaluate the following conversation between a user and an AI assistant.
 51     Focus on these criteria: {', '.join(evaluation_criteria)}
 52 
 53     Conversation:
 54     {formatted_chat}
 55 
 56     Provide a structured evaluation with:
 57     1. Scores (1-10) for each criterion
 58     2. Brief explanation for each score
 59     3. Overall assessment
 60     4. Suggestions for improvement
 61 
 62     Format your response as JSON.
 63     """
 64 
 65     try:
 66         # Get evaluation from Ollama
 67         response: GenerateResponse | Iterator[GenerateResponse] = ollama.generate(
 68             model=model,
 69             prompt=evaluation_prompt,
 70             system="You are an expert AI evaluator. Provide detailed, objective assessments in JSON format."
 71         )
 72 
 73         response_clean: str = clean_json_response(response['response'])
 74 
 75         # Parse the response to extract JSON
 76         try:
 77             evaluation_result = json.loads(response_clean)
 78         except json.JSONDecodeError:
 79             # Fallback if response isn't proper JSON
 80             evaluation_result = {
 81                 "error": "Could not parse evaluation as JSON",
 82                 "raw_response": response_clean
 83             }
 84 
 85         return evaluation_result
 86 
 87     except Exception as e:
 88         return {
 89             "error": f"Evaluation failed: {str(e)}",
 90             "status": "failed"
 91         }
 92 
 93 # Example usage
 94 if __name__ == "__main__":
 95     # Sample chat history
 96     sample_chat = [
 97         {"role": "user", "content": "What's the capital of France?"},
 98         {"role": "assistant", "content": "The capital of France is Paris."},
 99         {"role": "user", "content": "Tell me more about it."},
100         {"role": "assistant", "content": "Paris is the largest city in France and serves as the country's political, economic, and cultural center. It's known for landmarks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral."}
101     ]
102 
103     # Run evaluation
104     result = evaluate_llm_conversation(sample_chat)
105     print(json.dumps(result, indent=2))

We will use these five evaluation criteria:

  • Response accuracy
  • Coherence and clarity
  • Helpfulness
  • Task completion
  • Natural conversation flow

The main function evaluate_llm_conversation uses these steps:

  • Receives chat history and optional parameters
  • Formats the conversation into a readable string
  • Creates a detailed evaluation prompt
  • Sends prompt to Ollama for evaluation
  • Cleans and parses the response
  • Returns structured evaluation results

Sample Output

 1 $ cd OllamaEx 
 2 $ python tool_llm_eval.py 
 3 {
 4   "evaluation": {
 5     "responseAccuracy": {
 6       "score": 9,
 7       "explanation": "The assistant correctly answered the user's question about the capital of France, and provided accurate information when the user asked for more details."
 8     },
 9     "coherenceAndClarity": {
10       "score": 8,
11       "explanation": "The assistant's responses were clear and easy to understand. However, there was a slight shift in tone from a simple answer to a more formal description."
12     },
13     "helpfulness": {
14       "score": 9,
15       "explanation": "The assistant provided relevant information that helped the user gain a better understanding of Paris. The response was thorough and answered the user's follow-up question."
16     },
17     "taskCompletion": {
18       "score": 10,
19       "explanation": "The assistant completed both tasks: providing the capital of France and elaborating on it with additional context."
20     },
21     "naturalConversationFlow": {
22       "score": 7,
23       "explanation": "While the responses were clear, they felt a bit abrupt. The assistant could have maintained a more conversational tone or encouraged further discussion."
24     }
25   },
26   "overallAssessment": {
27     "score": 8.5,
28     "explanation": "The assistant demonstrated strong technical knowledge and was able to provide accurate information on demand. However, there were some minor lapses in natural conversation flow and coherence."
29   },
30   "suggestionsForImprovement": [
31     {
32       "improvementArea": "NaturalConversationFlow",
33       "description": "Consider using more conversational language or prompts to engage users further."
34     },
35     {
36       "improvementArea": "CoherenceAndClarity",
37       "description": "Use transitional phrases and maintain a consistent tone throughout the conversation."
38     }
39   ]
40 }

A Tool for Detecting Hallucinations

Here we use a text template file templates/anti_hallucinations.txt to define the prompt template for checking a user input, a context, and the resulting output by another LLM (most of the file is not shown for brevity):

 1 You are a fair judge and an expert at identifying false hallucinations and you are tasked with evaluating the accuracy of an AI-generated answer to a given context. Analyze the provided INPUT, CONTEXT, and OUTPUT to determine if the OUTPUT contains any hallucinations or false information.
 2 
 3 Guidelines:
 4 1. The OUTPUT must not contradict any information given in the CONTEXT.
 5 2. The OUTPUT must not introduce new information beyond what's provided in the CONTEXT.
 6 3. The OUTPUT should not contradict well-established facts or general knowledge.
 7 4. Check that the OUTPUT doesn't oversimplify or generalize information in a way that changes its meaning or accuracy.
 8 
 9 Analyze the text thoroughly and assign a hallucination score between 0 and 1, where:
10 - 0.0: The OUTPUT is unfaithful or is incorrect to the CONTEXT and the user's INPUT
11 - 1.0: The OUTPUT is entirely accurate abd faithful to the CONTEXT and the user's INPUT
12 
13 INPUT:
14 {input}
15 
16 CONTEXT:
17 {context}
18 
19 OUTPUT:
20 {output}
21 
22 Provide your judgement in JSON format:
23 {{
24     "score": <your score between 0.0 and 1.0>,
25     "reason": [
26         <list your reasoning as Python strings>
27     ]
28 }}

Here is the tool tool_anti_hallucination.py that uses this template:

 1 """
 2 Provides functions detecting hallucinations by other LLMs
 3 """
 4 
 5 from typing import Optional, Dict, Any
 6 from pathlib import Path
 7 from pprint import pprint
 8 import json
 9 from ollama import ChatResponse
10 from ollama import chat
11 
12 def read_anti_hallucination_template() -> str:
13     """
14     Reads the anti-hallucination template file and returns the content
15     """
16     template_path = Path(__file__).parent / "templates" / "anti_hallucination.txt"
17     with template_path.open("r", encoding="utf-8") as f:
18         content = f.read()
19         return content
20 
21 TEMPLATE = read_anti_hallucination_template()
22 
23 def detect_hallucination(user_input: str, context: str, output: str) -> str:
24     """
25     Given user input, context, and LLM output, detect hallucination
26 
27     Args:
28         user_input (str): User's input text prompt
29         context (str): Context text for LLM
30         output (str): LLM's output text that is to be evaluated as being a hallucination)
31 
32     Returns: JSON data:
33      {
34        "score": <your score between 0.0 and 1.0>,
35        "reason": [
36          <list your reasoning as bullet points>
37        ]
38      }
39     """
40     prompt = TEMPLATE.format(input=user_input, context=context, output=output)
41     response: ChatResponse = chat(
42         model="llama3.2:latest",
43         messages=[
44             {"role": "system", "content": prompt},
45             {"role": "user", "content": output},
46         ],
47     )
48     try:
49         return json.loads(response.message.content)
50     except json.JSONDecodeError:
51         print(f"Error decoding JSON: {response.message.content}")
52     return {"score": 0.0, "reason": ["Error decoding JSON"]}
53 
54 
55 # Export the functions
56 __all__ = ["detect_hallucination"]
57 
58 ## Test only code:
59 
60 def main():
61     def separator(title: str):
62         """Prints a section separator"""
63         print(f"\n{'=' * 50}")
64         print(f" {title}")
65         print('=' * 50)
66 
67     # Test file writing
68     separator("Detect hallucination from a LLM")
69 
70     test_prompt = "Sally is 55, John is 18, and Mary is 31. What are pairwise combinations of the absolute value of age differences?"
71     test_context = "Double check all math results."
72     test_output = "Sally and John:  55 - 18 = 31. Sally and Mary:  55 - 31 = 24. John and Mary:  31 - 18 = 10."
73     judgement = detect_hallucination(test_prompt, test_context, test_output)
74     print(f"\n** JUDGEMENT ***\n")
75     pprint(judgement)
76 
77 if __name__ == "__main__":
78     try:
79         main()
80     except Exception as e:
81         print(f"An error occurred: {str(e)}")

This code implements a hallucination detection system for Large Language Models (LLMs) using the Ollama framework. The core functionality revolves around the detect_hallucination function, which takes three parameters: user input, context, and LLM output, and evaluates whether the output contains hallucinated content by utilizing another LLM (llama3.2) as a judge. The system reads a template from a file to structure the evaluation prompt.

The implementation includes type hints and error handling, particularly for JSON parsing of the response. The output is structured as a JSON object containing a hallucination score (between 0.0 and 1.0) and a list of reasoning points. The code also includes a test harness that demonstrates the system’s usage with a mathematical example, checking for accuracy in age difference calculations. The modular design allows for easy integration into larger systems through the explicit export of the detect_hallucination function.

The output looks something like this:

 1 python /Users/markw/GITHUB/OllamaExamples/tool_anti_hallucination.py 
 2 
 3 ==================================================
 4  Detect hallucination from a LLM
 5 ==================================================
 6 
 7 ** JUDGEMENT ***
 8 
 9 {'reason': ['The OUTPUT claims that the absolute value of age differences are '
10             '31, 24, and 10 for Sally and John, Sally and Mary, and John and '
11             'Mary respectively. However, this contradicts the CONTEXT, as the '
12             'CONTEXT asks to double-check math results.',
13             'The OUTPUT does not introduce new information, but it provides '
14             'incorrect calculations: Sally and John: 55 - 18 = 37, Sally and '
15             'Mary: 55 - 31 = 24, John and Mary: 31 - 18 = 13. Therefore, the '
16             'actual output should be recalculated to ensure accuracy.',
17             'The OUTPUT oversimplifies the age differences by not considering '
18             "the order of subtraction (i.e., John's age subtracted from "
19             "Sally's or Mary's). However, this is already identified as a "
20             'contradiction in point 1.'],
21  'score': 0.0}

Wrap Up

Here we looked at several examples for using one LLM to rate the accuracy, usefulness, etc. of another LLM given an input prompt. There are two topics in this book that I spend most of my personal LLM research time on: automatic evaluation of LLM results, and tool using agents (the subject of the next chapter).

Building Agents with Ollama and the Hugging Face Smolagents Library

We have seen a few useful examples of tool use (function calling) and now we will build on tool use to build both single agents and multi-agent systems. There are commercial and open source resources to build agents, CrewAI and LangGraph being popular choices. We will follow a different learning path here, preferring to use the smolagents library. Please bookmark https://github.com/huggingface/smolagents for reference while working through this chapter.

Each example program and utility for this chapter uses the prefix smolagents_ in the Python file name.

Note: We are using the 2 GB model Llama3.2:latest here. Different models support tools and agents differently.

Choosing Specific LLMs for Writing Agents

As agents operate performing tasks like interpreting user input, performing Chain of Thought (Cot) reasoning, observe the output from calling tools, and following plan steps one by one, then LLMs errors, hallucinations, and inconsistencies accumulate. When using Ollama we prefer using the most powerful models that we can run on our hardware.

Here we use Llama3.2:latest that is recognized for its function calling capabilities, facilitating seamless integration with various tools.

As you work through the examples here using different local models running on Ollama, you might encounter compounding errors problems. When I am experimenting with ideas for implementing agents, I sometimes keep two versions of my code, one for a local model and one using eight of the commercial models GPT-4o or Claude Sonnet 3.5. Comparing the same agent setup using different models might provide some insight into runtime agent problems being your code or the model you are using.

Installation notes

As I write this chapter on January 2, 2025, smolagents needs to be run with an older version of Python:

1 python3.11 -m venv venv
2 source venv/bin/activate
3 python3.11 -m pip install -r requirements.txt
4 python3.11 smolagents_test.py

The first two lines of the requirements.txt file specify the smolagents specific requirements:

 1 smolagents
 2 litellm[proxy]
 3 requests
 4 beautifulsoup4
 5 ollama
 6 langchain
 7 langchain-community
 8 langchain-ollama
 9 langgraph
10 rdflib
11 kuzu
12 langchain_openai
13 tabulate

Overview of the Hugging Face Smolagents Library

The smolagents library https://github.com/huggingface/smolagents is built around a minimalist and modular architecture that emphasizes simplicity and composability. The core components are cleanly separated into the file agents.py for agent definitions, tools.py for tool implementations, and related support files. This design philosophy allows developers to easily understand, extend, and customize the components while maintaining a small codebase footprint - true to the “smol” name.

This library implements a tools-first approach where capabilities are encapsulated as discrete tools that agents can use. The tools.py file in the smolagents implementation defines a clean interface for tools with input/output specifications, making it straightforward to add new tools. This tools-based architecture enables agents to have clear, well-defined capabilities while maintaining separation of concerns between the agent logic and the actual implementation of capabilities.

Agents are designed to be lightweight and focused on specific tasks rather than trying to be general-purpose. The BaseAgent class provides core functionality while specific agents like WebAgent extend it for particular use cases. This specialization allows the agents to be more efficient and reliable at their designated tasks rather than attempting to be jack-of-all-trades.

Overview for LLM Agents (optional section)

You might want to skip this section if you want to quickly work through the examples in this chapter and review this material later.

In general, we use the following steps to build agent based systems:

  • Define agents (e.g., Researcher, Writer, Editor, Judge outputs of other models and agents).
  • Assign tasks (e.g., research, summarize, write, double check the work of other agents).
  • Use an orchestration framework to manage task sequencing and collaboration.

Features of Agents:

  • Retrieval-Augmented Generation (RAG): Enhance agents’ knowledge by integrating external documents or databases.–Example: An agent that retrieves and summarizes medical research papers.
  • Memory Management: Enable agents to retain context across interactions.
    • Example: A chatbot that remembers user preferences over time.
  • Tool Integration: Equip agents with tools like web search, data scraping, or API calls.
    • Example: An agent that fetches real-time weather data and provides recommendations. We will use tools previously developed in this book.

Examples of Real-World Applications

  • Healthcare: Agents that analyze medical records and provide diagnostic suggestions.
  • Education: Virtual tutors that explain complex topics using Ollama’s local models.
  • Customer Support: Chatbots that handle inquiries without relying on cloud services.
  • Content Creation: Agents that generate articles, summaries, or marketing content.

Let’s Write Some Code

I am still experimenting with LLM-based agents. Please accept the following examples as my personal works in progress.

“Hello World” Smolagents Example

Here we look at a simple example taken from the smolagents documentation and converted to run using local models with Ollama. Here is a listing of file smolagents_test.py:

 1 """
 2 smolagents example program (slightly modified)
 3 """
 4 
 5 from smolagents.agents import ToolCallingAgent
 6 from smolagents import tool, LiteLLMModel
 7 from typing import Optional
 8 
 9 model = LiteLLMModel(
10     model_id="ollama_chat/llama3.2:latest",
11     api_base="http://localhost:11434",
12     api_key="your-api-key" # not used
13 )
14 
15 @tool
16 def get_weather(location: str, celsius: Optional[bool] = False) -> str:
17     """
18     Get weather in the next days at given location.
19     Secretly this tool does not care about the location, it hates the weather everywhere.
20 
21     Args:
22         location: the location
23         celsius: the temperature
24     """
25     return "The weather is UNGODLY with torrential rains and temperatures below -10°C"
26 
27 agent = ToolCallingAgent(tools=[get_weather], model=model)
28 
29 print(agent.run("What's the weather like in Paris?"))

Understanding the smolagents and Ollama Example

This code demonstrates a simple integration between smolagents (a tool-calling framework) and Ollama (a local LLM server). Here’s what the code accomplishes: Core Components

Utilizes smolagents for creating AI agents with tool capabilities Integrates with a local Ollama server running llama3.2 Implements a basic weather checking tool (though humorously hardcoded)

Model Configuration

The code sets up a LiteLLM model instance that connects to a local Ollama server on port 11434. It’s configured to use the llama3.2 model and supports optional API key authentication.

Weather Tool Implementation

The code defines a weather-checking tool using the @tool decorator. While it accepts a location parameter and an optional celsius flag, this example version playfully returns the same dramatic weather report regardless of the input location.

Agent Setup and Execution

The implementation creates a ToolCallingAgent with the weather tool and the configured model. Users can query the agent about weather conditions in any location, though in this example it always returns the same humorous response about terrible weather conditions.

Key Features

Demonstrates tool-calling capabilities through smolagents Shows local LLM integration using Ollama Includes proper type hinting for better code clarity Provides an extensible structure for adding more tools

Python Tools Compatible with Smolagents

The tools I developed in previous chapters are not quite compatible with the smolagents library so I wrap a few of the tools I previously wrote in the utility smolagents_tools.py:

 1 """
 2 Wrapper for book example tools for smloagents compatibility
 3 """
 4 from pathlib import Path
 5 
 6 from smolagents import tool, LiteLLMModel
 7 from typing import Optional
 8 from pprint import pprint
 9 
10 from tool_file_dir import list_directory
11 
12 @tool
13 def sa_list_directory(list_dots: Optional[bool]=None) -> str:
14     """
15     Lists files and directories in the current working directory
16 
17     Args:
18         list_dots: optional boolean (if true, include dot files)
19 
20     Returns:
21         string with directory name, followed by list of files in the directory
22     """
23     lst = list_directory()
24     pprint(lst)
25     return lst
26 
27 @tool
28 def read_file_contents(file_path: str) -> str:
29     """
30     Reads contents from a file and returns the text
31 
32     Args:
33         file_path: Path to the file to read
34 
35     Returns:
36         Contents of the file as a string
37     """
38     try:
39         path = Path(file_path)
40         if not path.exists():
41             return f"File not found: {file_path}"
42 
43         with path.open("r", encoding="utf-8") as f:
44             content = f.read()
45             return f"Contents of file '{file_path}' is:\n{content}\n"
46 
47     except Exception as e:
48         return f"Error reading file '{file_path}' is: {str(e)}"
49 
50 @tool
51 def summarize_directory() -> str:
52     """
53     Summarizes the files and directories in the current working directory
54 
55     Returns:
56         string with directory name, followed by summary of files in the directory
57     """
58     lst = list_directory()
59     num_files = len(lst)
60     num_dirs = len([x for x in lst if x[1] == 'directory'])
61     num_files = num_files - num_dirs
62     return f"Current directory contains {num_files} files and {num_dirs} directories."

This code defines a wrapper module containing three tool functions designed for compatibility with the smolagents framework. The module includes sa_list_directory(), which lists files and directories in the current working directory with an optional parameter to include dot files; read_file_contents(), which takes a file path as input and returns the contents of that file as a string while handling potential errors and file encoding; and summarize_directory(), which provides a concise summary of the current directory by counting the total number of files and directories. All functions are decorated with @tool for integration with smlolagents, and the code imports necessary modules including pathlib for file operations, typing for type hints, and pprint for formatted output. The functions rely on an external list_directory() function imported from tool_file_dir.py, and they provide clear documentation through docstrings explaining their parameters, functionality, and return values. Error handling is implemented particularly in the file reading function to gracefully handle cases where files don’t exist or cannot be read properly.

A Complete Smolagents Example using Three Tools

This listing shows the script smolagents_agent_test.py:

 1 from smolagents.agents import ToolCallingAgent
 2 from smolagents import tool, LiteLLMModel
 3 from typing import Optional
 4 
 5 from smolagents_tools import sa_list_directory
 6 from smolagents_tools import summarize_directory
 7 from smolagents_tools import read_file_contents
 8 
 9 model = LiteLLMModel(
10     model_id="ollama_chat/llama3.2:latest",
11     api_base="http://localhost:11434",
12     api_key="your-api-key" # not used
13 )
14 
15 agent = ToolCallingAgent(tools=[sa_list_directory,
16                                 summarize_directory,
17                                 read_file_contents],
18                          model=model)
19 
20 print(agent.run("What are the files in the current directory? Describe the current directory"))
21 
22 print(agent.run("Which Python scripts evaluate the performance of LLMs?"))

This code demonstrates the creation of an AI agent using the smolagents library, specifically configured to work with file system operations. It imports three specialized tools from smolagents_tools: sa_list_directory for listing directory contents, summarize_directory for providing directory summaries, and read_file_contents for accessing file contents. The code sets up a LiteLLMModel instance that connects to a local Ollama server running the llama3.2 model on port 11434, with provisions for API key authentication if needed. A ToolCallingAgent is then created with these three file-system-related tools, enabling it to interact with and analyze the local file system. The agent is instructed to examine the current directory through a natural language query, asking for both a listing and description of the files present. There’s also a second section that would have asked the agent to specifically analyze Python programs in the directory and identify those related to LLM performance evaluation, showing the agent’s potential for more complex file analysis tasks. This setup effectively creates an AI-powered file system navigator that can understand and respond to natural language queries about directory contents and file analysis.

Output from the First Example: “List the Python programs in the current directory, and then tell me which Python programs in the current directory evaluate the performance of LLMs?”

In the following output please notice that sometimes tool use fails and occasionally wrong assumptions are made, but after a long chain or thought (CoT) process the final result is good.

The output for for the query “Which python scripts evaluate the performance of LLMs?” is:

  1 python smolagents_agent_test1.py 
  2 ╭────────────────────────────────── New run ───────────────────────────────────╮
  3 │                                                                              │
  4 │ List the Python programs in the current directory, and then tell me which    │
  5 │ Python programs in the current directory evaluate the performance of LLMs?   │
  6 │                                                                              │
  7 ╰─ LiteLLMModel - ollama_chat/llama3.2:latest ─────────────────────────────────╯
  8 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  9 ╭──────────────────────────────────────────────────────────────────────────────╮
 10 │ Calling tool: 'sa_list_directory' with arguments: {'list_dots': True} 11 ╰──────────────────────────────────────────────────────────────────────────────╯
 12 ('Contents of current directory: [Makefile, README.md, __pycache__, data, '
 13  'example_chain_read_summary.py, example_chain_web_summary.py, '
 14  'example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, '
 15  'langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, '
 16  'short_programs, smolagents_agent_test1.py, smolagents_test.py, '
 17  'smolagents_tools.py, templates, tool_anti_hallucination.py, '
 18  'tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, '
 19  'tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, '
 20  'tool_web_search.py, venv]')
 21 Observations: Contents of current directory: [Makefile, README.md, __pycache__, 
 22 data, example_chain_read_summary.py, example_chain_web_summary.py, 
 23 example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, 
 24 langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, 
 25 short_programs, smolagents_agent_test1.py, smolagents_test.py, 
 26 smolagents_tools.py, templates, tool_anti_hallucination.py, 
 27 tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, 
 28 tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, 
 29 venv]
 30 [Step 0: Duration 4.49 seconds| Input tokens: 1,347 | Output tokens: 79]
 31 
 32 ...
 33 
 34 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 35 ╭──────────────────────────────────────────────────────────────────────────────╮
 36 │ Calling tool: 'sa_summarize_directory' with arguments: {} 37 ╰──────────────────────────────────────────────────────────────────────────────╯
 38 lst='Contents of current directory: [Makefile, README.md, __pycache__, data, example_chain_read_summary.py, example_chain_web_summary.py, example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, short_programs, smolagents_agent_test1.py, smolagents_test.py, smolagents_tools.py, templates, tool_anti_hallucination.py, tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, venv]'
 39 response.message.content="Based on the file names provided, here's a summary of the contents and my educated guesses for their purposes:\n\n1. **Makefile**: A build script used to automate compilation, installation, or other tasks.\n2. **README.md**: A markdown file providing an introduction to the project, its purpose, and how to get started with it.\n3. **__pycache__**: This is a hidden directory generated by Python's bytecode compiler. It likely contains compiled versions of Python code in the current directory.\n4. **data**: A directory containing data used for training or testing models, simulations, or other computational tasks.\n5. **example_chain_read_summary.py**: A script that generates summaries from reading chains (e.g., text from a document). Its purpose is likely related to natural language processing (NLP) or text analysis.\n6. **example_chain_web_summary.py**: Similar to the previous one, but this script seems to be focused on web-based applications or online content summarization.\n7. **example_judge.py**: A script that evaluates judges or other aspects of a system. Its purpose might involve testing, validation, or quality control in a specific domain (e.g., AI, programming).\n8. **graph_kuzu_from_text.py**: A script that generates graphs from text data. This could be related to graph analysis, network science, or text mining.\n9. **graph_kuzu_property_example.py**: Another script related to graph generation, possibly used for demonstrating specific properties or characteristics of the generated graphs.\n10. **langgraph_agent_test.py**: A test script for a language graph agent, which likely involves machine learning, natural language processing, or other AI-related tasks.\n11. **ollama_tools_examples.py**: A file containing examples of OLLAMA (Open Language Learning with Multi-Agent) tools and their usage. This seems to be related to AI-powered language understanding and generation.\n12. **requirements.txt**: A text file specifying the dependencies required to run the project, including Python packages and other software.\n13. **short_programs**: A directory containing short programs or scripts that demonstrate specific techniques, concepts, or examples in the field of AI or programming.\n14. **smolagents_agent_test1.py**: Another test script for a small agent (SMOLAgent), which is likely involved in autonomous decision-making or other complex system simulations.\n15. **smolagents_test.py**: A general-purpose test file for SMOLAgents, used to verify their functionality and performance.\n16. **smolagents_tools.py**: A script containing tools or utilities specifically designed for working with SMOLAgents or related projects.\n17. **templates**: A directory containing templates for generating documents, reports, or other content in a specific format (e.g., LaTeX).\n18. **tool_anti_hallucination.py**: A script that addresses issues of hallucination (i.e., when an AI model provides false information) and aims to mitigate these errors.\n19. **tool_file_contents.py**, **tool_file_dir.py**, **tool_judge_results.py**, and other similar tool scripts: These tools seem to be designed for various computational tasks, such as:\n\t* `tool_file_contents.py`: Analyzing the contents of files.\n\t* `tool_file_dir.py`: Examining or manipulating directory structures.\n\t* `tool_judge_results.py`: Evaluating the performance or outcomes of a system or model.\n\n20. **tool_llm_eval.py**: A script for evaluating Large Language Models (LLMs) and their capabilities, likely involving text analysis, sentiment detection, or other NLP tasks.\n21. **tool_sqlite.py**: A tool that interacts with SQLite databases, possibly used for data storage, management, or querying.\n22. **tool_summarize_text.py**: A script designed to summarize long pieces of text into shorter versions, possibly using machine learning algorithms.\n23. **tool_web_search.py**: A tool that performs web searches or retrieves information from online sources, which could involve natural language processing (NLP) and web scraping techniques.\n\n24. **venv**: A directory generated by Python's virtual environment module, used to isolate dependencies and manage a specific Python environment for the project.\n\nKeep in mind that this is an educated guess based on common file name conventions and the context provided. The actual purposes of these files might differ depending on the specific project or domain they are related to."
 40 Observations: Summary of directory:Based on the file names provided, here's a 
 41 summary of the contents and my educated guesses for their purposes:
 42 
 43 1. **Makefile**: A build script used to automate compilation, installation, or 
 44 other tasks.
 45 2. **README.md**: A markdown file providing an introduction to the project, its 
 46 purpose, and how to get started with it.
 47 3. **__pycache__**: This is a hidden directory generated by Python's bytecode 
 48 compiler. It likely contains compiled versions of Python code in the current 
 49 directory.
 50 4. **data**: A directory containing data used for training or testing models, 
 51 simulations, or other computational tasks.
 52 5. **example_chain_read_summary.py**: A script that generates summaries from 
 53 reading chains (e.g., text from a document). Its purpose is likely related to 
 54 natural language processing (NLP) or text analysis.
 55 6. **example_chain_web_summary.py**: Similar to the previous one, but this 
 56 script seems to be focused on web-based applications or online content 
 57 summarization.
 58 7. **example_judge.py**: A script that evaluates judges or other aspects of a 
 59 system. Its purpose might involve testing, validation, or quality control in a 
 60 specific domain (e.g., AI, programming).
 61 8. **graph_kuzu_from_text.py**: A script that generates graphs from text data. 
 62 This could be related to graph analysis, network science, or text mining.
 63 9. **graph_kuzu_property_example.py**: Another script related to graph 
 64 generation, possibly used for demonstrating specific properties or 
 65 characteristics of the generated graphs.
 66 10. **langgraph_agent_test.py**: A test script for a language graph agent, which
 67 likely involves machine learning, natural language processing, or other 
 68 AI-related tasks.
 69 11. **ollama_tools_examples.py**: A file containing examples of OLLAMA (Open 
 70 Language Learning with Multi-Agent) tools and their usage. This seems to be 
 71 related to AI-powered language understanding and generation.
 72 12. **requirements.txt**: A text file specifying the dependencies required to 
 73 run the project, including Python packages and other software.
 74 13. **short_programs**: A directory containing short programs or scripts that 
 75 demonstrate specific techniques, concepts, or examples in the field of AI or 
 76 programming.
 77 14. **smolagents_agent_test1.py**: Another test script for a small agent 
 78 (SMOLAgent), which is likely involved in autonomous decision-making or other 
 79 complex system simulations.
 80 15. **smolagents_test.py**: A general-purpose test file for SMOLAgents, used to 
 81 verify their functionality and performance.
 82 16. **smolagents_tools.py**: A script containing tools or utilities specifically
 83 designed for working with SMOLAgents or related projects.
 84 17. **templates**: A directory containing templates for generating documents, 
 85 reports, or other content in a specific format (e.g., LaTeX).
 86 18. **tool_anti_hallucination.py**: A script that addresses issues of 
 87 hallucination (i.e., when an AI model provides false information) and aims to 
 88 mitigate these errors.
 89 19. **tool_file_contents.py**, **tool_file_dir.py**, **tool_judge_results.py**, 
 90 and other similar tool scripts: These tools seem to be designed for various 
 91 computational tasks, such as:
 92         * `tool_file_contents.py`: Analyzing the contents of files.
 93         * `tool_file_dir.py`: Examining or manipulating directory structures.
 94         * `tool_judge_results.py`: Evaluating the performance or outcomes of a 
 95 system or model.
 96 
 97 20. **tool_llm_eval.py**: A script for evaluating Large Language Models (LLMs) 
 98 and their capabilities, likely involving text analysis, sentiment detection, or 
 99 other NLP tasks.
100 21. **tool_sqlite.py**: A tool that interacts with SQLite databases, possibly 
101 used for data storage, management, or querying.
102 22. **tool_summarize_text.py**: A script designed to summarize long pieces of 
103 text into shorter versions, possibly using machine learning algorithms.
104 23. **tool_web_search.py**: A tool that performs web searches or retrieves 
105 information from online sources, which could involve natural language processing
106 (NLP) and web scraping techniques.
107 
108 24. **venv**: A directory generated by Python's virtual environment module, used
109 to isolate dependencies and manage a specific Python environment for the 
110 project.
111 
112 Keep in mind that this is an educated guess based on common file name 
113 conventions and the context provided. The actual purposes of these files might 
114 differ depending on the specific project or domain they are related to.
115 [Step 3: Duration 21.53 seconds| Input tokens: 6,779 | Output tokens: 133]
116 
117 ...
118 
119 Reached max iterations.
120 Final answer: [{'id': '9630af1b-498f-4604-ab1b-e4139607cb02', 'type': 
121 'function', 'function': {'name': 'sa_list_directory', 'arguments': {'list_dots':
122 True}}}, {'id': '2143745d-5ec1-4711-8813-228398bf36f8', 'type': 'function', 
123 'function': {'name': 'sa_summarize_directory', 'arguments': {}}}]
124 
125 ...
126 
127 The Python programs in the current directory are:
128 1. example_chain_read_summary.py
129 2. example_chain_web_summary.py
130 3. example_judge.py
131 4. graph_kuzu_from_text.py
132 5. graph_kuzu_property_example.py
133 6. langgraph_agent_test.py
134 7. ollama_tools_examples.py
135 8. tool_anti_hallucination.py
136 9. tool_file_contents.py
137 10. tool_file_dir.py
138 11. tool_judge_results.py
139 12. tool_llm_eval.py
140 13. tool_summarize_text.py
141 14. smolagents_agent_test1.py
142 15. smolagents_test.py
143 
144 These Python programs evaluate the performance of LLMs:
145 1. tool_anti_hallucination.py
146 2. tool_llm_eval.py
147 3. tool_summarize_text.py

This is a lot of debug output to list in a book but I want you, dear reader, to get a feeling for the output generated by tools becomes the data for an again to observe before determining the next step in a plan to process.

This output shows the execution of the example smolagent-based agent that analyzes Python files in a directory looking for Python files containing code to evaluate the output results of LLMs. The agent follows a systematic approach by first listing all files using the sa_list_directory tool, then using sa_summarize_directory to provide detailed analysis of the contents.

The agent successfully identified all Python programs in the directory and specifically highlighted three files that evaluate LLM performance: tool_anti_hallucination.py (which checks for false information generation), tool_llm_eval.py (for general LLM evaluation), and tool_summarize_text.py (which likely tests LLM summarization capabilities). The execution includes detailed step-by-step logging, showing input/output tokens and duration for each step, demonstrating the agent’s methodical approach to file analysis and classification.

Output from the Second example: “What are the files in the current directory? Describe the current directory“

In this section we look at another agent processing cycle. Again, pay attention to the output of tools, and whether the agent can observe tool output and make sense of it (often the agent can’t!)

It is fairly normal for tools to fail with errors and it is important that agents can observe a failure and move on to try something else.

  1 python smolagents_agent_test1.py 
  2 ╭────────────────────────────────── New run ───────────────────────────────────╮
  3 │                                                                              │
  4 │ What are the files in the current directory? Describe the current directory  │
  5 │                                                                              │
  6 ╰─ LiteLLMModel - ollama_chat/llama3.2:latest ─────────────────────────────────╯
  7 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  8 ╭──────────────────────────────────────────────────────────────────────────────╮
  9 │ Calling tool: 'sa_list_directory' with arguments: {'list_dots': True} 10 ╰──────────────────────────────────────────────────────────────────────────────╯
 11 ('Contents of current directory: [Makefile, README.md, __pycache__, data, '
 12  'example_chain_read_summary.py, example_chain_web_summary.py, '
 13  'example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, '
 14  'langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, '
 15  'short_programs, smolagents_agent_test1.py, smolagents_test.py, '
 16  'smolagents_tools.py, templates, tool_anti_hallucination.py, '
 17  'tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, '
 18  'tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, '
 19  'tool_web_search.py, venv]')
 20 Observations: Contents of current directory: [Makefile, README.md, __pycache__, 
 21 data, example_chain_read_summary.py, example_chain_web_summary.py, 
 22 example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, 
 23 langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, 
 24 short_programs, smolagents_agent_test1.py, smolagents_test.py, 
 25 smolagents_tools.py, templates, tool_anti_hallucination.py, 
 26 tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, 
 27 tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, 
 28 venv]
 29 [Step 0: Duration 3.06 seconds| Input tokens: 1,332 | Output tokens: 19]
 30 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 31 ╭──────────────────────────────────────────────────────────────────────────────╮
 32 │ Calling tool: 'sa_list_directory' with arguments: {'list_dots': True} 33 ╰──────────────────────────────────────────────────────────────────────────────╯
 34 ('Contents of current directory: [Makefile, README.md, __pycache__, data, '
 35  'example_chain_read_summary.py, example_chain_web_summary.py, '
 36  'example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, '
 37  'langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, '
 38  'short_programs, smolagents_agent_test1.py, smolagents_test.py, '
 39  'smolagents_tools.py, templates, tool_anti_hallucination.py, '
 40  'tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, '
 41  'tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, '
 42  'tool_web_search.py, venv]')
 43 Observations: Contents of current directory: [Makefile, README.md, __pycache__, 
 44 data, example_chain_read_summary.py, example_chain_web_summary.py, 
 45 example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, 
 46 langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, 
 47 short_programs, smolagents_agent_test1.py, smolagents_test.py, 
 48 smolagents_tools.py, templates, tool_anti_hallucination.py, 
 49 tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, 
 50 tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, 
 51 venv]
 52 [Step 1: Duration 1.58 seconds| Input tokens: 2,900 | Output tokens: 38]
 53 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 54 ╭──────────────────────────────────────────────────────────────────────────────╮
 55 │ Calling tool: 'sa_summarize_directory' with arguments: {} 56 ╰──────────────────────────────────────────────────────────────────────────────╯
 57 lst='Contents of current directory: [Makefile, README.md, __pycache__, data, example_chain_read_summary.py, example_chain_web_summary.py, example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, short_programs, smolagents_agent_test1.py, smolagents_test.py, smolagents_tools.py, templates, tool_anti_hallucination.py, tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, venv]'
 58 response.message.content="Based on the provided list of files in the current directory, here's a summary and an educated guess about the major purposes of each file:\n\n1. **Makefile**: A build script for automating compilation and other tasks.\n\n2. **README.md**: A document providing information about the project, its purpose, usage, and installation instructions.\n\n3. **__pycache__** (hidden directory): Cache files generated by Python's compiler to speed up importing modules.\n\n4. **data**: Directory containing data used for testing or training models.\n\n5. **example_chain_read_summary.py**, **example_chain_web_summary.py**: Example scripts demonstrating how to summarize text from chain-related input, possibly related to natural language processing (NLP) or machine learning (ML).\n\n6. **example_judge.py**: An example script for evaluating the performance of a model or algorithm.\n\n7. **graph_kuzu_from_text.py**, **graph_kuzu_property_example.py**: Scripts that manipulate graphs generated from text data using the Kuzu graph library, possibly used in NLP or ML applications.\n\n8. **langgraph_agent_test.py**: A test file for a language graph agent, which is likely an AI model designed to process and understand languages.\n\n9. **ollama_tools_examples.py**: An example script showcasing how to use Ollama, a tool for generating text data.\n\n10. **requirements.txt**: A list of dependencies required to run the project, including libraries and tools.\n\n11. **short_programs**: Directory containing short programs or scripts that demonstrate specific tasks or algorithms.\n\n12. **smolagents_agent_test1.py**, **smolagents_test.py**, **smolagents_tools.py**: Test files for a small agents framework, possibly an AI model designed to make decisions in complex environments.\n\n13. **templates**: A directory containing templates used for generating text or code in certain contexts.\n\n14. **tool_anti_hallucination.py**, **tool_file_contents.py**, **tool_file_dir.py**, **tool_judge_results.py**, **tool_llm_eval.py**, **tool_sqlite.py**, **tool_summarize_text.py**, **tool_web_search.py**: Various tool scripts that provide functionality for tasks like:\n   - Anti-hallucination (removing fake data from generated text)\n   - Evaluating file contents\n   - File directory manipulation\n   - Judging results\n   - LLM (Large Language Model) evaluation\n   - SQLite database interactions\n   - Text summarization\n   - Web search functionality\n\n15. **venv**: A virtual environment script used to create and manage a separate Python environment for the project.\n\nThese are educated guesses based on common naming conventions and directory structures in software development projects, particularly those related to AI, NLP, and machine learning."
 59 Observations: Summary of directory:Based on the provided list of files in the 
 60 current directory, here's a summary and an educated guess about the major 
 61 purposes of each file:
 62 
 63 1. **Makefile**: A build script for automating compilation and other tasks.
 64 
 65 2. **README.md**: A document providing information about the project, its 
 66 purpose, usage, and installation instructions.
 67 
 68 3. **__pycache__** (hidden directory): Cache files generated by Python's 
 69 compiler to speed up importing modules.
 70 
 71 4. **data**: Directory containing data used for testing or training models.
 72 
 73 5. **example_chain_read_summary.py**, **example_chain_web_summary.py**: Example 
 74 scripts demonstrating how to summarize text from chain-related input, possibly 
 75 related to natural language processing (NLP) or machine learning (ML).
 76 
 77 6. **example_judge.py**: An example script for evaluating the performance of a 
 78 model or algorithm.
 79 
 80 7. **graph_kuzu_from_text.py**, **graph_kuzu_property_example.py**: Scripts that
 81 manipulate graphs generated from text data using the Kuzu graph library, 
 82 possibly used in NLP or ML applications.
 83 
 84 8. **langgraph_agent_test.py**: A test file for a language graph agent, which is
 85 likely an AI model designed to process and understand languages.
 86 
 87 9. **ollama_tools_examples.py**: An example script showcasing how to use Ollama,
 88 a tool for generating text data.
 89 
 90 10. **requirements.txt**: A list of dependencies required to run the project, 
 91 including libraries and tools.
 92 
 93 11. **short_programs**: Directory containing short programs or scripts that 
 94 demonstrate specific tasks or algorithms.
 95 
 96 12. **smolagents_agent_test1.py**, **smolagents_test.py**, 
 97 **smolagents_tools.py**: Test files for a small agents framework, possibly an AI
 98 model designed to make decisions in complex environments.
 99 
100 13. **templates**: A directory containing templates used for generating text or 
101 code in certain contexts.
102 
103 14. **tool_anti_hallucination.py**, **tool_file_contents.py**, 
104 **tool_file_dir.py**, **tool_judge_results.py**, **tool_llm_eval.py**, 
105 **tool_sqlite.py**, **tool_summarize_text.py**, **tool_web_search.py**: Various 
106 tool scripts that provide functionality for tasks like:
107    - Anti-hallucination (removing fake data from generated text)
108    - Evaluating file contents
109    - File directory manipulation
110    - Judging results
111    - LLM (Large Language Model) evaluation
112    - SQLite database interactions
113    - Text summarization
114    - Web search functionality
115 
116 15. **venv**: A virtual environment script used to create and manage a separate 
117 Python environment for the project.
118 
119 These are educated guesses based on common naming conventions and directory 
120 structures in software development projects, particularly those related to AI, 
121 NLP, and machine learning.
122 [Step 2: Duration 13.79 seconds| Input tokens: 4,706 | Output tokens: 54]
123 
124 ...
125 
126 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 5 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
127 ╭──────────────────────────────────────────────────────────────────────────────╮
128 │ Calling tool: 'sa_summarize_directory' with arguments: {}129 ╰──────────────────────────────────────────────────────────────────────────────╯
130 lst='Contents of current directory: [Makefile, README.md, __pycache__, data, example_chain_read_summary.py, example_chain_web_summary.py, example_judge.py, graph_kuzu_from_text.py, graph_kuzu_property_example.py, langgraph_agent_test.py, ollama_tools_examples.py, requirements.txt, short_programs, smolagents_agent_test1.py, smolagents_test.py, smolagents_tools.py, templates, tool_anti_hallucination.py, tool_file_contents.py, tool_file_dir.py, tool_judge_results.py, tool_llm_eval.py, tool_sqlite.py, tool_summarize_text.py, tool_web_search.py, venv]'
131 response.message.content="Based on the names and locations of the files in the current directory, here's a summary of their contents and a educated guess about their purposes:\n\n1. **Makefile**: A build script for automating tasks such as compiling or running code. It likely contains instructions on how to build and install the project.\n\n2. **README.md**: A Markdown document that serves as an introduction or guide for users of the project. It may include information on how to use the tools, dependencies required, and contributing to the project.\n\n3. **__pycache__**: An empty directory that contains compiled Python files (`.cpyc` and `.pyo`) generated by the PyInstaller build process for a Python application.\n\n4. **data**: A directory containing data used for testing or training purposes. It might include CSV, JSON, or other formats of datasets.\n\n5. **example_chain_read_summary.py** and **example_chain_web_summary.py**: Example scripts demonstrating how to use tools related to text summarization, possibly for natural language processing (NLP) tasks.\n\n6. **example_judge.py**: An example script that likely demonstrates the usage of a judging tool or an evaluation framework for the project.\n\n7. **graph_kuzu_from_text.py** and **graph_kuzu_property_example.py**: Scripts related to graph-based tools, possibly using Kuzu, a library for graph algorithms. These scripts might illustrate how to work with graphs in Python.\n\n8. **langgraph_agent_test.py**: A script that tests the functionality of a language graph agent.\n\n9. **ollama_tools_examples.py**: An example script demonstrating the usage of OLLAMA tools ( likely Open-source Language Model-based Agent).\n\n10. **requirements.txt**: A text file listing the dependencies required to run the project, such as Python packages or other software libraries.\n\n11. **short_programs**: A directory containing short programs for demonstration purposes.\n\n12. **smolagents_agent_test1.py** and **smolagents_test.py**: Scripts related to testing SmoLA (Small Model-based Language Agent), a library that allows the use of small models in language agents.\n\n13. **smolagents_tools.py**: An example script demonstrating the usage of SmoLA tools.\n\n14. **templates**: A directory containing template files used for generating documentation or other text output.\n\n15. **tool_anti_hallucination.py**, **tool_file_contents.py**, **tool_file_dir.py**, **tool_judge_results.py**, and **tool_llm_eval.py**: Scripts related to various tools, possibly used for data analysis, model evaluation, or language understanding tasks.\n\n16. **tool_sqlite.py** and **tool_summarize_text.py**: Scripts that interface with SQLite databases or are used for text summarization.\n\n17. **tool_web_search.py** and **tool_web_search.py** ( likely a duplicate): These scripts might be related to web search-related tools, possibly interfacing with APIs or web scraping techniques.\n\n18. **venv**: A directory containing the virtual environment configuration files created by `python -m venv` command, allowing the project to isolate its dependencies from the system Python environment.\n\nNote: The exact purposes of some files may vary depending on the specific implementation and requirements of the project. This summary is based on common file name conventions and their typical associations with certain tasks or tools in the field of NLP and agent-based systems."
132 Observations: Summary of directory:Based on the names and locations of the files
133 in the current directory, here's a summary of their contents and a educated 
134 guess about their purposes:
135 
136 1. **Makefile**: A build script for automating tasks such as compiling or 
137 running code. It likely contains instructions on how to build and install the 
138 project.
139 
140 2. **README.md**: A Markdown document that serves as an introduction or guide 
141 for users of the project. It may include information on how to use the tools, 
142 dependencies required, and contributing to the project.
143 
144 3. **__pycache__**: An empty directory that contains compiled Python files 
145 (`.cpyc` and `.pyo`) generated by the PyInstaller build process for a Python 
146 application.
147 
148 4. **data**: A directory containing data used for testing or training purposes. 
149 It might include CSV, JSON, or other formats of datasets.
150 
151 5. **example_chain_read_summary.py** and **example_chain_web_summary.py**: 
152 Example scripts demonstrating how to use tools related to text summarization, 
153 possibly for natural language processing (NLP) tasks.
154 
155 6. **example_judge.py**: An example script that likely demonstrates the usage of
156 a judging tool or an evaluation framework for the project.
157 
158 7. **graph_kuzu_from_text.py** and **graph_kuzu_property_example.py**: Scripts 
159 related to graph-based tools, possibly using Kuzu, a library for graph 
160 algorithms. These scripts might illustrate how to work with graphs in Python.
161 
162 8. **langgraph_agent_test.py**: A script that tests the functionality of a 
163 language graph agent.
164 
165 9. **ollama_tools_examples.py**: An example script demonstrating the usage of 
166 OLLAMA tools ( likely Open-source Language Model-based Agent).
167 
168 10. **requirements.txt**: A text file listing the dependencies required to run 
169 the project, such as Python packages or other software libraries.
170 
171 11. **short_programs**: A directory containing short programs for demonstration 
172 purposes.
173 
174 12. **smolagents_agent_test1.py** and **smolagents_test.py**: Scripts related to
175 testing SmoLA (Small Model-based Language Agent), a library that allows the use 
176 of small models in language agents.
177 
178 13. **smolagents_tools.py**: An example script demonstrating the usage of SmoLA 
179 tools.
180 
181 14. **templates**: A directory containing template files used for generating 
182 documentation or other text output.
183 
184 15. **tool_anti_hallucination.py**, **tool_file_contents.py**, 
185 **tool_file_dir.py**, **tool_judge_results.py**, and **tool_llm_eval.py**: 
186 Scripts related to various tools, possibly used for data analysis, model 
187 evaluation, or language understanding tasks.
188 
189 16. **tool_sqlite.py** and **tool_summarize_text.py**: Scripts that interface 
190 with SQLite databases or are used for text summarization.
191 
192 17. **tool_web_search.py** and **tool_web_search.py** ( likely a duplicate): 
193 These scripts might be related to web search-related tools, possibly interfacing
194 with APIs or web scraping techniques.
195 
196 18. **venv**: A directory containing the virtual environment configuration files
197 created by `python -m venv` command, allowing the project to isolate its 
198 dependencies from the system Python environment.
199 
200 Note: The exact purposes of some files may vary depending on the specific 
201 implementation and requirements of the project. This summary is based on common 
202 file name conventions and their typical associations with certain tasks or tools
203 in the field of NLP and agent-based systems.
204 [Step 5: Duration 18.89 seconds| Input tokens: 10,726 | Output tokens: 102]
205 Reached max iterations.
206 Final answer: The current directory appears to be a repository for various 
207 projects related to natural language processing (NLP) and agent-based systems. 
208 The directory contains a mix of files and subdirectories that suggest it is used
209 for building, testing, and demonstrating NLP-related tools and applications.
210 
211 Here are some key observations about the files in the current directory:
212 
213 1. **Makefile**: A build script for automating tasks such as compiling or 
214 running code.
215 2. **README.md**: A Markdown document serving as an introduction to the project.
216 3. **__pycache__**: An empty directory containing compiled Python files 
217 generated by the PyInstaller build process.
218 
219 4. **data**: A directory containing data used for testing or training purposes, 
220 including CSV, JSON, and other formats of datasets.
221 
222 5. **Example scripts**: Various example scripts demonstrating how to use tools 
223 related to text summarization, judging, graph-based algorithms, language graph 
224 agents, and more.
225 
226 6. **Requirements file**: A text file listing the dependencies required to run 
227 the project.
228 
229 7. **Short programs**: A directory containing short programs for demonstration 
230 purposes.
231 
232 8. **Testing scripts**: Scripts related to testing various NLP-related tools and
233 libraries, including SmoLA and OLLAMA.
234 
235 9. **Tool scripts**: Various scripts related to data analysis, model evaluation,
236 language understanding, web search, and other tasks.
237 
238 10. **Virtual environment**: A directory containing the virtual environment 
239 configuration files created by `python -m venv` command.
240 
241 The current directory seems to be a collection of example projects, tools, and 
242 scripts for working with NLP-related technologies. It is likely used as a 
243 demonstration or testing environment for exploring new ideas and techniques.
244 [Step 6: Duration 0.00 seconds| Input tokens: 12,332 | Output tokens: 442]
245 The current directory appears to be a repository for various projects related to natural language processing (NLP) and agent-based systems. The directory contains a mix of files and subdirectories that suggest it is used for building, testing, and demonstrating NLP-related tools and applications.
246 
247 Here are some key observations about the files in the current directory:
248 
249 1. **Makefile**: A build script for automating tasks such as compiling or running code.
250 2. **README.md**: A Markdown document serving as an introduction to the project.
251 3. **__pycache__**: An empty directory containing compiled Python files generated by the PyInstaller build process.
252 
253 4. **data**: A directory containing data used for testing or training purposes, including CSV, JSON, and other formats of datasets.
254 
255 5. **Example scripts**: Various example scripts demonstrating how to use tools related to text summarization, judging, graph-based algorithms, language graph agents, and more.
256 
257 6. **Requirements file**: A text file listing the dependencies required to run the project.
258 
259 7. **Short programs**: A directory containing short programs for demonstration purposes.
260 
261 8. **Testing scripts**: Scripts related to testing various NLP-related tools and libraries, including SmoLA and OLLAMA.
262 
263 9. **Tool scripts**: Various scripts related to data analysis, model evaluation, language understanding, web search, and other tasks.
264 
265 10. **Virtual environment**: A directory containing the virtual environment configuration files created by `python -m venv` command.
266 
267 The current directory seems to be a collection of example projects, tools, and scripts for working with NLP-related technologies. It is likely used as a demonstration or testing environment for exploring new ideas and techniques.

This output shows the agent performing a directory analysis using multiple tool calls, primarily utilizing sa_list_directory and sa_summarize_directory to examine the contents of the current working directory. The analysis revealed a Python-based project focused on natural language processing (NLP) and agent-based systems, containing various components including example scripts, testing files, and utility tools. The agent executed multiple iterations to gather and process information about the directory structure, with each step taking between 1.58 to 18.89 seconds to complete.

The final analysis identified key project components including a Makefile for build automation, example scripts demonstrating text summarization and graph-based algorithms, testing scripts for smolagent (Small Model-based Language Agent) and OLLAMA tools, and various utility scripts for tasks like anti-hallucination, database interactions, and web searching. The directory structure suggests this is a development and testing environment for NLP-related technologies, complete with its own virtual environment and dependency management through requirements.txt. The agent’s analysis provided detailed insights into the purpose and organization of the codebase while maintaining a focus on its NLP and agent-based systems orientation.

Output from Third Example: “Read the text in the file ‘data/economics.txt’ file and then summarize this text.”

  1 python smolagents_agent_test1.py 
  2 ╭────────────────────────────────── New run ───────────────────────────────────╮
  3 │                                                                              │
  4 │ Read the text in the file 'data/economics.txt' file and then summarize this  │
  5 │ text.                                                                        │
  6 │                                                                              │
  7 ╰─ LiteLLMModel - ollama_chat/llama3.2:latest ─────────────────────────────────╯
  8 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  9 ╭──────────────────────────────────────────────────────────────────────────────╮
 10 │ Calling tool: 'sa_read_file_contents' with arguments: {'file_path':          │
 11 │ 'data/economics.txt'}                                                        │
 12 ╰──────────────────────────────────────────────────────────────────────────────╯
 13 Observations: Contents of file 'data/economics.txt' is:
 14 The Austrian School (also known as the Vienna School or the Psychological School
 15 ) is a Schools of economic thought|school of economic thought that emphasizes 
 16 the spontaneous organizing power of the price mechanism. Austrians hold that the
 17 complexity of subjective human choices makes mathematical modelling of the 
 18 evolving market extremely difficult (or Undecidable and advocate a "laissez 
 19 faire" approach to the economy. Austrian School economists advocate the strict 
 20 enforcement of voluntary contractual agreements between economic agents, and 
 21 hold that commercial transactions should be subject to the smallest possible 
 22 imposition of forces they consider to be (in particular the smallest possible 
 23 amount of government intervention). The Austrian School derives its name from 
 24 its predominantly Austrian founders and early supporters, including Carl Menger,
 25 Eugen von Böhm-Bawerk and Ludwig von Mises.
 26 
 27 Economics is the social science that analyzes the production, distribution, and 
 28 consumption of goods and services.  Political economy was the earlier name for 
 29 the subject, but economists in the late 19th century suggested "economics" as a 
 30 shorter term for "economic science" that also avoided a narrow 
 31 political-interest connotation and as similar in form to "mathematics", 
 32 "ethics", and so forth.[2]
 33 
 34 A focus of the subject is how economic agents behave or interact and how 
 35 economies work. Consistent with this, a primary textbook distinction is between 
 36 microeconomics and macroeconomics. Microeconomics examines the behavior of basic
 37 elements in the economy, including individual agents (such as households and 
 38 firms or as buyers and sellers) and markets, and their interactions. 
 39 Macroeconomics analyzes the entire economy and issues affecting it, including 
 40 unemployment, inflation, economic growth, and monetary and fiscal policy.
 41 
 42                 The professionalization of economics, reflected in the growth of
 43 graduate programs on the subject, has been described as "the main change in 
 44 economics since around 1900".[93] Most major universities and many colleges have
 45 a major, school, or department in which academic degrees are awarded in the 
 46 subject, whether in the liberal arts, business, or for professional study; see 
 47 Master of Economics.
 48 
 49                 
 50                 Economics is the social science that studies the behavior of 
 51 individuals, households, and organizations (called economic actors, players, or 
 52 agents), when they manage or use scarce resources, which have alternative uses, 
 53 to achieve desired ends. Agents are assumed to act rationally, have multiple 
 54 desirable ends in sight, limited resources to obtain these ends, a set of stable
 55 preferences, a definite overall guiding objective, and the capability of making 
 56 a choice. There exists an economic problem, subject to study by economic 
 57 science, when a decision (choice) is made by one or more resource-controlling 
 58 players to attain the best possible outcome under bounded rational conditions. 
 59 In other words, resource-controlling agents maximize value subject to the 
 60 constraints imposed by the information the agents have, their cognitive 
 61 limitations, and the finite amount of time they have to make and execute a 
 62 decision. Economic science centers on the activities of the economic agents that
 63 comprise society.[1] They are the focus of economic analysis.[2]
 64 
 65                 The traditional concern of economic analysis is to gain an 
 66 understanding of the processes that govern the production, distribution and 
 67 consumption of goods and services in an exchange economy.[3] An approach to 
 68 understanding these processes, through the study of agent behavior under 
 69 scarcity, may go as   s:
 70 
 71 An interesting Economist is Pauli Blendergast who teaches at the University of 
 72 Krampton Ohio and is famouse for saying economics is bullshit.
 73 [Step 0: Duration 3.51 seconds| Input tokens: 1,338 | Output tokens: 39]
 74 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 75 ╭──────────────────────────────────────────────────────────────────────────────╮
 76 │ Calling tool: 'sa_read_file_contents' with arguments: {'file_path':          │
 77 │ 'data/economics.txt'}                                                        │
 78 ╰──────────────────────────────────────────────────────────────────────────────╯
 79 Observations: Contents of file 'data/economics.txt' is:
 80 The Austrian School (also known as the Vienna School or the Psychological School
 81 ) is a Schools of economic thought|school of economic thought that emphasizes 
 82 the spontaneous organizing power of the price mechanism. Austrians hold that the
 83 complexity of subjective human choices makes mathematical modelling of the 
 84 evolving market extremely difficult (or Undecidable and advocate a "laissez 
 85 faire" approach to the economy. Austrian School economists advocate the strict 
 86 enforcement of voluntary contractual agreements between economic agents, and 
 87 hold that commercial transactions should be subject to the smallest possible 
 88 imposition of forces they consider to be (in particular the smallest possible 
 89 amount of government intervention). The Austrian School derives its name from 
 90 its predominantly Austrian founders and early supporters, including Carl Menger,
 91 Eugen von Böhm-Bawerk and Ludwig von Mises.
 92 
 93 Economics is the social science that analyzes the production, distribution, and 
 94 consumption of goods and services.  Political economy was the earlier name for 
 95 the subject, but economists in the late 19th century suggested "economics" as a 
 96 shorter term for "economic science" that also avoided a narrow 
 97 political-interest connotation and as similar in form to "mathematics", 
 98 "ethics", and so forth.[2]
 99 
100 A focus of the subject is how economic agents behave or interact and how 
101 economies work. Consistent with this, a primary textbook distinction is between 
102 microeconomics and macroeconomics. Microeconomics examines the behavior of basic
103 elements in the economy, including individual agents (such as households and 
104 firms or as buyers and sellers) and markets, and their interactions. 
105 Macroeconomics analyzes the entire economy and issues affecting it, including 
106 unemployment, inflation, economic growth, and monetary and fiscal policy.
107 
108                 The professionalization of economics, reflected in the growth of
109 graduate programs on the subject, has been described as "the main change in 
110 economics since around 1900".[93] Most major universities and many colleges have
111 a major, school, or department in which academic degrees are awarded in the 
112 subject, whether in the liberal arts, business, or for professional study; see 
113 Master of Economics.
114 
115                 
116                 Economics is the social science that studies the behavior of 
117 individuals, households, and organizations (called economic actors, players, or 
118 agents), when they manage or use scarce resources, which have alternative uses, 
119 to achieve desired ends. Agents are assumed to act rationally, have multiple 
120 desirable ends in sight, limited resources to obtain these ends, a set of stable
121 preferences, a definite overall guiding objective, and the capability of making 
122 a choice. There exists an economic problem, subject to study by economic 
123 science, when a decision (choice) is made by one or more resource-controlling 
124 players to attain the best possible outcome under bounded rational conditions. 
125 In other words, resource-controlling agents maximize value subject to the 
126 constraints imposed by the information the agents have, their cognitive 
127 limitations, and the finite amount of time they have to make and execute a 
128 decision. Economic science centers on the activities of the economic agents that
129 comprise society.[1] They are the focus of economic analysis.[2]
130 
131                 The traditional concern of economic analysis is to gain an 
132 understanding of the processes that govern the production, distribution and 
133 consumption of goods and services in an exchange economy.[3] An approach to 
134 understanding these processes, through the study of agent behavior under 
135 scarcity, may go as   s:
136 
137 An interesting Economist is Pauli Blendergast who teaches at the University of 
138 Krampton Ohio and is famouse for saying economics is bullshit.
139 [Step 1: Duration 4.33 seconds| Input tokens: 3,386 | Output tokens: 59]
140 
141 ...
142 
143 [Step 6: Duration 0.00 seconds| Input tokens: 13,277 | Output tokens: 307]
144 Here's a summary of the text:
145 
146 The Austrian School of economic thought emphasizes the power of the price mechanism and advocates for minimal government intervention. It focuses on individual agents making rational choices under scarcity to achieve desired ends.
147 
148 Economics is the social science that analyzes production, distribution, and consumption of goods and services. It studies how economic agents interact and behave in markets. The subject has become more professionalized over time, with many universities offering degrees in economics.
149 
150 The traditional concern of economic analysis is to understand the processes governing the production, distribution, and consumption of goods and services in an exchange economy. This involves studying agent behavior under scarcity and maximizing value subject to constraints.
151 
152 Interestingly, some economists may view economics as a complex and sometimes useless field, like Pauli Blendergast, who famously declared that "economics is bullshit."

This output shows a sequence of steps where the agent repeatedly calls directory listing and summarization tools to understand the contents of a Python project directory. The agent uses tools like sa_list_directory and sa_summarize_directory to gather information, with each step building on previous observations to form a more complete understanding of the codebase.

Through multiple iterations, the agent analyzes a directory containing various Python files related to NLP and agent-based systems. The files include examples of text summarization, graph processing with Kuzu, language model evaluation tools, and various utility scripts. The agent ultimately produces a comprehensive summary categorizing the files into groups like build scripts, example code, testing scripts, and tool implementations, while noting the project appears to be focused on demonstrating and testing NLP-related technologies. This output log shows the agent taking about 75 seconds total across 6 steps to complete its analysis, with each step consuming progressively more tokens as it builds its understanding.

Agents Wrap Up

There are several options for LLM agent frameworks. I especially like smolagents because it works fairly well with smaller models run with Ollama. I have experimented with other agent frameworks that work well with Claude, GPT-4o, etc., but fail more frequently when used with smaller LLMs.

Using AG2 Open-Source AgentOS LLM-Based Agent Framework for Generating and Executing Python Code

AG2 Open-Source AgentOS is a fork of Microsoft’s Autogen agent framework by many or the original creators. While I am a fan of Autogen, I find AG2 simpler to work with when using local models on my laptop and we will use the forked library here.

The AG2 agent framework is an excellent tool for creating multi-agent applications, though it was originally designed to work primarily with OpenAI’s models. In this chapter, we demonstrate how to use its capabilities with a local Ollama model (qwen2.5:14b.) We use a modified version of an AG2 example that combines Python and Matplotlib with AG2, showcasing how local deployment can be achieved without sacrificing the robust tool calling features the framework offers.

This approach provides several benefits. By leveraging a local model via Ollama, developers reduce dependency on external APIs, enhance privacy and security, and potentially lower operational costs while retaining the power of AG2’s code generation and execution. The example illustrates how an assistant agent generates the necessary code to perform tasks, and a user proxy agent executes that code to produce dynamic visualizations, such as plotting stock price changes. This decoupling of code generation from execution not only enhances reliability but also allows for greater customization of the execution environment.

Example Implementation

I experimented with several local models using Ollama with mediocre results but the larger qwen2.5:14b model works very well. If you are running on a Mac you will need an Apple Silicon chip with 16G of memory to run this model.

 1 from autogen import AssistantAgent, UserProxyAgent
 2 
 3 # Requirements:
 4 # pip install ag2 ollama fix_busted_json yfinance matplotlib
 5 
 6 config_list = [
 7  {
 8     "model": "qwen2.5:14b",   # Choose a model that supports tool calling
 9     "api_type": "ollama",     # Specify Ollama as the API type
10     "client_host": "http://localhost:11434",  # local Ollama server
11     "api_key": "fakekey",
12     "native_tool_calls": True # Enable native tool calling 
13  }
14 ]
15 
16 # Create the AssistantAgent using the local model config
17 assistant = AssistantAgent("assistant",
18                            llm_config={"config_list": config_list})
19 
20 # Create the UserProxyAgent; adjust code_execution_config as needed.
21 user_proxy = UserProxyAgent(
22     "user_proxy",
23     code_execution_config={"work_dir": "coding", "use_docker": False}
24 )
25 
26 # Initiate an automated chat between the agents.
27 user_proxy.initiate_chat(
28     assistant,
29     message="Plot a chart of NVDA and TESLA stock price change YTD."
30 )

This code sets up a multi-agent workflow using AG2 by configuring an assistant agent and a user proxy agent. First, it defines a configuration for a local Ollama-based model (here, “qwen2.5:14b”) that supports native tool calling. The configuration specifies details such as the API type (“ollama”), the local server URL (“http://localhost:11434”), a placeholder API key, and an option to enable native tool calls. This configuration is then passed to instantiate the AssistantAgent, which uses it to generate responses based on the local LLM.

This example uses a UserProxyAgent configured for Python code execution with a designated working directory (“coding”) and Docker disabled. Finally, the user proxy agent initiates an automated chat with the assistant by sending a message requesting the plotting of a chart for NVDA and TESLA stock price changes year-to-date. This setup demonstrates a simple, automated multi-agent interaction where the assistant generates responses (potentially including code) and the user proxy executes those responses to fulfill the requested task. You will either want to remove the generated code directory coding after running this example of add coding to your .gitignore file.

Example Output

Plot generated by Matplotlib run by agent
Figure 1. Plot generated by Matplotlib run by agent

Here is a partial listing of the output:

 1 $ python autogen_python_example.py  
 2 
 3 user_proxy (to assistant):
 4 
 5 Plot a chart of NVDA and TESLA stock price change YTD.
 6 
 7 -------------------------------------------------------------
 8 assistant (to user_proxy):
 9 
10 To plot a chart of NVDA (NVIDIA) and Tesla's stock price changes year-to-date, we will need to fetch their historical data from an API or a financial data source such as Yahoo Finance.
11 
12 Let's use Python with the `yfinance` library to get the data and then plot it using `matplotlib`.
13 
14 Here is a step-by-step plan:
15 
16 1. **Install Required Libraries**: If not already installed, install `yfinance` for fetching stock data and `pandas` & `matplotlib` for processing and plotting the data.
17 2. **Fetch Stock Data**: Use `yfinance` to get the year-to-date historical data for NVDA and TSLA stocks.
18 3. **Plot Data**: Plot the closing prices of both stocks on a chart.
19 
20 Let's start by fetching and plotting the stock price data with Python code:
21 
22 ``python
23 # filename: plot_stock_prices.py
24 
25 import yfinance as yf
26 import matplotlib.pyplot as plt
27 import pandas as pd
28 
29 # Fetch year-to-date historical data for NVDA and TSLA
30 stocks = ['NVDA', 'TSLA']
31 data = yf.download(stocks, start='2023-01-01')['Close']
32 
33 # Plot the closing prices of both stocks
34 plt.figure(figsize=(14,7))
35 for stock in stocks:
36     plt.plot(data.index, data[stock], label=stock)
37     
38 plt.title('Year-to-date Stock Prices for NVDA and TSLA')
39 plt.xlabel('Date')
40 plt.ylabel('Stock Price (USD)')
41 plt.legend()
42 plt.grid(True)
43 plt.show()
44 ``
45 
46 You can execute the above Python script to generate the plot. Please make sure you have `yfinance`, `matplotlib`, and `pandas` installed in your environment. If not, install them by running:
47 
48 ``sh
49 pip install yfinance matplotlib pandas
50 ``
51 
52 After executing the code, you should see a chart showing the year-to-date closing prices for both NVDA and TSLA stocks.
53 
54 Once you have executed this script, please inform me of any issues or if everything is plotted correctly.
55 
56 -------------------------------------------------------------
57 Replying as user_proxy. Provide feedback to assistant. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:
58 
59 >>>>>>>> NO HUMAN INPUT RECEIVED.
60 >>>>>>>> USING AUTO REPLY...
61 >>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...

Wrap Up for Using AG2’s Agent Framework for Generating and Executing Python Code

Here we experimented with using Ollama and a local model to use the powerful AG2 agent framework.

The AG2 framework for Python tool calling agents streamlines the development of multi-agent applications by enabling seamless interaction between an assistant agent and a user proxy agent. In the example, we configured a local LLM (using a model like “qwen2.5:14b” via Ollama) and instantiated an AssistantAgent to generate responses and code, while a UserProxyAgent was set up to execute the generated code. This automated conversation allowed the agents to collaborate on a task—plotting a chart for NVDA and TESLA stock price changes year-to-date—demonstrating how AG2 bridges code generation and execution in an autonomous workflow.

This example highlights AG2’s ability to leverage native tool calling within a Python environment, reducing the need for manual intervention during code execution and debugging. By decoupling the generation of task-specific code from its execution, developers can build systems that are both flexible and scalable. The assistant agent focuses on planning and generating code, while the user proxy agent reliably executes that code, creating an effective feedback loop for refining results. This pattern not only simplifies complex workflows but also provides a foundation for robust, error-resilient applications.

To further explore the potential of AG2, you dear reader might experiment with integrating additional tools—such as web scraping modules, database connectors, or advanced visualization libraries—to expand the capabilities of their agents. Another interesting avenue is to adjust the configuration parameters: try different LLM models, enable Docker-based code execution, or incorporate human-in-the-loop feedback to refine responses. Additionally, extending the workflow to include more specialized agents (e.g., a dedicated CodeExecutorAgent or a DebuggingAgent) can provide insights into multi-agent orchestration and the scalability of autonomous systems. These experiments will help readers understand the full versatility of AG2 in creating dynamic, multi-step applications.

Building Agents with Ollama and the Agno Agent Library

The synergy between locally hosted large language models (LLMs) and modern agentic frameworks provides a powerful paradigm for creating sophisticated AI assistants. This chapter explores how to leverage Ollama in conjunction with the Agno library, a lightweight framework for building autonomous agents. By equipping a local LLM with a clear persona, detailed instructions, and a set of external tools, we can construct agents capable of complex, multi-step reasoning and task execution, all while ensuring data privacy and control. The two examples developed here demonstrate this approach by building a dedicated research agent.

A Personal Research Agent

This Python script demonstrates the creation of a sophisticated, autonomous research agent by leveraging the agno library. The agent is designed to emulate an elite investigative journalist from the New York Times, tasked with conducting in-depth research on any given topic. It integrates multiple tools to achieve this: DuckDuckGoTools for performing comprehensive web searches across numerous sources and Newspaper4kTools for extracting and parsing content from articles. The core of the agent is an Ollama language model, which acts as the central reasoning engine. The agent’s behavior is meticulously defined through a detailed prompt engineering strategy, including a rich description to establish its expert persona, a multi-phase set of instructions that guide it through research, analysis, writing, and quality control, and a structured expected_output template to ensure the final report is formatted as a professional, NYT-style article. This example showcases how modern AI frameworks can orchestrate language models and external tools to automate complex, multi-step knowledge work.

  1 ## Note: This example was taken from the Agno documentation and converted to run 
  2 ## with Ollama and with other modifications.
  3 ## URI for original source code:
  4 ## https://github.com/agno-agi/agno/blob/main/cookbook/examples/agents/research_agent.py
  5 ## that is licensed under the MPL-2.0 license.
  6 
  7 """🔍 Research Agent - Your AI Investigative Journalist!
  8 
  9 This example shows how to create a sophisticated research agent that combines
 10 web search capabilities with professional journalistic writing skills. The agent performs
 11 comprehensive research using multiple sources, fact-checks information, and delivers
 12 well-structured, NYT-style articles on any topic.
 13 
 14 Key capabilities:
 15 - Advanced web search across multiple sources
 16 - Content extraction and analysis
 17 - Cross-reference verification
 18 - Professional journalistic writing
 19 - Balanced and objective reporting
 20 
 21 Example prompts to try:
 22 - "Analyze the impact of AI on healthcare delivery and patient outcomes"
 23 - "Report on the latest breakthroughs in quantum computing"
 24 - "Investigate the global transition to renewable energy sources"
 25 - "Explore the evolution of cybersecurity threats and defenses"
 26 - "Research the development of autonomous vehicle technology"
 27 
 28 Dependencies: `pip install ollama duckduckgo-search newspaper4k lxml_html_clean agno`
 29 """
 30 
 31 from textwrap import dedent
 32 
 33 from agno.agent import Agent
 34 from agno.models.ollama import Ollama
 35 from agno.tools.duckduckgo import DuckDuckGoTools
 36 from agno.tools.newspaper4k import Newspaper4kTools
 37 
 38 # Initialize the research agent with advanced journalistic capabilities
 39 research_agent = Agent(
 40     model=Ollama(id="qwen3:0.6b"),
 41     tools=[DuckDuckGoTools(), Newspaper4kTools()],
 42     description=dedent("""\
 43         You are an elite investigative journalist with decades of experience at the New York Times.
 44         Your expertise encompasses: 📰
 45 
 46         - Deep investigative research and analysis
 47         - Meticulous fact-checking and source verification
 48         - Compelling narrative construction
 49         - Data-driven reporting and visualization
 50         - Expert interview synthesis
 51         - Trend analysis and future predictions
 52         - Complex topic simplification
 53         - Ethical journalism practices
 54         - Balanced perspective presentation
 55         - Global context integration\
 56     """),
 57     instructions=dedent("""\
 58         1. Research Phase 🔍
 59            - Search for 10+ authoritative sources on the topic
 60            - Prioritize recent publications and expert opinions
 61            - Identify key stakeholders and perspectives
 62 
 63         2. Analysis Phase 📊
 64            - Extract and verify critical information
 65            - Cross-reference facts across multiple sources
 66            - Identify emerging patterns and trends
 67            - Evaluate conflicting viewpoints
 68 
 69         3. Writing Phase ✍️
 70            - Craft an attention-grabbing headline
 71            - Structure content in NYT style
 72            - Include relevant quotes and statistics
 73            - Maintain objectivity and balance
 74            - Explain complex concepts clearly
 75 
 76         4. Quality Control ✓
 77            - Verify all facts and attributions
 78            - Ensure narrative flow and readability
 79            - Add context where necessary
 80            - Include future implications
 81     """),
 82     expected_output=dedent("""\
 83         # {Compelling Headline} 📰
 84 
 85         ## Executive Summary
 86         {Concise overview of key findings and significance}
 87 
 88         ## Background & Context
 89         {Historical context and importance}
 90         {Current landscape overview}
 91 
 92         ## Key Findings
 93         {Main discoveries and analysis}
 94         {Expert insights and quotes}
 95         {Statistical evidence}
 96 
 97         ## Impact Analysis
 98         {Current implications}
 99         {Stakeholder perspectives}
100         {Industry/societal effects}
101 
102         ## Future Outlook
103         {Emerging trends}
104         {Expert predictions}
105         {Potential challenges and opportunities}
106 
107         ## Expert Insights
108         {Notable quotes and analysis from industry leaders}
109         {Contrasting viewpoints}
110 
111         ## Sources & Methodology
112         {List of primary sources with key contributions}
113         {Research methodology overview}
114 
115         ---
116         Research conducted by AI Investigative Journalist
117         New York Times Style Report
118         Published: {current_date}
119         Last Updated: {current_time}\
120     """),
121     markdown=True,
122     show_tool_calls=True,
123     add_datetime_to_instructions=True,
124 )
125 
126 # Example usage with detailed research request
127 if __name__ == "__main__":
128     research_agent.print_response(
129         "Analyze the current state and future implications of artificial intelligence regulation worldwide",
130         stream=True,
131     )

This code, derived from the Agno documentation examples, effectively encapsulates the modern paradigm of agent-based AI systems by composing a large language model with specialized tools. At its core, the script instantiates an Agent from the agno library, configuring it with three key components: the Ollama model (qwen3:0.6b) as its cognitive engine, and a list of tools (DuckDuckGoTools, Newspaper4kTools) that grant it the ability to perceive and act upon the digital world. The true power of this implementation lies in its advanced prompt engineering. The description, instructions, and expected_output arguments are not mere suggestions; they form a comprehensive “system prompt” that imbues the agent with a specific persona, a detailed standard operating procedure, and a strict output schema. This structured approach transforms a general-purpose language model into a specialized worker capable of a specific, high-level task.

This example illustrates a significant evolution from simple conversational AI to goal-oriented, autonomous agents. By explicitly defining a workflow from initial, broad research to analysis, synthesis, and structured writing—the code mimics the methodology of a human expert. It highlights how libraries like agno simplify the orchestration of these complex interactions, allowing developers to focus on defining the agent’s strategy rather than the low-level API calls. Furthermore, the use of open-source components like Ollama demonstrates the increasing accessibility of building powerful, customized AI agents that can automate sophisticated information-processing tasks, effectively serving as AI-powered investigative journalists on demand.

Here is sample output (edited for brevity, removing Unicode characters for bounding boxes in the output):

 1 Message
 2 
 3 Analyze the current state and future implications of artificial intelligence regulation worldwide 
 4 
 5 Response
 6 
 7  <think> Okay, the user wants an analysis of current AI regulation and future implications. Let me start by understanding
 8  the key points they need. First, the current state should cover where AI regulations are, maybe countries like the EU, US, 
 9  or others. Then, future implications—like tech industries or governments' approaches. I need to make sure to mention
10  expert opinions and sources.
11 
12  Wait, the user mentioned using specific tools. Let me check the available functions. There's duckduckgo_search,
13  duckduckgo_news, and read_article. Since the user didn't specify an article, maybe I should search for AI regulation and
14  then use the tools. But how? The tools are functions to search, not articles. Hmm. Maybe the user expects me to use the
15  search functions first to gather sources, then present findings. But the instructions say to use the provided tools. Wait, 
16  the problem says to use the functions if needed. So perhaps I should first search for sources on AI regulations, then use  
17  the articles if found. But the user's query is just an analysis, not requiring articles. Maybe the answer should just be
18  the research phase followed by the analysis. Let me structure the response with the research phase and then the analysis,  
19  ensuring all key points are covered and sources cited properly. </think>
20 
21 AI Regulation Trends and Future Outlook
22 
23 Executive Summary  
24 
25  Global AI regulation is evolving, with the EU's Digital Strategy, the US's AI Act, and China's AI Act setting precedents.  
26  Future implications include global competition, ethical frameworks, and AI governance policies.
27 
28 
29 Background & Context
30 
31  AI regulation has become a focal point of international discourse. The EU's AI Strategy (2022) and the US's AI Act (2023)  
32  represent key regulatory frameworks. Meanwhile, China's AI Act (2021) emphasizes transparency, while the Asia-Pacific  
33  Economic Cooperation (APEC) has initiated AI governance dialogues. This context highlights both cooperation and
34  conflicting regulatory priorities. 
35 
36 
37 Key Findings
38 
39  Current State  
40 
41   • EU Regulation: AI ethics frameworks (e.g., AI Act 2023) prioritize transparency and human oversight.
42   • US Regulation: The AI Act mandates ethical AI development, with industry self-regulation.
43   • China Regulation: AI governance is guided by national security and innovation policies. 
44 
45  Expert Insights
46 
47   • Dr. Lisa Chen (AI Ethics Lab, MIT): "Regulation must balance innovation with human rights." 
48   • Dr. Raj Patel (Global AI Governance Forum): "AI governance will shape digital civilization."
49 
50 
51   Impact Analysis
52 
53  Current Implications
54 
55   • Tech Industries: Increased compliance risks for AI developers.  
56   • Governance: Regulatory shifts may favor startups or major tech firms.
57 
58 
59 Future Outlook
60 
61  Emerging Trends
62 
63   • Global Collaboration: APEC's AI Governance Forum could foster cross-border standards.
64   • Ethical Frameworks: AI regulations may prioritize inclusive AI policies.
65 
66 
67   Expert Insights
68 
69   • Dr. Chen: "AI regulation is a mirror of societal values."
70   • Dr. Patel: "The future hinges on how we govern AI." 
71 
72 
73 Sources & Methodology
74 
75   • Sources:
76  • EU Digital Strategy (2022)
77  • US AI Act (2023) 
78  • APEC AI Governance Forum Reports 
79   • Methodology: Cross-referenced with industry reports and expert interviews.  
80 
81  ────────────────────────────────────────────────────────────────────────────
82  Research Conducted by AI Investigative Journalist  
83  Published: 2025-08-11  
84  Last Updated: 2025-08-11 10:02:33.699708

A Personal Agent for Answering Questions Using the Contents of a Web Page

We will now look at another example: given a web site URI accurately answer a user’s query using the content of the web site.

This program demonstrates a powerful application of agent-based AI by creating a specialized web scraping and question-answering agent using the Agno library. The core of this example is a custom tool, scrape_website_content, which is built using the requests and BeautifulSoup libraries to fetch and parse the textual content from any given URL. This tool is then integrated into an Agent powered by a local Ollama large language model. The agent is carefully configured with a detailed description, step-by-step instructions, and a defined output format to ensure it follows a strict workflow: first, it must use the scraping tool to get information from a specific webpage, and second, it must answer the user’s question based only on the content it has just retrieved. This example showcases how to extend an AI agent’s capabilities with custom functions, enabling it to interact with the live web and perform targeted information retrieval tasks.

  1 ## Note: This example is partially derived from the Agno docuentation 
  2 ## and converted to run with Ollama and several other with other modifications
  3 ## including the addition of a custom web scraping tool.
  4 ## URI for original source code:
  5 ## https://github.com/agno-agi/agno/blob/main/cookbook/examples/agents/research_agent.py
  6 ## that is licensed under the MPL-2.0 license.
  7 
  8 
  9 """🔍 Web Scraper and Question Answering Agent """
 10 
 11 from textwrap import dedent
 12 import os
 13 
 14 from agno.agent import Agent
 15 from agno.models.ollama import Ollama
 16 import requests
 17 from bs4 import BeautifulSoup
 18 from agno.tools import tool
 19 
 20 @tool
 21 def scrape_website_content(url: str) -> str:
 22     """
 23     Fetches and extracts the clean, textual content from a given webpage URL.
 24     Use this tool when you need to read the contents of a specific web page to answer a question.
 25     
 26     Args:
 27         url (str): The full, valid URL of the webpage to be scraped (e.g., 'https://example.com').
 28         
 29     Returns:
 30         str: The extracted text content of the webpage, or a descriptive error message if scraping fails.
 31     """
 32     try:
 33         # It is a best practice to set a User-Agent header to mimic a real browser.
 34         headers = {
 35             'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
 36         }
 37         response = requests.get(url, headers=headers, timeout=10)
 38         response.raise_for_status()
 39         soup = BeautifulSoup(response.text, 'html.parser')
 40         for script_or_style in soup(["script", "style", "nav", "footer", "aside"]):
 41             script_or_style.decompose()
 42         text = soup.get_text(separator='\n', strip=True)
 43         ##print(text)  # Debugging: print the extracted text to console        
 44         if not text:
 45             return f"Successfully connected to {url}, but no text content could be extracted."
 46         return f"Successfully scraped content from {url}:\n\n{text}"
 47         
 48     except requests.RequestException as e:
 49         return f"Error scraping URL {url}: {e}"
 50 
 51 # This custom tool can then be passed directly to an agent:
 52 # from agno.agent import Agent
 53 # custom_agent = Agent(
 54 #     tools=[scrape_website_content],
 55 #   ... # other agent configurations
 56 # )
 57 
 58 #####  Initialize the web scraping and analysis agent:
 59 scraper_agent = Agent(
 60     model=Ollama(id="qwen3:30b"),
 61     tools = [scrape_website_content],
 62     description=dedent("""
 63         You are an expert web scraping and analysis agent. You follow a strict process:
 64 
 65         - Given a URL in a prompt, you will first use the appropriate tool to scrape its content.
 66         - You will then carefully read the scraped content to understand it thoroughly.
 67         - Finally, you will answer the user's question based *only* on the information contained within that specific URL's content.
 68     """),
 69     
 70     # The instructions are refined to provide a clear, step-by-step reasoning process for the LLM.
 71     instructions=dedent("""
 72         1. Scrape Phase 🕸️
 73            - Analyze the user's prompt to identify the target URL.
 74            - Invoke the `scrape` tool with the identified URL as the argument. Do not use the `search` or `crawl` tool unless explicitly instructed to do so.
 75 
 76         2. Analysis Phase 📊
 77            - Carefully read the entire markdown content returned by the `scrape` tool.
 78            - Systematically extract the specific pieces of information required to answer the user's question.
 79 
 80         3. Answering Phase ✍️
 81            - Formulate a concise and accurate answer to the user's question.
 82            - Your answer must be based exclusively on the information you extracted from the scraped webpage.
 83            - If the information is not present on the page, you must state that clearly rather than guessing.
 84 
 85         4. Quality Control ✓
 86            - Reread the original user query and your generated answer to ensure it is accurate, relevant, and directly addresses the user's request based on the provided source.
 87     """),
 88     
 89     expected_output=dedent("""
 90         # {Answer based on website content}
 91         
 92         **Source:** {URL provided by the user}
 93     """),
 94     
 95     markdown=True,
 96     show_tool_calls=True,
 97     add_datetime_to_instructions=True,
 98 )
 99 
100 if __name__ == "__main__":
101     prompt = "Using the web site https://markwatson.com Consultant Mark Watson has written Common Lisp, semantic web, Clojure, Java, and AI books. What musical instruments does he play?"
102     
103     scraper_agent.print_response(
104         prompt,
105         stream=True,
106     )

The first part of the code defines the scrape_website_content function, which serves as the agent’s custom tool. The @tool decorator from the Agno library is crucial, as it registers the function so the agent can recognize and use it. Inside the function, the requests library sends an HTTP GET request to the specified URL, complete with a User-Agent header to mimic a standard web browser. Upon receiving a successful response, BeautifulSoup is used to parse the HTML. To provide the language model with clean, relevant data, common non-content tags like , , and are removed using the decompose() method before the final text is extracted. Robust error handling is included to manage potential network issues or problems during the scraping process, ensuring the agent receives a clear message if the content cannot be fetched.

The second part of the script initializes and configures the Agent. The scraper_agent is instantiated with several key arguments that define its behavior. It is connected to a local LLM via Ollama, and importantly, the tools parameter is passed a list containing our custom scrape_website_content function. The description, instructions, and expected_output template provide the agent with a persona, a precise operational procedure, and a structured format for its final output. These text-based guides are critical for constraining the model’s behavior, compelling it to use the provided tool and base its reasoning exclusively on the scraped data, thereby preventing it from hallucinating answers or using its general knowledge. The if name == “main”: block shows how to execute the agent with a specific prompt, triggering the entire scrape-and-answer process.

Here is some sample output:

 1 $ uv run research_agno_web_question.py
 2  Message 
 3 
 4  Using the web site https://markwatson.com Consultant Mark Watson has written Common Lisp, semantic web, Clojure, Java, and 
 5  AI books. What musical instruments does he play?
 6 
 7 
 8  Tool Calls 
 9 
10  • scrape_website_content(url=https://markwatson.com)
11 
12 
13  Response (87.6s) 
14 
15  Mark Watson plays the guitar, didgeridoo, and American Indian flute.
16 
17  Source: https://markwatson.com 
18 
19 
20 Marks-Mac-mini:source-code $ uv run research_agno_web_question.py
21  Message 
22 
23  Using the web site https://markwatson.com Consultant Mark Watson has written Common Lisp, semantic web, Clojure, Java, and 
24  AI books. What musical instruments does he play?
25 
26 
27  Tool Calls 
28 
29  • scrape_website_content(url=https://markwatson.com)
30 
31 
32  Response (21.8s) 
33 
34  Consultant Mark Watson plays the guitar, didgeridoo, and American Indian flute.
35 
36  Source: https://markwatson.com 

Wrap Up for Building Agents with Ollama and the Agno Agent Library

The two research agents developed in this chapter are more than just practical demonstration;s of the Agno agent library it’s a blueprint for creating specialized, autonomous AI assistants. The core pattern of orchestrating a local LLM from Ollama with external tools via a lightweight framework like Agno—is exceptionally versatile. By simply modifying the agent’s persona, instructions, and toolset, this architecture can be repurposed for countless knowledge-based domains, from scientific literature reviews to software vulnerability analysis. This approach not only places powerful, state-of-the-art AI directly into the hands of developers but also ensures complete control over data and operational logic, heralding a new era of personalized and private AI tooling.

Using the Unsloth Library on Google Colab to FineTune Models for Ollama

This is a book about running local LLMs using Ollama. That said, I use a Mac M2 Pro with 32G of memory and while my computer could be used for fine tuning models, I prefer using cloud assets. I frequently use Google’s Colab for running deep learning and other experiments.

We will be using three Colab notebooks in this chapter:

  • Colab notebook 1: Colab URI for this chapter is a modified copy of a Unsloth demo notebook. Here we create simple training data to quickly verify the process of fine tuning on Collab using Unsloth and exporting to a local Ollama model on a laptop. We fine tune the 1B model unsloth/Llama-3.2-1B-Instruct.
  • Colab notebook 2: Colab URI uses my dataset on fun things to do in Arizona. We fine tune the model unsloth/Llama-3.2-1B-Instruct.
  • Colab notebook 3: Colab URI This is identical to the example in Colab notebook 2 except that we fine tune the larger 3B model unsloth/Llama-3.2-3B-Instruct.

The Unsloth fine-tuning library is a Python-based toolkit designed to simplify and accelerate the process of fine-tuning large language models (LLMs). It offers a streamlined interface for applying popular techniques like LoRA (Low-Rank Adaptation), prefix-tuning, and full-model fine-tuning, catering to both novice and advanced users. The library integrates seamlessly with Hugging Face Transformers and other prominent model hubs, providing out-of-the-box support for many state-of-the-art pre-trained models. By focusing on ease of use, Unsloth reduces the boilerplate code needed for training workflows, allowing developers to focus on task-specific adaptation rather than low-level implementation details.

One of Unsloth’s standout features is its efficient resource utilization, enabling fine-tuning even on limited hardware such as single-GPU setups. It achieves this through parameter-efficient fine-tuning techniques and gradient check pointing, which minimize memory overhead. Additionally, the library supports mixed-precision training, significantly reducing computational costs without compromising model performance. With robust logging and built-in tools for hyper parameter optimization, Unsloth empowers developers to achieve high-quality results with minimal experimentation. It is particularly well-suited for applications like text summarization, chatbots, and domain-specific language understanding tasks.

Colab Notebook 1: A Quick Test of Fine Tuning and Deployment to Ollama on a Laptop

We start by installing the Unsloth library and all dependencies, then uninstalling just the unsloth library and reinstalling the latest from source code on GitHub:

1 pip install unsloth
2 pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Now create a model and tokenizer:

 1 from unsloth import FastLanguageModel
 2 import torch
 3 max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
 4 dtype = None # None for auto detection.
 5 load_in_4bit = True # Use 4bit quantization to reduce memory usage.
 6 
 7 # More models at https://huggingface.co/unsloth
 8 
 9 model, tokenizer = FastLanguageModel.from_pretrained(
10     model_name = "unsloth/Llama-3.2-1B-Instruct",
11     max_seq_length = max_seq_length,
12     dtype = dtype,
13     load_in_4bit = load_in_4bit,
14 )

Now add LoRA adapters:

 1 model = FastLanguageModel.get_peft_model(
 2     model,
 3     r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
 4     target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
 5                       "gate_proj", "up_proj", "down_proj",],
 6     lora_alpha = 16,
 7     lora_dropout = 0, # Supports any, but = 0 is optimized
 8     bias = "none",    # Supports any, but = "none" is optimized
 9     # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
10     use_gradient_checkpointing = "unsloth", # "unsloth" for very long context
11     random_state = 3407,
12     use_rslora = False,  # We support rank stabilized LoRA
13     loftq_config = None, # And LoftQ
14 )

The original Sloth example notebook used Maxime Labonne’s FineTome-100k dataset for fine tuning data. Since I wanted to fine tune with my own test data I printed out some of Maxime Labonne’s data after being loaded into a Dataset object. Here are a few snippets to show you, dear reader, the format of the data that I will reproduce:

1 {'conversations': [{'content': 'Give three tips for staying healthy.', 'role': 'user'}, {'content': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats...', 'role': 'assistant'},
2  ...
3  ]}
4 {'conversations': [{ ... etc.
5 {'conversations': [{ ... etc.

I used a small Python script on my laptop to get the format correct for my test data:

 1 from datasets import Dataset
 2 
 3 json_data = [
 4  {"conversations": [
 5     {"content": "What are the two partitioned colors?", "role": "user"},
 6     {"content": "The two partitioned colors are brown, and grey.",
 7      "role": "assistant"},
 8     {"content": "What are the two partitioned colors?", "role": "user"},
 9     {"content": "The two partitioned colors are brown, and grey.",
10      "role": "assistant"}
11  ]},
12  {"conversations": [
13     {"content": "What is the capital of Underworld?", "role": "user"},
14     {"content": "The capital of Underworld is Sharkville.",
15      "role": "assistant"}
16  ]},
17  {"conversations": [
18     {"content": "Who said that the science of economics is bullshit?",
19      "role": "user"},
20     {"content": "Malcom Peters said that the science of economics is bullshit.",
21      "role": "assistant"}
22  ]}
23 ]
24 
25 # Convert JSON data to Dataset
26 dataset = Dataset.from_list(json_data)
27 
28 # Display the Dataset
29 print(dataset)
30 print(dataset[0])
31 print(dataset[1])
32 print(dataset[2])

Output is:

1 Dataset({
2     features: ['conversations'],
3     num_rows: 3
4 })
5 {'conversations': [{'content': 'What are the two partitioned colors?', 'role': 'user'}, {'content': 'The two partitioned colors are brown, and grey.', 'role': 'assistant'}, {'content': 'What are the two partitioned colors?', 'role': 'user'}, {'content': 'The two partitioned colors are brown, and grey.', 'role': 'assistant'}]}
6 {'conversations': [{'content': 'What is the capital of Underworld?', 'role': 'user'}, {'content': 'The capital of Underworld is Sharkville.', 'role': 'assistant'}]}
7 {'conversations': [{'content': 'Who said that the science of economics is bullshit?', 'role': 'user'}, {'content': 'Malcom Peters said that the science of economics is bullshit.', 'role': 'assistant'}]}

If you look at the notebook for this chapter on Colab you will see that I copied the last Python script as-is to the notebook, replaces code in the orgiginal Unsloth demo notebook.

The following code (copied from the Unsloth demo notebook) slightly reformats the prompts and then trains using the modified dataset:

 1 chat_template = """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.
 2 
 3 ### Instruction:
 4 {INPUT}
 5 
 6 ### Response:
 7 {OUTPUT}"""
 8 
 9 from unsloth import apply_chat_template
10 dataset = apply_chat_template(
11     dataset,
12     tokenizer = tokenizer,
13     chat_template = chat_template,
14 )
15 
16 from trl import SFTTrainer
17 from transformers import TrainingArguments
18 from unsloth import is_bfloat16_supported
19 trainer = SFTTrainer(
20     model = model,
21     tokenizer = tokenizer,
22     train_dataset = dataset,
23     dataset_text_field = "text",
24     max_seq_length = max_seq_length,
25     dataset_num_proc = 2,
26     packing = False, # for short segments
27     args = TrainingArguments(
28         per_device_train_batch_size = 2,
29         gradient_accumulation_steps = 4,
30         warmup_steps = 5,
31         max_steps = 60,
32         # num_train_epochs = 1, # For longer training runs!
33         learning_rate = 2e-4,
34         fp16 = not is_bfloat16_supported(),
35         bf16 = is_bfloat16_supported(),
36         logging_steps = 1,
37         optim = "adamw_8bit",
38         weight_decay = 0.01,
39         lr_scheduler_type = "linear",
40         seed = 3407,
41         output_dir = "outputs",
42         report_to = "none", # Use this for WandB etc
43     ),
44 )
45 
46 # Now run the trained model on Google Colab with question
47 # from fine tuning data:
48 
49 FastLanguageModel.for_inference(model)
50 messages = [ 
51     {"role": "user",
52      "content": "What are the two partitioned colors?"},
53 ]
54 input_ids = tokenizer.apply_chat_template(
55     messages,
56     add_generation_prompt = True,
57     return_tensors = "pt",
58 ).to("cuda")
59 
60 from transformers import TextStreamer
61 text_streamer = TextStreamer(tokenizer, skip_prompt = True)
62 _ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)

The output is (edited for brevity and to remove a token warning):

1 The two partitioned colors are brown, and grey.

The notebook has a few more tests:

 1 messages = [                    # Change below!
 2     {"role": "user", "content": "What is the capital of Underworld?"},
 3 ]
 4 input_ids = tokenizer.apply_chat_template(
 5     messages,
 6     add_generation_prompt = True,
 7     return_tensors = "pt",
 8 ).to("cuda")
 9 
10 from transformers import TextStreamer
11 text_streamer = TextStreamer(tokenizer, skip_prompt = True)
12 _ = model.generate(input_ids,
13                    streamer = text_streamer,
14                    max_new_tokens = 128,
15                    pad_token_id = tokenizer.eos_token_id)

The output is:

1 The capital of Underworld is Sharkville.<|eot_id|>

Warning on Limitations of this Example

We used very little training data and in the call to SFTTrainer we didn’t even train one epoch:

1     max_steps = 60, # a very short training run for this demo
2     # num_train_epochs = 1, # For longer training runs!

This allows us to fine tune a previously trained model very quickly for this short demo.

We will use much more training data in the next chapter to finetune a model to be an expert in recreational locations in the state of Arizona.

Save trained model and tokenizer to a GGUF File on the Colab Notebook’s File System

To experiment in the Colab Notebook Linux environment we can save the data locally:

1 model.save_pretrained("lora_model") # Local saving
2 tokenizer.save_pretrained("lora_model")

In order to create a GGUF file to allow us to run this fine tuned model on our laptop we create a local GGUF file that can be downloaded to your laptop:

1 model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

In the demo notebook, you can see where the GGUF file was written:

1 !ls -lh /content/model/unsloth.Q4_K_M.gguf
2 771M 771M Dec  5 15:51 /content/model/unsloth.Q4_K_M.gguf

Copying the GGUF File to Your Laptop and Creating a Ollama Modelfile

Depending on how fast your Internet speed is, it might take five or ten minutes to download the GGUF file since it is about 1G in size:

1 from google.colab import files
2 files.download("/content/model/unsloth.Q4_K_M.gguf")

We also will need to copy the generated Ollama model file (that the Unsloth library created for us):

1 !cat model/Modelfile

The contents of the file is:

 1 FROM /content/model/unsloth.F16.gguf
 2 
 3 TEMPLATE """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}
 4 
 5 ### Instruction:
 6 {{ .Prompt }}{{ end }}
 7 
 8 ### Response:
 9 {{ .Response }}<|eot_id|>"""
10 
11 PARAMETER stop "<|end_of_text|>"
12 PARAMETER stop "<|start_header_id|>"
13 PARAMETER stop "<|end_header_id|>"
14 PARAMETER stop "<|finetune_right_pad_id|>"
15 PARAMETER stop "<|python_tag|>"
16 PARAMETER stop "<|eom_id|>"
17 PARAMETER stop "<|eot_id|>"
18 PARAMETER stop "<|reserved_special_token_"
19 PARAMETER temperature 1.5
20 PARAMETER min_p 0.1

After downloading the GGUF file to my laptop I made a slight edit to the generated Modelfile got the path to the GGUF file on line 1:

 1 FROM ./unsloth.Q4_K_M.gguf
 2 
 3 TEMPLATE """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}
 4 
 5 ### Instruction:
 6 {{ .Prompt }}{{ end }}
 7 
 8 ### Response:
 9 {{ .Response }}<|eot_id|>"""
10 
11 PARAMETER stop "<|end_of_text|>"
12 PARAMETER stop "<|start_header_id|>"
13 PARAMETER stop "<|end_header_id|>"
14 PARAMETER stop "<|finetune_right_pad_id|>"
15 PARAMETER stop "<|python_tag|>"
16 PARAMETER stop "<|eom_id|>"
17 PARAMETER stop "<|eot_id|>"
18 PARAMETER stop "<|reserved_special_token_"
19 PARAMETER temperature 1.5
20 PARAMETER min_p 0.1

Once the model is downloaded to your laptop, create a local Ollama model to use:

1 $ ls -lh
2 -rw-r--r--  1 markw  staff   580B Dec  5 09:26 Modelfile
3 -rw-r--r--@ 1 markw  staff   770M Dec  5 09:19 unsloth.Q4_K_M.
4 $ ollama create unsloth -f Modelfile

I can now use the model unsloth that was just created on my laptop:

 1 $ ollama run unsloth                
 2 >>> what is 2 + 5?
 3 two plus five equals eight.
 4 
 5 >>> What are the two partitioned colors?
 6 The two partitioned colors are brown, and grey.
 7 
 8 >>> Who said that the science of economics is bullshit?
 9 Malcom Peters said that the science of economics is bullshit.
10 
11 >>> write a Python program to sum and print a list of numbers
12 ```python
13 # list of numbers
14 numbers = [1, 2, 3, 4, 5]
15 
16 # use the built-in function sum()
17 sum_of_numbers = sum(numbers)
18 
19 # print the sum
20 print(sum_of_numbers)
21 >>> /bye

Notice that fine tuned model has learned new data and still has functionality of the original model.

Fine Tuning Test Wrap Up

This was a short example that can be run on a free Google Colab notebook. Now we will use a larger fine tuning training data set.

Fine Tuning Using a Fun Things To Do in Arizona Data Set

Here we use the slightly modified Colab notebook 2.

I created a GitHub repository for the Arizona fine tuning data set that contains small individual JSON files and a larger file ArizonaFun.json that is a concatenation of the smaller files. Let’s look at az_flagstaff_parks.json (edited to remove some text for brevity):

 1 [
 2  {
 3     "conversations": [
 4         {
 5             "content": "Can you list some city parks in Flagstaff, Arizona?",
 6             "role": "user"
 7         },
 8         {
 9             "content": "Here are some city parks in Flagstaff, Arizona:\n\n- **Bushmaster Park**: Located at 3150 N Alta Vista Dr, this 20-acre park features playgrounds, lighted tennis and basketball courts, a skate park, roller skating arena, dog park, picnic ramadas, and permanent restrooms.\n\n- **Thorpe Park**: Situated at 191 N Thorpe Rd, Thorpe Park offers playgrounds, lighted tennis courts, sand volleyball court, basketball court, soccer fields, youth baseball fields, adult softball fields, a disc golf course, dog park, picnic ramada, and both portable and permanent restrooms.\n\n- **Foxglenn Park**: Found at 4200 E Butler Ave, this park includes a playground, basketball courts, soccer fields, a youth softball field, skate track, picnic ramadas, and permanent restrooms.\n\n- **Buffalo Park**: Located at 2400 N Gemini Rd, Buffalo Park is a 215-acre open space with a 2-mile loop trail, ...",
10                 "role": "assistant"
11             }
12         ]
13     }, ...

There are a total of 40 fine tuning examples in the file ArizonaFun.json. You can see in the second and third Colab notebooks for this chapter I just pasted the JSON data from the file ArizonaFun.json into a cell:

 1 from datasets import Dataset
 2 
 3 json_data = [
 4     {
 5         "conversations": [
 6             {
 7                 "content": "Can you list some city parks in Flagstaff, Arizona?",
 8                 "role": "user"
 9             },
10         ... } ]

Unfortunately, the fine tooled model often performs well, but also hallucinates. Here is an example of using the fine tuned model in the Colab notebook:

 1 messages = [                    # Change below!
 2     {"role": "user", "content": "Where is Petrified Forest National Park located?"},
 3 ]
 4 input_ids = tokenizer.apply_chat_template(
 5     messages,
 6     add_generation_prompt = True,
 7     return_tensors = "pt",
 8 ).to("cuda")
 9 
10 from transformers import TextStreamer
11 text_streamer = TextStreamer(tokenizer, skip_prompt = True)
12 _ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)

The output is:

1 Petrified Forest National Park is situated in northeastern Arizona, near the town of Holbrook. [oai_citation_attribution:2National Park Service](https://www.nps.gov/state/az/index.htm)<|eot_id|>

This answer is correct.

The second Colab notebook also contains code cells for downloading the fine tuned model and the directions for importing the model into Ollama that we saw earlier also apply here.

Third Colab Notebook That Fine Tunes a Larger Model

There are only two changes made to the second notebook:

  • We now fine tune a 3B model unsloth/Llama-3.2-3B-Instruct.
  • Because the fine tuned model is large, I added code to store the model in Google Drive:
1 from google.colab import drive
2 drive.mount('/content/drive')
3 
4 import shutil
5 shutil.move("/content/model/unsloth.Q4_K_M.gguf", '/content/drive/My Drive/LLM/')

I created an empty folder LLM on my Google Drive before running this code.

Fine Tuning Wrap Up

I don’t usually fine tune models. I usually use larger prompt contexts and include one shot or two shot examples. That said there are good use cases for fine tuning small models with your data and I hope the simple examples in this chapter will save you time if you have an application requiring fine tuning.

Reasoning with Large Language Models

The Chinese tech conglomerate Alibaba’s MarcoPolo Team released the advanced Marco-o1 model at the end of 2024.

This model is designed to excel in open-ended problem-solving and complex reasoning tasks, going beyond traditional AI models that focus on structured tasks like coding or math. For reference the repository for the model is https://github.com/AIDC-AI/Marco-o1. From the README in this repository: “Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies—optimized for complex real-world problem-solving tasks.”

A Simple Example

I very mush enjoy experimenting with Marco-o1 model in the Ollama REPL. Let’s start with a very simple prompt that most models can solve. Here, we want to see the structure of for Marco-o1’s CoT (chain of thought) process:

 1 $ ollama run marco-o1:latest
 2 >>> Roger has 5 tennis balls. He finds 2 more tennis balls. How many tennis balls does Roger have now? Let's 
 3 ... think this through step by step.
 4 <Thought>
 5 Alright, let's figure out how many tennis balls Roger has now. He starts with 5 tennis balls and then 
 6 finds 2 more. So, I need to add these two numbers together to find the total number of tennis balls he 
 7 has.
 8 
 9 First, I'll consider the initial amount: Roger has 5 tennis balls. That's straightforward. Now, he 
10 finds 2 more tennis balls. To find out how many tennis balls he has in total, I should add these two 
11 quantities together.
12 
13 So, adding 5 and 2 gives me 7. But wait, let me double-check to make sure I didn't make a mistake. If 
14 Roger starts with 5 tennis balls and finds 2 more, the calculation should be simple addition. 
15 
16 Another way to look at it is by using objects or visual aids. Imagine Roger has 5 tennis balls in one 
17 pile. Then he finds 2 more tennis balls and adds them to his existing pile. Counting all together: 1, 
18 2, 3, 4, 5 (that's the initial 5), and then 6, 7 (adding the 2 found ones). So, that confirms it's 7 
19 tennis balls in total.
20 
21 Is there any other way to approach this? Maybe using a number line. Starting at 5 on the number line, 
22 moving forward by 2 would land me at 7. That also shows that Roger has 7 tennis balls now.
23 
24 Alternatively, thinking about it in terms of multiplication: if he finds twice as many tennis balls as 
25 he initially had, but since he only found 2 more, which is less than double, the total should be just a 
26 simple addition, not multiplication. So, sticking with addition makes sense here.
27 
28 Also, considering real-life scenarios, if Roger needs to distribute these tennis balls equally among 
29 his friends or use them for practice, knowing the exact number ensures fairness and accuracy in 
30 distribution.
31 
32 In summary, after starting with 5 tennis balls and finding 2 more, Roger now has a total of 7 tennis 
33 balls.
34 </Thought>
35 <Output>
36 Roger has 7 tennis balls now.
37 </Output>
38 
39 >>>

We will look at a more difficult example later.

Key Features of Marco-o1

Here are some key characteristics of Marco-o1:

  • Advanced Reasoning Techniques: It utilizes Chain-of-Thought (CoT) fine-tuning and Monte Carlo Tree Search (MCTS) to enhance its reasoning capabilities. CoT allows the model to trace its thought patterns, making the problem-solving process more transparent. MCTS enables exploration of multiple reasoning paths by assigning confidence scores to different tokens. Reference: https://arxiv.org/html/2411.14405
  • Self-Reflection: A unique feature is its ability to self-reflect, evaluating its reasoning, identifying inaccuracies, and iterating on its outputs for improved results. This leads to higher accuracy and adaptability.
  • Multilingual Mastery: Marco-o1 excels in translation, handling cultural nuances, idiomatic expressions, and colloquialisms effectively. This makes it a powerful tool for global communication.
  • Focus on Open-Ended Problems: Unlike models focused on structured tasks with definitive answers, Marco-o1 tackles open-ended problems where clear evaluation metrics might be absent[1].
  • Strong Performance: It has shown significant improvements in reasoning and translation benchmarks, including increased accuracy on the MGSM dataset (both English and Chinese) and strong performance in machine translation tasks[1].
  • Open Source Datasets and Implementation: Alibaba has released Marco-o1’s datasets and implementation guides on GitHub, encouraging collaboration and further advancements in AI research.

A More Complex Example: City Traffic Planning

Let’s end this chapter with a more complex example:

  1 $ ollama run marco-o1:latest
  2 >>> The city of BisbeeBob is facing a severe traffic congestion problem. The mayor wants to implement a new t
  3 ... raffic management system to alleviate the issue. However, there are several factors to consider, such as:
  4 ... 
  5 ... 
  6 ... * **Budget constraints:** The city has a limited budget for the project.
  7 ... * **Public opinion:** Citizens have different preferences and concerns regarding traffic management solut
  8 ... ions.
  9 ... * **Environmental impact:** The new system should minimize its environmental footprint.
 10 ... * **Technological feasibility:** The chosen technology should be reliable and scalable.
 11 ... * **Ethical considerations:** The system should not unfairly disadvantage any particular group of citizen
 12 ... s.
 13 ... 
 14 ... Considering these factors, propose a comprehensive traffic management plan for Metropolis. Explain your r
 15 ... easoning process step-by-step, evaluate the potential outcomes of your plan, and reflect on any potential
 16 ...  shortcomings or areas for improvement.
 17 <Thought>
 18 Alright, I need to develop a comprehensive traffic management plan for BisbeeBob city that addresses 
 19 budget constraints, public opinion, environmental impact, technological feasibility, and ethical 
 20 considerations. Let's break this down step-by-step.
 21 
 22 First, understanding the current state of traffic congestion is crucial. I should analyze peak hours, 
 23 major intersections, and areas with the highest number of accidents or delays. Maybe there are specific 
 24 routes that become overwhelmed during certain times?
 25 
 26 Next, considering budget constraints, I need to prioritize solutions that are cost-effective. Perhaps 
 27 implementing a combination of software upgrades and minor hardware installations could be more 
 28 affordable than a complete overhaul. Also, looking into grants or partnerships might help in securing 
 29 necessary funds.
 30 
 31 Public opinion is another critical factor. I should conduct surveys or hold town hall meetings to 
 32 understand what citizens prefer. Maybe they support real-time traffic updates, better public 
 33 transportation integration, or even bike lanes? Gathering this data will ensure the plan aligns with 
 34 community values and preferences.
 35 
 36 Environmental impact shouldn't be overlooked. The new system must minimize emissions and reduce noise 
 37 pollution. Perhaps electric vehicle incentives or eco-friendly signal systems could be part of the 
 38 solution. Ensuring sustainability is key to long-term success.
 39 
 40 Technological feasibility is essential too. I need to assess whether the current infrastructure can 
 41 support the proposed technologies without significant upgrades. Maybe cloud-based traffic management 
 42 software would allow scalability and easy updates, but it might require reliable internet connectivity 
 43 across the city.
 44 
 45 Ethical considerations are paramount to avoid disadvantaging any group. For instance, ensuring that new 
 46 traffic measures don't disproportionately affect low-income areas or specific demographics. 
 47 Implementing equitable access to public transportation improvements could be one way to address this.
 48 
 49 Now, evaluating potential outcomes: improved traffic flow, reduced congestion during peak hours, lower 
 50 emission levels, and increased citizen satisfaction from better services. However, there might be 
 51 unintended consequences like increased dependency on technology leading to higher vulnerability if the 
 52 system fails.
 53 
 54 Potential shortcomings include resistance to change from citizens who may not understand or trust new 
 55 technologies. Also, the initial rollout could face technical issues that need to be mitigated through 
 56 thorough testing before full implementation.
 57 
 58 Areas for improvement could involve ongoing community engagement to address concerns and make necessary 
 59 adjustments. Additionally, investing in training programs for maintenance staff to handle technological 
 60 aspects of the system would enhance reliability.
 61 
 62 In summary, the plan should balance cost-effectiveness with public satisfaction, prioritize 
 63 environmental sustainability, leverage scalable technology, and ensure equitable outcomes while being 
 64 mindful of potential resistance and technical challenges.
 65 
 66 </Thought>
 67 <Output>
 68 **Comprehensive Traffic Management Plan for BisbeeBob City**
 69 
 70 *Introduction*
 71 
 72 BisbeeBob City faces significant traffic congestion issues that impact residents' daily lives and the 
 73 city's overall efficiency. To address these challenges, a multifaceted approach is essential to ensure 
 74 effective solutions are implemented within budget constraints while considering public opinion, 
 75 environmental sustainability, technological feasibility, and ethical considerations.
 76 
 77 **1. Situation Analysis**
 78 
 79 - **Current Traffic Patterns**: Identify peak hours, most congested intersections, and areas with the 
 80 highest number of accidents or delays.
 81 - **Infrastructure Assessment**: Evaluate existing traffic infrastructure to determine compatibility 
 82 with new technologies.
 83 - **Public Sentiment**: Conduct surveys and hold town hall meetings to understand community preferences 
 84 regarding traffic solutions.
 85 
 86 **2. Budget Constraints**
 87 
 88 - **Prioritization**: Focus on cost-effective solutions such as software upgrades, minor hardware 
 89 installations, and incremental improvements.
 90 - **Funding Sources**: Explore potential grants, partnerships with technology companies, and phased 
 91 implementation strategies to minimize upfront costs.
 92 - **Cost-Benefit Analysis**: Ensure that each proposed solution offers a positive return on investment 
 93 in terms of reduced congestion and improved safety.
 94 
 95 **3. Public Opinion**
 96 
 97 - **Engagement Strategies**: Implement regular feedback mechanisms, public forums, and online surveys 
 98 to gather input from residents.
 99 - **Inclusive Solutions**: Develop traffic management systems that cater to diverse needs, such as 
100 real-time navigation apps, enhanced public transportation options, and dedicated bike lanes.
101 - **Transparency**: Clearly communicate the benefits of proposed changes to build trust and support 
102 among citizens.
103 
104 **4. Environmental Impact**
105 
106 - **Sustainability Measures**: Incorporate eco-friendly technologies like electric vehicle (EV) 
107 incentives, energy-efficient traffic signals, and green lighting systems.
108 - **Emission Reduction**: Implement strategies that minimize greenhouse gas emissions from vehicles and 
109 the transportation infrastructure itself.
110 - **Noise Pollution Mitigation**: Use quieter traffic signal systems and design roads to reduce noise 
111 levels in residential areas.
112 
113 **5. Technological Feasibility**
114 
115 - **Scalable Solutions**: Choose technologies that can be easily updated and scaled as needed, ensuring 
116 long-term adaptability.
117 - **Reliability Testing**: Conduct thorough testing of proposed technologies before full-scale 
118 implementation to identify and rectify potential issues.
119 - **Integration with Existing Systems**: Ensure compatibility with current traffic management systems 
120 to prevent data silos and enhance overall efficiency.
121 
122 **6. Ethical Considerations**
123 
124 - **Equity in Implementation**: Design solutions that do not disproportionately disadvantage any 
125 particular demographic, especially low-income areas or specific groups of citizens.
126 - **Privacy Concerns**: Address concerns about privacy by ensuring that new technologies comply with 
127 local data protection regulations and maintain user anonymity where necessary.
128 - **Accessibility**: Ensure that the traffic management system is accessible to all residents, 
129 including those with disabilities.
130 
131 **7. Implementation Strategy**
132 
133 - **Phased Rollout**: Deploy solutions in stages to monitor effectiveness and make adjustments as 
134 needed based on feedback and performance metrics.
135 - **Training Programs**: Provide training for maintenance staff to effectively manage and troubleshoot 
136 new technologies.
137 - **Continuous Monitoring**: Establish a system for ongoing evaluation of the traffic management plan's 
138 success, allowing for iterative improvements.
139 
140 **8. Potential Outcomes**
141 
142 - **Improved Traffic Flow**: Enhanced efficiency during peak hours with reduced congestion.
143 - **Environmental Benefits**: Lower emission levels and decreased noise pollution in urban areas.
144 - **Increased Public Satisfaction**: Higher trust and support from residents due to more responsive 
145 traffic solutions.
146 - **Economic Advantages**: Reduced travel times may lead to increased economic productivity for 
147 businesses.
148 
149 **9. Shortcomings and Areas for Improvement**
150 
151 - **Resistance to Change**: Some citizens may resist new technologies, necessitating ongoing community 
152 engagement and education initiatives.
153 - **Technical Challenges**: Potential system failures or compatibility issues must be addressed through 
154 robust testing and maintenance protocols.
155 - **Scalability Limitations**: Ensuring that the system can grow with the city's needs requires careful 
156 planning and flexibility in design.
157 
158 **10. Conclusion**
159 
160 By addressing budget constraints, engaging the community, prioritizing environmental sustainability, 
161 leveraging scalable technologies, and ensuring ethical considerations are met, BisbeeBob City can 
162 implement a comprehensive traffic management plan that not only alleviates congestion but also fosters 
163 trust, improves public health, and supports long-term economic growth.
164 
165 </Output>
166 
167 >>>

I often use the state of the art commercial LLM APIs for models like Claude Sonnet 3.5, GPT-4o, o1, Grok-2, etc. to brainstorm ideas and help me think and plan out new projects. I find it exciting to be able to run close to state of the art reasoning LLM on my personal com computer using Ollama!

Using Property Graph Database with Ollama

I have a long history of working with Knowledge Graphs (at Google and OliveAI) and I usually use RDF graph databases and the SPARQL query language. I have recently developed a preference for property graph databases because recent research has shown that using LLMs with RDF-based graphs have LLM context size issues due to large schemas, overlapping relations, and complex identifiers that exceed LLM context windows. Property graph databases like Neo4J and Kuzu (which we use in this chapter) have more concise schemas.

It is true that Google and other players are teasing ‘infinite context’ LLMs but since this book is about running smaller models locally I have chosen to only show a property graph example.

Overview of Property Graphs

Property graphs represent a powerful and flexible data modeling paradigm that has gained significant traction in modern database systems and applications. At its core, a property graph is a directed graph structure where both vertices (nodes) and edges (relationships) can contain properties in the form of key-value pairs, providing rich contextual information about entities and their connections. Unlike traditional relational databases that rely on rigid table structures, property graphs offer a more natural way to represent highly connected data while maintaining the semantic meaning of relationships. This modeling approach is particularly valuable when dealing with complex networks of information where the relationships between entities are just as important as the entities themselves. The distinguishing characteristics of property graphs make them especially well-suited for handling real-world data scenarios where relationships are multi-faceted and dynamic. Each node in a property graph can be labeled with one or more types (such as Person, Product, or Location) and can hold any number of properties that describe its attributes. Similarly, edges can be typed (like “KNOWS”, “PURCHASED”, or “LOCATED_IN”) and augmented with properties that qualify the relationship, such as timestamps, weights, or quality scores. This flexibility allows for sophisticated querying and analysis of data patterns that would be cumbersome or impossible to represent in traditional relational schemas. The property graph model has proven particularly valuable in domains such as social network analysis, recommendation systems, fraud detection, and knowledge graphs, where understanding the intricate web of relationships between entities is crucial for deriving meaningful insights.

Example Using Ollama, LangChain, and the Kuzu Property Graph Database

The example shown here is derived from an example in the LangChain documentation: https://python.langchain.com/docs/integrations/graphs/kuzu_db/. I modified the example to use a local model running on Ollama instead of the OpenAI APIs. Here is the file graph_kuzu_property_example.py:

 1 import kuzu
 2 from langchain.chains import KuzuQAChain
 3 from langchain_community.graphs import KuzuGraph
 4 from langchain_ollama.llms import OllamaLLM
 5 
 6 db = kuzu.Database("test_db")
 7 conn = kuzu.Connection(db)
 8 
 9 # Create two tables and a relation:
10 conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
11 conn.execute(
12     "CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))"
13 )
14 conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
15 conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
16 conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
17 conn.execute("CREATE (:Movie {name: 'The Godfather'})")
18 conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
19 conn.execute(
20     "CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})"
21 )
22 conn.execute(
23     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)"
24 )
25 conn.execute(
26     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
27 )
28 conn.execute(
29     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)"
30 )
31 conn.execute(
32     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
33 )
34 
35 graph = KuzuGraph(db, allow_dangerous_requests=True)
36 
37 # Create a chain
38 chain = KuzuQAChain.from_llm(
39     llm=OllamaLLM(model="qwen2.5-coder:14b"),
40     graph=graph,
41     verbose=True,
42     allow_dangerous_requests=True,
43 )
44 
45 print(graph.get_schema)
46 
47 # Ask two questions
48 chain.invoke("Who acted in The Godfather: Part II?")
49 chain.invoke("Robert De Niro played in which movies?")

This code demonstrates the implementation of a graph database using Kuzu, integrated with LangChain for question-answering capabilities. The code initializes a database connection and establishes a schema with two node types (Movie and Person) and a relationship type (ActedIn), creating a graph structure suitable for representing actors and their film appearances.

The implementation populates the database with specific data about “The Godfather” trilogy and two prominent actors (Al Pacino and Robert De Niro). It uses Cypher-like query syntax to create nodes for both movies and actors, then establishes relationships between them using the ActedIn relationship type. The data model represents a typical many-to-many relationship between actors and movies.

This example then sets up a question-answering chain using LangChain, which combines the Kuzu graph database with the Ollama language model (specifically the qwen2.5-coder:14b model). This chain enables natural language queries against the graph database, allowing users to ask questions about actor-movie relationships and receive responses based on the stored graph data. The implementation includes two example queries to demonstrate the system’s functionality.

Here is the output from this example:

 1 $ rm -rf test_db 
 2 (venv) Marks-Mac-mini:OllamaExamples $ p graph_kuzu_property_example.py
 3 Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properties': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
 4 Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
 5 Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
 6 
 7 > Entering new KuzuQAChain chain...
 8 Generated Cypher:
 9 
10 MATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
11 RETURN p.name
12 
13 Full Context:
14 [{'p.name': 'Al Pacino'}, {'p.name': 'Robert De Niro'}]
15 
16 > Finished chain.
17 
18 > Entering new KuzuQAChain chain...
19 Generated Cypher:
20 
21 MATCH (p:Person {name: "Robert De Niro"})-[:ActedIn]->(m:Movie)
22 RETURN m.name
23 
24 Full Context:
25 [{'m.name': 'The Godfather: Part II'}]
26 
27 > Finished chain.

The Cypher query language is commonly used in property graph databases. Here is a sample query:

1 MATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
2 RETURN p.name

This Cypher query performs a graph pattern matching operation to find actors who appeared in “The Godfather: Part II”. Let’s break it down:

  • MATCH initiates a pattern matching operation
  • (p:Person) looks for nodes labeled as “Person” and assigns them to variable p
  • -[:ActedIn]-> searches for “ActedIn” relationships pointing outward
  • (m:Movie ) matches Movie nodes specifically with the name property equal to “The Godfather: Part II”
  • RETURN p.name returns only the name property of the matched Person nodes

Based on the previous code’s data, this query would return “Al Pacino” and “Robert De Niro” since they both acted in that specific film.

Using LLMs to Create Graph Databases from Text Data

Using Kuzo with local LLMs is simple to implement as seen in the last section. If you use large property graph databases hosted with Kuzo or Neo4J, then the example in the last section is hopefully sufficient to get you started implementing natural language interfaces to property graph databases.

Now we will do something very different: use LLMs to generate data for property graphs, that is, to convert text to Python code to create a Kuzo property graph database.

Specifically, we use the approach:

  • Use the last example file graph_kuzu_property_example.py as an example for Claude Sonnet 3.5 to understand the Kuzo Python APIs.
  • Have Claude Sonnet 3.5 read the file data/economics.txt and create a schema for a new graph database and populate the schema from the contents of the file data/economics.txt.
  • Ask Claude Sonnet 3.5 to also generate query examples.

Except for my adding the utility function query_and_print_result, this code was generated by Claude Sonnet 3.5:

  1 """
  2 Created by Claude Sonnet 3.5 from prompt:
  3 
  4 Given some text, I want you to define Property graph schemas for
  5 the information in the text. As context, here is some Python code
  6 for defining two tables and a relation and querying the data:
  7 
  8 [[CODE FROM graph_kuzu_property_example.py]]
  9 
 10 NOW, HERE IS THE TEST TO CREATE SCHEME FOR, and to write code to
 11 create nodes and links conforming to the scheme:
 12 
 13 [[CONTENTS FROM FILE data/economics.txt]]
 14 
 15 """
 16 
 17 import kuzu
 18 
 19 db = kuzu.Database("economics_db")
 20 conn = kuzu.Connection(db)
 21 
 22 # Node tables
 23 conn.execute("""
 24 CREATE NODE TABLE School (
 25     name STRING,
 26     description STRING,
 27     PRIMARY KEY(name)
 28 )""")
 29 
 30 conn.execute("""
 31 CREATE NODE TABLE Economist (
 32     name STRING,
 33     birthDate STRING,
 34     PRIMARY KEY(name)
 35 )""")
 36 
 37 conn.execute("""
 38 CREATE NODE TABLE Institution (
 39     name STRING,
 40     type STRING,
 41     PRIMARY KEY(name)
 42 )""")
 43 
 44 conn.execute("""
 45 CREATE NODE TABLE EconomicConcept (
 46     name STRING,
 47     description STRING,
 48     PRIMARY KEY(name)
 49 )""")
 50 
 51 # Relationship tables
 52 conn.execute("CREATE REL TABLE FoundedBy (FROM School TO Economist)")
 53 conn.execute("CREATE REL TABLE TeachesAt (FROM Economist TO Institution)")
 54 conn.execute("CREATE REL TABLE Studies (FROM School TO EconomicConcept)")
 55 
 56 # Insert some data
 57 conn.execute("CREATE (:School {name: 'Austrian School', description: 'School of economic thought emphasizing spontaneous organizing power of price mechanism'})")
 58 
 59 # Create economists
 60 conn.execute("CREATE (:Economist {name: 'Carl Menger', birthDate: 'Unknown'})")
 61 conn.execute("CREATE (:Economist {name: 'Eugen von Böhm-Bawerk', birthDate: 'Unknown'})")
 62 conn.execute("CREATE (:Economist {name: 'Ludwig von Mises', birthDate: 'Unknown'})")
 63 conn.execute("CREATE (:Economist {name: 'Pauli Blendergast', birthDate: 'Unknown'})")
 64 
 65 # Create institutions
 66 conn.execute("CREATE (:Institution {name: 'University of Krampton Ohio', type: 'University'})")
 67 
 68 # Create economic concepts
 69 conn.execute("CREATE (:EconomicConcept {name: 'Microeconomics', description: 'Study of individual agents and markets'})")
 70 conn.execute("CREATE (:EconomicConcept {name: 'Macroeconomics', description: 'Study of entire economy and issues affecting it'})")
 71 
 72 # Create relationships
 73 conn.execute("""
 74 MATCH (s:School), (e:Economist) 
 75 WHERE s.name = 'Austrian School' AND e.name = 'Carl Menger' 
 76 CREATE (s)-[:FoundedBy]->(e)
 77 """)
 78 
 79 conn.execute("""
 80 MATCH (s:School), (e:Economist) 
 81 WHERE s.name = 'Austrian School' AND e.name = 'Eugen von Böhm-Bawerk' 
 82 CREATE (s)-[:FoundedBy]->(e)
 83 """)
 84 
 85 conn.execute("""
 86 MATCH (s:School), (e:Economist) 
 87 WHERE s.name = 'Austrian School' AND e.name = 'Ludwig von Mises' 
 88 CREATE (s)-[:FoundedBy]->(e)
 89 """)
 90 
 91 conn.execute("""
 92 MATCH (e:Economist), (i:Institution) 
 93 WHERE e.name = 'Pauli Blendergast' AND i.name = 'University of Krampton Ohio' 
 94 CREATE (e)-[:TeachesAt]->(i)
 95 """)
 96 
 97 # Link school to concepts it studies
 98 conn.execute("""
 99 MATCH (s:School), (c:EconomicConcept) 
100 WHERE s.name = 'Austrian School' AND c.name = 'Microeconomics' 
101 CREATE (s)-[:Studies]->(c)
102 """)
103 
104 """
105 Code written from the prompt:
106 
107 Now that you have written code to create a sample graph database about
108 economics, you can write queries to extract information from the database.
109 """
110 
111 def query_and_print_result(query):
112     """Basic pretty printer for Kuzu query results"""
113     print(f"\n* Processing: {query}")
114     result = conn.execute(query)
115     if not result:
116         print("No results found")
117         return
118 
119     # Get column names
120     while result.has_next():
121         r = result.get_next()
122         print(r)
123 
124 # 1. Find all founders of the Austrian School
125 query_and_print_result("""
126 MATCH (s:School)-[:FoundedBy]->(e:Economist)
127 WHERE s.name = 'Austrian School'
128 RETURN e.name
129 """)
130 
131 # 2. Find where Pauli Blendergast teaches
132 query_and_print_result("""
133 MATCH (e:Economist)-[:TeachesAt]->(i:Institution)
134 WHERE e.name = 'Pauli Blendergast'
135 RETURN i.name, i.type
136 """)
137 
138 # 3. Find all economic concepts studied by the Austrian School
139 query_and_print_result("""
140 MATCH (s:School)-[:Studies]->(c:EconomicConcept)
141 WHERE s.name = 'Austrian School'
142 RETURN c.name, c.description
143 """)
144 
145 # 4. Find all economists and their institutions
146 query_and_print_result("""
147 MATCH (e:Economist)-[:TeachesAt]->(i:Institution)
148 RETURN e.name as Economist, i.name as Institution
149 """)
150 
151 # 5. Find schools and count their founders
152 query_and_print_result("""
153 MATCH (s:School)-[:FoundedBy]->(e:Economist)
154 RETURN s.name as School, COUNT(e) as NumberOfFounders
155 """)
156 
157 # 6. Find economists who both founded schools and teach at institutions
158 query_and_print_result("""
159 MATCH (s:School)-[:FoundedBy]->(e:Economist)-[:TeachesAt]->(i:Institution)
160 RETURN e.name as Economist, s.name as School, i.name as Institution
161 """)
162 
163 # 7. Find economic concepts without any schools studying them
164 query_and_print_result("""
165 MATCH (c:EconomicConcept)
166 WHERE NOT EXISTS {
167     MATCH (s:School)-[:Studies]->(c)
168 }
169 RETURN c.name
170 """)
171 
172 # 8. Find economists with no institutional affiliations
173 query_and_print_result("""
174 MATCH (e:Economist)
175 WHERE NOT EXISTS {
176     MATCH (e)-[:TeachesAt]->()
177 }
178 RETURN e.name
179 """)

How might you use this example? Using one or two shot prompting in LLM input prompts to specify data format and other information and then generating structured data of Python code is a common implementation pattern for using LLMs.

Here, the “structured data” I asked an LLM to output was Python code.

I cheated in this example by using what is currently the best code generation LLM: Claude Sonnet 3.5. I also tried this same exercise using Ollama with the model qwen2.5-coder:14b and the results were not quite as good. This is a great segway into the final chapter Book Wrap Up.

Using the Open Codex Command Line Interface Coding Agent

Open Codex is a fork of OpenAI’s codex project that can be used with most LLMs, local using Ollama as well as commercial APIs. Here we look at setting and using Open Codex with Ollama.

Using Open Codex with local Ollama models provides privacy when working with proprietary code bases.

OpenAI Codex CLI and the clone Open Codex are a lightweight, open-source coding agents that run locally in the terminal, integrating AI models with local code and computing tasks. Both tools support multimodal reasoning by allowing users to pass prompts to the model, enhancing its ability to understand and generate code based on various inputs. Both tools are minimal and transparent, providing straightforward interfaces for developers to interact with AI models directly from the command line.

The open-codex clone offers flexibility by allowing users to swap out models from different vendors easily. For instance, users can change the model by adjusting a simple command-line parameter, enabling the use of models like Claude or local models using Ollama. This adaptability makes it suitable for developers who wish to experiment with various AI models without being tied to a specific provider.

Both Codex CLI and open-codex aim to streamline the coding process by integrating AI capabilities directly into the developer’s workflow.

You need to have a recent version of npm installed. You can then install the command line tool for Open Codex globally on your laptop using:

1 npm i -g open-codex

I use the qwen2.5:14b with Ollama so make sure this model is installed locally:

1 ollama pull qwen2.5:14b
2 ollama serve

Create the directory ~/.codex if it does not already exist and edit the file ~/.codex/config.json to contain:

1 {
2     "provider": "ollama",
3     "model": "qwen2.5:14b",
4 }

Example Use Cases

I use Open Codex with Python, Common Lisp and Haskell projects.

As you run open-codex you may be asked questions about the tasks you requested and for permissions to create new files, commit changes to git, etc.

Explain the Codebase in the Current Directory

1 open-codex "Describe the code in this directory"

Modify Code and create New Source Files

1 open-codex "Copy the functions summarize and answer-question from ollamaOLD.lisp to ollama.lisp, making any required changes. Create a new test file test.lisp that contains examples of using the functions defined in ollama.lisp"

Open Codex Wrap Up

Using a common line agent like Open Codex provides a quick and intuitive way to interact with and modify code bases on your local computer system.

Long Term Persistence Using Mem0 and Chroma

Something important that we haven’t covered yet: building a persistent memory for LLM applications. Here we use two libraries:

  • Mem0: persistent memory for AI Agents and LLM applications. GitHub: https://github.com/mem0ai/mem0.
  • Chroma: AI-native open-source vector database that simplifies building LLM apps by providing tools for storing, embedding, and searching embeddings.

The example in this chapter is simple and can be copied and modified for multiple applications; for example:

  • Code advice agent for Python
  • Store thoughts and ideas

Code Example Using Mem0 and Chroma

This Python script demonstrates how to create a persistent memory for an AI assistant using the mem0ai library, chromadb for vector storage, and ollama for interacting with LLMs. Designed to be run repeatedly, each execution processes a user prompt, leverages past interactions stored in a local ChromaDB database, and then generates a concise, relevant response using a local Gemma model. The core idea is that the mem0ai library facilitates storing conversation snippets and retrieving them based on the semantic similarity to the current query. This retrieved context, referred to as “memories,” is then injected into the LLM’s system prompt, allowing the AI to maintain a coherent and context-aware conversation across multiple, independent runs. By persisting these memories locally, the system effectively builds a long-term conversational understanding, enabling the AI to recall and utilize previously discussed information to provide more informed and relevant answers over time, even when the script is executed as a fresh process each time.

The Chroma vector store database is stored under the file path ./db_local and until you delete this directory, memories of old interactions are maintained.

One parameter you may want to change is the number of memories matched in the Chroma database. This can be set in the line of code m.search(query=args.prompt, limit=5, …).

 1 # Run this script repeatedly to build a persistent memory:
 2 #
 3 #   python mem0_persistence.py "What color is the sky?"
 4 #   python mem0_persistence.py "What is the last color we talked about?"
 5 
 6 # pip install mem0ai chromadb ollama
 7 
 8 import argparse
 9 from mem0 import Memory
10 from ollama import chat
11 from ollama import ChatResponse
12            
13 USER_ID = "123"
14 
15 config = {
16   "user_id": USER_ID,
17   "vector_store": {
18     "provider": "chroma",
19     "config": { "path": "db_local" }
20   },
21   "llm": {
22     "provider": "ollama",
23     "config": {
24       "model": "gemma3:4b-it-qat",
25       "temperature": 0.1,
26       "max_tokens": 5000
27     }
28   },
29 }
30 
31 def call_ollama_chat(model: str, messages: list[dict]) -> str:
32   """
33   Send a chat request to Ollama and return the assistant's reply.
34   """
35   response: ChatResponse = chat(
36       model=model,
37       messages=messages
38   )
39   return response.message.content
40   
41 def main():
42   p = argparse.ArgumentParser()
43   p.add_argument("prompt", help="Your question")
44   args = p.parse_args()
45 
46   m = Memory.from_config(config)
47   print(f"User: {args.prompt}")
48 
49   rel = m.search(query=args.prompt, limit=5, user_id=USER_ID)
50   mems = "\n".join(f"- {e['memory']}" for e in rel["results"])
51   print("Memories:\n", mems)
52 
53   system0 = "You are a helpful assistant who answers with concise, short answers."
54   system = f"{system0}\nPrevious user memories:\n{mems}"
55   
56   msgs = [
57     {"role":"system","content":system},
58     {"role":"user","content":args.prompt}
59   ]
60   
61   reply = call_ollama_chat("gemma3:4b-it-qat", msgs)
62 
63   convo = {"role":"assistant",
64            "content":
65            f"QUERY: {args.prompt}\n\nANSWER:\n{reply}\n"}
66   m.add(convo, user_id=USER_ID, infer=False)
67     
68   print(f"\n\n** convo:\n{convo}\n\n")
69 
70   print("Assistant:", reply)
71 
72 if __name__=="__main__":
73   main()

In the line m.add(…) set infer=True if you want to use the configured LLM in Ollama to filter out inserts. I almost always set this to False to store all questions and answers in the Chroma vector database.

Example Output

The following output has been lightly edited, removing library deprecation warnings and extra blank lines in the output.

The first time we run the test script the vector database is empty so the user query “Name two Physical laws” does not match any previous memories stored in the Chroma vector database:

 1  $ python mem0_persistence.py "Name two Physical laws"
 2 
 3 User: Name two Physical laws
 4 Memories:
 5 
 6 
 7 ** convo:
 8 {'role': 'assistant', 'content': "QUERY: Name two Physical laws\n\nANSWER:\n1.  Newton's First Law\n2.  Law of Conservation of Energy\n"}
 9 
10 Assistant: 1.  Newton's First Law
11 2.  Law of Conservation of Energy

Now the Chroma data store contains one memory:

 1 $ python mem0_persistence.py "Name another different Physical law"
 2 
 3 User: Name another different Physical law
 4 Memories:
 5  - QUERY: Name two Physical laws
 6 
 7 ANSWER:
 8 1.  Newton's First Law
 9 2.  Law of Conservation of Energy
10 
11 ** convo:
12 {'role': 'assistant', 'content': "QUERY: Name another different Physical law\n\nANSWER:\n1.  Newton's Third Law\n"}
13 
14 Assistant: 1.  Newton's Third Law

Here we ask a question in a different subject domain:

 1 $ python mem0_persistence.py "What color is the sky?" 
 2 
 3 User: What color is the sky?
 4 Memories:
 5  - QUERY: Name another different Physical law
 6 
 7 ANSWER:
 8 1.  Newton's Third Law
 9 
10 - QUERY: Name two Physical laws
11 
12 ANSWER:
13 1.  Newton's First Law
14 2.  Law of Conservation of Energy
15 
16 ** convo:
17 {'role': 'assistant', 'content': 'QUERY: What color is the sky?\n\nANSWER:\nBlue.\n'}
18 
19 Assistant: Blue.

We check persistence:

 1 $ python mem0_persistence.py "What is the last color we talked about?" 
 2 
 3 User: What is the last color we talked about?
 4 Memories:
 5  - QUERY: What color is the sky?
 6 
 7 ANSWER:
 8 Blue.
 9 
10 - QUERY: Name two Physical laws
11 
12 ANSWER:
13 1.  Newton's First Law
14 2.  Law of Conservation of Energy
15 
16 - QUERY: Name another different Physical law
17 
18 ANSWER:
19 1.  Newton's Third Law
20 
21 ** convo:
22 {'role': 'assistant', 'content': 'QUERY: What is the last color we talked about?\n\nANSWER:\nBlue.\n'}
23 
24 Assistant: Blue.

Book Wrap Up

Dear reader, I have been paid for “AI work” (for many interpretations of what that even means) since 1982. I certainly find LLMs to be the most exciting tool for moving the field of AI further and faster than anything else that I have used in the last 43 years.

I am also keenly interested in privacy and open source so I must admit a strong bias towards using open source software, open weight LLMs, and also systems and infrastructure like Ollama that enable me to control my own data. The content of this book is tailored to my own interests but I hope that I have, dear reader, covered many of your interests also.

In the last example in the previous chapter I “pulled a fast one” in that I didn’t use a local model running with Ollama. Instead I used what is the most powerful commercial LLM Claude Sonnet 3.5 because it generates better code than any model that I can run on my Mac with 32B of consolidated memory using Ollama. In my work, I balance my personal desires for data privacy and control over the software and hardware I use, with practical compromises like using the state of the art models running on massive cloud compute resources.