LM Studio In Action: Building Safe, Private AI with LLMs, Function Calling and Agents with MCP
LM Studio In Action: Building Safe, Private AI with LLMs, Function Calling and Agents with MCP
Mark Watson
Buy on Leanpub

Preface

LM Studio allows you to run LLMs locally on your own computer. The LM Studio app is not open source but it is free to use for personal use and/or internal business purposes. I suggest, dear reader, that you read their terms of service. You can also bookmark LM Studio’s online documentation.

I have been a paid AI Practitioner and Researcher since 1982. Since 2013 I have worked on AI-related projects at Google, Capital One, and five startups. You can read more about me at https://markwatson.com.

Both the Python example programs for this book and the manuscript files are contained in this GitHub repository: https://github.com/mark-watson/LM_Studio_BOOK.

Setup LM Studio

This book serves as your comprehensive guide to LM Studio, a powerful desktop application designed for developing and experimenting with Large Language Models (LLMs) locally on your computer. Building on your foundational knowledge of running local models, perhaps from reading the book “Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents” (read free online: https://leanpub.com/ollama/read for the alternative local LLM tool Ollama, this book will delve specifically into how LM Studio empowers you to leverage your computer’s CPU and, optionally, its GPU, to run openly available LLMs such as Llama 3.3, Phi-4, and Gemma 3. LM Studio provides a familiar chat interface and robust search and download functionality via Hugging Face, making it incredibly intuitive to get started. It supports running LLMs using llama.cpp on Mac, Windows, and Linux, and additionally supports Apple’s MLX on Apple Silicon Macs, ensuring broad compatibility.

To begin our journey with LM Studio, the process is straightforward: first, install the latest version of the application for your operating system from https://lmstudio.ai/download. Once installed, you will download your preferred LLM directly within LM Studio from the Discover tab, choosing from curated options or searching for specific models. I will often use the model google/gemma-3n-e4b in this book because is is a small yet highly effective model.

After downloading the app, install with the “developer option” and then install at least one model, the next step is to load the model into your computer’s memory via the Chat tab, a process that allocates the necessary memory for the model’s weights and parameters. LM Studio also enhances your interaction by allowing you to chat with documents entirely offline, a feature known as “RAG” (Retrieval Augmented Generation), enabling completely private and local document interaction.

For streamlined workflows, LM Studio introduces Presets, which allow you to bundle system prompts and other inference parameters like Temperature, Top P, or Max Tokens into reusable configurations, and you can even set Per-model Defaults for load settings such as GPU offload or context size.

A significant focus of this book will be on leveraging LM Studio as a local API endpoint for your applications and scripts. LM Studio provides a REST API that allows you to interact with your local models programmatically. You’ll learn how to utilize both the OpenAI Compatibility API and the newer LM Studio REST API (beta), enabling seamless integration with existing tools and new development. The LM Studio product offers dedicated client libraries, including lmstudio-python and lmstudio-js. This book will particularly emphasize Python client code examples, guiding you through building custom applications that harness the power of your locally running LLMs.

Beyond core functionality, LM Studio offers advanced capabilities that we will explore. These include features like Structured Output, Tools and Function Calling, and Speculative Decoding, which can significantly enhance your LLM interactions.

Additionally, with the introduction of Model Context Protocol (MCP) Host capabilities in LM Studio 0.3.17, you can connect MCP servers to your app, enabling advanced functions like model and dataset search. It is crucial to be aware that some MCP servers can run arbitrary code, access local files, and use your network connection, so caution is advised when installing MCPs from untrusted sources. Throughout your journey, remember that the LM Studio community on Discord is a valuable resource for support, sharing knowledge, and discussing everything from models to hardware.

By the end of this book, you will not only be proficient in running and managing various LLMs locally within LM Studio’s intuitive environment but also expert in integrating these powerful models into your own applications via its versatile API.

Advantages of Running Local LLMs

A main theme of this book are the advantages of running models privately on either your personal computer or a computer at work. While many commercial LLM API venders like OpenAI, Google, and Anthropic may have options to not reuse your prompt data and the output generated from your prompts to train their systems, there is no better privacy and security than running open weight models on your own hardware. There have been many tech news articles warning that often commercial LLM API venders store your data even after you ask for it to be deleted.

After a short tutorial on running the LM Studio application interactively, this book is largely about running Large Language Models (LLMs) on your own hardware using LM Studio.

To be clear, dear reader, although I have a strong preference to running smaller LLMs on my own hardware, I also frequently use commercial LLM API vendors like Anthropic, OpenAI, ABACUS.AI, GROQ, and Google to take advantage of features like advanced models and scalability using cloud-based hardware.

About the Author

I am an AI practitioner and consultant specializing in large language models, LangChain/Llama-Index integrations, deep learning, and the semantic web. I have authored over 20 books on topics including artificial intelligence, Python, Common Lisp, deep learning, Haskell, Clojure, Java, Ruby, the Hy language, and the semantic web. I have 55 U.S. patents. Please check out my home page and social media: my personal web site https://markwatson.com, X/Twitter, my Blog on Blogspot, and my Blog on Substack

Requests from the Author

This book will always be available to read free online at https://leanpub.com/LMstudio/read.

That said, I appreciate it when readers purchase my books because the income enables me to spend more time writing.

Hire the Author as a Consultant

I am available for short consulting projects. Please see https://markwatson.com.

Why Should We Care About Privacy?

Running local models can enhance privacy when dealing with sensitive data. Let’s delve into why privacy is crucial and how LM Studio contributes to improved security.

Why is privacy important?

Privacy is paramount for several reasons:

  • Protection from Data Breaches: When data is processed by third-party services, it becomes vulnerable to potential data breaches. Storing and processing data locally minimizes this risk significantly. This is especially critical for sensitive information like personal details, financial records, or proprietary business data.
  • Compliance with Regulations: Many industries are subject to stringent data privacy regulations, such as GDPR, HIPAA, and CCPA. Running models locally can help organizations maintain compliance by ensuring data remains under their control.
  • Maintaining Confidentiality: For certain applications, like handling legal documents or medical records, maintaining confidentiality is of utmost importance. Local processing ensures that sensitive data isn’t exposed to external parties.
  • Data Ownership and Control: Individuals and organizations have a right to control their own data. Local models empower users to maintain ownership and make informed decisions about how their data is used and shared.
  • Preventing Misuse: By keeping data local, you reduce the risk of it being misused by third parties for unintended purposes, such as targeted advertising, profiling, or even malicious activities.

Introduction to Using the LM Studio Application

Figure 1: An example of the LM Studio user interface, showing the chat tab.

This chapter introduces you to LM Studio, a powerful desktop application designed specifically for developing and experimenting with Large Language Models (LLMs) directly on your local computer. Building upon your understanding of running local models, perhaps from prior experience with tools like Ollama, LM Studio offers a streamlined environment to interact with openly available LLMs. It allows you to leverage your computer’s CPU and, optionally, its GPU to run models such as Llama 3.1, Phi-3, and Gemma 2. LM Studio is characterized by its familiar chat interface and robust search and download functionality via Hugging Face, making it intuitive for new users to get started. The application supports running LLMs using llama.cpp on Mac, Windows, and Linux, and additionally supports Apple’s MLX on Apple Silicon Macs, ensuring wide compatibility across different systems.

To begin your journey with LM Studio, the initial steps are straightforward. First, you need to install the latest version of LM Studio for your specific operating system by downloading an installer from the Downloads page. LM Studio is available for macOS, Windows, and Linux, and generally supports Apple Silicon Macs, x64/ARM64 Windows PCs, and x64 Linux PCs.

Once installed, the process for running an LLM like Llama, Phi, or DeepSeek R1 on your computer involves three key steps:

  • Download an LLM: Head to the Discover tab within LM Studio to download model weights. You can choose from curated options or search for specific models.
  • Load a model to memory: Navigate to the Chat tab, open the model loader (quickly accessible via cmd + L on macOS or ctrl + L on Windows/Linux), and select a downloaded model. Loading a model typically means allocating memory to accommodate the model’s weights and other parameters in your computer’s RAM.
  • Chat! Once the model is loaded, you can initiate a back-and-forth conversation with the model in the Chat tab.
An example of the LM Studio user interface, showing the chat tab.
Figure 1. An example of the LM Studio user interface, showing the chat tab.

LM Studio offers several features that enhance your local LLM experience. A notable capability is the ability to chat with documents entirely offline on your computer, a feature known as “RAG” (Retrieval Augmented Generation), which allows for completely private and local document interaction. For managing various configurations and use cases, LM Studio introduces Presets. These allow you to bundle a system prompt and other inference parameters, such as Temperature, Top P, or Max Tokens, into a single, reusable configuration. You can save these settings as named presets to easily switch between different use cases like reasoning or creative writing. Presets can be imported from files or URLs, and you can even publish your own to share via the LM Studio Hub. Additionally, you can set Per-model Defaults for load settings like GPU offload or context size for individual models via the My Models tab, which will be applied whenever that model is loaded.

Beyond basic chat and configuration, LM Studio incorporates advanced functionalities to further empower your interactions with LLMs. These include support for Structured Output, Tools and Function Calling, and Speculative Decoding, all of which can significantly enhance the sophistication of your LLM applications. Furthermore, starting with LM Studio 0.3.17, the application functions as a Model Context Protocol (MCP) Host, allowing you to connect MCP servers. These servers can provide advanced functions, such as model and dataset search, as exemplified by the Hugging Face MCP Server. However, it is crucial to exercise caution when installing MCPs from untrusted sources, as some MCP servers have the potential to run arbitrary code, access your local files, and use your network connection.

Using the LM Studio Command Line Interface (CLI)

While the LM Studio UI application is convenient to use for chatting, using LM Studio as a RAG system, etc., the command line interface is also useful because a command line interface (CLI) is often a much faster way to get work done.

You can refer to the official documentation https://lmstudio.ai/docs/cli. Here we will look at a few examples:

lms ls

 1 $ lms ls
 2 
 3 You have 6 models, taking up 42.23 GB of disk space.
 4 
 5 LLMs (Large Language Models)        PARAMS ARCHITECTURE  SIZE
 6 qwen3moe                                                 13.29 GB
 7 qwen3-30b-a3b-instruct-2507-mlx            qwen3_moe     17.19 GB
 8 qwen/qwen3-30b-a3b-2507                    qwen3_moe     17.19 GB
 9 google/gemma-3n-e4b                        gemma3n        5.86 GB  ✓ LOADED
10 liquid/lfm2-1.2b                           lfm2           1.25 GB
11 
12 Embedding Models                   PARAMS      ARCHITECTURE          SIZE
13 text-embedding-nomic-embed-text-v1.5           Nomic BERT        84.11 MB

lms load <model_key>

A model key is the first item displayed on an output line when you run llm ls.

1 $ lms load google/gemma-3n-e4b 
2 
3 Loading model "google/gemma-3n-e4b"...
4 Model loaded successfully in 13.59s. (5.86 GB)
5 To use the model in the API/SDK, use the identifier "google/gemma-3n-e4b:2".
6 To set a custom identifier, use the --identifier <identifier> option.

lms unload

lms unload takes an optional <model_key>. If you don’t specify a model key then you will be shown a list of loaded models and you can interactively unload models:

1 $ lms unload
2 
3 ! Use the arrow keys to navigate, type to filter, and press enter to select.
4 ! To unload all models, use the --all flag.
5 
6 ? Select a model to unload | Type to filter...
7    qwen3-30b-a3b-instruct-2507-mlx
8 ❯  google/gemma-3n-e4b  

lms get

lms get supports searching for models on Huggingface by name and interactively downloading them. Here is an example:

 1 $ lms get llama-3.2 --mlx --gguf --limit 6
 2 Searching for models with the term llama-3.2
 3 No exact match found. Please choose a model from the list below.
 4 
 5 ! Use the arrow keys to navigate, and press enter to select.
 6 
 7 ? Select a model to download (Use arrow keys)
 8 ❯ [Staff Pick] Hermes 3 Llama 3.2 3B 
 9   [Staff Pick] Llama 3.2 1B Instruct 4bit 
10   [Staff Pick] Llama 3.2 3B Instruct 4bit 
11   [Staff Pick] Llama 3.2 1B 
12   [Staff Pick] Llama 3.2 3B 
13   DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF 

Server Status and Control

 1 $ lms server status
 2 The server is running on port 1234.
 3 Marks-Mac-mini:api_introduction $ lms server stop
 4 Stopped the server on port 1234.
 5 Marks-Mac-mini:api_introduction $ lms server start
 6 Starting server...
 7 Success! Server is now running on port 1234
 8 Marks-Mac-mini:api_introduction $ lms ps
 9 
10    LOADED MODELS   
11 
12 Identifier: google/gemma-3n-e4b
13   • Type:  LLM 
14   • Path: google/gemma-3n-e4b
15   • Size: 5.86 GB
16   • Architecture: gemma3n

Introduction to LM Studio’s Local Inference API

Dear reader, we will start with two simple examples, one using the OpenAI AI compatibility features of LM Studio and the other using the Python lmstudio package. First, make sure you hit the green icon on the left menu area:

Enable The API in “developer’s mode using the slider in the upper left corner of the app. You should see “Status: Running”
Figure 2. Enable The API in “developer’s mode using the slider in the upper left corner of the app. You should see “Status: Running”

When you installed LM Studio you were asked if you wanted “developer mode” enabled. That prompt during installation can be a bit misleading. You haven’t been locked out of any features.

“Developer Mode” in LM Studio is simply a UI setting that you can toggle at any time. It’s not a permanent choice made during installation.

Here’s how to enable it:

Look at the very bottom of the LM Studio window. You will see three buttons: User, Power User, and Developer.

Just click on Developer to switch to that mode. This will expose all the advanced configuration options and developer-focused features throughout the application, including more detailed settings in the Local Server tab.

When using the LM Studio inference APIs using Python scripts, you can’t set the model, or even load a model. Instead, you must use the LM Studio application UI to choose and manually load a model. For Python applications I work on that require switching between different models, I don’t use LM Studio, rather I then use Ollama (read my Ollama book online).

For the examples in this chapter I manually selected and loaded the small but very capable model google/gemma-3n-e4b.

Using the Python OpenAI Compatibility APIs

You can find the Python script examples for this book in the GitHub repository https://github.com/mark-watson/LM_Studio_BOOK in the src directory. The example we now use is in the file src/api_introduction/openai_cmpatibility.py:

 1 from openai import OpenAI
 2 
 3 # --- Configuration ---
 4 # Point the client to your local LM Studio server
 5 # The default base_url is "http://localhost:1234/v1"
 6 # You can leave the api_key as a placeholder; it's not required for local servers.
 7 client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
 8 
 9 # --- Main Execution ---
10 def get_local_llm_response():
11     """
12     Sends a request to the local LLM and prints the response.
13     """
14     # Before running this, make sure you have:
15     # 1. Downloaded and installed LM Studio.
16     # 2. Downloaded a model from the LM Studio hub.
17     # 3. In the "Local Server" tab (the '<->' icon), selected your model
18     #    at the top and clicked "Start Server".
19 
20     # The "model" parameter should be a placeholder, as the model is
21     # selected and loaded in the LM Studio UI. The server will use
22     # whichever model is currently loaded.
23     try:
24         completion = client.chat.completions.create(
25             model="local-model",  # This field is ignored by LM Studio
26                                   #but is required by the API.
27             messages=[
28                 {"role": "system", "content": "You are a helpful AI assistant."},
29                 {"role": "user", "content": "What is the capital of France?"}
30             ],
31             temperature=0.7,
32         )
33 
34         # Extracting and printing the response content
35         response_message = completion.choices[0].message.content
36         print("\nResponse from local model:")
37         print(response_message)
38 
39     except Exception as e:
40         print(f"\nAn error occurred:")
41         print(f"It's likely the LM Studio server is not running or the model is not loaded.")
42         print(f"Please ensure the server is active and a model is selected.")
43         print(f"Error details: {e}")
44 
45 
46 if __name__ == "__main__":
47     print("--- Local LLM Interaction via OpenAI Compatibility ---")
48     get_local_llm_response()

This Python script uses the official openai library to connect to a local AI model running in LM Studio, not OpenAI’s servers.

It sends the question “What is the capital of France?” to your local model and prints its response to the console. The key is the base_url=“http://localhost:1234/v1” line, which redirects the API request to the LM Studio server.

In the next chapter we will cover “tool use” which also referred to as “function calling” (i.e., we write Python functions, configure API calls to inform a model the names and required arguments for tools/functions).

If you don’t enable LM Server’s inference API, you will see an error like:

1 $ uv run openai_cmpatibility.py
2 --- Local LLM Interaction via OpenAI Compatibility ---
3 
4 An error occurred:
5 It's likely the LM Studio server is not running or the model is not loaded.
6 Please ensure the server is active and a model is selected.
7 Error details: Connection error.

Note that I use uv as a Python package manager and to run scripts. The examples also have standard Python requirements.txt files that you can alternatively use with pip and python3.

When you have the server inference running on LM Studio you should see output like this:

1 $ uv run openai_cmpatibility.py
2 --- Local LLM Interaction via OpenAI Compatibility ---
3 
4 Response from local model:
5 The capital of France is **Paris**. 

Using the Python lmstudio Package

Here is a simple example that assumes the server is running and a model is loaded:

1 import lmstudio as lms
2 model = lms.llm()
3 print(model.respond("Sally is 77, Bill is 32, and Alex is 44 years old. Pairwise, what are their age differences? Print results in JSON format. Be concise and only provide a correct answer, no need to think about different correct answers."))

Here is sample output:

 1 $ uv run lmstudio_simple.py
 2 <think>
 3 To determine the age differences between Sally, Bill, and Alex, I will list their ages first. Sally is 77 years old, Bill is 32 years old, and Alex is 44 years old.
 4 
 5 Next, I'll calculate the age difference between each pair:
 6 
 7 1. **Sally and Bill**: Subtract Bill's age from Sally's age.
 8    - 77 (Sally) - 32 (Bill) = 45 years.
 9 
10 2. **Sally and Alex**: Subtract Alex's age from Sally's age.
11    - 77 (Sally) - 44 (Alex) = 33 years.
12 
13 3. **Bill and Alex**: Subtract Bill's age from Alex's age.
14    - 44 (Alex) - 32 (Bill) = 12 years.
15 
16 Finally, I'll format the results in JSON as specified: a key for each pair of names with their respective age difference.
17 </think>
18 
19 ``json
20 {
21   "Sally and Bill": 45,
22   "Sally and Alex": 33,
23   "Bill and Alex": 12
24 }
25 ``

Here is a more complex example that demonstrates how to pass multiple messages and a custom system prompt:

 1 import lmstudio as lms
 2 
 3 # --- Main Execution ---
 4 def get_llm_response_with_sdk(prompt):
 5     """
 6     Loads a model and gets a response using the lmstudio-python SDK.
 7     """
 8     # Before running this, make sure you have:
 9     # 1. Downloaded and installed LM Studio.
10     # 2. Started the LM Studio application. The SDK communicates directly
11     #    with the running application; you don't need to manually start the server.
12 
13     try:
14         # Load a model by its repository ID from the Hugging Face Hub.
15         # The SDK will communicate with LM Studio to use the model.
16         # If the model isn't downloaded, LM Studio might handle that,
17         # but it's best to have it downloaded first.
18         #
19         # Replace this with the identifier of a model you have downloaded.
20         # e.g., "gemma-2-9b-it-gguf"
21         print("Loading model...")
22         model = lms.llm("google/gemma-3n-e4b")
23 
24         # Send a prompt to the loaded model.
25         print("Sending prompt to the model...")
26         response = model.respond(
27           {"messages":
28             [
29                 {"role": "system", "content": "You are a helpful AI assistant."},
30                 {"role": "user", "content": prompt},
31             ]
32           }
33         )
34 
35         # The 'response' object contains the full API response.
36         # The text content is in response.text
37         return response
38 
39     except Exception as e:
40         print(f"\nAn error occurred:")
41         print("Please ensure the LM Studio application is running and the model identifier is correct.")
42         print(f"Error details: {e}")
43 
44 
45 if __name__ == "__main__":
46     print("--- Local LLM Interaction via lmstudio-python SDK ---")
47     print("\n--- Model Response ---")
48     print(get_llm_response_with_sdk("Explain the significance of the Rosetta Stone in one paragraph."))

Sample output may look like this:

1 $ uv run lmstudio_library_example.py
2 --- Local LLM Interaction via lmstudio-python SDK ---
3 
4 --- Model Response ---
5 Loading model...
6 Sending prompt to the model...
7 The Rosetta Stone is a fragment of a larger stele inscribed with the same text in three scripts: hieroglyphic, demotic, and ancient Greek. Its discovery in 1799 was a pivotal moment in Egyptology because it provided the key to deciphering hieroglyphics, a writing system that had been lost for centuries. By comparing the known Greek text to the unknown Egyptian scripts, scholars like Jean-François Champollion were able to unlock the meaning of hieroglyphics, opening up a vast treasure trove of information about ancient Egyptian history, culture, and religion.  Essentially, the Rosetta Stone provided the crucial bridge for understanding a civilization's written language and allowed us to finally "read" ancient Egypt.

Tool Use / Function Calling

The key to effective tool use with local models that aren’t specifically fine-tuned for tool use or function calling is to guide them with a very structured prompt that contains information for available functions and their arguments. Dear reader, we work around not having built-in supported tool use by creating a “template” that contains information about available tools we write ourselves and include in the template the names and arguments of each tool.

Note that many models and inferencing platforms directly support tool use for some combinations of models and client API libraries.

The advantage of “building it ourselves” is the flexibility of being able to use most models, libraries and inferencing platforms.

Before we look at more complex tools we will first look at a simple common example: a tool to use a stubbed out weather API.

An Initial Example: a Tool That is a Simple Python Function

The first example in this chapter can be found in the file LM_Studio_BOOK/src/tool_use/weather_tool.py and uses these steps:

  • Define the tools: First, I need to define the functions that the AI can “call.” These are standard Python functions. For this example, I’ll create a simple get_weather function.
  • Create the prompt template: This is the most critical step for local models that may not directly support tool use. We need to design a prompt that clearly lists the available tools with their descriptions and parameters, and provides a format for the model to use when it wants to call a tool. The prompt should instruct the model to output a specific, parseable format, like JSON.
  • Set up a client for using the LM Studio service APIs: In this example we use the OpenAI Python library, configured to point to the local LM Studio server. This allows for a familiar and standardized way to interact with the local model.
  • Process user input: The user’s query is inserted into the prompt template.
  • Send a prompt to a model and get a response: In this example a complete prompt is sent to the Gemma model running in LM Studio.
  • Parse the JSON response and execute a local Pyhton function in the client script: Check the model’s response. If it contains the special JSON format for a tool call, then we parse it, execute the corresponding Python function with the provided arguments, and then feed the result back to the model for a final, natural language response. If the initial response from the model doesn’t contain a tool call, then we just use the response.
  1 from openai import OpenAI
  2 import json
  3 import re
  4 
  5 def get_weather(city: str, unit: str = "celsius"):
  6     """
  7     Get the current weather for a given city.
  8     
  9     Args:
 10         city (str): The name of the city.
 11         unit (str): The temperature unit, 'celsius' or 'fahrenheit'.
 12     """
 13     # In a real application, you would call a weather API here.
 14     # For this example, we'll just return some mock data.
 15     if "chicago" in city.lower():
 16         return json.dumps({"city": "Chicago", "temperature": "12", "unit": unit})
 17     elif "tokyo" in city.lower():
 18         return json.dumps({"city": "Tokyo", "temperature": "25", "unit": unit})
 19     else:
 20         return json.dumps({"error": "City not found"})
 21 
 22 
 23 # Point to the local server
 24 client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
 25 
 26 # A dictionary to map tool names to actual functions
 27 available_tools = {
 28     "get_weather": get_weather,
 29 }
 30 
 31 def run_conversation(user_prompt: str):
 32     # System prompt that defines the rules and tools for the model
 33     system_prompt = """
 34     You are a helpful assistant with access to the following tools.
 35     To use a tool, you must respond with a JSON object with two keys: "tool_name" and "parameters".
 36     
 37     Here are the available tools:
 38     {
 39         "tool_name": "get_weather",
 40         "description": "Get the current weather for a given city.",
 41         "parameters": [
 42             {"name": "city", "type": "string", "description": "The city name."},
 43             {"name": "unit", "type": "string", "description": "The unit for temperature, either 'celsius' or 'fahrenheit'."}
 44         ]
 45     }
 46     
 47     If you decide to use a tool, your response MUST be only the JSON object.
 48     If you don't need a tool, answer the user's question directly.
 49     """
 50     
 51     messages = [
 52         {"role": "system", "content": system_prompt},
 53         {"role": "user", "content": user_prompt}
 54     ]
 55 
 56     print("--- User Question ---")
 57     print(user_prompt)
 58 
 59     completion = client.chat.completions.create(
 60         model="local-model", # This will be ignored by LM Studio
 61         messages=messages,
 62         temperature=0.1, # Lower temperature for more predictable, structured output
 63     )
 64 
 65     response_message = completion.choices[0].message.content
 66 
 67     # More robustly find and extract the JSON from the model's response
 68     json_str = None
 69     tool_call = {}
 70     
 71     # Use regex to find JSON within ```json ... ``` or ``` ... ```
 72     match = re.search(r'```(?:json)?\s*(\{.*?\})\s*```', response_message, re.DOTALL)
 73     if match:
 74         json_str = match.group(1)
 75     else:
 76         # If no markdown block, maybe the whole message is the JSON
 77         if response_message.startswith('{'):
 78             json_str = response_message
 79 
 80     if json_str:
 81         try:
 82             tool_call = json.loads(json_str)
 83         except json.JSONDecodeError as e:
 84             print(e)
 85              
 86     # Check if the model wants to call a tool
 87     try:
 88         tool_name = tool_call.get("tool_name")
 89         
 90         if tool_name in available_tools:
 91             print("\n--- Tool Call Detected ---")
 92             print(f"Tool: {tool_name}")
 93             print(f"Parameters: {tool_call.get('parameters')}")
 94             
 95             # Execute the function
 96             function_to_call = available_tools[tool_name]
 97             tool_params = tool_call.get("parameters", {})
 98             function_response = function_to_call(**tool_params)
 99             
100             print("\n--- Tool Response ---")
101             print(function_response)
102             
103             # (Optional) Send the result back to the model for a final summary
104             messages.append({"role": "assistant", "content": response_message})
105             messages.append({"role": "tool", "content": function_response})
106             
107             print("\n--- Final Response from Model ---")
108             final_completion = client.chat.completions.create(
109                 model="local-model",
110                 messages=messages,
111                 temperature=0.7,
112             )
113             print(final_completion.choices[0].message.content)
114             
115         else:
116             # The JSON doesn't match our tool schema
117             print("\n--- Assistant Response (No Tool) ---")
118             print(response_message)
119 
120     except json.JSONDecodeError:
121         # The response was not JSON, so it's a direct answer
122         print("\n--- Assistant Response (No Tool) ---")
123         print(response_message)
124 
125 
126 # --- Run Examples ---
127 run_conversation("What's the weather like in Tokyo in celsius?")
128 print("\n" + "="*50 + "\n")
129 run_conversation("What is the capital of France?")

The output using LM Studio with the model google/gemma-3n-e4b looks like:

 1 $ uv run weather_tool.py
 2 --- User Question ---
 3 What's the weather like in Tokyo in celsius?
 4 
 5 --- Tool Call Detected ---
 6 Tool: get_weather
 7 Parameters: {'city': 'Tokyo', 'unit': 'celsius'}
 8 
 9 --- Tool Response ---
10 {"city": "Tokyo", "temperature": "25", "unit": "celsius"}
11 
12 --- Final Response from Model ---
13 The weather in Tokyo is 25 degrees Celsius.
14 
15 ==================================================
16 
17 --- User Question ---
18 What is the capital of France?
19 
20 --- Assistant Response (No Tool) ---
21 Paris is the capital of France.

Here we tested two prompts: the first uses a tool and the second prompt does not use a tool. We started with a simple example so you understand the low-level process of supporting tool use/function calling. In the next section we will generalize this example into two parts: a separate library and examples that uses this separate library.

Creating a General Purpose Tools/Function Calling Library

This Python code in the file tool_use/function_calling_library.py provides a lightweight and flexible framework for integrating external functions as “tools” with a large language model (LLM). (Note: we will use tools hosted in the LM Studio application in the next chapter.) Our library defines two primary classes: ToolManager, which handles the registration and schema generation for available tools, and ConversationHandler, which orchestrates the multi-step interaction between the user, the LLM, and the tools. This approach allows the LLM to decide when to call a function, execute it within the Python environment, and then use the result to formulate a more informed and human readable response.

  1 import json
  2 import re
  3 import inspect
  4 from openai import OpenAI
  5 
  6 class ToolManager:
  7     """
  8     Manages the registration and formatting of tools for the LLM.
  9     """
 10     def __init__(self):
 11         """Initializes the ToolManager with empty dictionaries for tools."""
 12         self.tools_schema = {}
 13         self.available_tools = {}
 14 
 15     def register_tool(self, func):
 16         """
 17         Registers a function as a tool, extracting its schema from the
 18         docstring and signature.
 19         
 20         Args:
 21             func (function): The function to be registered as a tool.
 22         """
 23         tool_name = func.__name__
 24         self.available_tools[tool_name] = func
 25 
 26         # Extract description from docstring
 27         description = "No description found."
 28         docstring = inspect.getdoc(func)
 29         if docstring:
 30             description = docstring.strip().split('\n\n')[0]
 31 
 32         # Extract parameters from function signature
 33         sig = inspect.signature(func)
 34         parameters = []
 35         for name, param in sig.parameters.items():
 36             param_type = "string" # Default type
 37             if param.annotation is not inspect.Parameter.empty:
 38                 # A simple way to map Python types to JSON schema types
 39                 if param.annotation == int:
 40                     param_type = "integer"
 41                 elif param.annotation == float:
 42                     param_type = "number"
 43                 elif param.annotation == bool:
 44                     param_type = "boolean"
 45             
 46             # Simple docstring parsing for parameter descriptions (assumes "Args:" section)
 47             param_description = ""
 48             if docstring:
 49                 arg_section = re.search(r'Args:(.*)', docstring, re.DOTALL)
 50                 if arg_section:
 51                     param_line = re.search(rf'^\s*{name}\s*\(.*?\):\s*(.*)',
 52                                            arg_section.group(1), re.MULTILINE)
 53                     if param_line:
 54                         param_description = param_line.group(1).strip()
 55 
 56             parameters.append({
 57                 "name": name,
 58                 "type": param_type,
 59                 "description": param_description
 60             })
 61 
 62         self.tools_schema[tool_name] = {
 63             "tool_name": tool_name,
 64             "description": description,
 65             "parameters": parameters
 66         }
 67 
 68     def get_tools_for_prompt(self):
 69         """
 70         Formats the registered tools' schemas into a JSON string for the system prompt.
 71 
 72         Returns:
 73             str: A JSON string representing the list of available tools.
 74         """
 75         if not self.tools_schema:
 76             return "No tools available."
 77         return json.dumps(list(self.tools_schema.values()), indent=4)
 78 
 79 class ConversationHandler:
 80     """
 81     Handles the conversation flow, including making API calls and executing tools.
 82     """
 83     def __init__(self, client: OpenAI, tool_manager: ToolManager,
 84                  model: str = "local-model", temperature: float = 0.1):
 85         """
 86         Initializes the ConversationHandler.
 87 
 88         Args:
 89             client (OpenAI): The OpenAI client instance.
 90             tool_manager (ToolManager): The ToolManager instance with registered tools.
 91             model (str): The model name to use (ignored by LM Studio).
 92             temperature (float): The sampling temperature for the model.
 93         """
 94         self.client = client
 95         self.tool_manager = tool_manager
 96         self.model = model
 97         self.temperature = temperature
 98 
 99     def _create_system_prompt(self):
100         """Creates the system prompt with tool definitions."""
101         return f"""
102 You are a helpful assistant with access to the following tools.
103 To use a tool, you must respond with a JSON object with two keys: "tool_name" and "parameters".
104 
105 Here are the available tools:
106 {self.tool_manager.get_tools_for_prompt()}
107 
108 If you decide to use a tool, your response MUST be only the JSON object.
109 If you don't need a tool, answer the user's question directly.
110 """
111 
112     def run(self, user_prompt: str, verbose: bool = True):
113         """
114         Runs the full conversation loop for a single user prompt.
115 
116         Args:
117             user_prompt (str): The user's question or command.
118             verbose (bool): If True, prints detailed steps of the conversation.
119         """
120         system_prompt = self._create_system_prompt()
121         messages = [
122             {"role": "system", "content": system_prompt},
123             {"role": "user", "content": user_prompt}
124         ]
125 
126         if verbose:
127             print("--- User Question ---")
128             print(user_prompt)
129 
130         # --- First API Call: Check for tool use ---
131         completion = self.client.chat.completions.create(
132             model=self.model,
133             messages=messages,
134             temperature=self.temperature,
135         )
136         response_message = completion.choices[0].message.content
137 
138         # --- Parse for Tool Call ---
139         tool_call = self._parse_tool_call(response_message)
140 
141         if tool_call and tool_call.get("tool_name") in self.tool_manager.available_tools:
142             tool_name = tool_call["tool_name"]
143             tool_params = tool_call.get("parameters", {})
144             
145             if verbose:
146                 print("\n--- Tool Call Detected ---")
147                 print(f"Tool: {tool_name}")
148                 print(f"Parameters: {tool_params}")
149 
150             # --- Execute the Tool ---
151             function_to_call = self.tool_manager.available_tools[tool_name]
152             try:
153                 function_response = function_to_call(**tool_params)
154             except Exception as e:
155                 function_response = f"Error executing tool: {e}"
156 
157             if verbose:
158                 print("\n--- Tool Response ---")
159                 print(function_response)
160 
161             # --- Second API Call: Summarize the result ---
162             messages.append({"role": "assistant", "content": json.dumps(tool_call, indent=4)})
163             messages.append({"role": "tool", "content": str(function_response)})
164             
165             messages.append({
166                 "role": "user", 
167                 "content": "Based on the result from the tool, please formulate a final answer to the original user question."
168             })
169 
170             if verbose:
171                 print("\n--- Final Response from Model ---")
172             
173             final_completion = self.client.chat.completions.create(
174                 model=self.model,
175                 messages=messages,
176                 temperature=0.7, # Higher temp for more natural language
177             )
178             final_response = final_completion.choices[0].message.content
179             print(final_response)
180         else:
181             # --- No Tool Call Detected ---
182             if verbose:
183                 print("\n--- Assistant Response (No Tool) ---")
184             print(response_message)
185 
186     def _parse_tool_call(self, response_message: str) -> dict | None:
187         """
188         Parses the model's response to find and decode a JSON tool call.
189 
190         Args:
191             response_message (str): The raw response content from the model.
192 
193         Returns:
194             dict or None: A dictionary representing the tool call, or None if not found.
195         """
196         json_str = None
197         # Use regex to find JSON within ```json ... ``` or ``` ... ```
198         match = re.search(r'```(?:json)?\s*(\{.*?\})\s*```', response_message, re.DOTALL)
199         if match:
200             json_str = match.group(1)
201         # If no markdown block, check if the whole message is a JSON object
202         elif response_message.strip().startswith('{'):
203             json_str = response_message
204 
205         if json_str:
206             try:
207                 # Clean up the JSON string before parsing
208                 cleaned_json_str = json_str.strip()
209                 return json.loads(cleaned_json_str)
210             except json.JSONDecodeError as e:
211                 print(f"JSON Decode Error: {e} in string '{cleaned_json_str}'")
212                 return None
213         return None

The first class, ToolManager, serves as a registry for the functions you want to expose to the LLM. Its core method, register_tool, uses Python’s inspect module to dynamically analyze a function’s signature and docstring. It extracts the function’s name, its parameters (including their type hints), and their descriptions from the “Args” section of the docstring. This information is then compiled into a JSON schema that describes the tool in a machine-readable format. This automated process is powerful because it allows a developer to make a standard Python function available to the LLM simply by adding it to the manager, without manually writing complex JSON schemas.

The second class, ConversationHandler, is the engine that drives the interaction. When its run method is called, it first constructs a detailed system prompt. This special prompt instructs the LLM on how to behave and includes the JSON schemas for all registered tools, informing the model of its capabilities. The user’s question is then sent to the LLM. The model’s first task is to decide whether to answer directly or to use one of the provided tools. If it determines a tool is necessary, it is instructed to respond only with a JSON object specifying the tool_name and the parameters needed to run it.

The process concludes with a crucial two-step execution logic. If the ConversationHandler receives a valid JSON tool call from the LLM, it executes the corresponding Python function with the provided parameters. The return value from that function is then packaged into a new message with the role “tool” and sent back to the LLM in a second API call. This second call prompts the model to synthesize the tool’s output into a final, natural-language answer for the user. If the model’s initial response was not a tool call, the system assumes no tool was needed and simply presents that response directly to the user. This conditional, multi-step approach enables the LLM to leverage external code to answer questions it otherwise couldn’t.

First example Using Function Calling Library: Generate Python and Execute to Answer User Questions

This Python script in the file tool_use/test1.py demonstrates a practical implementation of the function_calling_library by creating a specific tool designed to solve mathematical problems. It defines a function, solve_math_problem, that can execute arbitrary Python code in an insecure (you must trust the Python tool functions that the model writes for you to help solve a user prompt or query), isolated process. The main part of the script then initializes the ToolManager and ConversationHandler from the library developed in the previous section, registers the new math tool, and runs two example conversations: one that requires complex calculation, thereby triggering the tool, and another that is a general knowledge question, which the LLM answers directly.

 1 import os
 2 import subprocess
 3 import json
 4 from openai import OpenAI
 5 from function_calling_library import ToolManager, ConversationHandler
 6 
 7 # --- Define the Custom Tool ---
 8 
 9 def solve_math_problem(python_code: str):
10     """
11     Executes a given string of Python code to solve a math problem and returns the output.
12     The code should be a complete, runnable script that prints the final result to standard output.
13 
14     Args:
15         python_code (str): A string containing the Python code to execute.
16     """
17     temp_filename = "temp.py"
18     
19     # Ensure any previous file is removed
20     if os.path.exists(temp_filename):
21         os.remove(temp_filename)
22 
23     try:
24         # Write the code to a temporary file
25         with open(temp_filename, "w") as f:
26             f.write(python_code)
27         
28         # Execute the python script as a separate process
29         result = subprocess.run(
30             ['python', temp_filename], 
31             capture_output=True, 
32             text=True, 
33             check=True, # This will raise CalledProcessError if the script fails
34             timeout=10 # Add a timeout for safety
35         )
36         
37         # The output from the script's print() statements
38         return result.stdout.strip()
39 
40     except subprocess.CalledProcessError as e:
41         # If the script has a runtime error, return the error message
42         error_message = f"Error executing the Python code:\nSTDOUT:\n{e.stdout}\nSTDERR:\n{e.stderr}"
43         return error_message
44     except Exception as e:
45         return f"An unexpected error occurred: {e}"
46     finally:
47         # Clean up the temporary file
48         if os.path.exists(temp_filename):
49             os.remove(temp_filename)
50 
51 
52 # --- Main Execution Logic ---
53 
54 if __name__ == "__main__":
55     # Point to the local LM Studio server
56     client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
57     
58     # 1. Initialize the ToolManager
59     tool_manager = ToolManager()
60     
61     # 2. Register the custom tool
62     tool_manager.register_tool(solve_math_problem)
63     
64     # 3. Initialize the ConversationHandler with the client and tools
65     handler = ConversationHandler(client, tool_manager)
66     
67     # 4. Define a user prompt that requires the tool
68     user_prompt = "Can you please calculate the area of a circle with a radius of 7.5 and also find the 20th number in the Fibonacci sequence? Please provide the Python code to do this."
69     
70     # 5. Run the conversation
71     handler.run(user_prompt)
72 
73     print("\n" + "="*50 + "\n")
74 
75     # Example of a question that should NOT use the tool
76     non_tool_prompt = "What is the most popular programming language?"
77     handler.run(non_tool_prompt)

The core of this script is the solve_math_problem function, which serves as the custom tool. It’s designed to somewhat safely execute a string of Python code passed to it by the LLM. To avoid security risks associated with eval() or exec(), it writes the code to a temporary file (temp.py). It then uses Python’s subprocess module to run this file as an entirely separate process. This sandboxes the execution, and the capture_output=True argument ensures that any output printed by the script (e.g., the result of a calculation) is captured. The function includes robust error handling, returning any standard error from the script if it fails, and a finally block to guarantee the temporary file is deleted, maintaining a clean state.

Please note that code generated my the LLM is not fully sandboxed and this approach is tailored to personal development environments. For production consider running the generated code in a container that limits network and file access activity, as appropriate.

The main execution block, guarded by if __name__ == "__main__", orchestrates the entire demonstration. It begins by configuring the OpenAI client to connect to a local server, such as LM Studio. It then instantiates the ToolManager and registers the solve_math_problem function as an available tool. With the tool ready, it creates a ConversationHandler to manage the flow. The script then showcases the system’s decision-making ability by running two different prompts. The first asks for two distinct mathematical calculations, a task that perfectly matches the solve_math_problem tool’s purpose. The second prompt is a general knowledge question that requires no calculation, demonstrating the LLM’s ability to differentiate between tasks and answer directly when a tool is not needed.

Sample output:

 1 $ uv run test1.py              
 2 --- User Question ---
 3 Can you please calculate the area of a circle with a radius of 7.5 and also find the 20th number in the Fibonacci sequence? Please provide the Python code to do this.
 4 
 5 --- Tool Call Detected ---
 6 Tool: solve_math_problem
 7 Parameters: {'python_code': 'import math\nradius = 7.5\narea = math.pi * radius**2\nprint(area)\n\ndef fibonacci(n):\n  if n <= 0:\n    return 0\n  elif n == 1:\n    return 1\n  else:\n    a, b = 0, 1\n    for _ in range(2, n + 1):\n      a, b = b, a + b\n    return b\n\nprint(fibonacci(20))'}
 8 
 9 --- Tool Response ---
10 176.71458676442586
11 6765
12 
13 --- Final Response from Model ---
14 The area of a circle with a radius of 7.5 is approximately 176.7146, and the 20th number in the Fibonacci sequence is 6765.
15 
16 
17 ==================================================
18 
19 --- User Question ---
20 What is the most popular programming language?
21 
22 --- Assistant Response (No Tool) ---
23 Python is generally considered the most popular programming language.

Having a model generate Python code to solve problems is a powerful technique so please, dear reader, take some time experimenting with this last example and adapting it to your own use cases.

Second example Using Function Calling Library: Stub of Weather API

This is our original example, modified to use the library developed earlier in this chapter.

This script in the file tool_use/test2.py provides another clear example of how the function_calling_library can be used to extend an LLM’s capabilities, this time by simulating an external data fetch from a weather API. It defines a simple get_weather function that returns mock data for specific cities. The main execution logic then sets up the ConversationHandler, registers this new tool, and processes two distinct user prompts to demonstrate the LLM’s ability to intelligently decide when to call the function and when to rely on its own knowledge base.

 1 import json
 2 from openai import OpenAI
 3 from function_calling_library import ToolManager, ConversationHandler
 4 
 5 # --- Define the Custom Tool ---
 6 
 7 def get_weather(city: str, unit: str = "celsius"):
 8     """
 9     Get the current weather for a given city.
10     
11     Args:
12         city (str): The name of the city.
13         unit (str): The temperature unit, 'celsius' or 'fahrenheit'.
14     """
15     # In a real application, you would call a weather API here.
16     # For this example, we'll just return some mock data.
17     if "chicago" in city.lower():
18         return json.dumps({"city": "Chicago", "temperature": "12", "unit": unit})
19     elif "tokyo" in city.lower():
20         return json.dumps({"city": "Tokyo", "temperature": "25", "unit": unit})
21     else:
22         return json.dumps({"error": "City not found"})
23 
24 # --- Main Execution Logic ---
25 
26 if __name__ == "__main__":
27     # Point to the local LM Studio server
28     client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
29     
30     # 1. Initialize the ToolManager
31     tool_manager = ToolManager()
32     
33     # 2. Register the custom tool
34     tool_manager.register_tool(get_weather)
35     
36     # 3. Initialize the ConversationHandler with the client and tools
37     handler = ConversationHandler(client, tool_manager)
38     
39     # 4. Define a user prompt that requires the tool
40     user_prompt = "What's the weather like in Tokyo in celsius?"
41     
42     # 5. Run the conversation
43     handler.run(user_prompt)
44 
45     print("\n" + "="*50 + "\n")
46 
47     # Example of a question not using tool calling:
48     another_prompt = "What do Chicago and Tokyo have in common? Provide a fun answer."
49     handler.run(another_prompt)

The custom tool in this example is the get_weather function. It is defined with clear parameters, city and temperature unit, and includes type hints and a docstring that the ToolManager will use to automatically generate its schema. Instead of making a live call to a weather service, this function contains simple conditional logic to return hardcoded JSON strings for “Chicago” and “Tokyo,” or an error if another city is requested. This mock implementation is a common and effective development practice, as it allows you to build and test the entire function-calling logic without depending on external network requests or API keys. The function’s return value is a JSON string, which is a standard data interchange format easily understood by both the Python environment and the LLM.

The main execution block follows the same clear, step-by-step pattern as the previous example. It configures the client, initializes the ToolManager, and registers the get_weather function. After setting up the ConversationHandler, it runs two test cases that highlight the system’s contextual awareness. The first prompt, “What’s the weather like in Tokyo in celsius?”, directly maps to the functionality of the get_weather tool, and the LLM correctly identifies this and generates the appropriate JSON tool call. The second prompt, which asks for commonalities between the two cities, is a conceptual question outside the tool’s scope. In this case, the LLM correctly bypasses the tool-calling mechanism and provides a direct, creative answer from its own training data, demonstrating the robustness of the overall approach.

Here is sample output from the test2.py script:

 1 $ uv run test2.py
 2 --- User Question ---
 3 What's the weather like in Tokyo in celsius?
 4 
 5 --- Tool Call Detected ---
 6 Tool: get_weather
 7 Parameters: {'city': 'Tokyo', 'unit': 'celsius'}
 8 
 9 --- Tool Response ---
10 {"city": "Tokyo", "temperature": "25", "unit": "celsius"}
11 
12 --- Final Response from Model ---
13 The weather in Tokyo is 25 degrees Celsius.
14 
15 
16 ==================================================
17 
18 --- User Question ---
19 What do Chicago and Tokyo have in common? Provide a fun answer.
20 
21 --- Assistant Response (No Tool) ---
22 Chicago and Tokyo both have amazing food scenes! Chicago is famous for deep-dish pizza, while Tokyo is known for its incredible sushi and ramen. Both cities are culinary adventures! 🍕🍜

Dear reader, you probably use many APIs in developing applications. Choose one or two of these APIs that you are familiar with and modify this last example to call real APIs, making sure that you use type metadata and a descriptive comment in each Python tool function you write. Then you can experiment with “chatting” with live application data from APIs that you use in your own work or research.

A Technical Introduction to Model Context Protocol and Experiments with LM Studio

Here I will guide you, dear reader, through the process of using the Model Context Protocol (MCP). This is a long chapter. We will start with some background material and then work through a few examples that you can easily modify for your own applications.

An Introduction to the Model Context Protocol

The rapid evolution of Large Language Models (LLMs) has shifted the focus of AI development from model creation to model application. The primary challenge in this new era is no longer just generating coherent text, images, and videos, but also enabling LLMs to perceive, reason about, and act upon the world through external data and software tools. Here we learn a definitive architectural guide to the MCP, an open standard designed to solve this integration challenge. We study a complete strategy for leveraging MCP within a local, privacy-centric environment using LM Studio, culminating in an example Python implementation of a custom MCP compatible services that interact with models running on LM Studio.

The Post-Integration Era: Why MCP is Necessary

The process of connecting LLMs to external systems has been a significant bottleneck for innovation. Each new application, data source, or API required a bespoke, one-off integration. For example, a developer wanting an LLM to access a customer’s Salesforce data, query a local database, and send a notification via Slack would need to write three separate, brittle, and non-interoperable pieces of code. This approach created deep “information silos,” where the LLM’s potential was hamstrung by the immense engineering effort required to grant it context. This ad-hoc integration paradigm was fundamentally unscalable from a software development viewpoint, hindering the development of complex, multi-tool AI agents.

In response to this systemic problem, Anthropic introduced the Model Context Protocol in November 2024. MCP is an open-source, open-standard framework designed to be the universal interface for AI systems. The protocol was almost immediately adopted by other major industry players, including OpenAI, Google DeepMind, and Microsoft, signaling a collective recognition of the need for a unified standard.

The most effective way to understand MCP’s purpose is through the “USB cables for AI” analogy. Before USB, every device had a proprietary connector, requiring a different cable for charging, data transfer, and video output. USB replaced this chaos with a single, standardized port that could handle all these functions. Similarly, MCP replaces the chaos of bespoke AI integrations with a single, standardized protocol. It provides a universal way for an AI model to connect to any compliant data source or tool, whether it’s reading files from a local computer, searching a knowledge base, or executing actions in a project management tool.

The vision of MCP is to enable the creation of sophisticated, agentic workflows that can be maintained inexpensively as individual tool and service interfaces may occasioanlly change internally. By providing a common interface language for tools, MCP allows an AI to compose multiple capabilities in a coordinated fashion. For example, an agent could use one tool to look up a document, a second tool to query a CRM system for related customer data, and a third tool to draft and send a message via a messaging API. This ability to chain together actions across distributed resources is the foundation of advanced, context-aware AI reasoning and automation. The rapid, cross-industry adoption of MCP was not merely the embrace of a new feature, but a strategic acknowledgment that the entire AI ecosystem would benefit more from a shared protocol layer than from maintaining proprietary integration moats. The N-to-M integration problem connecting N applications to M tools was a drag on the entire industry. By solving it, MCP unlocked a new frontier of possibility, shifting developer focus from building brittle pipes to orchestrating intelligent workflows.

Architectural Deep Dive: Hosts, Clients, and Servers

The MCP architecture is built upon a clear separation of concerns, defined by three primary actors: Hosts, Clients, and Servers. Understanding the distinct role of each is critical to designing and implementing MCP-compliant systems:

  • MCP Host: The Host is the primary LLM application that the end-user interacts with. It is responsible for managing the overall session, initiating connections, and, most importantly, responsibility for user security. The Host discovers the tools available from connected servers and presents them to the LLM. When the LLM decides to use a tool, the Host intercepts the request and presents it to the user for explicit approval. In the context of this discusion, LM Studio serves as the MCP Host. Other examples include the Claude Desktop application or IDEs with MCP plugins.
  • MCP Server: A Server is a program that exposes external capabilities, data and tools to an MCP Host. This is the component that bridges the gap between the abstract world of the LLM and the concrete functionality of an external system. A server could be a lightweight script providing access to the local file system (the focus of our implementation), or it could be a robust enterprise service providing access to a platform like GitHub, Stripe, or Google Drive. Developers can use pre-built servers or create their own to connect proprietary systems to the MCP ecosystem.
  • MCP Client: The Client is a connector component that resides within the Host. This is a subtle but important architectural detail. The Host application spawns a separate Client for each MCP Server it connects to. Each Client is responsible for maintaining a single, stateful, and isolated connection with its corresponding Server. It handles protocol negotiations, routes messages, and manages the security boundaries between different servers. This one-to-one relationship between a Client and a Server ensures that connections are modular and a failure in one server does not impact others.

The conceptual design of MCP draws significant inspiration from the Language Server Protocol (LSP), a standard pioneered by Microsoft for use in development tools like Visual Studio Code, Emacs, and other editors and IDEs. Before LSP, adding support for a new programming language to an IDE required writing a complex, IDE specific extension. LSP standardized the communication between IDEs (the client) and language-specific servers. A developer could now write a single “language server” for Python, and it would provide features like code completion, linting, and syntax highlighting to any LSP-compliant editor. In the same way, MCP standardizes the communication between AI applications (the Host) and context providers (the Server). A developer can write a single MCP server for their API, and it can provide tools and resources to any MCP-compliant application, be it Claude, LM Studio, or a custom-built agent.

The MCP Specification: Communication and Primitives

The MCP specification defines the rules of interoperability between Hosts and Servers, ensuring robust communication across the ecosystem. It is built on established web standards and defines a clear set of capabilities.

At its core, all communication within MCP is conducted using JSON-RPC 2.0 messages. This is a lightweight, stateless, and text-based remote procedure call protocol. Its simplicity and wide support across programming languages make it an ideal choice for a universal standard. The protocol defines three message types: Requests (which expect a response), Responses (which contain the result or an error), and Notifications (one-way messages that don’t require a response).

An MCP Server can offer three fundamental types of capabilities, known as primitives, to a Host:

  • Tools: These are executable functions that allow an LLM to perform actions and produce side effects in the external world. A tool could be anything from write_file to send_email or create_calendar_event. Tools are the primary mechanism for building agentic systems that can act on the user’s behalf. This primitive is the main focus of the implementation in this chapter.
  • Resources: These are read-only blocks of contextual data that can be provided to the LLM to inform its reasoning. A resource could be the content of a document, a record from a database, or the result of an API call. Unlike tools, resources do not have side effects; they are purely for information retrieval.
  • Prompts: These are pre-defined, templated messages or workflows that a server can expose to the user. They act as shortcuts for common tasks, allowing a user to easily invoke a complex chain of actions through a simple command.

To manage compatibility as the protocol evolves, MCP uses a simple, date-based versioning scheme in the format YYYY-MM-DD. When a Host and Server first connect, they negotiate a common protocol version to use for the session. This ensures that both parties understand the same set of rules and capabilities, allowing for graceful degradation or connection termination if a compatible version cannot be found.

Security and Trust: The User-in-the-Loop Paradigm

The power of MCP in enabling an AI to access files and execute arbitrary code necessitates a security model that is both robust and transparent. The protocol’s design is founded on the principle of explicit user consent and control.

The MCP specification deliberately places the responsibility for enforcing security on the MCP Host application, not on the protocol itself. The protocol documentation uses formal language to state that Hosts must obtain explicit user consent before invoking any tool and should provide clear user interfaces for reviewing and authorizing all data access and operations. This is a pragmatic design choice. Rather than attempting to build a complex, universal sandboxing mechanism into the protocol, a task of immense difficulty, the specification keeps the protocol itself simple and pushes the responsibility for security to the application layer, which is better equipped to handle user interaction.

This places a significant burden on developers of Host applications like LM Studio. They are required to implement effective safety features, such as the tool call confirmation dialog, which intercepts every action the LLM wants to take and presents it to the user for approval. This “human-in-the-loop” approach makes the user the final arbiter of any action.

Furthermore, the protocol treats tool descriptions themselves as potentially untrusted. A malicious server could misrepresent what a tool does. Therefore, the user must understand the implications of a tool call before authorizing it. This security model is powerful because it is flexible, but its effectiveness is entirely dependent on the vigilance of the end-user and the clarity of the Host’s UI.

The following table compares MCP to previous tool-use paradigms, illustrating its advantages in standardization, discoverability, and composability.

Table 1: Comparison of Tool-Use Paradigms

Paradigm Standards Discovery Security Developer Overhead Composability
Manual API Integration None Manual (API Docs) Application-Specific High (per integration) Difficult
Proprietary Function Calling Vendor-Specific API-Based Platform-Enforced Medium (per platform) Limited (within vendor ecosystem)
Model Context Protocol (MCP) Open Standard Protocol-Based (tools/list) Host/User-Enforced Low (per tool server) High (cross-platform)

The Local AI Ecosystem: LM Studio as an MCP Host

To move from the theory of MCP to a practical implementation, a suitable Host environment is required. LM Studio, a popular desktop application for running LLMs locally and the topic of this book, has emerged as a key player in the local AI ecosystem and now functions as a full-featured MCP Host. This section details the relevant capabilities of LM Studio and the specific mechanisms for configuring and using it with MCP servers.

Overview of LM Studio for Local LLM Inference

LM Studio is designed to make local LLM inference accessible to a broad audience. It is free for internal business use and runs on consumer-grade hardware across macOS, Windows, and Linux. By leveraging highly optimized inference backends like llama.cpp for cross-platform support and Apple’s MLX for Apple Silicon, LM Studio allows users to run a wide variety of open-source models from Hugging Face directly on their own machines, ensuring data privacy and offline operation.

For developers, LM Studio provides two critical functionalities that form the foundation of our local agentic system:

  • OpenAI-Compatible API Server: LM Studio includes a built-in local server that mimics the OpenAI REST API. This server, typically running at http://localhost:1234/v1, accepts requests on standard OpenAI endpoints like /v1/chat/completions and /v1/embeddings. This is a powerful feature because it allows developers to use the vast ecosystem of existing OpenAI client libraries (for Python, TypeScript, etc.) with local models, often by simply changing the base_url parameter in their client configuration. Our testing client will use this API to interact with the LLM running in LM Studio.
  • MCP Host Capabilities: Beginning with version 0.3.17, LM Studio officially supports acting as an MCP Host. This important update allows the application to connect to, manage, and utilize both local and remote MCP servers. This capability bridges the gap between raw local inference and true agentic functionality, enabling local models to interact with the outside world through the standardized MCP interface. The rapid implementation of this feature was likely driven by strong community demand for standardized tool-use capabilities in local environments.

This combination of a user-friendly interface, a powerful local inference engine, an OpenAI-compatible API, and full MCP Host support makes LM Studio an ideal platform for developing and experimenting with private, local-first AI agents. It democratizes access to technologies that were previously the exclusive domain of large, cloud-based providers, allowing any developer to build sophisticated agents on their own hardware.

Enabling the Bridge: The mcp.json Manifest

The configuration of all MCP servers within LM Studio is centralized in a single JSON file named mcp.json. This file acts as a manifest, telling the LM Studio Host which servers to connect to and how to run them.

The file is located in the LM Studio application data directory:

1 macOS & Linux: ~/.lmstudio/mcp.json
2 Windows: %USERPROFILE%/.lmstudio/mcp.json

While it can be edited directly, the recommended method is to use the in-app editor, which can be seen in a later figure showing a screenshot of LM Studio. This approach avoids potential file permission issues and ensures the application reloads the configuration correctly.

The structure of mcp.json follows a notation originally established by the Cursor code editor, another MCP-enabled application. The root of the JSON object contains a single key, mcpServers, which holds an object where each key is a unique identifier for a server and the value is an object defining that server’s configuration.

LM Studio supports two types of server configurations:

  • Local Servers: These are servers that LM Studio will launch and manage as local child processes. This is the method used later for running our example Python services. The configuration requires the command and args keys to specify the executable and its arguments. An optional env key can be used to set environment variables for the server process.
  • Remote Servers: These are pre-existing servers running on a network, which LM Studio will connect to as a client. This configuration uses the url key to specify the server’s HTTP(S) endpoint. An optional headers key can be used to provide HTTP headers, which is commonly used for passing authorization tokens.

The reliance on a plain JSON file for configuration is characteristic of a rapidly evolving open-source tool. While a full graphical user interface for managing servers would be more user-friendly, a configuration file is significantly faster to implement and provides full power to technical users. This often results in a scenario where community-generated resources, such as blog posts and forum guides, become the de facto documentation for new features, supplementing the official guides. The following table consolidates this community knowledge into a clear reference:

Table: Table 2: LM Studio mcp.json Configuration Parameters

Key Type Scope Description Example
command String Local Server The executable program used to start the server process. "python"
args Array of Strings Local Server A list of command-line arguments to pass to the executable. ["server.py", "--port", "8000"]
env Object Local Server Key-value pairs of environment variables to set for the server process. {"API_KEY": "secret"}
url String Remote Server The full HTTP or HTTPS endpoint of a running remote MCP server. "https://huggingface.co/mcp"
headers Object Remote Server Key-value pairs of HTTP headers to include in every request to the server. {"Authorization": "Bearer <token>"}

Practical Host-Side Security: Intercepting and Approving Tool Calls

In alignment with the MCP security philosophy, LM Studio implements a critical safety feature: the tool call confirmation dialog. This mechanism acts as the practical enforcement of the “user consent” principle.

When an LLM running within LM Studio determines that it needs to use an external tool, it generates a structured request. Before this request is sent to the corresponding MCP server, the LM Studio Host intercepts it and pauses the execution. It then presents a modal dialog to the user, clearly stating which tool the model wants to invoke and displaying the exact arguments it intends to use.

This dialog empowers the user with complete agency over the process. They can:

  • Inspect: Carefully review the tool name and its parameters to ensure the action is intended and safe.
  • Edit: Modify the arguments before execution if necessary.
  • Approve: Allow the tool call to proceed for this one instance.
  • Deny: Block the tool call entirely.
  • Whitelist: Choose to “always allow” a specific, trusted tool. This adds the tool to a whitelist (managed in App Settings > Tools & Integrations), and LM Studio will no longer prompt for confirmation for that particular tool, streamlining workflows for trusted operations.

This user-in-the-loop system is the cornerstone of MCP’s security in a local environment. It acknowledges that local models can be made to call powerful, and potentially dangerous, tools. By making the user the final checkpoint, it mitigates the risk of unintended or malicious actions. Additionally, LM Studio enhances stability by running each configured MCP server in a separate, isolated process, ensuring that a crash or error in one tool server does not affect the main application or other connected servers.

Strategic Integration: A Blueprint for Local MCP-Powered Agents

With a solid understanding of MCP principles and the LM Studio Host environment, the next step is to formulate a clear strategy for building a functional agent. This section outlines a concrete project plan, defining the agent’s capabilities, the end-to-end workflow, model selection criteria, and an approach to error handling.

Defining the Goal: Implementing Three Example Agent Resources

The project goal is to design and build a Local File System Agent. This is an ideal first project as it is both useful and intuitive. In the next Python example we limit file system operations to fetching the names of files in a local directory but you are free to extend this example to match your specific use cases. It directly demonstrates the value of MCP for local AI by enabling an LLM to interact with the user’s own files, a common and highly useful task.

The agent will be equipped with three fundamental capabilities, which will be exposed as MCP tools:

  • List Directory Contents: The ability to receive a directory path and return a list of the files and subdirectories within it.
  • Perform exact arithmetic: This circumvents the problems LLMs sometimes have perfroming precise math operations.
  • Get the local time of day: This show how to make system calls in Python.

These three tools provide solid examples for implementing external resources for a wide range of tasks, from summarizing documents to generating code and saving it locally.

The End-to-End Workflow: From Prompt to Action

The complete operational flow of our agent involves a multi-step dance between the user, the LLM, the MCP Host (LM Studio), and our custom MCP Server. The following sequence illustrates this process for a typical user request:

Step 1: User Prompt: The user initiates the workflow by typing a natural language command into the LM Studio chat interface. For example: “Read the file named project_brief.md located in my Downloads folder and give me a summary.”

Step 2: LLM Inference and Tool Selection: The local LLM, loaded in LM Studio, processes this prompt. Because it is a model trained for tool use, it recognizes the user’s intent to read a file. It consults the list of available tools provided in its context by the Host and determines that the read_file tool is the most appropriate. It then formulates a structured tool call, specifying the tool’s name and the argument it has extracted from the prompt (e.g., {‘path’: ‘~/Downloads/project_brief.md’}).

Step 3: Host Interception and User Confirmation: LM Studio, acting as the vigilant Host, intercepts this tool call before it is executed. It presents the user with the confirmation dialog, displaying a message like: “The model wants to run read_file with arguments: {‘path’: ‘~/Downloads/project_brief.md’}. Allow?”. Step 4: User Approval: The user verifies that the request is legitimate and safe so the model is asking to read the correct file and clicks the “Allow” button.

Step 5: Client-Server Communication: Upon approval, LM Studio’s internal MCP Client for our server sends a formal JSON-RPC request over its stdio connection to our running Python MCP Server process.

Step 6: Server-Side Execution: Our Python server receives the tools/call request. It parses the method name (read_file) and the parameters, and invokes the corresponding Python function. The function executes the file system operation, opening and reading the contents of ~/Downloads/project_brief.md.

Step 7: Response and Contextualization: The server packages the file’s content into a successful JSON-RPC response and sends it back to the LM Studio Host. The Host receives the response and injects the file content back into the LLM’s context for its next turn.

Step 8: Final LLM Response: The LLM now has the full text of the document in its context. It proceeds to perform the final part of the user’s request, summarization, and generates the final, helpful response in the chat window: “The project brief outlines a plan to develop a new mobile application…”

This entire loop, from prompt to action to final answer, happens seamlessly from the user’s perspective, but relies on the clear, standardized communication defined by MCP.

Model Selection Criteria for Effective Tool Use

A critical component of this strategy is selecting an appropriate LLM. Not all models are created equal when it comes to tool use. The ability to correctly interpret a user’s request, select the right tool from a list, and generate a syntactically correct, structured JSON call is a specialized skill that must be explicitly trained into a model.

Using a generic base model or an older chat model not fine-tuned for this capability will almost certainly fail. Such models lack the ability to follow the complex instructions required for tool use and will likely respond with plain text instead of a tool call.

For this project, it is essential to select a model known for its strong instruction-following and function-calling capabilities. Excellent candidates available through LM Studio include modern, open-source models such as:

  • Qwen3 variants
  • Gemma 3 variants
  • Llama 3 variants
  • Mixtral variants

These models have been specifically trained on datasets that include examples of tool use, making them proficient at recognizing when a tool is needed and how to call it correctly.

The success of any MCP-powered agent is fundamentally dependent on the quality of the “semantic contract” established between the LLM and the tools it can use. This contract is not written in code but in natural language and structured data: the tool’s name, its parameters, and, most importantly, its description. The LLM has no innate understanding of a Python function; it only sees the text-based manifest provided by the MCP server. Its ability to make an intelligent choice hinges on how clearly and accurately this manifest describes the tool’s purpose and usage. A vague description like:

1 def tool1(arg1):

is useless. A clear, descriptive one like:

1 def read_file_content(path: str) -> str:
2     "Reads the entire text content of a file at the given local path."
3     ...

provides a strong signal that the model can understand and act upon. In agentic development, therefore, writing high-quality docstrings and schemas transitions from being a documentation “best practice” to a core functional requirement for the system to work at all.

Error Handling and Recovery Strategy

A robust agent must be able to handle failure gracefully. The MCP specification provides a two-tiered error handling mechanism, and our strategy must leverage it correctly:

  • Protocol-Level Errors: These errors relate to the MCP communication itself. Examples include the LLM trying to call a tool that doesn’t exist (tool_not_found) or providing arguments in the wrong format (invalid_parameters). These failures indicate a problem with the system’s mechanics. Our server should respond with a standard JSON-RPC error object. These errors are typically logged by the Host and are not intended for the LLM to reason about.
  • Tool Execution Errors: These errors occur when the tool itself runs but fails for a logical reason. For example, the read_file tool might be called with a path to a file that does not exist, or the write_file tool might be denied permission by the operating system. These are not protocol failures; they are runtime failures. In these cases, the server should return a successful JSON-RPC response. However, the content of the response should be an error object that includes a descriptive message for the LLM (e.g., {“isError”: true, “message”: “File not found at the specified path.”}).

This distinction is crucial. By reporting execution errors back to the LLM as content, we empower it to be more intelligent. The LLM can see the error message, understand what went wrong, and decide on a next step. It might inform the user (“I couldn’t find that file. Could you please verify the path?”), or it might even try to recover by using a different tool (e.g., using list_directory to see the available files). This approach makes the agent more resilient and the user experience more interactive.

Architectural Design for a Python-Based MCP Server

This section provides a detailed architectural design for the custom MCP server. It outlines the project setup, code structure, and the key design patterns that leverage the official MCP Python SDK to create a robust and maintainable server with minimal boilerplate.

Setting up the Development Environment

A clean and consistent development environment is the first step toward a successful implementation. The recommended tool for managing Python packages for this project is uv, a high-performance package manager and resolver written in Rust. The official MCP Python SDK documentation recommends its use for its speed and modern approach to dependency management.

The environment setup process is as follows:

Install uv: If not already installed, uv can be installed with a single command:

  • On macOS or Linux: curl -LsSf https://astral.sh/uv/install.sh | sh
  • On Windows (PowerShell): powershell -ExecutionPolicy ByPass -c “irm https://astral.sh/uv/install.ps1 | iex”

Initialize the Project: Create a new directory for the project and initialize it with uv.

1 mkdir math-and-time-and-files
2 cd math-and-time-and-files
3 uv init
4 uv venv
5 source.venv/bin/activate  # On macOS/Linux
6 #.venv\Scripts\activate   # On Windows
7 uv add fastmcp "mcp[cli]" openai

This setup provides a self-contained environment with all the necessary tools, ready for development. The last statement installs most libraries that you would likely need.

Designing the Server: A Modular and Declarative Approach

The design of the server will prioritize clarity, simplicity, and adherence to modern Python best practices. For a project of this scope, a single Python file, server.py, is sufficient and keeps the entire implementation in one place for easy review.

The central component of our architecture is the fastmcp.FastMCP class from the Python fastmcp package. This high-level class is an abstraction layer that handles the vast majority of the protocol’s complexity automatically. It manages the underlying JSON-RPC message parsing, the connection lifecycle (initialization, shutdown), and the dynamic registration of tools. By using FastMCP, the design can focus on implementing the business logic of the tools rather than the intricacies of the protocol.

The primary design pattern for defining tools will be declarative programming using decorators. The SDK provides the @tool() decorator, which can be applied to a standard Python function to expose it as an MCP tool. This approach is exceptionally clean and “Pythonic.” It allows the tool’s implementation and its MCP definition to coexist in a single, readable block of code, as opposed to requiring separate registration logic or configuration files.

The design of the fastmcp Python SDK, and the FastMCP class in particular, is a prime example of excellent developer experience. The raw MCP specification would require a developer to manually construct complex JSON-RPC messages and write JSON Schemas to define their tools. This process is both tedious and highly prone to error. The SDK’s designers abstracted this friction away by creating the FastMCP wrapper and the @tool decorator. This design cleverly leverages existing Python language features, functions, docstrings, and type hints that developers already use as part of writing high-quality, maintainable code. The SDK performs the complex translation from this familiar Pythonic representation to the formal protocol-level representation automatically. This significantly lowers the cognitive overhead and barrier to entry, making MCP server development accessible to a much wider audience of Python developers.

Defining the Tool Contract: Signatures, Type Hints, and Docstrings

The FastMCP class works through introspection. When it encounters a function decorated with @mcp.tool(), it automatically inspects the function’s signature to generate the formal MCP tool manifest that will be sent to the Host. This automated process relies on three key elements of the Python function:

  • Function Name: The name of the Python function (e.g., def list_directory(…)) is used directly as the unique name of the tool in the MCP manifest.
  • Docstring: The function’s docstring (the string literal within “““…”””) is used as the description of the tool. As discussed previously, this is the most critical element for the LLM’s ability to understand what the tool does and when to use it.
  • Type Hints: The Python type hints for the function’s parameters and return value (e.g., path: str -> list[str]) are used to automatically generate the inputSchema and output schema for the tool. This provides a structured contract that the LLM must adhere to when generating a tool call, ensuring that the server receives data in the expected format.

The example file LM_Studio_BOOK/src/mcp_examples/server.py contains the definitions of three tools. The following script, server.py, implements the MCP server with the three defined file system tools. The code is heavily commented to explain not only the “what” but also the “why” behind the implementation choices, particularly concerning error handling and security.

 1 from fastmcp import FastMCP
 2 import os
 3 
 4 # Initialize the MCP server
 5 app = FastMCP("my-mcp-server")
 6 
 7 @app.tool()
 8 def add_numbers(a: int, b: int) -> int:
 9     """Adds two numbers together."""
10     return a + b
11 
12 @app.tool()
13 def get_current_time_for_mark() -> str:
14     """Returns the current time."""
15     import datetime
16     return datetime.datetime.now().strftime("%H:%M:%S")
17 
18 @app.tool()
19 def list_directory(path: str = ".") -> list[str]:
20     """
21     Lists all files and subdirectories within a given local directory path.
22     The path should be an absolute path or relative to the user's home directory.
23     """
24     try:
25         # Implementation logic will go here
26         expanded_path = os.path.expanduser(path)
27         return os.listdir(expanded_path)
28     except FileNotFoundError:
29         return # Return an empty list if path not found
30     
31 # To run the server (e.g., via stdio for local development)
32 if __name__ == "__main__":
33     app.run()

Later we will see how to activate these three tools in LM Studio.

Data Flow and State Management

In general the tools will be designed to be stateless. Each tool call is an atomic, self-contained operation. The get_current_time_for_mark function, for example, simply calls a Python operating system function to get the local time. It does not rely on any previous state stored on the server. This is a robust design principle for MCP servers, as it makes them simple to reason about, test, and scale. The state of the conversation is managed by the LLM and the Host within the chat history, not by the tool server.

The data flow for a tool call is a straightforward JSON-RPC exchange. The Host sends a request object containing the method (“tools/call”) and params (the tool name and arguments). The server processes this request and sends back a response object containing either the result of the successful operation or an error object if a protocol-level failure occurred.

The following table provides a quick reference to the key SDK components used in this design.

Component Type Purpose Example Usage
FastMCP Class High-level server implementation that simplifies tool registration and lifecycle management. app = FastMCP()
@app.tool() Decorator Registers a Python function as an MCP tool, automatically generating its manifest from its signature and docstring. @app.tool() def my_function():...
app.run() Method Starts the MCP server, listening for connections via the standard input/output (stdio) transport. if __name__ == "__main__": app.run()
str, int, list, dict, bool, None Used with type hints to define the inputSchema for tool arguments. All types must be JSON-serializable. def get_user(id: int) -> dict:

The following table provides a quick reference to the key SDK components used in this design.

Component Type Purpose Example Usage
FastMCP Class High-level server implementation that simplifies tool registration and lifecycle management. app = FastMCP()
@app.tool() Decorator Registers a Python function as an MCP tool, automatically generating its manifest from its signature and docstring. @app.tool() def my_function():...
app.run() Method Starts the MCP server, listening for connections via the standard input/output (stdio) transport. if __name__ == "__main__": app.run()
str, int, list, dict, bool, None Used with type hints to define the inputSchema for tool arguments. All types must be JSON-serializable. def get_user(id: int) -> dict:

LM Studio Integration and Execution Guide

Follow these steps to run the complete system:

  • Start LM Studio: Launch the LM Studio application. In the “Search” tab (magnifying glass icon), download a tool-capable model like lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF. Once downloaded, go to the “Chat” tab (speech bubble icon) and select the model from the dropdown at the top to load it into memory.
  • Start the API Server: Navigate to the “Local Server” tab (two arrows icon) in LM Studio and click the “Start Server” button. This will activate the OpenAI-compatible API at http://localhost:1234/v1.
  • Configure mcp.json:
EConfigur mcp.json for LM Studio
Figure 3. EConfigur mcp.json for LM Studio

Paste the following JSON configuration into the editor. Replace /Users/markw/GITHUB/LM_Studio_BOOK/src/mcp_examples/ with the absolute path to the math-and-time-and-files directory you created.

1 {
2   "math-and-time-and-files": {
3       "command": "/Users/markw/GITHUB/LM_Studio_BOOK/src/mcp_examples/.venv/bin/python",
4       "args": [
5         "/Users/markw/GITHUB/LM_Studio_BOOK/src/mcp_examples/server.py"
6       ]
7   }
8 }

Note that we will be running the tools in the file server.py that we saw earlier.

Save the file (Ctrl+S or Cmd+S). LM Studio will automatically detect the change and launch your server.py script in a background process. You should see the log messages from your server appear in the “Program” tab’s log viewer.

Run the New Tools

Open a new chat tab in LM Studio. Enter a prompt like what time is it?

Please remember that using the “electrical plug” icon seen in the last figure, you must enable tools to use in each new chat session. By default tools are configured to require human approval for their use - often turn this setting to auto after the Python code for a tool is complete.

You will by default see a popup dialog asking you to approve the tool tall. Review the request (“The model wants to run get_current_time_for_mark …”) and click “Allow”.

Observe the result: after you approve the tool call the model running in LM Studio uses the results of the tool call to print a human-readable (not JSON) result.

Advanced Considerations and Future Trajectories

Having successfully implemented tools, it is valuable to consider the broader context and future potential of this technology. MCP is more than just a protocol for single-tool use; it is an architectural primitive for building complex, multi-faceted AI systems.

Composing Multiple MCP Servers for Complex Workflows

The true power of the Model Context Protocol is realized through composability. An MCP Host like LM Studio is not limited to a single server; it can connect to and manage multiple MCP servers simultaneously. This enables the creation of agents that can orchestrate actions across disparate systems and data sources.

Consider a more advanced workflow that combines, for example, a hypothetical custom LocalFileSystemServer with a pre-built, community-provided MCP server for GitHub. An agent could execute the following sequence:

  • Read a local bug report: The agent uses the read_file tool from the LocalFileSystemServer to ingest the details of a bug from a local text file.
  • Search for related issues: The agent then uses a search_issues tool provided by the GitHub MCP server, using keywords from the bug report to find similar, existing issues in a specific repository.
  • Analyze and summarize: The agent processes the search results and the original bug report.
  • Write a summary file: Finally, the agent uses the write_file tool from this hypothetical server to save a new file containing a summary of its findings and a list of potentially related GitHub issues.

This entire workflow is orchestrated by the LLM, which seamlessly switches between tools provided by two different, independent servers. This level of interoperability, made simple by the standardized protocol, is what enables the development of truly powerful and versatile AI assistants.

The Emerging MCP Ecosystem: Registries and Pre-built Servers

The standardization provided by MCP is fostering a vibrant ecosystem of tools and integrations. A growing number of pre-built MCP servers are available for popular enterprise and developer tools, including Google Drive, Slack, GitHub, Postgres, and Stripe. This allows developers to quickly add powerful capabilities to their agents without having to write the integration code themselves.

To facilitate the discovery and management of these servers, the community is developing MCP Registries. A registry acts as a centralized or distributed repository, like to a package manager where developers can publish, find, and share MCP servers. The official MCP organization on GitHub hosts a community-driven registry service, and Microsoft has announced plans to contribute a registry for discovering servers within its ecosystem.

This trend points toward the creation of a true “app store” or “plugin marketplace” for AI tools. A developer building an agent will be able to browse a registry, select the tools they need (e.g., a calendar tool, a weather tool, a stock trading tool), and easily add them to their Host application. This will dramatically accelerate the development of feature-rich agents and create a new “Tool Economy,” where companies and individual developers can create and even monetize specialized MCP servers for niche applications.

The Trajectory of Local-First AI Agents

The convergence of powerful open-source LLMs, accessible local inference engines like LM Studio, and a standardized tool protocol like MCP marks an inflection point for AI. It signals the rise of the local-first AI agent that is a new class of applications that are private, personalized, and deeply integrated into a user’s personal computing environment.

The future of this technology extends beyond simple chat interfaces. We can anticipate the emergence of “ambient assistants” embedded directly into operating systems, IDEs, and other desktop applications. These assistants will use MCP as the common language to securely access and reason about a user’s personal context like their local files, emails, calendar appointments, and contacts without sending sensitive data to the cloud. They will be able to perform complex, multi-step tasks on the user’s behalf, seamlessly blending the reasoning power of LLMs with the practical utility of desktop and web applications. MCP provides the critical, standardized plumbing that makes this future possible.

The importance of using local inference tools like LM Studio and Ollama is enabling developers to develop “privacy first” systems that either leak no personal or proprietary data, or less private data, to third party providers.

Concluding Analysis and Recommendations

Here we have detailed the architecture, strategy, and implementation of tools written in Python and addressed the possibility of building a local AI agent using the Model Context Protocol. The key conclusions are:

  • MCP successfully standardizes tool use, solving a fundamental integration problem and creating a foundation for a new, interoperable ecosystem of AI capabilities. Its design, inspired by the Language Server Protocol, effectively decouples tool providers from AI application providers.
  • LM Studio democratizes agentic AI development. By acting as a robust MCP Host, it empowers any developer to build and experiment with sophisticated, tool-using agents on their own hardware, using private data and open-source models.
  • The MCP and fastMCP Python SDKs both offer an exemplary developer experience. The FastMCP class and its declarative, decorator-based approach abstract away the protocol’s complexity, allowing developers to create powerful custom tools with minimal boilerplate and in a familiar, “Pythonic” style.

For developers embarking on building with MCP, the following recommendations are advised:

  • Start with Read-Only Tools: Begin by implementing tools that only read data (like list_directory and read_file). This allows for safe experimentation with the protocol and workflow without the risk of unintended side effects.
  • Prioritize Descriptive Tool Manifests: The most critical element for a tool’s success is its description. Write clear, unambiguous docstrings that accurately describe what the tool does, its parameters, and what it returns. This “semantic contract” is what the LLM relies on to make intelligent decisions.
  • Exercise Extreme Caution with Write-Enabled Tools: Tools that modify the file system or have other side effects are incredibly powerful but also dangerous. The security of the system relies on the user’s vigilance. Always assume a tool could be called with malicious intent and rely on the Host’s confirmation dialog as the final, essential safeguard.

The Model Context Protocol is more than just a technical specification; it is a key piece of infrastructure for the next generation of computing. By providing a secure and standardized bridge between the reasoning capabilities of Large Language Models and the vast world of external data and tools, MCP is paving the way for a future of more helpful, capable, and personalized AI assistants.

MCP Wrap Up

Dear reader, while we have looked at the architecture and rationale behind MCP, we have barely skimmed the surface of possible applications. I am a computer scientist with a preference for manually designing and writing code. My personal approach to using MCP with tools to build agentic systems is to put most of the complexity in the Python tool implementations and to rely less on LLMs except to primarily act as a human friendly interface between user interactions and backend systems and data stores. In fairness, I have a “minority opinion” compared to most people working in our industry.

I enjoy writing my own MCP servers and perhaps dear reader you do also. That said, a web search for “open source MCP server examples” shows a rich and developing ecosystem of open source projects that can easily be used with LM Studio and other MCP-compliant platforms. Some open source projects you might enjoy or find useful are:

Here are a few maintained lists of MCP servers: