Ollama
Ollama is a powerful and user-friendly tool designed to simplify the process of running large language models (LLMs) locally on personal hardware. In a landscape often dominated by cloud-based APIs, Ollama democratizes access to advanced AI by providing a simple command-line interface that bundles model weights, configurations, and a tailored execution environment into a single, easy-to-install package. It allows developers, researchers, and enthusiasts to download and interact with a wide range of popular open-source models, such as Llama 3, Mistral, and Phi-3, with just a single command. Beyond its interactive chat functionality, Ollama also exposes a local REST API, enabling the seamless integration of these locally-run models into custom applications without the latency, cost, or privacy concerns associated with remote services. This focus on accessibility and local deployment makes it an indispensable tool for offline development, rapid prototyping, and leveraging the power of modern LLMs while maintaining full control over data and infrastructure.
Example Code
This next program in file gerbil_scheme_book/source_code/ollama/ollama.ss provides a practical demonstration of network programming and data handling in Gerbil Scheme by creating a simple client for the Ollama API. Ollama is a fantastic tool that allows you to run powerful large language models, like Llama 3, Mistral, and Gemma, directly on your own machine. Our ollama function will encapsulate the entire process of communicating with a locally running Ollama instance. It will take a text prompt as input, construct the necessary JSON payload specifying the model and prompt, send it to the Ollama server’s /api/generate endpoint via an HTTP POST request, and then carefully parse the server’s JSON response. The goal is to extract and return only the generated text, while also including basic error handling to gracefully manage any non-successful API responses, making for a robust and reusable utility.
1 (import :std/net/request :std/text/json)
2 (export ollama)
3
4 (def (ollama prompt
5 model: (model "gemma3:latest")) ;; "gpt-oss:20b")) ;; "qwen3:0.6b"))
6 (let* ((endpoint "http://localhost:11434/api/generate")
7 (headers '(("Content-Type". "application/json")))
8 (body-data
9 (list->hash-table
10 `(("model". ,model) ("prompt". ,prompt) ("stream". #f))))
11 (body-string (json-object->string body-data)))
12
13 (let ((response (http-post endpoint headers: headers data: body-string)))
14 (if (= (request-status response) 200)
15 (let* ((response-json (request-json response)))
16 ;;(displayln (hash-keys response-json))
17 (hash-ref response-json 'response))
18 (error "Ollama API request failed"
19 status: (request-status response)
20 body: (request-text response))))))
21
22 ;; (ollama "why is the sky blue? Be very concise.")
The ollama function begins by using a let* block to define the necessary components for the API request: the server endpoint, the required HTTP headers, and the request body-data. The body is first constructed as a Gerbil hash-table, which is the natural way to represent a JSON object, and then serialized into a JSON string using json-object->string. Note that the “stream” parameter is explicitly set to #f to ensure we receive the complete response at once rather than as a series of events. The core of the function is the http-post call, which performs the actual network request.
After the request is made, the code immediately checks the status of the response. A status code of 200 indicates success, prompting the code to parse the JSON body using request-json and extract the generated text from the ’response field of the resulting hash-table. If the request fails for any reason, a descriptive error is raised, including the HTTP status and response body, which is crucial for debugging. The function’s design, with its optional model: keyword argument, makes it trivial to switch between different models you have downloaded through Ollama, providing a flexible interface for interacting with local large language models.
Install Ollama and Pull a Model to Experiment With
Linux Installation
Open your terminal and run the following command to download and execute the installation script:
1 curl -fsSL https://ollama.com/install.sh | sh
macOS Installation
- Download the Ollama application from the official website: [https://ollama.com/download}(https://ollama.com/download).
- Unzip the downloaded file.
- Move the Ollama.app file to your /Applications folder.
- Run the application. An Ollama icon will appear in the menu bar.
This will also install the ollama command line program.
Pulling the Model
After installing Ollama on either Linux or macOS, open your terminal and run the following command to download the gemma3:latest model:
1 ollama pull gemma3:latest
After this is complete, you can run the local API service using:
1 $ ollama serve
2 time=2025-08-26T16:05:50.161-07:00 level=INFO source=routes.go:1318 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/markw/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1
Example Output
You need to have Ollama installed on your system and you should pull the model you want to experiment with.
1 $ gxi -L ollama.ss -
2 > (ollama "why is the sky blue? Be very concise.")
3 "The sky is blue due to a phenomenon called **Rayleigh scattering**. Shorter wavelengths of light (like blue) are scattered more by the Earth's atmosphere, making the sky appear blue to our eyes."
4
5 > (ollama "write a bash script to rename all files with extension **.JPG** to **.jpg**. Just output the bash script and nothing else.")
6 "```bash\n#!/bin/bash\n\nfind . -name \"*.JPG\" -print0 | while IFS= read -r -d $'\\0' file; do\n new_name=$(echo \"$file\" | sed 's/\\.JPG/.jpg/')\n mv \"$file\" \"$new_name\"\ndone\n```\n"
7
8 > (displayln (ollama "write a bash script to rename all files with extension **.JPG** to **.jpg**. Just output the bash script and nothing else."))
9 ``bash
10 #!/bin/bash
11
12 find . -name "*.JPG" -print0 | while IFS= read -r -d $'\0' file; do
13 new_name=$(echo "$file" | sed 's/\.JPG/\.jpg/')
14 mv "$file" "$new_name"
15 done
16 ``
17 >
A few comments: In the second example I added “Just output the bash script and nothing else.” to the end of the prompt. Without this, the model will generate a 100 lines of design notes, instructions how to make the bash script executable, etc. I didn’t want that, just the bash script.
In the third example, I used the same prompt but used displayln to print the result in a more useful format.