Using Ollama to Run Local LLMs
Ollama is a program and framework written in Go that allows you to download, run models on the command line, and call using a REST style interface. You need to downnload the Ollama executable for your operation system at https://ollama.com.
Similarly to our use of a third party for accessing the Anthropic Clause models, here we will not write a wrapper libary. The example code for ths chapter is in the test code for the Swift project in the GitHub repository https://github.com/mark-watson/Ollama_swift_examples.
We use the library in the GitHub repository https://github.com/mattt/ollama-swift.
Running the Ollama Service
Assuming you have Ollama installed, download the following model that required two gigabytes of disk space:
ollama pull llama3.2:latest
When the model is downloaded it is also cached for future use on your laptop.
Here is the test/example code we will run:
import XCTest
import Ollama
final class Ollama_swift_examplesTests: XCTestCase {
let text1 = "If Mary is 42, Bill is 27, and Sam is 51, what are their pairwise a\
ge differences."
let client = Ollama.Client.default // http://localhost:11434 endpoint
func testExample() async throws {
let response = try await client.chat(
model: "llama3.2:latest",
messages: [
.system("You are a helpful assistant who completes text and also answers\
questions. You are always concise."),
.user(text1),
.user("what if Sam is 52?")
])
print(response.message.content)
}
}
The output looks like:
Pairwise age differences:
- Mary - Bill: |42 - 27| = 15
- Mary - Sam: |42 - 51| = 9
- Bill - Sam: |27 - 51| = 24
If Sam is 52:
- Mary - Bill: |42 - 27| = 15
- Mary - Sam: |42 - 52| = 10
- Bill - Sam: |27 - 52| = 25
The ollama_swift library also supports text generation. You can also do single shot text generation using the code in the previous example, but only using one user call, for example:
final class Ollama_swift_examplesTests: XCTestCase {
let text1 = "What is the capital of Germany?"
let client = Ollama.Client.default
func testExample() async throws {
let response = try await client.chat(
model: "llama3.2:latest",
messages: [
.system("You are a helpful assistant who completes text and also answers\
questions. You are always concise."),
.user(text1),
])
print(response.message.content)
}
}
The output looks like:
The capital of Germany is Berlin.
Ollama Wrap Up
This is a short chapter but an important one. I do over half my work with LLMs running locally on my laptop using Ollama, with the rest of my work using OpenAI, Anthropic, and Groq commercial APIs.