Leanpub: Publish Early, Publish Often

Using Ollama to Run Local LLMs

Ollama is a program and framework written in Go that allows you to download, run models on the command line, and call using a REST style interface. You need to downnload the Ollama executable for your operation system at https://ollama.com.

Similarly to our use of a third party for accessing the Anthropic Clause models, here we will not write a wrapper libary. The example code for ths chapter is in the test code for the Swift project in the GitHub repository https://github.com/mark-watson/Ollama_swift_examples.

We use the library in the GitHub repository https://github.com/mattt/ollama-swift.

Running the Ollama Service

Assuming you have Ollama installed, download the following model that required two gigabytes of disk space:

ollama pull llama3.2:latest

When the model is downloaded it is also cached for future use on your laptop.

Here is the test/example code we will run:

import XCTest
import Ollama

final class Ollama_swift_examplesTests: XCTestCase {
    let text1 = "If Mary is 42, Bill is 27, and Sam is 51, what are their pairwise a\
ge differences."
    let client = Ollama.Client.default // http://localhost:11434 endpoint
    func testExample() async throws {
      let response = try await client.chat(
        model: "llama3.2:latest",
        messages: [
            .system("You are a helpful assistant who completes text and also answers\
 questions. You are always concise."),
            .user(text1),
            .user("what if Sam is 52?")
        ])
        print(response.message.content)
    }
}

The output looks like:

Pairwise age differences:

- Mary - Bill: |42 - 27| = 15
- Mary - Sam: |42 - 51| = 9
- Bill - Sam: |27 - 51| = 24

If Sam is 52:
- Mary - Bill: |42 - 27| = 15
- Mary - Sam: |42 - 52| = 10
- Bill - Sam: |27 - 52| = 25

The ollama_swift library also supports text generation. You can also do single shot text generation using the code in the previous example, but only using one user call, for example:

final class Ollama_swift_examplesTests: XCTestCase {
    let text1 = "What is the capital of Germany?"
    let client = Ollama.Client.default
    func testExample() async throws {
      let response = try await client.chat(
        model: "llama3.2:latest",
        messages: [
            .system("You are a helpful assistant who completes text and also answers\
 questions. You are always concise."),
            .user(text1),
        ])
        print(response.message.content)
    }
}

The output looks like:

The capital of Germany is Berlin.

Ollama Wrap Up

This is a short chapter but an important one. I do over half my work with LLMs running locally on my laptop using Ollama, with the rest of my work using OpenAI, Anthropic, and Groq commercial APIs.

Up next

Using Apple’s MLX Framework to Run Local LLMs