Command Line Tool to Use Local Ollama LLM Server

Ollama is a multi-platform tool for running Large Language Models (LLMs) on your laptop. Ollama is available for macOS, Linux, and Windows.

My personal preference is to run local models on my own computer, and just use commercial LLM APIs from OpenAI, Google, Anthropic, Mistral, etc. when local models are insufficient for specific tasks.

You can download Ollama from https://ollama.ai.

The directory haskell_book/source-code/ollama_commandline contains the code for this example.

Before we look at the code, let’s look at how we use this command line tool:

cabal run ollama-client "how much is 4 + 11 + 13?"
cabal run ollama-client "write Python script to print 11th and 12th prime numbers"
cabal run ollama-client "Write a Haskell hello world program"

I like to build the executable for this example using cabal build and then copy the executable file to my personal bin utility directory using these commands:

$ find . -name ollama-client
$ cp ./dist-newstyle/build/aarch64-osx/ghc-9.4.8/ollama-client-0.1.0.0/x/ollama-client/build/ollama-client/ollama-client ~/bin

The find command is used to find the location of the executable generated by cabal build.

The example in this chapter is very similar to the example in the last chapter that uses Google’s Gemini model APIs. Currently Google makes using their APIs almost free but I still like running open weight models on my own laptop for privacy and security.

This example program is listed below. I will describe the code after the listing.

-- Simple command-line client for Ollama's local API
-- Usage: run as `Main "<prompt>" [model]` or `runghc Main.hs "<prompt>" [model]`. Default model: `llama3.2:latest`
-- LANGUAGE pragmas enable features used below
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE OverloadedRecordDot #-}
{-# LANGUAGE DeriveGeneric #-}

-- Core utilities
import Control.Monad (when)
import System.Environment (getArgs)

-- JSON support
import qualified Data.Aeson as Aeson
import Data.Aeson (FromJSON, ToJSON)
import GHC.Generics (Generic)

-- HTTP client
import Network.HTTP.Client
  ( newManager
  , httpLbs
  , parseRequest
  , Request(..)
  , RequestBody(..)
  , responseBody
  , responseStatus
  , defaultManagerSettings
  , Manager
  )
import Network.HTTP.Types.Status (statusIsSuccessful)

-- Types that mirror Ollama's request and response JSON
data OllamaRequest = OllamaRequest
  { model :: String        -- name/tag of the model to use
  , prompt :: String       -- user input sent to the model
  , stream :: Bool         -- stream tokens or return a single final string
  } deriving (Show, Generic, ToJSON)

data OllamaResponse = OllamaResponse
  { model :: String
  , created_at :: String
  , response :: String     -- the generated text from the model
  , done :: Bool
  , done_reason :: Maybe String -- may be missing; use Maybe
  } deriving (Show, Generic, FromJSON)

-- Call Ollama's local API and decode the JSON response
callOllama :: Manager -> String -> String -> IO (Either String OllamaResponse)
callOllama manager modelName userPrompt = do
  -- Build the POST request to /api/generate
  initialRequest <- parseRequest "http://localhost:11434/api/generate"

  let ollamaRequestBody = OllamaRequest
        { model = modelName
        , prompt = userPrompt
        , stream = False     -- single complete response
        }

  let request = initialRequest
        { requestHeaders = [("Content-Type", "application/json")]
        , method = "POST"
        , requestBody = RequestBodyLBS $ Aeson.encode ollamaRequestBody -- encode as JSON
        }

  -- Send the request and get the HTTP response
  httpResponse <- httpLbs request manager

  let status = responseStatus httpResponse
      body = responseBody httpResponse

  if statusIsSuccessful status
    then do
      -- Try to decode the JSON body into our Haskell type
      let maybeOllamaResponse = Aeson.decode body :: Maybe OllamaResponse
      case maybeOllamaResponse of
        Just ollamaResponse -> return $ Right ollamaResponse
        Nothing -> return $ Left $ "Error: Failed to parse JSON response. Body: " ++ show body
    else
      -- Non-2xx HTTP status
      return $ Left $ "Error: HTTP request failed with status " ++ show status ++ ". Body: " ++ show body

main :: IO ()
main = do
  -- Read command-line args: prompt and optional model name
  args <- getArgs
  case args of
    [] -> putStrLn "Usage: <program_name> <prompt> [model_name]"
    (promptArg:modelArgs) -> do
      -- Choose model: use user-provided or default
      let modelName = case modelArgs of
                        (m:_) -> m
                        []    -> "llama3.2:latest"

      -- Create an HTTP connection manager
      manager <- newManager defaultManagerSettings

      putStrLn $ "Sending prompt '" ++ promptArg ++ "' to model '" ++ modelName ++ "'..."

      -- Make the API call
      result <- callOllama manager modelName promptArg

      -- Handle success or error
      case result of
        Right ollamaResponse -> do
          putStrLn "\n--- Response ---"
          putStrLn ollamaResponse.response
          -- Print reason if present
          when (ollamaResponse.done_reason /= Nothing) $
              putStrLn $ "\nDone reason: " ++ show ollamaResponse.done_reason
        Left err -> do
          putStrLn $ "API Error: " ++ err

This Haskell code defines a program that interacts with a local HTTP API provided by Ollama to send a prompt to a language model and retrieve its response. The program uses several Haskell features and libraries, including OverloadedRecordDot for concise record field access and DuplicateRecordFields to allow multiple data types to share field names. The program is structured around two main data types, OllamaRequest and OllamaResponse, which represent the request and response payloads for the API. These data types are derived from the Generic type class, enabling automatic JSON serialization and deserialization using the Aeson library. The OllamaRequest type includes fields for specifying the model, the prompt, and whether the response should be streamed, while the OllamaResponse type captures the model’s response, creation time, and additional metadata.

The function main begins by reading command-line arguments using getArgs. If no arguments are provided, the program outputs an error message and terminates. Otherwise, it extracts the first argument as the prompt and proceeds to set up an HTTP client using newManager with default settings. The program constructs an HTTP POST request to the local API endpoint (http://localhost:11434/api/generate) using parseRequest. It then creates an instance of OllamaRequest with a hardcoded model name (llama3.2:latest), the user-provided prompt, and stream set to False. This request is serialized to JSON using Aeson.encode and included in the HTTP request body, with the appropriate Content-Type header.

The program sends the HTTP request using httpLbs, which performs the request and returns the response. It then checks the HTTP status code of the response using responseStatus and statusCode. If the status code is 200 (indicating success), the program attempts to decode the response body into an OllamaResponse object using Aeson.decode. If decoding is successful, it extracts and prints the response field from the OllamaResponse object using the OverloadedRecordDot extension for concise field access. If decoding fails, the program outputs an error message indicating that the response could not be parsed.

If the HTTP status code is not 200, the program outputs an error message with the status code. This error handling ensures that the user is informed of any issues with the API request or response. Overall, the program demonstrates how to use Haskell’s type system and libraries to build a robust and type-safe client for interacting with a JSON-based HTTP API. It also highlights the use of modern Haskell features, such as record dot syntax and generic programming, to simplify code and improve readability.

In general when you are writing code to access a LLM model using a REST interface, look for examples using the curl command, examine the returned JSON payload and write code to decode the JSON and extract the information you need. I started writing this example by finding the following curl example in the Ollama documentation and running it:

1 curl http://localhost:11434/api/generate -d '{
2   "model": "llama3.2",
3   "prompt": "What is 3 + 5 + 11?",
4   "stream": false
5 }'