Command Line Tool to Use Local Ollama LLM Server

Ollama is a multi-platform tool for running Large Language Models (LLMs) on your laptop. Ollama is available for macOS, Linux, and Windows.

My personal preference is to run local models on my own computer, and just use commercial LLM APIs from OpenAI, Google, Anthropic, Mistral, etc. when local models are insufficient for specific tasks.

You can download Ollama from https://ollama.ai.

The directory haskell_tutorial_cookbook_examples/ollama_commandline contains the code for this example.

Before we look at the code, let’s look at how we use this command line tool:

cabal run ollama-client "how much is 4 + 11 + 13?"
cabal run ollama-client "write Python script to print 11th and 12th prime numbers"
cabal run ollama-client "Write a Haskell hello world program"

I like to build the executable for this example using cabal build and then copy the executable file to my personal bin utility directory using these commands:

$ find . -name ollama-client
$ cp ./dist-newstyle/build/aarch64-osx/ghc-9.4.8/ollama-client-0.1.0.0/x/ollama-clie\
nt/build/ollama-client/ollama-client ~/bin

The find command is used to find the location of the executable generated by cabal build.

The example in this chapter is very similar to the example in the last chapter that uses Google’s Gemini model APIs. Currently Google makes using their APIs almost free but I still like running open weight models on my own laptop for privacy and security.

This example program is listed below. I will describe the code after the listing.

{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE OverloadedRecordDot #-}

import Control.Monad.IO.Class (liftIO)
import System.Environment (getArgs)
import qualified Data.Aeson as Aeson
import Data.Aeson (FromJSON, ToJSON)
import GHC.Generics
import Network.HTTP.Client (newManager, httpLbs, parseRequest, Request(..), RequestB\
ody(..), responseBody, responseStatus, defaultManagerSettings)
import Network.HTTP.Types.Status (statusCode)

data OllamaRequest = OllamaRequest
  { model :: String
  , prompt :: String
  , stream :: Bool
  } deriving (Show, Generic, ToJSON)

data OllamaResponse = OllamaResponse
  { model :: String
  , created_at :: String
  , response :: String  -- This matches the actual field name in the JSON
  , done :: Bool
  , done_reason :: String
  } deriving (Show, Generic, FromJSON)

main :: IO ()
main = do
  args <- getArgs
  case args of
    [] -> putStrLn "Error: Please provide a prompt as a command-line argument."
    (arg:_) -> do
      manager <- newManager defaultManagerSettings

      initialRequest <- parseRequest "http://localhost:11434/api/generate"

      let ollamaRequestBody = OllamaRequest
            { model = "llama3.2:latest"  -- You can change this to another  model
            , prompt = arg
            , stream = False
            }

      let request = initialRequest
            { requestHeaders = [("Content-Type", "application/json")]
            , method = "POST"
            , requestBody = RequestBodyLBS $ Aeson.encode ollamaRequestBody
            }

      httpResponse <- httpLbs request manager
--    liftIO $ putStrLn $ "httpResponse:" ++ show httpResponse -- debug
      
      let responseStatus' = responseStatus httpResponse

      if statusCode responseStatus' == 200
        then do
          let maybeOllamaResponse =
                Aeson.decode (responseBody httpResponse) :: Maybe OllamaResponse
          case maybeOllamaResponse of
            Just ollamaResponse -> do
              liftIO $ putStrLn $ "Response:\n\n" ++ ollamaResponse.response
            Nothing -> do
              liftIO $ putStrLn "Error: Failed to parse response"
        else do
          putStrLn $ "Error: " ++ show responseStatus'

This Haskell code defines a program that interacts with a local HTTP API provided by Ollama to send a prompt to a language model and retrieve its response. The program uses several Haskell features and libraries, including OverloadedRecordDot for concise record field access and DuplicateRecordFields to allow multiple data types to share field names. The program is structured around two main data types, OllamaRequest and OllamaResponse, which represent the request and response payloads for the API. These data types are derived from the Generic type class, enabling automatic JSON serialization and deserialization using the Aeson library. The OllamaRequest type includes fields for specifying the model, the prompt, and whether the response should be streamed, while the OllamaResponse type captures the model’s response, creation time, and additional metadata.

The function main begins by reading command-line arguments using getArgs. If no arguments are provided, the program outputs an error message and terminates. Otherwise, it extracts the first argument as the prompt and proceeds to set up an HTTP client using newManager with default settings. The program constructs an HTTP POST request to the local API endpoint (http://localhost:11434/api/generate) using parseRequest. It then creates an instance of OllamaRequest with a hardcoded model name (llama3.2:latest), the user-provided prompt, and stream set to False. This request is serialized to JSON using Aeson.encode and included in the HTTP request body, with the appropriate Content-Type header.

The program sends the HTTP request using httpLbs, which performs the request and returns the response. It then checks the HTTP status code of the response using responseStatus and statusCode. If the status code is 200 (indicating success), the program attempts to decode the response body into an OllamaResponse object using Aeson.decode. If decoding is successful, it extracts and prints the response field from the OllamaResponse object using the OverloadedRecordDot extension for concise field access. If decoding fails, the program outputs an error message indicating that the response could not be parsed.

If the HTTP status code is not 200, the program outputs an error message with the status code. This error handling ensures that the user is informed of any issues with the API request or response. Overall, the program demonstrates how to use Haskell’s type system and libraries to build a robust and type-safe client for interacting with a JSON-based HTTP API. It also highlights the use of modern Haskell features, such as record dot syntax and generic programming, to simplify code and improve readability.

In general when you are writing code to access a LLM model using a REST interface, look for examples using the curl command, examine the returned JSON payload and write code to decode the JSON and extract the information you need. I started writing this example by finding the following curl example in the Ollama documentation and running it:

1 curl http://localhost:11434/api/generate -d '{
2   "model": "llama3.2",
3   "prompt": "What is 3 + 5 + 11?",
4   "stream": false
5 }'