Command Line Utility To Use the Google Gemini APIs

This example is similar to the example in the last chapter but here we build not a web application but a command line application to use the Google Gemini LLM APIs.

The directory haskell_book/source-code/gemini_commandline contains the code for this example.

Before we look at the code let’s run the example:

$ gemini "what is the square of pi?"
Response:

The square of pi (π) is π multiplied by itself: π².  Since π is approximately 3.14159, π² is approximately 9.8696.

The executable file gemini is on my path because I copied the executable file to my personal bin directory:

$ cabal build
$ find . -name gemini
  ... output not shown
$ cp ./dist-newstyle/build/aarch64-osx/ghc-9.4.8/gemini-0.1.0.0/x/gemini/build/gemini/gemini ~/bin

If you don’t want to permanently install this example on your laptop, then just run it with cabal:

$ cabal run gemini "what is 11 + 23?"
Response:

11 + 23 = 34

Here is a listing of the source file Main.hs (explanation after the code). This code is a Haskell program that interacts with Google’s Gemini AI model through its API. The program is structured to send prompts to Gemini and receive generated responses, implementing a command line interface for this interaction.

-- Minimal Gemini command-line demo; calls the API and extracts simple entities
-- Run with no args for a demo, or pass a prompt; set `GOOGLE_API_KEY` in your environment
{-# LANGUAGE DeriveGeneric       #-}
{-# LANGUAGE OverloadedStrings   #-}
{-# LANGUAGE ScopedTypeVariables #-}

import System.Environment (getArgs, getEnv)
import qualified Data.Aeson as Aeson
import Data.Aeson (FromJSON, ToJSON, eitherDecode)
import GHC.Generics (Generic)
import Network.HTTP.Client.TLS (tlsManagerSettings)
import Network.HTTP.Client (newManager, httpLbs, parseRequest, Manager, Request(..), RequestBody(..), Response(..), responseStatus)
import Network.HTTP.Types.Status (statusCode)
-- Replace qualified import with explicit import list:
import Data.Text (Text, pack, unpack, splitOn, strip, null)
import Data.Text.Encoding (encodeUtf8)
import Control.Exception (SomeException, handle)

-- --- Request Data Types ---

data RequestPart = RequestPart
  { reqText :: Text  -- Using reqText to avoid name clash with Response Part's text
  } deriving (Show, Generic)

instance ToJSON RequestPart where
  toJSON (RequestPart t) = Aeson.object ["text" Aeson..= t]

data RequestContent = RequestContent
  { reqParts :: [RequestPart] -- Using reqParts to avoid name clash
  } deriving (Show, Generic)

instance ToJSON RequestContent where
  toJSON (RequestContent p) = Aeson.object ["parts" Aeson..= p]

data GenerationConfig = GenerationConfig
  { temperature     :: Double
  , maxOutputTokens :: Int
  -- Add other config fields as needed (e.g., topP, topK)
  } deriving (Show, Generic, ToJSON)

data GeminiApiRequest = GeminiApiRequest
  { contents         :: [RequestContent]
  , generationConfig :: GenerationConfig
  } deriving (Show, Generic, ToJSON)


-- --- Response Data Types (mostly unchanged, renamed for clarity) ---

data ResponsePart = ResponsePart
  { text :: String
  } deriving (Show, Generic, FromJSON)

data ResponseContent = ResponseContent
  { parts :: [ResponsePart]
  } deriving (Show, Generic, FromJSON)

data Candidate = Candidate
  { content :: ResponseContent
  } deriving (Show, Generic, FromJSON)

-- Assuming promptFeedback might be present at the top level of the response
-- alongside candidates, adjust if it's nested differently.
data SafetyRating = SafetyRating
  { category    :: String
  , probability :: String
  } deriving (Show, Generic, FromJSON)

data PromptFeedback = PromptFeedback
  { blockReason   :: Maybe String
  , safetyRatings :: Maybe [SafetyRating]
  } deriving (Show, Generic, FromJSON)

data GeminiApiResponse = GeminiApiResponse
  { candidates     :: [Candidate]
  , promptFeedback :: Maybe PromptFeedback -- Added optional promptFeedback
  } deriving (Show, Generic, FromJSON)

-- --- Completion Function ---

-- | Sends a prompt to the Gemini API and returns the completion text or an error.
completion :: String             -- ^ Google API Key
           -> Manager            -- ^ HTTP Manager
           -> String             -- ^ The user's prompt text
           -> IO (Either String String) -- ^ Left error message or Right completion text
completion apiKey manager promptText = do
  -- Build the JSON payload expected by the Gemini API
  initialRequest <- parseRequest "https://generativelanguage.googleapis.com/v1/models/gemini-2.5-flash:generateContent"
  let reqContent = RequestContent { reqParts = [RequestPart { reqText = pack promptText }] }
  let genConfig = GenerationConfig { temperature = 0.1, maxOutputTokens = 800 }
  let apiRequest = GeminiApiRequest { contents = [reqContent], generationConfig = genConfig }

  -- Prepare the HTTP POST request with headers and body
  let request = initialRequest
        { requestHeaders =
            [ ("Content-Type", "application/json")
            , ("x-goog-api-key", encodeUtf8 $ pack apiKey)
            ]
        , method = "POST"
        , requestBody = RequestBodyLBS $ Aeson.encode apiRequest
        }

  -- Send the request using the shared HTTP manager
  response <- httpLbs request manager
  let status = responseStatus response
      body = responseBody response

  -- Check HTTP status and decode JSON into our Haskell types
  if statusCode status == 200
    then do
      case eitherDecode body :: Either String GeminiApiResponse of
        Left err -> return $ Left ("Error decoding JSON response: " ++ err)
        Right geminiResponse ->
          -- Extract first candidate and its first text part
          case candidates geminiResponse of
            (candidate:_) ->
              case parts (content candidate) of
                (part:_) -> return $ Right (text part)
                [] -> return $ Left "Error: Received candidate with no parts."
            [] ->
              -- No candidates: check if the prompt was blocked and report why
              case promptFeedback geminiResponse of
                Just pf -> case blockReason pf of
                             Just reason -> return $ Left ("API Error: Blocked - " ++ reason)
                             Nothing -> return $ Left "Error: No candidates found and no block reason provided."
                Nothing -> return $ Left "Error: No candidates found in response."
    else do
      let err = "Error: API request failed with status " ++ show (statusCode status) ++ "\nBody: " ++ show body
      return $ Left err

-- | A generic function to extract entities from text using a specific prompt pattern.
extractEntities :: String             -- ^ Type of entity to extract (e.g., "place names")
                -> String             -- ^ Example for the prompt (e.g., "London,Paris,Tokyo")
                -> String             -- ^ Google API Key
                -> Manager            -- ^ HTTP Manager
                -> String             -- ^ The input text to analyze
                -> IO (Either String [String]) -- ^ Left error or Right list of entities
extractEntities entityType example apiKey manager inputText = do
    -- Build a simple, strict prompt and call the completion helper
    let prompt = "Extract only the " ++ entityType ++ " strictly separated by commas from the following text. Do not include any explanation or introduction. Example: " ++ example ++ "\n\nText:\"" ++ inputText ++ "\""
    apiResult <- completion apiKey manager prompt

    -- Parse a comma-separated list into `[String]`, trimming empty items
    return $ case apiResult of
        Left err -> Left ("API call failed in extractEntities for " ++ entityType ++ ": " ++ err)
        Right responseText ->
            let rawParts = splitOn (pack ",") (pack responseText)
                strippedParts = map strip rawParts
                nonEmptyParts = filter (not . Data.Text.null) strippedParts
                entities = map unpack nonEmptyParts
            in Right entities

-- | Extracts potential place names from text using the Gemini API.
findPlaces :: String             -- ^ Google API Key
           -> Manager            -- ^ HTTP Manager
           -> String             -- ^ The input text to analyze
           -> IO (Either String [String]) -- ^ Left error or Right list of places
findPlaces = extractEntities "place names" "London,Paris,Tokyo"

-- | Extracts potential person names from text using the Gemini API.
findPeople :: String             -- ^ Google API Key
           -> Manager            -- ^ HTTP Manager
           -> String             -- ^ The input text to analyze
           -> IO (Either String [String]) -- ^ Left error or Right list of people
findPeople = extractEntities "person names" "Alice,Bob,Charlie"

-- | Extracts potential company names from text using the Gemini API.
findCompanyNames :: String             -- ^ Google API Key
                 -> Manager            -- ^ HTTP Manager
                 -> String             -- ^ The input text to analyze
                 -> IO (Either String [String]) -- ^ Left error or Right list of companies
findCompanyNames = extractEntities "company names" "Google,Apple,Microsoft"

-- --- Main Function ---

main :: IO ()
main = do
  -- CLI entry: with no args runs demo; with a prompt does direct completion
  args <- getArgs
  case args of
    -- If no args, run demo extraction on sample text
    [] -> do
        putStrLn "No prompt provided. Running demo extraction on sample text:"
        let sampleText = "Dr. Evelyn Reed from Acme Corporation went to London last week with her colleague Bob Smith. They visited the Tower Bridge and met someone near Paris, Texas."
        putStrLn $ "Sample Text: \"" ++ sampleText ++ "\"\n"

        -- Read API key from env; create a single `Manager` for all requests
        apiKeyResult <- lookupEnv "GOOGLE_API_KEY"
        case apiKeyResult of
            Nothing -> putStrLn "Error: GOOGLE_API_KEY environment variable not set."
            Just apiKey -> do
                manager <- newManager tlsManagerSettings

                -- Find Places
                putStrLn "Attempting to find places..."
                placesResult <- findPlaces apiKey manager sampleText
                case placesResult of
                    Left err -> putStrLn $ "Error finding places: " ++ err
                    Right places -> putStrLn $ "Found Places: " ++ show places

                putStrLn "\nAttempting to find people..."
                -- Find People
                peopleResult <- findPeople apiKey manager sampleText
                case peopleResult of
                    Left err -> putStrLn $ "Error finding people: " ++ err
                    Right people -> putStrLn $ "Found People: " ++ show people

                putStrLn "\nAttempting to find company names..."
                -- Find Companies
                companiesResult <- findCompanyNames apiKey manager sampleText
                case companiesResult of
                    Left err -> putStrLn $ "Error finding companies: " ++ err
                    Right companies -> putStrLn $ "Found Companies: " ++ show companies

    -- If args provided, run original completion behavior
    (promptArg:_) -> do
      putStrLn "Prompt provided. Running direct completion:"
      apiKeyResult <- lookupEnv "GOOGLE_API_KEY"
      case apiKeyResult of
        Nothing -> putStrLn "Error: GOOGLE_API_KEY environment variable not set."
        Just apiKey -> do
          manager <- newManager tlsManagerSettings
          -- Single completion call for the provided prompt
          result <- completion apiKey manager promptArg

          -- Print either an error message or the model's response
          case result of
            Left errMsg -> putStrLn $ "API Call Failed:\n" ++ errMsg
            Right completionText -> putStrLn $ "Response:\n\n" ++ completionText


-- Helper: safely read an environment variable (returns `Nothing` if missing)
lookupEnv :: String -> IO (Maybe String)
lookupEnv name = handle (\(_ :: SomeException) -> return Nothing) $ Just <$> getEnv name

The first section of the code defines several data types using Haskell’s deriving mechanism for automatic JSON serialization/deserialization. These types (GeminiRequest, GeminiResponse, Candidate, Content, Part, PromptFeedback, and SafetyRating) mirror the JSON structure expected by the Gemini API. This demonstrates Haskell’s strong type system being used to ensure type-safe handling of API data structures.

The main function implements the program’s core logic: it retrieves a prompt from command-line arguments and an API key from environment variables, constructs an HTTP request to the Gemini API with proper headers and JSON body, and handles the response. The code uses monadic composition to handle the asynchronous nature of HTTP requests and includes error handling for various failure cases, such as missing arguments, API errors, or malformed responses. The response processing extracts the generated text from the nested data structure and prints it to the console. The code also includes configuration for the AI model’s parameters like temperature and maximum output tokens.