Image Processing With Local Ollama Models

Here we use a very small model qwen3.5:0.8b to support the use case of natural language queries to read image files and answer questions on the image content. This can easily be customized for specific data processing pipelines.

The code for this example can be found in the directory loving-common-lisp/src/ollama_images.

Design Notes for the Example code

Briefly dear reader, here are my design notes for this project:

  • Network Delegation: Rather than importing a heavy Lisp HTTP client (like Dexador or Drakma), the library delegates network I/O to curl via uiop:run-program. This keeps the dependency footprint small.
  • Security & Injection Prevention: By passing curl arguments as a strict list and piping the JSON payload through an in-memory string stream (make-string-input-stream) to curl’s stdin (—data-binary @-), the design completely bypasses the system shell. This eliminates command injection risks and avoids temporary file overhead.
  • Asymmetric JSON Handling: It relies on the external cl-json library for decoding responses but uses a lightweight, custom recursive function (lisp-to-json-string) for encoding. This ensures strict control over how Lisp keywords, symbols, and lists map to the specific JSON schema required by the Ollama API without needing complex CLOS (Common Lisp Object System) serializations.
  • Functional Wrappers: The public API (image-to-text, describe-image-simple) hides the encoding and transport complexity, presenting a clean, functional interface with sensible dynamic variables (model-name, ollama-host) for environment overrides.

Code to Process Images

This Common Lisp module acts as a lightweight client for the Ollama vision API by base64-encoding local images and serializing them into a customized JSON payload. To minimize dependencies and avoid shell injection risks, it securely streams this data to a spawned curl process via standard input and parses the returned JSON to extract the model’s text description.

  1 ;;;; describe-image.lisp — Send images to Ollama vision models for description
  2 ;;;;
  3 ;;;; Usage (from SBCL REPL):
  4 ;;;;   (load "describe-image.lisp")
  5 ;;;;   (describe-image:image-to-text "ticket.png" "Print out the plain text in this image")
  6 ;;;;   (describe-image:image-to-text '("a.png" "b.png") "Compare these two images.")
  7 ;;;;   (describe-image:describe-image-simple "photo.jpg")
  8 ;;;;
  9 ;;;; Environment:
 10 ;;;;   OLLAMA_MODEL — optional model override (default: qwen3.5:0.8b)
 11 ;;;;   OLLAMA_HOST  — optional API host override (default: http://localhost:11434/api/chat)
 12 
 13 (ql:quickload '(:cl-base64 :cl-json :uiop) :silent t)
 14 
 15 (defpackage #:describe-image
 16   (:use #:cl)
 17   (:export #:image-to-text
 18            #:describe-image-simple
 19            #:*model-name*
 20            #:*ollama-host*))
 21 
 22 (in-package #:describe-image)
 23 
 24 (defvar *model-name* "qwen3.5:0.8b"
 25   "Default vision-capable model for image queries.")
 26 
 27 (defvar *ollama-host* "http://localhost:11434/api/chat"
 28   "Ollama API endpoint for chat completions.")
 29 
 30 (defun encode-image (image-path)
 31   "Read IMAGE-PATH file and return its contents as a base64-encoded string.
 32    Signals an error immediately if the file does not exist."
 33   (unless (probe-file image-path)
 34     (error "Image file not found: ~a" image-path))
 35   (with-open-file (in image-path :element-type '(unsigned-byte 8))
 36     (let ((bytes (make-array (file-length in) :element-type '(unsigned-byte 8))))
 37       (read-sequence bytes in)
 38       (cl-base64:usb8-array-to-base64-string bytes))))
 39 
 40 (defun lisp-to-json-string (data)
 41   "Convert a Lisp nested association list to a JSON string.
 42    Keyword keys become JSON object keys; :true/:false become JSON booleans;
 43    lists of cons cells become JSON objects; plain lists become JSON arrays."
 44   (with-output-to-string (s)
 45     (labels ((alist-object-p (lst)
 46                "Return T if LST is a non-empty alist whose every car is a keyword."
 47                (and (consp lst)
 48                     (every #'(lambda (elem)
 49                                (and (consp elem)
 50                                     (keywordp (car elem))))
 51                            lst)))
 52              (write-json-value (value)
 53                (cond
 54                  ((null value)   (write-string "null"  s))
 55                  ((eq value :true)  (write-string "true"  s))
 56                  ((eq value :false) (write-string "false" s))
 57                  ((stringp value)
 58                   (write-char #\" s)
 59                   (loop for char across value
 60                         do (case char
 61                              (#\"      (write-string "\\\"" s))
 62                              (#\\      (write-string "\\\\" s))
 63                              (#\Newline (write-string "\\n"  s))
 64                              (#\Return  (write-string "\\r"  s))
 65                              (#\Tab     (write-string "\\t"  s))
 66                              (t         (write-char char s))))
 67                   (write-char #\" s))
 68                  ((numberp value)
 69                   (format s "~a" value))
 70                  ((symbolp value)
 71                   (write-json-value (string-downcase (symbol-name value))))
 72                  ((listp value)
 73                   (if (alist-object-p value)
 74                       ;; Encode as JSON object
 75                       (progn
 76                         (write-char #\{ s)
 77                         (loop for pair in value
 78                               for i from 0
 79                               when (> i 0) do (write-string ", " s)
 80                               do (let ((key (car pair))
 81                                        (val (cdr pair)))
 82                                    (write-json-value (if (keywordp key)
 83                                                          (string-downcase (symbol-name key))
 84                                                          key))
 85                                    (write-char #\: s)
 86                                    (write-json-value val)))
 87                         (write-char #\} s))
 88                       ;; Encode as JSON array
 89                       (progn
 90                         (write-char #\[ s)
 91                         (loop for elem in value
 92                               for i from 0
 93                               when (> i 0) do (write-string ", " s)
 94                               do (write-json-value elem))
 95                         (write-char #\] s))))
 96                  (t (error "Unsupported JSON type: ~a" (type-of value))))))
 97       (write-json-value data))))
 98 
 99 (defun parse-ollama-response (json-string)
100   "Parse the Ollama JSON response string.
101    Signals a descriptive error if the response is empty or contains an API error field.
102    Returns the message content string on success."
103   (when (or (null json-string) (string= json-string ""))
104     (error "Empty response from Ollama — is the server running at ~a?" *ollama-host*))
105   (let ((parsed (cl-json:decode-json-from-string json-string)))
106     ;; Propagate server-side error messages rather than swallowing them
107     (let ((err (cdr (assoc :error parsed))))
108       (when err
109         (error "Ollama API error: ~a" err)))
110     (let ((message (cdr (assoc :message parsed))))
111       (when message
112         (cdr (assoc :content message))))))
113 
114 (defun call-ollama-vision (image-base64-list prompt &key (model *model-name*) (host *ollama-host*))
115   "Call the Ollama vision API with a list of base64-encoded images and PROMPT.
116    JSON is streamed directly to curl's stdin — no temp file, no shell injection.
117    curl stderr is captured and reported on failure.
118    Returns the response content string."
119   (let* ((message (list (cons :|role|   "user")
120                         (cons :|content| prompt)
121                         (cons :|images|  image-base64-list)))
122          (data    (list (cons :|model|    model)
123                         (cons :|stream|   :false)
124                         (cons :|messages| (list message))))
125          (json-data (lisp-to-json-string data)))
126     (multiple-value-bind (response-string stderr-string exit-code)
127         ;; Pass args as a list — uiop bypasses the shell, preventing injection
128         (uiop:run-program (list "curl" "-s" "-X" "POST" host
129                                 "-H" "Content-Type: application/json"
130                                 "--data-binary" "@-")
131                           :input        (make-string-input-stream json-data)
132                           :output       '(:string :stripped t)
133                           :error-output '(:string :stripped t)
134                           :ignore-error-status t)
135       (unless (zerop exit-code)
136         (error "curl failed (exit code ~a): ~a" exit-code stderr-string))
137       (parse-ollama-response response-string))))
138 
139 (defun image-to-text (image-paths prompt &key (model nil) (host nil))
140   "Send one or more images to an Ollama vision model with PROMPT and return text.
141 
142    IMAGE-PATHS may be a single path string or a list of path strings.
143    Optional keyword arguments:
144      :model — override *model-name*
145      :host  — override *ollama-host*
146 
147    Examples:
148      (describe-image:image-to-text \"test.jpg\" \"What is in this image?\")
149      (describe-image:image-to-text \"ticket.png\" \"Print out the text\" :model \"llava\")
150      (describe-image:image-to-text '(\"before.png\" \"after.png\") \"Compare these images.\")"
151   (let ((paths (if (listp image-paths) image-paths (list image-paths))))
152     ;; Validate all paths upfront before encoding any of them
153     (dolist (p paths)
154       (unless (probe-file p)
155         (error "Image file not found: ~a" p)))
156     (let ((encoded-list (mapcar #'encode-image paths))
157           (use-model    (or model *model-name*))
158           (use-host     (or host  *ollama-host*)))
159       (call-ollama-vision encoded-list prompt :model use-model :host use-host))))
160 
161 (defun describe-image-simple (image-path)
162   "Convenience wrapper — describe a single image using the default model and prompt.
163    Equivalent to: (image-to-text IMAGE-PATH \"What is in this image?\")"
164   (image-to-text image-path "What is in this image?"))

This lightweight module serves as an excellent foundation for embedding local, privacy-preserving vision capabilities into larger architectures. You could use it to automate the annotation and curation of proprietary training datasets by running bulk inference on unlabelled image directories, integrate it into a CI/CD pipeline for intelligent visual regression testing to compare “before and after” graphical states, or deploy it as the ingestion layer for an expert system that extracts structured semantic data from scanned technical diagrams and legacy documents without relying on external cloud APIs.

Example Program Output

Here is a sample run:

 1 $ sbcl
 2 * (load "describe-image.lisp")
 3 * (describe-image:image-to-text "ticket.png" "Print out the text in this image")
 4 "Fanfares and Fireworks
 5 Flagstaff Symphony Orchestra
 6 Ardrey Memorial Auditorium
 7 Friday, September 26, 2025
 8 7:30 PM (AZ)
 9 
10 Level Section Row Seat
11 Main Main Level M 31
12 
13 WJJNBY.1.2406.1498
14 Friday, September 26, 2025 @ 7:30 PM
15 
16 Price Service Fee Ticket Type
17 $53.00 $0.00 Early Bird Tickets
18 New Subscriber C3
19 
20 The unique barcodes on this ticket allow only one entry to the event. If multiple copies of an ETTicket are made, the first copy of the ETTicket to arrive at the event will gain entry after scanning and validation. Other copies of this ticket will be denied entry."
21 *