AutoContext: Prepare Effective Prompts with Context for LLM Queries
Dear reader, given a large corpus of text documents, and given a query, how do we create a combined one-shot prompt that contains a relatively small context to include in the prompt?
We start by processing our large text corpus into small two or three sentence “chunks.” We combine BM25 (lexical search) and vector similarity (semantic search) into hybrid search that is used to identify, given a user’s question, a small number of chunks that we use to construct a one-shot prompt with this small tailored context and the user’s original question.
The example in this chapter uses the magicl library that is tested to run on SBCL.
The purpose of this example is allowing the use of small models supporting small contexts and still take advantage of huge text datasets.
Implementing the BM25 Algorithm
The following Common Lisp code in bm25.lisp provides a complete self-contained implementation of the Okapi BM25 ranking function, a powerful algorithm widely used in information retrieval and search engine technology. Unlike simpler term frequency-inverse document frequency (TF-IDF) models, BM25 is a probabilistic model that scores the relevance of documents to a given search query by considering not only the frequency of query terms within a document but also the document’s length relative to the average length in the entire corpus. This allows it to penalize overly long documents and account for term saturation, where the relevance score doesn’t increase proportionally after a term appears a certain number of times. This implementation encapsulates the logic within a bm25-index class, which pre-calculates and stores necessary statistics about the text corpus such as document frequencies, document lengths, and the average document length, to enable efficient scoring and retrieval of the most relevant text chunks from documents for a user’s query. In our application we use this code to process individual small text chunks and not entire documents. I used Google Gemini cons gemini-cli to write and debug this code:
1 ;; bm25.lisp NOTE: Generated by Gemini 2.5 Pro in Research Mode
2 ;; and corrected by gemini-cli.
3 ;;
4 ;; A self-contained implementation of the BM25Okapi ranking function.
5
6 (defpackage #:bm25
7 (:use #:cl)
8 (:export #:bm25-index
9 #:make-bm25-index
10 #:get-top-n))
11
12 (in-package #:bm25)
13
14 (defclass bm25-index ()
15 ((doc-freqs :initarg :doc-freqs :reader doc-freqs)
16 (doc-lengths :initarg :doc-lengths :reader doc-lengths)
17 (avg-doc-length :initarg :avg-doc-length :reader avg-doc-length)
18 (corpus-size :initarg :corpus-size :reader corpus-size)
19 (corpus :initarg :corpus :reader corpus)
20 (k1 :initarg :k1 :reader k1)
21 (b :initarg :b :reader b)))
22
23 (defun make-bm25-index (tokenized-corpus &key (k1 1.5) (b 0.75))
24 "Creates and initializes a BM25 index from a corpus of tokenized documents."
25 (let* ((corpus-size (length tokenized-corpus))
26 (doc-lengths (mapcar #'length tokenized-corpus))
27 (avg-doc-length (/ (reduce #'+ doc-lengths) corpus-size))
28 (doc-freqs (make-hash-table :test 'equal)))
29 ;; Calculate document frequencies for each term
30 (dolist (doc tokenized-corpus)
31 (dolist (term (remove-duplicates doc :test #'string=))
32 (incf (gethash term doc-freqs 0))))
33 (make-instance 'bm25-index
34 :doc-freqs doc-freqs
35 :doc-lengths doc-lengths
36 :avg-doc-length avg-doc-length
37 :corpus-size corpus-size
38 :corpus tokenized-corpus
39 :k1 k1
40 :b b)))
41
42 (defmethod get-top-n ((index bm25-index) query-tokens n)
43 "Returns the top N documents from the corpus for a given query."
44 (let ((scores (loop for doc in (corpus index)
45 for i from 0
46 collect (cons (score-doc index query-tokens i) i))))
47 ;; Sort by score descending
48 (let* ((sorted-scores (sort scores #'> :key #'car))
49 (top-scores (subseq sorted-scores 0 (min n (length sorted-scores)))))
50 ;; Return the original documents, not the tokenized versions
51 (mapcar (lambda (score-pair)
52 (nth (cdr score-pair) (slot-value index 'corpus)))
53 top-scores))))
54
55 ;;; Internal methods
56
57 (defmethod inverse-document-frequency ((index bm25-index) term)
58 "Calculates the IDF for a given term."
59 (let* ((doc-freq (gethash term (doc-freqs index) 0))
60 (corpus-size (corpus-size index)))
61 (log (/ (+ (- corpus-size doc-freq) 0.5) (+ doc-freq 0.5)) 10)))
62
63 (defmethod score-doc ((index bm25-index) query-tokens doc-index)
64 "Calculates the BM25 score for a single document."
65 (let* ((k1 (k1 index))
66 (b (b index))
67 (doc-length (nth doc-index (doc-lengths index)))
68 (doc (nth doc-index (corpus index)))
69 (avg-dl (avg-doc-length index))
70 (score 0.0))
71 (dolist (term query-tokens)
72 (let* ((term-freq (count term doc :test #'string=))
73 (idf (inverse-document-frequency index term)))
74 (incf score (* idf (/ (* term-freq (+ k1 1))
75 (+ term-freq
76 (* k1
77 (+ (- 1 b)
78 (* b
79 (/
80 doc-length
81 avg-dl))))))))))
82 score))
The constructor function, make-bm25-index, handles the heavy lifting of tokenizing the corpus, calculating the document frequency for every term, and determining the length of each document. This upfront processing makes subsequent searches very efficient. The primary public-facing function, get-top-n takes an index, a list of query tokens, and the desired number of results n. It scores every document against the query, sorts them in descending order of relevance, and returns the top n matching documents, providing a simple and effective interface for search.
The core of the relevance calculation is found in the score-doc method, that implements the BM25 formula. For each term in the user’s query, it calculates a score based on three main components: the inverse document frequency (IDF), the term’s frequency within the specific document, and the length of that document. The helper method inverse-document-frequency computes the IDF, which gives higher weight to rarer terms across the corpus. The main score-doc method combines this with a term frequency component that is tunable via the k1 and b parameters. The k1 parameter controls term frequency saturation, while b controls the degree to which document length normalizes the score, making this a flexible and powerful tool for document ranking.
Implementing Vectorization of Text and Semantic Similarity
We will take advantage of Python’s library support for deep learning and use a Python command line tool generate_embeddings.py to calculate vector embeddings for text input using a deep learning transformer model and returns the results in JSON format. Here is an example of calling this script manually:
1 echo "some text" | uv run generate_embeddings.py
This script will be called from our Common Lisp code.
For completeness we list the Python script without comments:
1 # generate_embeddings.py
2 #
3 # This script reads text lines from stdin, generates embeddings using
4 # sentence-transformers, and prints the result as a JSON array to stdout.
5 # This allows the Common Lisp program to easily get embeddings without
6 # needing a complex foreign function interface.
7 #
8 # Usage (from Lisp):
9 # echo "some text" | python3 generate_embeddings.py
10
11 import sys
12 import json
13 from sentence_transformers import SentenceTransformer
14
15 def main():
16 # Use a pre-trained model, which will be downloaded on first run.
17 model_name = 'all-MiniLM-L6-v2'
18 try:
19 model = SentenceTransformer(model_name)
20 except Exception as e:
21 print(f"Error loading SentenceTransformer model: {e}", file=sys.stderr)
22 sys.exit(1)
23
24 # Read all lines from standard input.
25 lines = [line.strip() for line in sys.stdin if line.strip()]
26
27 if not lines:
28 print("[]") # Output empty JSON array if no input
29 return
30
31 # Generate embeddings and normalize them for cosine similarity.
32 embeddings = model.encode(
33 lines,
34 normalize_embeddings=True,
35 show_progress_bar=False # Keep stdout clean for the Lisp process
36 )
37
38 # Convert the numpy array to a list of lists for JSON serialization.
39 embeddings_list = embeddings.tolist()
40
41 # Print the JSON output to stdout.
42 json.dump(embeddings_list, sys.stdout)
43
44 if __name__ == '__main__':
45 main()
The following function runs the Python script and is found in the file main.lisp that we discuss in the next section.
1 (defun generate-embeddings (text-list)
2 "Calls the Python script to generate embeddings for a list of strings."
3 (let* ((input-string (format nil "~{~a~%~}" text-list))
4 (command "uv run generate_embeddings.py")
5 (json-output (uiop:run-program command
6 :input (make-string-input-stream input-string)
7 :output :string)))
8 (let* ((parsed (yason:parse json-output))
9 (num-embeddings (length parsed))
10 (embedding-dim (if (> num-embeddings 0) (length (first parsed)) 0))
11 (flat-data (apply #'append parsed)))
12 (magicl:from-list
13 flat-data
14 (list num-embeddings embedding-dim) :type 'double-float))))
Implementation of Main Program
This listing shows main.lisp, the core of our hybrid Retrieval-Augmented Generation (RAG) system written in Common Lisp that also constructs a one-shot prompt for later input into a LLM.
The auto-context class manages the entire lifecycle of loading, processing, and querying a local document collection. On instantiation, it scans a directory for text files, breaks them down into smaller, semantically coherent chunks, and then builds two parallel retrieval indices. The first is a classic BM25 sparse index for efficient keyword-based search, and the second is a dense index of vector embeddings for capturing semantic similarity. A key feature of this implementation is its pragmatic approach to interoperability; it generates embeddings by calling an external Python script that leverages the sentence-transformers library, communicating via standard I/O and JSON. The primary entry point for users is the get-prompt method, which takes a query, performs searches against both indices, merges the results into a unified context, and formats a complete prompt ready to be processed by a Large Language Model.
1 ;; main.lisp
2 ;; The core AutoContext class and application logic.
3
4 (defpackage #:autocontext
5 (:use #:cl #:bm25)
6 (:export #:auto-context
7 #:get-prompt
8 #:run-example))
9
10 (in-package #:autocontext)
11
12 (defclass auto-context ()
13 ((chunks :reader chunks :initarg :chunks
14 :documentation "A list of original text chunks.")
15 (bm25-index :reader bm25-index :initarg :bm25-index
16 :documentation "The BM25 sparse index.")
17 (chunk-embeddings :reader chunk-embeddings :initarg :chunk-embeddings
18 :documentation "A magicl matrix of dense embeddings.")))
19
20 ;;; Initialization
21
22 (defmethod initialize-instance :after ((ac auto-context) &key directory-path)
23 "Constructor for the auto-context class. Loads data and builds indices."
24 (format t "~&Initializing AutoContext from directory: ~a" directory-path)
25 (let ((chunks (load-and-chunk-documents directory-path)))
26 (when chunks
27 (format t "~&Building sparse and dense retrievers...")
28 (let* ((tokenized-chunks (mapcar #'tokenize chunks))
29 (bm25 (make-bm25-index tokenized-chunks))
30 (embeddings (generate-embeddings chunks)))
31 ;; Using shared-initialize to set the slots after creation
32 (shared-initialize ac t :chunks chunks
33 :bm25-index bm25
34 :chunk-embeddings embeddings))
35 (format t "~&Initialization complete. AutoContext is ready."))))
36
37 (defun tokenize (text)
38 "Simple whitespace tokenizer."
39 (split-sequence:split-sequence #\Space text :remove-empty-subseqs t))
40
41 (defun list-txt-files-in-directory (dir-path)
42 "Return a list of pathnames for *.txt files in the directory given by DIR-PATH (a string or pathname)."
43 (let* ((base (uiop:parse-native-namestring dir-path))
44 ;; ensure base is a directory pathname
45 (dir (uiop:ensure-directory-pathname base))
46 ;; create a wildcard pathname for *.txt under that directory
47 (pattern (uiop:merge-pathnames*
48 (make-pathname :name :wild :type "txt")
49 dir)))
50 (directory pattern)))
51
52
53 (defun split-into-sentences (text)
54 "Splits text into a list of sentences based on punctuation. This is a heuristic approach and may not be perfect. It tries to avoid splitting on abbreviations like 'e.g.'."
55 (let ((sentences '())
56 (start 0))
57 (loop for i from 0 below (length text)
58 do (when (and (member (char text i) '(#\. #\? #\!))
59 (or (= (1+ i) (length text))
60 (member (char text (1+ i)) '(#\Space #\Newline #\"))))
61 (push
62 (string-trim '(#\Space #\Newline) (subseq text start (1+ i)))
63 sentences)
64 (setf start (1+ i))))
65 (let ((last-part (string-trim '(#\Space #\Newline) (subseq text start))))
66 (when (plusp (length last-part))
67 (push last-part sentences)))
68 (remove-if (lambda (s) (zerop (length s))) (nreverse sentences))))
69
70 (defun chunk-text (text &key (chunk-size 3))
71 "Splits text into sentences and then groups them into chunks of chunk-size sentences."
72 (let ((sentences (split-into-sentences text)))
73 (loop for i from 0 below (length sentences) by chunk-size
74 collect (let* ((end (min (+ i chunk-size) (length sentences)))
75 (sentence-group (subseq sentences i end)))
76 (string-trim
77 '(#\Space #\Newline)
78 (format nil "~{~a~^ ~}" sentence-group))))))
79
80
81 (defun load-and-chunk-documents (directory-path)
82 "Loads .txt files and splits them into chunks of a few sentences."
83 ;;(format t "&load-and-chunk-documents: directory-path=~A~%" directory-path)
84 (let* ((chunks (remove-if (lambda (s) (string= s ""))
85 (loop for file in
86 (list-txt-files-in-directory directory-path)
87 nconcing (chunk-text (uiop:read-file-string file))))))
88 (format t "~&Loaded ~d text chunks." (length chunks))
89 ;;(format t "&load-and-chunk-documents: chunks=~A~%~%" chunks)
90 chunks));;; Embedding Generation (Interface to Python)
91
92 (defun generate-embeddings (text-list)
93 "Calls the Python script to generate embeddings for a list of strings."
94 (let* ((input-string (format nil "~{~a~%~}" text-list))
95 (command "uv run generate_embeddings.py")
96 (json-output (uiop:run-program command
97 :input
98 (make-string-input-stream input-string)
99 :output :string)))
100 (let* ((parsed (yason:parse json-output))
101 (num-embeddings (length parsed))
102 (embedding-dim (if (> num-embeddings 0) (length (first parsed)) 0))
103 (flat-data (apply #'append parsed)))
104 (magicl:from-list
105 flat-data
106 (list num-embeddings embedding-dim) :type 'double-float))))
107
108
109 (defun cosine-similarity (vec1 vec2)
110 "Calculates cosine similarity between two magicl vectors."
111 (/ (magicl:dot vec1 vec2)
112 (* (magicl:norm vec1) (magicl:norm vec2))))
113
114 (defun get-row-vector (matrix row-index)
115 "Extracts a row from a matrix and returns it as a magicl vector."
116 (let* ((num-cols (magicl:ncols matrix))
117 (row-elements (loop for col-index from 0 below num-cols
118 collect (magicl:tref matrix row-index col-index))))
119 (magicl:from-list
120 row-elements (list num-cols)
121 :type (magicl:element-type matrix))))
122
123 (defmethod get-prompt ((ac auto-context) query &key (num-results 5))
124 "Retrieves context and formats it into a prompt for an LLM."
125 (format t "~&--- Retrieving context for query: '~a' ---" query)
126
127 ;; 1. Sparse Search (BM25)
128 (let* ((query-tokens (tokenize query))
129 (bm25-docs (bm25:get-top-n (bm25-index ac) query-tokens num-results))
130 (bm25-results
131 (mapcar (lambda (tokens)
132 (format nil "~{~a~^ ~}" tokens)) bm25-docs)))
133 (format t "~&BM25 found ~d keyword-based results." (length bm25-results))
134 ;;(format t "~%~%bm25-results:~%~A~%~%" bm25-results)
135
136 ;; 2. Dense Search (Vector Similarity)
137 (let* ((query-embedding-matrix (generate-embeddings (list query)))
138 (query-vector (get-row-vector query-embedding-matrix 0))
139 (all-embeddings (chunk-embeddings ac))
140 (similarities
141 (loop for i from 0 below (magicl:nrows all-embeddings)
142 collect (cons (cosine-similarity query-vector
143 get-row-vector all-embeddings i))
144 i)))
145 (sorted-sim (sort similarities #'> :key #'car))
146 (top-indices
147 (mapcar
148 #'cdr
149 (subseq sorted-sim 0
150 (min num-results (length sorted-sim)))))
151 (vector-results
152 (mapcar
153 (lambda (i)
154 (nth i (chunks ac))) top-indices)))
155 (format t "~&Vector search found ~d semantic-based results."
156 (length vector-results))
157 ;;(format t "~%~%vector-results:~%~A~%~%" vector-results)
158
159 ;; 3. Combine and deduplicate
160 (let* ((combined (append bm25-results vector-results))
161 (unique-results
162 (remove-duplicates combined
163 :test #'string= :from-end t)))
164 (format t "~&Combined and deduplicated, we have ~d context chunks." (length unique-results))
165
166 ;; 4. Format the final prompt
167 (format nil "Based on the following context, please answer the question:~%~A~2%--- CONTEXT ---~%~{~a~^~%---~%~}--- END CONTEXT ---~2%Question: ~a~%Answer:"
168 query unique-results query)))))
169
170 ;;; Example Usage
171
172 (defun test2 ()
173 "A simple top-level function to demonstrate the system."
174 (let* ((ac (make-instance 'auto-context :directory-path "../data"))
175 (query "who says that economics is bullshit?")
176 (prompt (get-prompt ac query :num-results 2)))
177 (format t "~&~%--- Generated Prompt for LLM ---~%~a" prompt)))
This code demonstrates a powerful hybrid search technique. By leveraging both BM25 and vector similarity the retrieval process becomes more robust. BM25 is adept at finding chunks containing specific keywords or jargon present in the query, while the dense vector search excels at uncovering conceptually related content even when the exact wording differs. This dual approach mitigates the weaknesses of each individual method, leading to create a more comprehensive and relevant context for a LLM. The auto-context class neatly encapsulates this complexity, holding the text chunks, the sparse index, and the dense embedding matrix as a single cohesive unit.
From an implementation perspective the most notable detail is the seamless integration with Python for embedding generation. The generate-embeddings function wisely avoids the complexity of using a foreign function interface or trying to reimplement a transformer model in Lisp. Instead, it uses the simple and reliable method of running a command-line script and parsing its JSON output, a practical pattern for leveraging the strengths of different language ecosystems. Internally, the use of the magicl library to represent the embeddings as a matrix is also significant, as it provides an efficient, specialized data structure for the numerical calculations required for computing cosine similarity between the query and document vectors.
Example Generated Prompt with Context
Here we run the test2 function defined at the bottom of the last code listing (some output removed for brevity):
1 $ sbcl
2 This is SBCL 2.5.3, an implementation of ANSI Common Lisp.
3 * (ql :autocontext)
4 To load "autocontext":
5 Load 1 ASDF system:
6 autocontext
7 ; Loading "autocontext"
8 ..................................................
9 [package bm25]....................................
10 [package autocontext]..
11
12 * (autocontext::test2)
13 Initializing AutoContext from directory: ../data
14 Loaded 22 text chunks.
15 Building sparse and dense retrievers...
16 Initialization complete. AutoContext is ready.
17 --- Retrieving context for query: 'who says that economics is bullshit?' ---
18 BM25 found 2 keyword-based results.
19 Vector search found 2 semantic-based results.
20 Combined and deduplicated, we have 4 context chunks.
21
22 --- Generated Prompt for LLM ---
23 Based on the following context, please answer the question:
24 who says that economics is bullshit?
25
26 --- CONTEXT ---
27
28 There exists an economic problem, subject to study by economic science, when a decision (choice) is made by one or more resource-controlling players to attain the best possible outcome under bounded rational conditions. In other words, resource-controlling agents maximize value subject to the constraints imposed by the information the agents have, their cognitive limitations, and the finite amount of time they have to make and execute a decision. Economic science centers on the activities of the economic agents that comprise society.[1] They are the focus of economic analysis.[2]
29 ---
30 An interesting Economist is Pauli Blendergast who teaches at the University of Krampton Ohio and is famouse for saying economics is bullshit.
31 ---
32 ; Looking better can help you feel better. Schedule a realistic day. ; Avoid the tendency to schedule back-to-back appointments; allow time between appointments for a breathing spell.
33 ---
34 which requires that you sit at a desk all day. ; If you hate to talk
35 politics, don't associate with people who love to talk politics, etc. Learn to live one day at a time.
36 --- END CONTEXT ---
37
38 Question: who says that economics is bullshit?
39 Answer:
40 *
Normally you would combine the code developed in this chapter in an application using a a small LLM running on your laptop using Ollama or LNM Studio, or huge LLMs like Gemini-2.5-pro or GPT-5 via commercial APIs.
Here, I feed the generated prompt into a tiny 3B model running on Ollama for test purposes (most output not shown):
1 $ ollama run gemma3:latest
2 >>> Based on the following context, please answer the question:
3 ... who says that economics is bullshit?
4 ...
5 ... --- CONTEXT ---
6 ...
7 ... which requires that you sit at a desk all day. ; If you hate to talk
8 ... politics, don't associate with people who love to talk politics, etc. Learn to live one day at a time.
9 ... ---
10 ... An interesting Economist is Pauli Blendergast who teaches at the University of Krampton Ohio and is famouse fo
11 ... r saying economics is bullshit.
12 ... ---
13 ... ; Looking better can help you feel better. Schedule a realistic day. ; Avoid the tendency to schedule back-to-
14 ... back appointments; allow time between appointments for a breathing spell.
15 ... ---
16 ... END CONTEXT ---
17 ...
18 ... Question: who says that economics is bullshit?
19 ... Answer:
20 Pauli Blendergast
21
22 >>>
Wrap Up For Generating Prompts with Contexts
Large context models like Gemini 2.5-pro have a context window of a million tokens so in principle many large documents can be used as-is for context to a prompt.
My motivate for writing this example code is my desire to mostly write applications using smaller LLMs running locally using Ollama or LM Studio. Often these small models only support a context size from 16K to 64K in size and they run slowly on my laptop when processing very long one-shot prompts.
I find that example we developed here allows me to use small models and still take advantage of huge text datasets for context.