Agentic RAG Using the Gemini LLM APIs
This chapter implements an Agentic Retrieval-Augmented Generation (RAG) system in Common Lisp, inspired by Google’s June 2026 research blog post Unlocking Dependable Responses with Agentic RAG.
In the previous chapter on document question answering, we built a “vanilla” RAG system: embed documents, embed the query, find similar chunks, and pass them to an LLM for answer generation. That approach works well for simple factual questions, but falls short on complex queries that require information from multiple sources or where the first retrieval pass misses critical details.
Agentic RAG addresses this limitation by introducing multiple specialized agents that plan, rewrite queries, assess context sufficiency, and iteratively search until enough information is gathered to produce a reliable answer. The key insight from the Google research is the Sufficient Context Agent — a quality-control step that evaluates whether the retrieved passages actually contain enough information to answer the question, and if not, generates specific feedback about what’s missing so the system can refine its search.
The source code for this example is in the directory src/RAG of the book’s GitHub repository. It uses the Gemini text-embedding-004 model for embeddings (free tier) and gemini-2.0-flash for all agent LLM calls (very inexpensive).
Overview of the Agentic RAG Architecture
The system implements a multi-agent pipeline with five phases:
- Query Rewriting — A Gemini-powered agent decomposes complex questions into 1–3 focused sub-queries for retrieval.
- Search Fanout — Each sub-query is embedded and searched across multiple document corpora. Results are deduplicated.
- Sufficient Context Assessment — A specialized agent evaluates whether the retrieved passages contain enough information to fully answer the original question.
- Iterative Refinement — If context is insufficient, the system generates refined search queries based on feedback about what’s missing, then searches again. This loop repeats up to a configurable limit.
- Synthesis — Once context is sufficient (or the iteration limit is reached), a synthesis agent generates a grounded answer citing source documents.
This differs fundamentally from vanilla RAG. In a vanilla system, if the first retrieval doesn’t find the right passages, you get a partial answer or a hallucination. In agentic RAG, the system recognizes the gap and actively searches for the missing information.
Project Structure
The project is organized as an ASDF system with five source files:
| File | Description |
|---|---|
| embeddings.lisp | Gemini text-embedding-004 integration and cosine similarity |
| vector-store.lisp | In-memory vector store with document chunking |
| agents.lisp | Multi-agent pipeline (rewriter, search, sufficiency, synthesis) |
| rag.lisp | Top-level API, interactive demo, and test code |
| data/ | Sample text documents for the demo |
The ASDF system definition in rag.asd is straightforward:
1 (asdf:defsystem #:rag
2 :description "Agentic RAG (Retrieval-Augmented Generation) using Gemini"
3 :author "Mark Watson"
4 :license "Apache 2"
5 :version "1.0.0"
6 :serial t
7 :depends-on (#:llm #:cl-json #:uiop)
8 :components ((:file "package")
9 (:file "embeddings")
10 (:file "vector-store")
11 (:file "agents")
12 (:file "rag")))
The package exports the main entry points:
1 (defpackage #:rag
2 (:use #:cl)
3 (:export #:make-corpus
4 #:add-document
5 #:query
6 #:agentic-rag
7 #:interactive-demo
8 #:test))
Computing Embeddings With the Gemini API
The file embeddings.lisp provides the foundation for semantic search. We use Google’s text-embedding-004 model, which produces 768-dimensional vectors and is available on the free tier.
The function get-embedding calls the Gemini embedding API via curl, sending a text string and receiving back a vector of floating-point numbers. We also define cosine-similarity to compare two embedding vectors — this is how we determine which document chunks are most relevant to a query.
1 (in-package #:rag)
2
3 (defvar *embedding-model* "text-embedding-004")
4 (defvar *embedding-api-url*
5 "https://generativelanguage.googleapis.com/v1beta/models/")
6
7 (defun get-google-api-key ()
8 (or (uiop:getenv "GOOGLE_API_KEY")
9 (error "GOOGLE_API_KEY environment variable is not set")))
10
11 (defun get-embedding (text)
12 "Compute an embedding vector for TEXT using Gemini text-embedding-004.
13 Returns a list of floats."
14 (let* ((api-url (concatenate 'string
15 *embedding-api-url*
16 *embedding-model*
17 ":embedContent"
18 "?key=" (get-google-api-key)))
19 (payload (make-hash-table :test 'equal)))
20 ;; Build the request payload
21 (let ((content-ht (make-hash-table :test 'equal))
22 (part-ht (make-hash-table :test 'equal)))
23 (setf (gethash "text" part-ht) text)
24 (setf (gethash "parts" content-ht) (list part-ht))
25 (setf (gethash "content" payload) content-ht)
26 (setf (gethash "model" payload)
27 (concatenate 'string "models/" *embedding-model*)))
28 (let* ((payload-json (cl-json:encode-json-to-string payload))
29 (curl-command
30 (list "curl" "-s" "-X" "POST"
31 "-H" "Content-Type: application/json"
32 "-d" payload-json
33 api-url))
34 (response-string
35 (uiop:run-program curl-command
36 :output :string
37 :error-output :string
38 :ignore-error-status t))
39 (decoded (cl-json:decode-json-from-string response-string))
40 (embedding-obj (cdr (assoc :EMBEDDING decoded)))
41 (values-list (cdr (assoc :VALUES embedding-obj))))
42 (format t "~%DEBUG get-embedding: got ~A-dimensional vector for ~S~%"
43 (length values-list)
44 (subseq text 0 (min 60 (length text))))
45 values-list)))
46
47 (defun dot-product (vec-a vec-b)
48 "Compute the dot product of two numeric lists."
49 (loop for a in vec-a
50 for b in vec-b
51 sum (* a b)))
52
53 (defun vector-magnitude (vec)
54 "Compute the magnitude (L2 norm) of a numeric list."
55 (sqrt (loop for x in vec sum (* x x))))
56
57 (defun cosine-similarity (vec-a vec-b)
58 "Compute cosine similarity between two embedding vectors.
59 Returns a value between -1 and 1."
60 (let ((mag-a (vector-magnitude vec-a))
61 (mag-b (vector-magnitude vec-b)))
62 (if (or (zerop mag-a) (zerop mag-b))
63 0.0
64 (/ (dot-product vec-a vec-b) (* mag-a mag-b)))))
The embedding API returns a JSON response containing a list of floating-point values. The cosine similarity between two vectors measures how similar their directions are in the high-dimensional embedding space, regardless of magnitude. A similarity of 1.0 means the texts are semantically identical; 0.0 means they are unrelated.
In-Memory Vector Store
The file vector-store.lisp implements a simple in-memory document store. Production systems would use a dedicated vector database like Pinecone or Chroma, but for a book example, an in-memory list with brute-force cosine similarity is clearer and requires zero setup.
We define two structs: document-chunk holds a piece of text with its source filename and embedding vector, and corpus is a named collection of chunks:
1 (defstruct document-chunk
2 "A chunk of text with its source file and embedding vector."
3 text
4 source
5 embedding)
6
7 (defstruct corpus
8 "A named collection of document chunks for retrieval."
9 name
10 description
11 (chunks nil))
The function split-into-chunks breaks a long text into overlapping pieces of approximately 500 characters each, trying to break at sentence boundaries (periods or newlines) rather than cutting words in half:
1 (defvar *default-chunk-size* 500)
2 (defvar *chunk-overlap* 50)
3
4 (defun split-into-chunks (text &key (chunk-size *default-chunk-size*)
5 (overlap *chunk-overlap*))
6 "Split TEXT into overlapping chunks of approximately CHUNK-SIZE characters.
7 Tries to break at sentence boundaries when possible."
8 (let ((chunks nil)
9 (len (length text))
10 (start 0))
11 (loop while (< start len)
12 do (let* ((end (min (+ start chunk-size) len))
13 (break-pos
14 (if (>= end len)
15 end
16 (or (position #\. text :start (max start (- end 80))
17 :end end :from-end t)
18 (position #\Newline text :start (max start (- end 80))
19 :end end :from-end t)
20 end)))
21 (actual-end (if (< break-pos end)
22 (1+ break-pos)
23 end))
24 (chunk (string-trim '(#\Space #\Newline #\Tab)
25 (subseq text start actual-end))))
26 (when (> (length chunk) 0)
27 (push chunk chunks))
28 (if (>= actual-end len)
29 (setf start len)
30 (setf start (- actual-end overlap)))))
31 (nreverse chunks)))
The overlap between chunks (defaulting to 50 characters) ensures that information at chunk boundaries is not lost — a sentence that spans two chunks will appear in both.
The function add-document reads a file, chunks it, computes embeddings, and stores everything in a corpus. The search-corpus and search-corpora functions find the top-K most similar chunks for a given query embedding:
1 (defun add-document (corpus filepath &key (chunk-size *default-chunk-size*))
2 "Read a text file, split it into chunks, compute embeddings,
3 and add the chunks to CORPUS. Returns the number of chunks added."
4 (format t "~%DEBUG add-document: loading ~A~%" filepath)
5 (let* ((text (read-file-contents filepath))
6 (chunks (split-into-chunks text :chunk-size chunk-size))
7 (source (file-namestring filepath))
8 (count 0))
9 (format t "DEBUG add-document: split into ~A chunks~%" (length chunks))
10 (dolist (chunk-text chunks)
11 (let ((embedding (get-embedding chunk-text)))
12 (push (make-document-chunk :text chunk-text
13 :source source
14 :embedding embedding)
15 (corpus-chunks corpus))
16 (incf count)))
17 (format t "DEBUG add-document: added ~A chunks from ~A~%" count source)
18 count))
19
20 (defun search-corpora (corpora query-embedding &key (top-k 3))
21 "Search multiple CORPORA for the TOP-K most similar chunks overall.
22 Returns a list of (score . document-chunk) pairs."
23 (let ((all-results
24 (loop for corpus in corpora
25 append (search-corpus corpus query-embedding
26 :top-k top-k))))
27 (subseq (sort all-results #'> :key #'car)
28 0 (min top-k (length all-results)))))
An important feature for agentic RAG is that search-corpora accepts a list of corpora, enabling cross-corpus retrieval. The Google research article emphasizes this capability: real-world knowledge is often spread across separate databases managed by different teams. Our system searches all corpora simultaneously and returns the best results regardless of source.
The Multi-Agent Pipeline
The file agents.lisp is the heart of the system. Each “agent” is a function that calls Gemini with a specialized prompt. This is a practical and effective pattern — we don’t need an external agent framework to implement agent behaviors, just well-crafted prompts and structured response parsing.
We use gemini-2.0-flash for all agent calls. This model is very inexpensive while being capable enough for query rewriting, sufficiency assessment, and synthesis. The function rag-generate wraps the Gemini Interactions API:
1 (defvar *rag-model* "gemini-2.0-flash")
2
3 (defvar *interactions-api-url*
4 "https://generativelanguage.googleapis.com/v1beta/interactions")
5
6 (defun rag-generate (prompt &key (model-id *rag-model*))
7 "Call the Gemini Interactions API via curl.
8 Returns the generated text string."
9 (let* ((payload (make-hash-table :test 'equal)))
10 (setf (gethash "model" payload) model-id
11 (gethash "input" payload) prompt)
12 (let* ((payload-json (cl-json:encode-json-to-string payload))
13 (api-key (get-google-api-key))
14 (curl-command
15 (list "curl" "-s" "-X" "POST"
16 "-H" "Content-Type: application/json"
17 "-H" (concatenate 'string "x-goog-api-key: " api-key)
18 "-H" "Api-Revision: 2026-05-20"
19 "-d" payload-json
20 *interactions-api-url*))
21 (response-string
22 (uiop:run-program curl-command
23 :output :string
24 :error-output :string
25 :ignore-error-status t))
26 (decoded (cl-json:decode-json-from-string response-string))
27 (steps (cdr (assoc :STEPS decoded))))
28 ;; Extract text from last model_output step
29 (loop for step in (reverse steps)
30 when (string-equal (cdr (assoc :TYPE step)) "model_output")
31 return (let* ((content (cdr (assoc :CONTENT step)))
32 (first-content (first content)))
33 (cdr (assoc :TEXT first-content)))))))
Agent 1: The Query Rewriter
The Query Rewriter takes a complex user question and decomposes it into 1–3 focused sub-queries. For example, the question “How does the carbon footprint of manufacturing EV batteries compare to the emissions saved by charging EVs from renewable energy?” would be split into sub-queries like:
- “carbon footprint of EV battery manufacturing”
- “emissions saved by charging electric vehicles from renewable energy”
This decomposition improves retrieval because each sub-query targets a specific fact that might appear in a different document or section.
1 (defun rewrite-queries (user-query)
2 "Decompose USER-QUERY into 1-3 focused sub-queries for retrieval.
3 Returns a list of query strings."
4 (format t "~%DEBUG rewrite-queries: decomposing query...~%")
5 (let* ((prompt
6 (format nil
7 "You are a search query rewriter for a RAG system. ~
8 Your job is to break a complex user question into ~
9 1-3 simple, focused search queries that will help ~
10 retrieve relevant information from a document collection.~%~
11 ~%Rules:~
12 ~%- Output ONLY the queries, one per line~
13 ~%- No numbering, bullets, or extra text~
14 ~%- Each query should target a specific fact or concept~
15 ~%- Keep queries concise (under 15 words each)~
16 ~%~%User question: ~A" user-query))
17 (response (rag-generate prompt))
18 (queries (remove-if (lambda (s) (zerop (length s)))
19 (mapcar (lambda (line)
20 (string-trim
21 '(#\Space #\Tab #\- #\* #\1 #\2 #\3 #\.)
22 line))
23 (uiop:split-string response
24 :separator '(#\Newline))))))
25 (format t "DEBUG rewrite-queries: generated ~A sub-queries:~%~{ - ~A~%~}"
26 (length queries) queries)
27 (if queries
28 queries
29 (list user-query))))
Agent 2: Search Fanout
The Search Fanout agent executes the sub-queries against all corpora. For each sub-query, it computes an embedding and searches for the most similar document chunks. Results are deduplicated to avoid showing the same passage twice when multiple sub-queries match the same text.
1 (defun search-fanout (corpora sub-queries &key (top-k 3))
2 "Execute embedding search across CORPORA for each sub-query.
3 Returns a deduplicated list of (score . document-chunk) pairs."
4 (format t "~%DEBUG search-fanout: searching ~A corpora with ~A queries~%"
5 (length corpora) (length sub-queries))
6 (let ((all-results nil)
7 (seen-texts (make-hash-table :test 'equal)))
8 (dolist (query sub-queries)
9 (format t "DEBUG search-fanout: embedding query: ~S~%" query)
10 (let* ((query-embedding (get-embedding query))
11 (results (search-corpora corpora query-embedding :top-k top-k)))
12 (dolist (result results)
13 (let ((text (document-chunk-text (cdr result))))
14 (unless (gethash text seen-texts)
15 (setf (gethash text seen-texts) t)
16 (push result all-results))))))
17 (let ((sorted (sort all-results #'> :key #'car)))
18 (format t "DEBUG search-fanout: found ~A unique chunks~%" (length sorted))
19 sorted)))
Agent 3: The Sufficient Context Agent
This is the key innovation from the Google research. After retrieval, the Sufficient Context Agent evaluates whether the passages actually contain enough information. It asks Gemini to produce a structured verdict: SUFFICIENT or INSUFFICIENT, with a reason and a description of what’s missing.
The structured output format (VERDICT/REASON/MISSING) makes it straightforward to parse the LLM’s response programmatically:
1 (defun assess-sufficiency (user-query retrieved-chunks)
2 "Evaluate whether RETRIEVED-CHUNKS provide sufficient context
3 to answer USER-QUERY. Returns two values:
4 1. SUFFICIENT-P — T if context is sufficient, NIL otherwise
5 2. FEEDBACK — String describing what information is missing."
6 (format t "~%DEBUG assess-sufficiency: evaluating ~A chunks~%"
7 (length retrieved-chunks))
8 (let* ((context (format-retrieved-chunks retrieved-chunks))
9 (prompt
10 (format nil
11 "You are a Sufficient Context Agent in an agentic RAG system. ~
12 Your role is to evaluate whether the retrieved passages ~
13 contain enough information to fully answer the user's question.~%~
14 ~%User Question: ~A~%~
15 ~%Retrieved Passages:~A~%~
16 ~%Evaluate carefully:~
17 ~%1. Does the context contain ALL the specific facts needed?~
18 ~%2. Are there any parts of the question left unanswered?~
19 ~%3. Is any critical information missing?~
20 ~%~%Respond in EXACTLY this format:~
21 ~%VERDICT: SUFFICIENT or INSUFFICIENT~
22 ~%REASON: (one sentence explaining your assessment)~
23 ~%MISSING: (if insufficient, describe what specific ~
24 information to search for next; if sufficient, write NONE)"
25 user-query context))
26 (response (rag-generate prompt)))
27 (format t "DEBUG assess-sufficiency response:~%~A~%" response)
28 (let* ((verdict-line (find-if (lambda (line)
29 (search "VERDICT:" line :test #'char-equal))
30 (uiop:split-string response
31 :separator '(#\Newline))))
32 (sufficient-p (and verdict-line
33 (search "SUFFICIENT" verdict-line :test #'char-equal)
34 (not (search "INSUFFICIENT" verdict-line
35 :test #'char-equal))))
36 (missing-line (find-if (lambda (line)
37 (search "MISSING:" line :test #'char-equal))
38 (uiop:split-string response
39 :separator '(#\Newline))))
40 (feedback (if missing-line
41 (string-trim '(#\Space)
42 (subseq missing-line
43 (+ (search "MISSING:" missing-line
44 :test #'char-equal)
45 8)))
46 "No specific feedback available")))
47 (format t "DEBUG assess-sufficiency: verdict=~A~%"
48 (if sufficient-p "SUFFICIENT" "INSUFFICIENT"))
49 (values sufficient-p feedback))))
The two return values — sufficient-p (a boolean) and feedback (a string describing what’s missing) — drive the orchestrator’s decision to either synthesize an answer or refine the search.
Agent 4: The Synthesis Agent
When the context is deemed sufficient, the Synthesis Agent generates the final answer. It is instructed to use only the retrieved passages and to cite source filenames:
1 (defun synthesize-answer (user-query retrieved-chunks)
2 "Generate a grounded answer to USER-QUERY using RETRIEVED-CHUNKS.
3 The answer cites source documents."
4 (format t "~%DEBUG synthesize-answer: generating answer from ~A chunks~%"
5 (length retrieved-chunks))
6 (let* ((context (format-retrieved-chunks retrieved-chunks))
7 (prompt
8 (format nil
9 "You are a Synthesis Agent in a RAG system. Generate a ~
10 clear, accurate answer to the user's question using ONLY ~
11 the information in the retrieved passages below. ~
12 ~%~%Rules:~
13 ~%- Base your answer strictly on the retrieved passages~
14 ~%- Cite sources by mentioning the source filename~
15 ~%- If the passages don't fully answer the question, ~
16 say what you can answer and note what's missing~
17 ~%- Be concise but thorough~
18 ~%~%User Question: ~A~
19 ~%~%Retrieved Passages:~A"
20 user-query context))
21 (response (rag-generate prompt)))
22 (format t "DEBUG synthesize-answer: generated response (~A chars)~%"
23 (length response))
24 response))
The Orchestrator
The agentic-rag function ties everything together. It runs the full pipeline, iterating when the Sufficient Context Agent determines the retrieved passages are incomplete:
1 (defun agentic-rag (corpora user-query &key (max-iterations 3) (top-k 3))
2 "Run the full agentic RAG pipeline:
3 1. Rewrite the user query into sub-queries
4 2. Search corpora for relevant chunks
5 3. Check if context is sufficient (loop if not)
6 4. Synthesize a grounded answer
7
8 CORPORA is a list of corpus structs.
9 Returns the synthesized answer string."
10 (format t "~%~%========================================~%")
11 (format t " AGENTIC RAG PIPELINE~%")
12 (format t " Query: ~A~%" user-query)
13 (format t "========================================~%")
14
15 ;; Phase 1: Rewrite queries
16 (let* ((sub-queries (rewrite-queries user-query))
17 ;; Phase 2: Initial search
18 (all-chunks (search-fanout corpora sub-queries :top-k top-k))
19 (iteration 0))
20
21 ;; Phase 3: Iterative sufficiency check
22 (loop
23 (incf iteration)
24 (format t "~%--- Iteration ~A/~A ---~%" iteration max-iterations)
25
26 (when (null all-chunks)
27 (format t "DEBUG agentic-rag: no chunks found~%")
28 (return-from agentic-rag
29 "I could not find any relevant information in the available documents."))
30
31 (multiple-value-bind (sufficient-p feedback)
32 (assess-sufficiency user-query all-chunks)
33
34 (when sufficient-p
35 (format t "~%DEBUG agentic-rag: context is SUFFICIENT at iteration ~A~%"
36 iteration)
37 (return-from agentic-rag
38 (synthesize-answer user-query all-chunks)))
39
40 (when (>= iteration max-iterations)
41 (format t "~%DEBUG agentic-rag: max iterations reached~%")
42 (return-from agentic-rag
43 (synthesize-answer user-query all-chunks)))
44
45 ;; Phase 4: Refine and search again
46 (format t "~%DEBUG agentic-rag: context INSUFFICIENT, refining...~%")
47 (format t "DEBUG agentic-rag: feedback: ~A~%" feedback)
48 (let* ((refined-queries (refine-queries user-query feedback))
49 (new-chunks (search-fanout corpora refined-queries :top-k top-k)))
50 ;; Accumulate new chunks (deduplicate)
51 (let ((seen (make-hash-table :test 'equal)))
52 (dolist (scored-chunk all-chunks)
53 (setf (gethash (document-chunk-text (cdr scored-chunk)) seen) t))
54 (dolist (scored-chunk new-chunks)
55 (unless (gethash (document-chunk-text (cdr scored-chunk)) seen)
56 (setf (gethash (document-chunk-text (cdr scored-chunk)) seen) t)
57 (push scored-chunk all-chunks))))
58 (setf all-chunks (sort all-chunks #'> :key #'car)))))))
Notice how each iteration accumulates new chunks with the existing ones, deduplicating by text content. The accumulated context grows richer with each iteration, increasing the likelihood that the Sufficient Context Agent will be satisfied.
Top-Level API and Demo
The file rag.lisp provides convenience functions and a built-in demo. The test function creates three separate corpora — renewable energy, electric vehicles, and climate science — and runs three progressively harder queries:
1 (defun test ()
2 "Run a demo of the Agentic RAG system with sample documents."
3 (format t "~%~%============================================~%")
4 (format t " Agentic RAG Demo — Loading Documents~%")
5 (format t "============================================~%")
6
7 (let ((energy-corpus (make-corpus :name "renewable-energy"
8 :description "Renewable energy sources and technologies"))
9 (ev-corpus (make-corpus :name "electric-vehicles"
10 :description "Electric vehicle technology and infrastructure"))
11 (climate-corpus (make-corpus :name "climate-science"
12 :description "Climate science and carbon emissions")))
13
14 (add-document energy-corpus (data-path "renewable-energy.txt"))
15 (add-document ev-corpus (data-path "electric-vehicles.txt"))
16 (add-document climate-corpus (data-path "climate-science.txt"))
17
18 (let ((all-corpora (list energy-corpus ev-corpus climate-corpus)))
19
20 ;; Query 1: Single-corpus question
21 (format t "~%~%===== TEST QUERY 1 (single topic) =====~%")
22 (let ((answer (query all-corpora
23 "What is the current cost of lithium-ion battery storage per kilowatt-hour?")))
24 (format t "~%~%ANSWER 1:~%~A~%~%" answer))
25
26 ;; Query 2: Multi-hop question requiring cross-corpus retrieval
27 (format t "~%~%===== TEST QUERY 2 (multi-hop, cross-corpus) =====~%")
28 (let ((answer (query all-corpora
29 "How does the carbon footprint of manufacturing EV batteries compare to the emissions saved by charging EVs from renewable energy sources?")))
30 (format t "~%~%ANSWER 2:~%~A~%~%" answer))
31
32 ;; Query 3: Complex question that may need iterative retrieval
33 (format t "~%~%===== TEST QUERY 3 (complex, iterative) =====~%")
34 (let ((answer (query all-corpora
35 "What role could solid-state batteries and pumped-storage hydroelectricity play together in solving the intermittency problem of wind and solar energy?")))
36 (format t "~%~%ANSWER 3:~%~A~%~%" answer))
37
38 all-corpora)))
The test queries are designed to demonstrate different capabilities:
- Query 1 is a simple factual lookup — the answer exists in a single document chunk.
- Query 2 requires combining information from the electric vehicles corpus (battery manufacturing emissions) with the climate science corpus (emissions data), demonstrating cross-corpus retrieval.
- Query 3 combines concepts from all three corpora — solid-state batteries (EVs), pumped-storage hydro (renewable energy), and intermittency (both), potentially requiring iterative refinement to gather all the pieces.
Running the Example
Load the system and run the demo:
1 $ sbcl
2 * (load "project.lisp")
3
4 --- rag project loaded ---
5
6 * (rag:test)
7
8 ============================================
9 Agentic RAG Demo — Loading Documents
10 ============================================
11
12 DEBUG add-document: loading .../data/renewable-energy.txt
13 DEBUG add-document: split into 7 chunks
14 DEBUG add-document: loading .../data/electric-vehicles.txt
15 DEBUG add-document: split into 8 chunks
16 DEBUG add-document: loading .../data/climate-science.txt
17 DEBUG add-document: split into 7 chunks
18
19 Loaded 22 total chunks across 3 corpora.
20
21
22 ===== TEST QUERY 1 (single topic) =====
23
24 ========================================
25 AGENTIC RAG PIPELINE
26 Query: What is the current cost of lithium-ion battery storage
27 per kilowatt-hour?
28 ========================================
29
30 DEBUG rewrite-queries: generated 1 sub-queries:
31 - lithium-ion battery storage cost per kilowatt-hour
32
33 --- Iteration 1/3 ---
34
35 DEBUG assess-sufficiency: verdict=SUFFICIENT
36
37 ANSWER 1:
38 The cost of lithium-ion battery storage has fallen by approximately
39 90% since 2010, from over $1,100 per kilowatt-hour to under $140
40 per kilowatt-hour (source: renewable-energy.txt).
After the test completes, you can use the returned corpora for interactive queries:
1 * (defvar *corpora* (rag:test))
2 ;; ... test output ...
3
4 * (rag:interactive-demo *corpora*)
5
6 ============================
7 Agentic RAG Interactive Demo
8 ============================
9
10 Loaded 3 corpora with 22 total chunks.
11 Type your question (or 'quit' to exit):
12
13 RAG> What is the Paris Agreement temperature target?
14
15 ===== ANSWER =====
16 The Paris Agreement aims to limit warming to 1.5°C above
17 pre-industrial levels (source: climate-science.txt).
18 ==================
19
20 RAG> quit
Wrap Up for Agentic RAG
The key takeaway from this chapter is that agentic RAG dramatically improves answer quality compared to vanilla RAG, especially for complex queries that require information from multiple sources. The Sufficient Context Agent is the critical innovation — by explicitly checking whether enough information has been retrieved before generating an answer, we avoid the common failure modes of hallucination and incomplete responses.
The implementation is deliberately simple: each “agent” is just a function with a well-crafted prompt. You don’t need an elaborate agent framework to get the benefits of multi-agent architectures. What matters is the pattern: decompose, search, assess, refine, synthesize.
For production use, consider these enhancements:
- Persistent vector store: Replace the in-memory lists with a dedicated vector database (Chroma, Qdrant, or Pinecone) for larger document collections.
- Document loaders: Add support for PDF, HTML, and other formats beyond plain text.
- Caching: Cache embeddings to avoid recomputing them on each load.
- Parallel search: Use threads to search multiple corpora simultaneously.
The Google research reports that their production agentic RAG system achieves up to 34% higher accuracy than vanilla RAG on factuality benchmarks, with cross-corpus retrieval nearly matching single-corpus accuracy. Our Common Lisp implementation demonstrates the same architecture on a smaller scale.
Optional Practice Problems
Custom Chunk Size and Overlap Strategy: The logic inside
split-into-chunksin vector-store.lisp uses the package constants*default-chunk-size*(500 characters) and*chunk-overlap*(50 characters). Modifyadd-documentin vector-store.lisp andagentic-ragin agents.lisp to support dynamic configuration of these parameters. Write a helper function that measures the sensitivity of retrieval relevance scores to different chunk configurations.Deduplication with Embedding Similarity (Soft Deduplication): In
search-fanout(in agents.lisp), the retrieved chunks are deduplicated strictly by exact string matching. In large datasets, different documents might contain near-identical chunks or rephrased content. Implement a “soft” deduplication mechanism insearch-fanoutthat uses thecosine-similarityfunction from embeddings.lisp to discard any retrieved chunk that has a similarity score greater than 0.9 with an already selected chunk.Multi-Turn Chat Interface Integration: The current
interactive-demoloop in rag.lisp and the orchestratoragentic-ragin agents.lisp are stateless: each query is processed independently. Extend the pipeline to support a conversation history (list of past QA turns). Pass the history to theQuery Rewriterso it can resolve pronouns and context (e.g., rewriting “How does it compare to hydro?” following “What is the cost of battery storage?”).Self-Correction and Web Search Fallback on Refinement: In
agentic-rag(in agents.lisp), if the context remains insufficient after refining queries, the system continues to search the same corpus. Write a fallback mechanism that, when the Sufficient Context Agent reportsINSUFFICIENTfor the second time, switches to an external API (like a local DuckDuckGo lookup or Ollama Cloud web search helper) to gather external context, appending the results to the RAG vector store dynamically.JSON Schema Schema Enforcement for Agent Verdicts: The Sufficient Context Agent in agents.lisp relies on string matching (searching for
"VERDICT:"and"MISSING:") to parse Gemini’s response. This is fragile if the model outputs code blocks, explanations, or formatting deviations. Modifyassess-sufficiencyto request structured output using a JSON Schema (by specifying schema parameters in the request payload to the Gemini Interactions API). Parse the returned structured JSON reliably usingcl-json.Parallelized Search Fanout: The
search-fanoutfunction in agents.lisp runs queries sequentially. If there are 3 sub-queries and multiple corpora, fetching embeddings and searching them one by one introduces latency. Use a threading library such asbordeaux-threadsto parallelize the calls toget-embeddingandsearch-corporafor each sub-query, gathering and deduplicating results once all threads terminate.