An AI Command-Line Tool with Search Grounding and Persistent Cache

In this chapter we build an interactive command-line tool that combines Google’s Gemini API with optional search grounding and a persistent cache. The result is a practical daily-driver REPL: you can ask Gemini questions, ground answers in live web search results, and selectively cache useful responses so they become context for future queries. The project uses only the @google/genai SDK and Node.js built-in modules, no native dependencies are required.

The examples for this chapter are in the directory source-code/ai-command-line-tool.

How It Works

The AI REPL implements a simple but effective workflow:

Ask a question: Type a natural language query and Gemini responds using its training data plus any relevant cached context.
Ask with search: Prefix your query with ! to enable Google Search grounding, useful for current events or factual lookups.
Cache useful answers: Type > to save the last answer to a persistent JSON cache file. When you ask a new question, the tool extracts keywords from your query and retrieves only cached entries that share keyword overlap, so only relevant context is included.
Manage the cache: Type ! alone to clear cache entries older than one week.

This cache-as-context pattern is a lightweight alternative to retrieval-augmented generation (RAG). Instead of embedding documents into a vector store, you manually curate a set of useful facts. At query time, bag-of-words matching retrieves only the cached entries relevant to your current question, keeping context focused and avoiding noise.

Prerequisites

You need Node.js (v20+) with npm and a GOOGLE_API_KEY environment variable set for the Gemini API. Get a free API key from Google AI Studio.

Project Structure

The project consists of three TypeScript files with no external dependencies beyond @google/genai and the Node.js standard library:

1 ai-command-line-tool/
2 ├── package.json
3 ├── ai_repl.ts          // Main REPL application
4 ├── cache_engine.ts     // JSON-file persistent cache
5 ├── keywords.ts         // Keyword extraction with stop-word filtering
6 └── README.md

Keyword Extraction

Before we can do relevance-based cache lookups, we need a way to extract meaningful keywords from the user’s query. The extractKeywords function splits text into words, strips punctuation and stop words, and returns an array of content-bearing terms:

 1 // keywords.ts, Keyword extraction with stop-word filtering
 2 
 3 const STOP = new Set([
 4   "a","an","the","is","are","was","were","be","been","being",
 5   "have","has","had","do","does","did","will","would","shall","should",
 6   "may","might","must","can","could","am","it","its",
 7   "in","on","at","to","for","of","with","by","from","as",
 8   "and","or","but","not","no","nor","so","yet",
 9   "this","that","these","those","what","which","who","whom",
10   "i","me","my","we","our","you","your","he","she","they","them",
11   "how","when","where","why","if","then","than","about",
12 ]);
13 
14 export const extractKeywords = (text: string): string[] =>
15   text.toLowerCase().split(/\s+/)
16     .map(w => w.replace(/^[?!.,;:'"()]+|[?!.,;:'"()]+$/g, ""))
17     .filter(w => w.length > 2 && !STOP.has(w));

For example, the query "What sci-fi movies are playing today in Flagstaff AZ?" produces the keyword array ["sci-fi", "movies", "playing", "today", "flagstaff"]. Words shorter than three characters, punctuation, and common stop words are all filtered out.

The Set data structure gives O(1) lookup time for stop words, a small detail that matters when you call this function on every query.

The Cache Engine

The cache engine stores text entries as a JSON file in the user’s home directory. Each entry carries a timestamp so old entries can be expired:

 1 // cache_engine.ts, JSON-file-backed persistent cache with keyword lookup
 2 
 3 import { readFileSync, writeFileSync, existsSync } from "node:fs";
 4 
 5 const ONE_WEEK_MS = 7 * 24 * 60 * 60 * 1000;
 6 
 7 interface CacheEntry { content: string; createdAt: number }
 8 
 9 export class CacheEngine {
10   private entries: CacheEntry[];
11   constructor(private filePath: string) {
12     this.entries = existsSync(filePath) ? JSON.parse(readFileSync(filePath, "utf-8")) : [];
13   }
14   private save() { writeFileSync(this.filePath, JSON.stringify(this.entries, null, 2)); }
15 
16   add(content: string) { this.entries.push({ content, createdAt: Date.now() }); this.save(); }
17 
18   lookup(keywords: string[], limit = 10): string[] {
19     const lk = keywords.map(k => k.toLowerCase());
20     return this.entries
21       .filter(e => lk.some(k => e.content.toLowerCase().includes(k)))
22       .sort((a, b) => b.createdAt - a.createdAt)
23       .slice(0, limit).map(e => e.content);
24   }
25 
26   count() { return this.entries.length; }
27 
28   clearOlderThanOneWeek(): number {
29     const before = this.entries.length;
30     this.entries = this.entries.filter(e => e.createdAt >= Date.now() - ONE_WEEK_MS);
31     this.save();
32     return before - this.entries.length;
33   }
34 
35   close() { this.save(); }
36 }

The lookup method implements bag-of-words matching: a cached entry is included if its text contains any of the query keywords. This OR-matching approach ensures that if you cached a movie-related answer last week and now ask about movies again, that context surfaces. But if you ask about something unrelated, say, a recipe, the movie answer stays out of the prompt.

Using a JSON file rather than SQLite keeps the project dependency-free and makes the cache trivially inspectable, you can open ~/.ai-repl-cache.json in any text editor to review or edit your cached entries.

The Main REPL Application

Imports and Configuration

The application imports from Node.js standard library modules and the two local modules:

 1 import * as readline from "node:readline/promises";
 2 import { stdin as input, stdout as output } from "node:process";
 3 import { join } from "node:path";
 4 import { homedir } from "node:os";
 5 import { GoogleGenAI } from "@google/genai";
 6 import { CacheEngine } from "./cache_engine.js";
 7 import { extractKeywords } from "./keywords.js";
 8 
 9 const MODEL = "gemini-2.5-flash";
10 const CACHE_DB_PATH = join(homedir(), ".ai-repl-cache.db");

The model is set to gemini-2.5-flash for fast, capable responses suitable for interactive use. The cache file lives in the user’s home directory so it persists across sessions and working directories.

Note the .js extension in the import paths, this is required for ES module resolution in TypeScript. The tsx runner resolves .js imports to their .ts source files automatically.

API Key Validation

Before doing anything else, we verify the API key is present:

1 const apiKey = process.env.GOOGLE_API_KEY;
2 if (!apiKey) { console.error("Error: Set GOOGLE_API_KEY"); process.exit(1); }
3 
4 const ai = new GoogleGenAI({ apiKey });

This early check prevents a confusing error deep inside the API call. Small touches like this make command-line tools pleasant to use.

Cache Context Builder

The buildContextFromCache function uses keyword extraction to retrieve only relevant cached entries and format them as a context preamble:

 1 const cache = new CacheEngine(CACHE_DB_PATH);
 2 let lastAnswer: string | null = null;
 3 
 4 function buildContext(query: string): string {
 5   const kw = extractKeywords(query);
 6   if (!kw.length) return "";
 7   const items = cache.lookup(kw, 10);
 8   if (!items.length) return "";
 9   return "Use the following context from previous conversations when answering:\n\n" +
10     items.map(i => `- ${i}`).join("\n") + "\n\n---\n\n";
11 }

When relevant cached items are found, the function produces a context preamble like:

1 Use the following context from previous conversations when answering:
2 
3 - Project Hail Mary is playing at Harkins Flagstaff 16.
4 
5 ---

This preamble is prepended to the prompt so Gemini can reference previously cached facts without the user repeating them.

Query Dispatch

The askGemini function handles both plain and search-grounded queries:

1 async function askGemini(prompt: string, search: boolean): Promise<string> {
2   try {
3     const config: Record<string, unknown> = {};
4     if (search) config.tools = [{ googleSearch: {} }];
5     const r = await ai.models.generateContent({ model: MODEL, contents: buildContext(prompt) + prompt, config });
6     return r.text ?? "[No response from Gemini]";
7   } catch (e) { return `[Error: ${e instanceof Error ? e.message : e}]`; }
8 }

The try/catch wrapping is important for a daily-use tool, network errors, rate limits, and API issues should produce a readable message rather than crashing the REPL.

The tools: [{ googleSearch: {} }] config enables Google Search grounding through the Gemini API. When active, Gemini searches the web for current information before generating its response, making it useful for questions about current events or facts that may have changed since the model’s training cutoff.

The REPL Loop

The heart of the application is replLoop, which uses Node.js readline/promises for interactive input:

 1 function showAnswer(text: string) { console.log("\n" + text + "\n"); lastAnswer = text; }
 2 
 3 async function replLoop() {
 4   const rl = readline.createInterface({ input, output });
 5   console.log("\n  Gemini AI REPL  (type 'h' for help)\n");
 6 
 7   try {
 8     while (true) {
 9       let line: string;
10       try { line = await rl.question("gemini> "); } catch { console.log("\nGoodbye."); break; }
11       const t = line.trim();
12       if (!t) continue;
13       if (["q", "quit", "exit"].includes(t.toLowerCase())) { console.log("Goodbye."); break; }
14 
15       if (["h", "help"].includes(t.toLowerCase())) {
16         console.log(`\n  <text>  Ask Gemini    !<text> Ask + Search\n  >      Cache answer  !      Clear old cache\n  h      Help          q      Quit\n  Model: ${MODEL}  Cache: ${CACHE_DB_PATH} (${cache.count()} items)\n`);
17         continue;
18       }
19 
20       if (t === ">") {
21         lastAnswer ? (cache.add(lastAnswer), console.log(`  [Cached. ${cache.count()} items]`)) : console.log("  [No answer yet]");
22         continue;
23       }
24 
25       if (t === "!" || (t.startsWith("!") && !t.slice(1).trim())) {
26         const b = cache.count(); cache.clearOlderThanOneWeek();
27         console.log(`  [Cleared ${b - cache.count()} old entries. ${cache.count()} remain]`);
28         continue;
29       }
30 
31       if (t.startsWith("!")) {
32         console.log("  [Searching...]");
33         showAnswer(await askGemini(t.slice(1).trim(), true));
34       } else {
35         console.log("  [Thinking...]");
36         showAnswer(await askGemini(t, false));
37       }
38     }
39   } finally { rl.close(); cache.close(); console.log("  [Cache closed]"); }
40 }
41 
42 replLoop();

The command dispatch is worth studying. The ! character serves double duty: alone it clears old cache entries, but followed by text it triggers a search-grounded query. The if (trimmed === "!") check before if (trimmed.startsWith("!")) ensures the two cases are handled separately.

The try/finally block mirrors the Common Lisp unwind-protect pattern, the cache file is flushed to disk even if the user exits with Ctrl-D or an unhandled error occurs.

Using readline/promises with await gives us a clean loop structure without the callback nesting that the traditional readline API would require. The try/catch around rl.question handles EOF (Ctrl-D) gracefully.

Running the Tool

Install dependencies and start the REPL:

1 cd source-code/ai-command-line-tool
2 npm install
3 export GOOGLE_API_KEY=your-key-here
4 npx tsx ai_repl.ts

Example Session

The following session demonstrates the search-then-cache workflow. First we ask a question with Google Search grounding (prefix !), then cache the answer, then ask the same question without search, Gemini can now answer from the cached context:

 1 $ npx tsx ai_repl.ts
 2 
 3   Gemini AI REPL  (type 'h' for help)
 4 
 5 gemini> h
 6 
 7   Gemini AI REPL
 8   ─────────────────────────────────────────
 9   <text>         Ask Gemini a question
10   !<text>        Ask with Google Search grounding
11   >              Add last answer to cache
12   !              Clear cache entries older than 1 week
13   h / help       Show this help
14   q / quit       Exit
15   Ctrl-D         Exit
16   ─────────────────────────────────────────
17   Model: gemini-2.5-flash
18   Cache: /Users/you/.ai-repl-cache.json (0 items)
19 
20 gemini> !what sci-fi movies are playing today in Flagstaff AZ?
21   [Searching...]
22 
23 For today, the following science fiction movie is playing in Flagstaff, AZ:
24 
25 *   **Project Hail Mary** (PG-13) is showing at the **Harkins Flagstaff 16**.
26 
27 Please check the Harkins Theatres website or your preferred ticketing
28 platform to confirm specific showtimes.
29 
30 gemini> >
31   [Cached. 1 items total]
32 gemini> what sci-fi movies are playing today in Flagstaff AZ?
33   [Thinking...]
34 
35 For today, the science fiction movie **Project Hail Mary** (PG-13) is
36 playing at the **Harkins Flagstaff 16**.
37 
38 Please check the Harkins Theatres website to confirm specific showtimes.

Notice that the second query (without the ! prefix) produces the same accurate, current answer, even though it did not use Google Search. The keywords "sci-fi", "movies", "flagstaff" matched the cached answer, so it was automatically included as context for Gemini.

REPL Command Reference

Input	Action
`<text>`	Ask Gemini a question
`!<text>`	Ask with Google Search grounding
`>`	Add last answer to persistent cache
`!`	Clear cache entries older than 1 week
`h` / `help`	Show help
`q` / `quit`	Exit
`Ctrl-D`	Exit

Key Takeaways

Cache as context with relevance filtering: Selectively caching LLM responses and using bag-of-words keyword matching to retrieve only relevant entries keeps prompts focused. This is a lightweight alternative to vector-based RAG that works well for a personal tool.
Search grounding: The tools: [{ googleSearch: {} }] config leverages Google Search through the Gemini API, making the tool useful for current events and factual queries that exceed the model’s training cutoff.
Node.js readline/promises: The promise-based readline API provides line editing, history, and Ctrl-D handling out of the box, making the REPL feel like a native shell tool without any third-party dependencies.
try/finally for cleanup: Wrapping the REPL loop ensures the cache file is flushed to disk, even on unexpected exits. This is the TypeScript equivalent of Common Lisp’s unwind-protect.
Composing modules: This tool demonstrates how small, focused TypeScript modules (Gemini client, cache engine, keyword extractor) compose cleanly into a practical application. Each module is independently testable and reusable.

Up next

Part 4 - Symbolic AI and Knowledge Representation