Long Term Persistence Using Mem0 and Chroma

Something important that we haven’t covered yet: building a persistent memory for LLM applications. Here we use two libraries:

Mem0: persistent memory for AI Agents and LLM applications. GitHub: https://github.com/mem0ai/mem0.
Chroma: AI-native open-source vector database that simplifies building LLM apps by providing tools for storing, embedding, and searching embeddings.

The example in this chapter is simple and can be copied and modified for multiple applications; for example:

Code advice agent for Python
Store thoughts and ideas

Code Example Using Mem0 and Chroma

This Python script demonstrates how to create a persistent memory for an AI assistant using the mem0ai library, chromadb for vector storage, and ollama for interacting with LLMs. Designed to be run repeatedly, each execution processes a user prompt, leverages past interactions stored in a local ChromaDB database, and then generates a concise, relevant response using a local Gemma model. The core idea is that the mem0ai library facilitates storing conversation snippets and retrieving them based on the semantic similarity to the current query. This retrieved context, referred to as “memories,” is then injected into the LLM’s system prompt, allowing the AI to maintain a coherent and context-aware conversation across multiple, independent runs. By persisting these memories locally, the system effectively builds a long-term conversational understanding, enabling the AI to recall and utilize previously discussed information to provide more informed and relevant answers over time, even when the script is executed as a fresh process each time.

The Chroma vector store database is stored under the file path ./db_local and until you delete this directory, memories of old interactions are maintained.

One parameter you may want to change is the number of memories matched in the Chroma database. This can be set in the line of code m.search(query=args.prompt, limit=5, …).

 1 # Run this script repeatedly to build a persistent memory:
 2 #
 3 #   uv run mem0_persistence.py "What color is the sky?"
 4 #   uv run mem0_persistence.py "What is the last color we talked about?"
 5 
 6 
 7 import argparse
 8 from mem0 import Memory
 9 from ollama import chat
10 from ollama import ChatResponse
11            
12 USER_ID = "123"
13 
14 config = {
15   "user_id": USER_ID,
16   "vector_store": {
17     "provider": "chroma",
18     "config": { "path": "db_local" }
19   },
20   "llm": {
21     "provider": "ollama",
22     "config": {
23       "model": "gemma3:4b-it-qat",
24       "temperature": 0.1,
25       "max_tokens": 5000
26     }
27   },
28 }
29 
30 def call_ollama_chat(model: str, messages: list[dict]) -> str:
31   """
32   Send a chat request to Ollama and return the assistant's reply.
33   """
34   response: ChatResponse = chat(
35       model=model,
36       messages=messages
37   )
38   return response.message.content
39   
40 def main():
41   p = argparse.ArgumentParser()
42   p.add_argument("prompt", help="Your question")
43   args = p.parse_args()
44 
45   m = Memory.from_config(config)
46   print(f"User: {args.prompt}")
47 
48   rel = m.search(query=args.prompt, limit=5, user_id=USER_ID)
49   mems = "\n".join(f"- {e['memory']}" for e in rel["results"])
50   print("Memories:\n", mems)
51 
52   system0 = "You are a helpful assistant who answers with concise, short answers."
53   system = f"{system0}\nPrevious user memories:\n{mems}"
54   
55   msgs = [
56     {"role":"system","content":system},
57     {"role":"user","content":args.prompt}
58   ]
59   
60   reply = call_ollama_chat("gemma3:4b-it-qat", msgs)
61 
62   convo = {"role":"assistant",
63            "content":
64            f"QUERY: {args.prompt}\n\nANSWER:\n{reply}\n"}
65   m.add(convo, user_id=USER_ID, infer=False)
66     
67   print(f"\n\n** convo:\n{convo}\n\n")
68 
69   print("Assistant:", reply)
70 
71 if __name__=="__main__":
72   main()

In the line m.add(…) set infer=True if you want to use the configured LLM in Ollama to filter out inserts. I almost always set this to False to store all questions and answers in the Chroma vector database.

Example Output

The following output has been lightly edited, removing library deprecation warnings and extra blank lines in the output. As with most examples in this book we use uv to manage dependences and run Python code:

The first time we run the test script the vector database is empty so the user query “Name two Physical laws” does not match any previous memories stored in the Chroma vector database:

 1  $ uv run mem0_persistence.py "Name two Physical laws"
 2 
 3 User: Name two Physical laws
 4 Memories:
 5 
 6 
 7 ** convo:
 8 {'role': 'assistant', 'content': "QUERY: Name two Physical laws\n\nANSWER:\n1.  Newton's First Law\n2.  Law of Conservation of Energy\n"}
 9 
10 Assistant: 1.  Newton's First Law
11 2.  Law of Conservation of Energy

Now the Chroma data store contains one memory:

 1 $ uv run mem0_persistence.py "Name another different Physical law"
 2 
 3 User: Name another different Physical law
 4 Memories:
 5  - QUERY: Name two Physical laws
 6 
 7 ANSWER:
 8 1.  Newton's First Law
 9 2.  Law of Conservation of Energy
10 
11 ** convo:
12 {'role': 'assistant', 'content': "QUERY: Name another different Physical law\n\nANSWER:\n1.  Newton's Third Law\n"}
13 
14 Assistant: 1.  Newton's Third Law

Here we ask a question in a different subject domain:

 1 $ uv run mem0_persistence.py "What color is the sky?" 
 2 
 3 User: What color is the sky?
 4 Memories:
 5  - QUERY: Name another different Physical law
 6 
 7 ANSWER:
 8 1.  Newton's Third Law
 9 
10 - QUERY: Name two Physical laws
11 
12 ANSWER:
13 1.  Newton's First Law
14 2.  Law of Conservation of Energy
15 
16 ** convo:
17 {'role': 'assistant', 'content': 'QUERY: What color is the sky?\n\nANSWER:\nBlue.\n'}
18 
19 Assistant: Blue.

We check persistence:

 1 $ uv run mem0_persistence.py "What is the last color we talked about?" 
 2 
 3 User: What is the last color we talked about?
 4 Memories:
 5  - QUERY: What color is the sky?
 6 
 7 ANSWER:
 8 Blue.
 9 
10 - QUERY: Name two Physical laws
11 
12 ANSWER:
13 1.  Newton's First Law
14 2.  Law of Conservation of Energy
15 
16 - QUERY: Name another different Physical law
17 
18 ANSWER:
19 1.  Newton's Third Law
20 
21 ** convo:
22 {'role': 'assistant', 'content': 'QUERY: What is the last color we talked about?\n\nANSWER:\nBlue.\n'}
23 
24 Assistant: Blue.

Up next

Using Ollama Cloud Services