Using Property Graph Database with Ollama

I have a long history of working with Knowledge Graphs (at Google and OliveAI) and I usually use RDF graph databases and the SPARQL query language. I have recently developed a preference for property graph databases because recent research has shown that using LLMs with RDF-based graphs have LLM context size issues due to large schemas, overlapping relations, and complex identifiers that exceed LLM context windows. Property graph databases like Neo4J and Kuzu (which we use in this chapter) have more concise schemas.

It is true that Google and other players are teasing ‘infinite context’ LLMs but since this book is about running smaller models locally I have chosen to only show a property graph example.

Overview of Property Graphs

Property graphs represent a powerful and flexible data modeling paradigm that has gained significant traction in modern database systems and applications. At its core, a property graph is a directed graph structure where both vertices (nodes) and edges (relationships) can contain properties in the form of key-value pairs, providing rich contextual information about entities and their connections. Unlike traditional relational databases that rely on rigid table structures, property graphs offer a more natural way to represent highly connected data while maintaining the semantic meaning of relationships. This modeling approach is particularly valuable when dealing with complex networks of information where the relationships between entities are just as important as the entities themselves. The distinguishing characteristics of property graphs make them especially well-suited for handling real-world data scenarios where relationships are multi-faceted and dynamic. Each node in a property graph can be labeled with one or more types (such as Person, Product, or Location) and can hold any number of properties that describe its attributes. Similarly, edges can be typed (like “KNOWS”, “PURCHASED”, or “LOCATED_IN”) and augmented with properties that qualify the relationship, such as timestamps, weights, or quality scores. This flexibility allows for sophisticated querying and analysis of data patterns that would be cumbersome or impossible to represent in traditional relational schemas. The property graph model has proven particularly valuable in domains such as social network analysis, recommendation systems, fraud detection, and knowledge graphs, where understanding the intricate web of relationships between entities is crucial for deriving meaningful insights.

Example Using Ollama, LangChain, and the Kuzu Property Graph Database

The example shown here is derived from an example in the LangChain documentation: https://python.langchain.com/docs/integrations/graphs/kuzu_db/. I modified the example to use a local model running on Ollama instead of the OpenAI APIs. Here is the file graph_kuzu_property_example.py:

 1 import kuzu
 2 from langchain.chains import KuzuQAChain
 3 from langchain_community.graphs import KuzuGraph
 4 from langchain_ollama.llms import OllamaLLM
 5 
 6 db = kuzu.Database("test_db")
 7 conn = kuzu.Connection(db)
 8 
 9 # Create two tables and a relation:
10 conn.execute("CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))")
11 conn.execute(
12     "CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))"
13 )
14 conn.execute("CREATE REL TABLE ActedIn (FROM Person TO Movie)")
15 conn.execute("CREATE (:Person {name: 'Al Pacino', birthDate: '1940-04-25'})")
16 conn.execute("CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})")
17 conn.execute("CREATE (:Movie {name: 'The Godfather'})")
18 conn.execute("CREATE (:Movie {name: 'The Godfather: Part II'})")
19 conn.execute(
20     "CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})"
21 )
22 conn.execute(
23     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)"
24 )
25 conn.execute(
26     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
27 )
28 conn.execute(
29     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)"
30 )
31 conn.execute(
32     "MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)"
33 )
34 
35 graph = KuzuGraph(db, allow_dangerous_requests=True)
36 
37 # Create a chain
38 chain = KuzuQAChain.from_llm(
39     llm=OllamaLLM(model="qwen2.5-coder:14b"),
40     graph=graph,
41     verbose=True,
42     allow_dangerous_requests=True,
43 )
44 
45 print(graph.get_schema)
46 
47 # Ask two questions
48 chain.invoke("Who acted in The Godfather: Part II?")
49 chain.invoke("Robert De Niro played in which movies?")

This code demonstrates the implementation of a graph database using Kuzu, integrated with LangChain for question-answering capabilities. The code initializes a database connection and establishes a schema with two node types (Movie and Person) and a relationship type (ActedIn), creating a graph structure suitable for representing actors and their film appearances.

The implementation populates the database with specific data about “The Godfather” trilogy and two prominent actors (Al Pacino and Robert De Niro). It uses Cypher-like query syntax to create nodes for both movies and actors, then establishes relationships between them using the ActedIn relationship type. The data model represents a typical many-to-many relationship between actors and movies.

This example then sets up a question-answering chain using LangChain, which combines the Kuzu graph database with the Ollama language model (specifically the qwen2.5-coder:14b model). This chain enables natural language queries against the graph database, allowing users to ask questions about actor-movie relationships and receive responses based on the stored graph data. The implementation includes two example queries to demonstrate the system’s functionality.

Here is the output from this example:

 1 $ rm -rf test_db 
 2 $ uv run graph_kuzu_property_example.py
 3 
 4 Node properties: [{'properties': [('name', 'STRING')], 'label': 'Movie'}, {'properties': [('name', 'STRING'), ('birthDate', 'STRING')], 'label': 'Person'}]
 5 Relationships properties: [{'properties': [], 'label': 'ActedIn'}]
 6 Relationships: ['(:Person)-[:ActedIn]->(:Movie)']
 7 
 8 > Entering new KuzuQAChain chain...
 9 Generated Cypher:
10 
11 MATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
12 RETURN p.name
13 
14 Full Context:
15 [{'p.name': 'Al Pacino'}, {'p.name': 'Robert De Niro'}]
16 
17 > Finished chain.
18 
19 > Entering new KuzuQAChain chain...
20 Generated Cypher:
21 
22 MATCH (p:Person {name: "Robert De Niro"})-[:ActedIn]->(m:Movie)
23 RETURN m.name
24 
25 Full Context:
26 [{'m.name': 'The Godfather: Part II'}]
27 
28 > Finished chain.

The Cypher query language is commonly used in property graph databases. Here is a sample query:

1 MATCH (p:Person)-[:ActedIn]->(m:Movie {name: 'The Godfather: Part II'})
2 RETURN p.name

This Cypher query performs a graph pattern matching operation to find actors who appeared in “The Godfather: Part II”. Let’s break it down:

  • MATCH initiates a pattern matching operation
  • (p:Person) looks for nodes labeled as “Person” and assigns them to variable p
  • -[:ActedIn]-> searches for “ActedIn” relationships pointing outward
  • (m:Movie ) matches Movie nodes specifically with the name property equal to “The Godfather: Part II”
  • RETURN p.name returns only the name property of the matched Person nodes

Based on the previous code’s data, this query would return “Al Pacino” and “Robert De Niro” since they both acted in that specific film.

Using LLMs to Create Graph Databases from Text Data

Using Kuzo with local LLMs is simple to implement as seen in the last section. If you use large property graph databases hosted with Kuzo or Neo4J, then the example in the last section is hopefully sufficient to get you started implementing natural language interfaces to property graph databases.

Now we will do something very different: use LLMs to generate data for property graphs, that is, to convert text to Python code to create a Kuzo property graph database.

Specifically, we use the approach:

  • Use the last example file graph_kuzu_property_example.py as an example for Claude Sonnet 3.5 to understand the Kuzo Python APIs.
  • Have Claude Sonnet 3.5 read the file data/economics.txt and create a schema for a new graph database and populate the schema from the contents of the file data/economics.txt.
  • Ask Claude Sonnet 3.5 to also generate query examples.

Except for my adding the utility function query_and_print_result, this code was generated by Claude Sonnet 3.5:

  1 """
  2 Created by Claude Sonnet 3.5 from prompt:
  3 
  4 Given some text, I want you to define Property graph schemas for
  5 the information in the text. As context, here is some Python code
  6 for defining two tables and a relation and querying the data:
  7 
  8 [[CODE FROM graph_kuzu_property_example.py]]
  9 
 10 NOW, HERE IS THE TEST TO CREATE SCHEME FOR, and to write code to
 11 create nodes and links conforming to the scheme:
 12 
 13 [[CONTENTS FROM FILE data/economics.txt]]
 14 
 15 """
 16 
 17 import kuzu
 18 
 19 db = kuzu.Database("economics_db")
 20 conn = kuzu.Connection(db)
 21 
 22 # Node tables
 23 conn.execute("""
 24 CREATE NODE TABLE School (
 25     name STRING,
 26     description STRING,
 27     PRIMARY KEY(name)
 28 )""")
 29 
 30 conn.execute("""
 31 CREATE NODE TABLE Economist (
 32     name STRING,
 33     birthDate STRING,
 34     PRIMARY KEY(name)
 35 )""")
 36 
 37 conn.execute("""
 38 CREATE NODE TABLE Institution (
 39     name STRING,
 40     type STRING,
 41     PRIMARY KEY(name)
 42 )""")
 43 
 44 conn.execute("""
 45 CREATE NODE TABLE EconomicConcept (
 46     name STRING,
 47     description STRING,
 48     PRIMARY KEY(name)
 49 )""")
 50 
 51 # Relationship tables
 52 conn.execute("CREATE REL TABLE FoundedBy (FROM School TO Economist)")
 53 conn.execute("CREATE REL TABLE TeachesAt (FROM Economist TO Institution)")
 54 conn.execute("CREATE REL TABLE Studies (FROM School TO EconomicConcept)")
 55 
 56 # Insert some data
 57 conn.execute("CREATE (:School {name: 'Austrian School', description: 'School of economic thought emphasizing spontaneous organizing power of price mechanism'})")
 58 
 59 # Create economists
 60 conn.execute("CREATE (:Economist {name: 'Carl Menger', birthDate: 'Unknown'})")
 61 conn.execute("CREATE (:Economist {name: 'Eugen von Böhm-Bawerk', birthDate: 'Unknown'})")
 62 conn.execute("CREATE (:Economist {name: 'Ludwig von Mises', birthDate: 'Unknown'})")
 63 conn.execute("CREATE (:Economist {name: 'Pauli Blendergast', birthDate: 'Unknown'})")
 64 
 65 # Create institutions
 66 conn.execute("CREATE (:Institution {name: 'University of Krampton Ohio', type: 'University'})")
 67 
 68 # Create economic concepts
 69 conn.execute("CREATE (:EconomicConcept {name: 'Microeconomics', description: 'Study of individual agents and markets'})")
 70 conn.execute("CREATE (:EconomicConcept {name: 'Macroeconomics', description: 'Study of entire economy and issues affecting it'})")
 71 
 72 # Create relationships
 73 conn.execute("""
 74 MATCH (s:School), (e:Economist) 
 75 WHERE s.name = 'Austrian School' AND e.name = 'Carl Menger' 
 76 CREATE (s)-[:FoundedBy]->(e)
 77 """)
 78 
 79 conn.execute("""
 80 MATCH (s:School), (e:Economist) 
 81 WHERE s.name = 'Austrian School' AND e.name = 'Eugen von Böhm-Bawerk' 
 82 CREATE (s)-[:FoundedBy]->(e)
 83 """)
 84 
 85 conn.execute("""
 86 MATCH (s:School), (e:Economist) 
 87 WHERE s.name = 'Austrian School' AND e.name = 'Ludwig von Mises' 
 88 CREATE (s)-[:FoundedBy]->(e)
 89 """)
 90 
 91 conn.execute("""
 92 MATCH (e:Economist), (i:Institution) 
 93 WHERE e.name = 'Pauli Blendergast' AND i.name = 'University of Krampton Ohio' 
 94 CREATE (e)-[:TeachesAt]->(i)
 95 """)
 96 
 97 # Link school to concepts it studies
 98 conn.execute("""
 99 MATCH (s:School), (c:EconomicConcept) 
100 WHERE s.name = 'Austrian School' AND c.name = 'Microeconomics' 
101 CREATE (s)-[:Studies]->(c)
102 """)
103 
104 """
105 Code written from the prompt:
106 
107 Now that you have written code to create a sample graph database about
108 economics, you can write queries to extract information from the database.
109 """
110 
111 def query_and_print_result(query):
112     """Basic pretty printer for Kuzu query results"""
113     print(f"\n* Processing: {query}")
114     result = conn.execute(query)
115     if not result:
116         print("No results found")
117         return
118 
119     # Get column names
120     while result.has_next():
121         r = result.get_next()
122         print(r)
123 
124 # 1. Find all founders of the Austrian School
125 query_and_print_result("""
126 MATCH (s:School)-[:FoundedBy]->(e:Economist)
127 WHERE s.name = 'Austrian School'
128 RETURN e.name
129 """)
130 
131 # 2. Find where Pauli Blendergast teaches
132 query_and_print_result("""
133 MATCH (e:Economist)-[:TeachesAt]->(i:Institution)
134 WHERE e.name = 'Pauli Blendergast'
135 RETURN i.name, i.type
136 """)
137 
138 # 3. Find all economic concepts studied by the Austrian School
139 query_and_print_result("""
140 MATCH (s:School)-[:Studies]->(c:EconomicConcept)
141 WHERE s.name = 'Austrian School'
142 RETURN c.name, c.description
143 """)
144 
145 # 4. Find all economists and their institutions
146 query_and_print_result("""
147 MATCH (e:Economist)-[:TeachesAt]->(i:Institution)
148 RETURN e.name as Economist, i.name as Institution
149 """)
150 
151 # 5. Find schools and count their founders
152 query_and_print_result("""
153 MATCH (s:School)-[:FoundedBy]->(e:Economist)
154 RETURN s.name as School, COUNT(e) as NumberOfFounders
155 """)
156 
157 # 6. Find economists who both founded schools and teach at institutions
158 query_and_print_result("""
159 MATCH (s:School)-[:FoundedBy]->(e:Economist)-[:TeachesAt]->(i:Institution)
160 RETURN e.name as Economist, s.name as School, i.name as Institution
161 """)
162 
163 # 7. Find economic concepts without any schools studying them
164 query_and_print_result("""
165 MATCH (c:EconomicConcept)
166 WHERE NOT EXISTS {
167     MATCH (s:School)-[:Studies]->(c)
168 }
169 RETURN c.name
170 """)
171 
172 # 8. Find economists with no institutional affiliations
173 query_and_print_result("""
174 MATCH (e:Economist)
175 WHERE NOT EXISTS {
176     MATCH (e)-[:TeachesAt]->()
177 }
178 RETURN e.name
179 """)

How might you use this example? Using one or two shot prompting in LLM input prompts to specify data format and other information and then generating structured data of Python code is a common implementation pattern for using LLMs.

Here, the “structured data” I asked an LLM to output was Python code.

I cheated in this example by using what is currently the best code generation LLM: Claude Sonnet 3.5. I also tried this same exercise using Ollama with the model qwen2.5-coder:14b and the results were not quite as good. This is a great segway into the final chapter Book Wrap Up.