About This Book
- What You Will Learn
About the Cover
About the Author
Introduction: Why Graph Modeling in DuckDB?
- The Problem: Graph Databases Require Separate Infrastructure
- DuckDB as an Analytical Swiss Army Knife
- The Thesis: Sophisticated Graph Structures Live in DuckDB
- The SQL-First Approach: Start with What You Know
- The Progression: Building Graph Sophistication Layer by Layer
- Graph Extensions: Bringing Cypher and PGQL into DuckDB
- The Hybrid Approach: Vectors + Graphs in One Engine
- Connection to the LadybugDB Book
- Connection to the Edge AI Book
- What This Book Covers: Chapter by Chapter
- Who This Book Is For
- Prerequisites
- How to Use This Book
- A Note on the Journey Ahead
Getting Started with DuckDB
- What Is DuckDB?
- Installation
- In-Memory vs Persistent Databases
- DuckDB CLI Shell Basics
- Python API Basics
- The Extension System
- Why DuckDB Is Uniquely Suited for Graph Experiments
- Project Structure Recommendations
- Next Steps
Why Embedded Databases Matter for Private AI Agents
- Privacy: Data Never Leaves the Device
- Latency: Zero Network Round-Trips
- Portability: The Entire Knowledge Graph Is a Single File
- Simplicity: No Docker, No Server Configuration
- Cost: Zero Infrastructure Cost for Inference-Time Retrieval
- The Embedded Database Landscape
- Why DuckDB Specifically
- Embedded Databases as the Memory Layer for Autonomous Agents
- The Privacy Argument for Enterprise
- A Real-World Scenario: The Private AI Agent
- Bridge to the Rest of This Book
The Property Graph Model
- Introduction to Graphs
- Directed vs Undirected Graphs
- The Labeled Property Graph Model
- LPG vs RDF: When to Use Which
- Mapping Relational Models to Graphs
- Adjacency List and Edge Table Patterns
- Building a Simple Social Graph in DuckDB
- Basic Graph Queries
- Limitations of Pure SQL Approach
- Why We Need Something Better
Graph Modeling in Pure SQL
- Deep Dive into SQL-Only Implementation
- Schema Design Patterns
- Recursive CTEs for Path Traversal
- PageRank in Pure SQL
- Community Detection Approximation
- Building a Knowledge Graph
- Indexing Strategies for Graph Queries
- Performance Considerations
- DuckDB-Specific Optimizations
- Complete Working Example: Software Project Knowledge Graph
From SQL to Cypher: DuckDB Graph Extensions
- The Leap from SQL to Graph Query Languages
- Introduction to Cypher
- Introduction to PGQL
- DuckDB Graph Extensions Ecosystem
- Translating Pure SQL Examples to Graph Extensions
- Variable-Length Paths
- Shortest Path Queries
- The SQL/PGQ Standard and DuckDB’s Alignment
- When to Use Pure SQL vs Graph Extensions
- Side-by-Side Comparisons
- Limitations and Edge Cases
- Conclusion and Next Steps
Graph Algorithms in DuckDB: DuckPGQ and Onager
- Two Extensions, Two Philosophies
- Installation
- Setting Up a Test Graph
- DuckPGQ: Property Graph Declaration
- DuckPGQ Graph Algorithms
- Onager: The Analytics Powerhouse
- Applying Algorithms to Our Metagraph
- Applying Algorithms to Hypergraphs
- DuckPGQ + Onager: Combined Workflows
- Algorithm Selection Guide
- Performance Considerations
- Pure SQL Fallback
- Summary
Typed Graphs and Ontologies
- Your Schema IS Your Ontology
- A Note on Schema Evolution
- Designing Typed Node Tables
- Designing Typed Edge Tables
- Beyond JSON: DuckDB’s Native Type System for Graph Properties
- DuckPGQ: Property Graphs as Ontology
- The Wiring Matrix: Allowed Connections
- Enforcing Ontological Constraints
- A Complete Knowledge Graph Ontology
- Conclusion
Subgraphs and Graph Partitioning
- Why Subgraphs Matter
- Named Subgraphs with Views
- Property Filtering Subgraphs
- Combining Subgraph Filters
- Multiple Property Graphs Over Shared Tables
- Connected Components in SQL
- Practical Example: Topic Subgraph
- Materialized Subgraphs
- Incremental Subgraph Updates
- When to Use Subgraphs
- Conclusion
Hypergraphs: Beyond Binary Relationships
- The Limitation of Binary Edges
- What is a Hypergraph?
- Real-World Examples
- Hypergraphs in DuckDB: The Bipartite Approach
- Querying Hypergraphs
- The Bipartite Rule
- Advantages of the Bipartite Approach
- With DuckPGQ: Property Graph Modeling
- Performance Considerations
- Relational Models for Hypergraphs
Metagraphs: Graphs About Graphs
- The Next Level: Relationships as First-Class Citizens
- The Fundamental Insight: Homoiconicity
- Why Metagraphs Matter for AI
- Relational Models for Metagraphs
- The Bipartite Edge-Node Pattern
- Four Fundamental Edge Types
- Wiring the Metagraph: A Complete Example
- Querying the Metagraph
- Meta-Reasoning: Edges About Edges
- Recursive Causal Chains
- PageRank on the Metagraph
- DuckPGQ: Property Graph over the Metagraph
- The Contains Pattern: Three Levels
- Performance: When Metagraphs Become Too Deep
- Temporal Snapshots
- Metagraph Statistics
- From LadybugDB to DuckDB: The Translation
- Summary
Semantic Spacetime
- The Two Organizing Axes
- Temporality: The Third Dimension
- Implementing Semantic Spacetime in DuckDB
- Temporal Snapshot Queries
- Knowledge Decay Queries
- Extending with Causal and Temporal Relations
- The Time Tree
- Abstract Time Nodes
- Complete Semantic Spacetime DDL
DuckDB as a Triple Store
- RDF Triples: The Data Model
- Modeling Triples in DuckDB
- SPARQL-like Queries in SQL
- Named Graphs: Organizing Triples
- Converting Between LPG and RDF
- Importing RDF/Turtle Data into DuckDB
- Exporting DuckDB Graph Data as RDF Triples
- When to Use Triple Store vs Property Graph Model
- Hybrid Approach: Property Graph + Triple Store
- Complete Example: Importing an Ontology
- Summary
Vector Indexes in DuckDB
- Why Vectors Matter for Graphs
- DuckDB’s Vector Similarity Search Extension
- Storing Embeddings in DuckDB
- Creating Vector Indexes
- Querying with Vector Search
- Vector Search + Graph Traversal Pipeline
- Which Nodes Get Embeddings?
- Embedding Strategy: Model Selection
- DuckDB-Specific: Fetching Embeddings from an API
- Combining Vector Search with Full-Text Search
- Performance: HNSW in DuckDB vs Dedicated Vector Stores
- Complete Example: Semantic Search Over a Knowledge Graph
Designing Graph Memory for AI Agents
- The Complete Memory Ontology
- Edge-Node Tables for All Relationship Types
- The Wiring: Participation Tables
- Building the Time Tree in DuckDB
- Query Patterns: The Agent’s Questions
- Complete Memory DDL
- Initializing the Time Tree
- Putting It Together: A Memory in Action
- Summary
Agentic Memory Patterns
- The Retrieval Pipeline: From Query to Context
- Working Memory vs Long-Term Memory
- Episodic Memory: Events Linked by Time
- Semantic Memory: Facts and Relationships
- Procedural Memory: Action Sequences
- Memory Consolidation: From Events to Permanence
- Forgetting: Temporal Decay and Expiry
- Implementation: A MemoryAgent Class in Python
- Summary
Hybrid RAG Pipeline in DuckDB
- The Problem with Flat RAG
- The Hybrid Approach: Vectors + Graphs + Algorithms
- Schema: The Graph RAG Data Model
- Stage 1 & 2: Chunking and Entity Extraction
- Stage 3: Graph Construction and PageRank
- Stage 4: Hybrid Retrieval with RRF
- Complete Python Implementation: HybridGraphRAG
- Comparison with LadybugDB
- When to Use DuckDB RAG vs Dedicated Graph Databases
- Summary
Promise Graphs for Agent AI
- The Problem with Command-Based Thinking
- Core Principles of Promise Theory
- Multi-Layered Architecture
- Why Promise Graphs Matter for Autonomous AI
- The Complete Promise Graph Ontology
- Complete Example: Two AI Agents Coordinating a Task
- Lifecycle Queries
- Summary
The Relational Bridge: DuckDB as Graph Gateway
- Reading from External Databases with ATTACH
- Importing Relational Data into Graph Structures
- Bulk Import: COPY FROM for Parquet, CSV, JSON
- The ETL Pipeline: Relational → Graph → Enriched Relational
- Practical Pattern: Enriching PostgreSQL with Graph Intelligence
- DuckDB as Analytics Layer
- Complete Example: Building a Customer Knowledge Graph
- Summary
DuckDB as a Bridge for Data Pipelines
- The Bridge Concept
- Reading from Heterogeneous Sources
- The Bridge Pattern for Knowledge Graphs
- DuckDB as the Analytical Backbone for AI Agent Workflows
- Building a Data Pipeline That Crosses Tool Boundaries
- DuckDB’s COPY TO for Exporting
- The Federated Query Pattern
- Why DuckDB Beats Custom ETL Scripts
- Practical Example: Building a Content Knowledge Graph
- Conclusion
GraphDuck Memory: Putting It All Together
- Architecture Overview
- The GraphDuckMemory Python Class
- Usage Example
- Complete Schema Walkthrough
- Comparison with LadybugDB Memory
- When to Choose GraphDuck vs LadybugDB
- Extending the Schema for Your Domain
- Conclusion: A Unified Memory Architecture
Do We Need a “Graph Database” for Knowledge Graphs?
- The Question Nobody Wants to Ask
- What a “Graph Database” Actually Gives You
- The Case Against Dedicated Graph Databases
- Graphs Can Be Represented in Many Ways
- The Embeddable Database Argument
- Lessons from the Field
- Memory Is Not Retrieval
- When You Actually Need a Graph Database
- The GraphDuck Philosophy
- Further Reading
Afterword: From SQL to Cypher and Back
- The Progression in Retrospect
- SQL and Cypher Are Complements, Not Alternatives
- When to Use Each Approach
- The Future of Graph Support in DuckDB
- Private AI and Embedded Databases
- Final Thoughts
- Resources