Leanpub Header

Skip to main content

GraphDuck : duckdb for embedded Ai agents and graphs

You don't need a graph database. You need graph thinking inside DuckDB. GraphDuck takes you from SQL adjacency lists to metagraphs, hypergraphs, and hybrid Graph RAG pipelines — all inside DuckDB. Learn to model knowledge graphs, build AI agent memory systems, run graph algorithms, and combine vector search with graph traversal in a single embedded database. Every concept comes with runnable code. No infrastructure required.

Minimum price

$27.45

$37.80

You pay

$37.80

Author earns

$30.24
$

...Or Buy With Credits!

You can get credits with a paid monthly or annual Reader Membership, or you can buy them here.
PDF
EPUB
WEB
About

About

About the Book

You don't need a separate graph database. You need graph thinking inside the database you already have. GraphDuck teaches you how to model, query, and reason over sophisticated graph structures entirely within DuckDB — the embedded analytical database that runs everywhere, from your laptop to a serverless function. No infrastructure to manage, no new query language to learn first, no data synchronization between systems. One database, one schema, one file. The book begins where every data practitioner already stands: SQL. You will build directed graphs from simple adjacency lists, implement PageRank with recursive CTEs, and traverse shortest paths — all in pure SQL you can run today. From there, the book progressively introduces richer structures: property graphs with typed edges and node labels, RDF-like triple stores for semantic data, hypergraphs that capture multi-entity relationships, and metagraphs where edges themselves become first-class objects that can participate in other relationships. But GraphDuck goes further than modeling. The book shows you how to build real systems on top of these structures. You will design an ontology for AI agent memory — episodic, semantic, and procedural — stored as a graph inside DuckDB. You will implement Promise Graphs that track agent commitments, assessments, and outcomes. You will build a Hybrid Graph RAG pipeline that combines HNSW vector similarity search with graph traversal using Reciprocal Rank Fusion, all in a single query. Along the way, you will learn to use DuckDB's graph extensions: DuckPGQ for SQL/PGQ pattern matching (the graph syntax from the SQL:2023 standard) and graph algorithms like PageRank, shortest path, community detection, and centrality analysis — available as simple SQL table functions. Every concept comes with runnable code. Every chapter builds on the previous one. The arc is deliberate: from flat tables to temporal knowledge graphs, from a single SELECT to a complete retrieval-augmented generation pipeline. ## This book is for you if: - You know SQL and want to add graph modeling to your analytical workflows without adopting a new database - You are building AI agents and need structured memory, knowledge graphs, or hybrid retrieval — without the operational overhead of a graph database - You want to understand hypergraphs, metagraphs, and semantic spacetime — not just as theory, but as working SQL you can execute and extend - You believe the best infrastructure is the infrastructure you don't have to manage ## What you will learn: - How to model directed graphs, property graphs, hypergraphs, metagraphs, and triple stores in DuckDB SQL - How to implement graph algorithms (PageRank, shortest path, community detection, centrality) using DuckPGQ - How to build HNSW vector indexes and combine vector similarity with graph traversal - How to design ontologies for AI agent memory, promise tracking, and temporal knowledge - How to build a complete Hybrid Graph RAG pipeline in a single embedded database - When you actually need a dedicated graph database — and when DuckDB is the better choice

Share this book

Author

About the Author

Volodymyr Pavlyshyn

Hey I am Volodymyr 

Seasoned Developer's Journey from COBOL to Web 3.0, SSI, Privacy First Edge AI, and Beyond

 As a seasoned developer with over 20 years of experience, I have worked with various programming languages, including some that are considered "dead," such as COBOL and Smalltalk. However, my passion for innovation and embracing cutting-edge technology has led me to focus on the emerging fields of Web 5.0, Self-Sovereign Identity (SSI),AI Agents, Knowledge Graphs, Agentiic memory systems, and the architecture of a decentralized world that empowers data democratization.

A firm believer in the potential of agent systems and the concept of a "soft" internet, I am dedicated to exploring and promoting these transformative ideas. In addition to writing, I also enjoy sharing my knowledge and insights through videoblogging. Most of my Medium posts serve as supplementary content to the videos on my YouTube channel, which you can explore here: https://www.youtube.com/c/VolodymyrPavlyshyn. 

Join me on this exciting journey as we delve into the future of technology and the possibilities it holds.

Contents

Table of Contents

About This Book

  1. What You Will Learn

About the Cover

About the Author

Introduction: Why Graph Modeling in DuckDB?

  1. The Problem: Graph Databases Require Separate Infrastructure
  2. DuckDB as an Analytical Swiss Army Knife
  3. The Thesis: Sophisticated Graph Structures Live in DuckDB
  4. The SQL-First Approach: Start with What You Know
  5. The Progression: Building Graph Sophistication Layer by Layer
  6. Graph Extensions: Bringing Cypher and PGQL into DuckDB
  7. The Hybrid Approach: Vectors + Graphs in One Engine
  8. Connection to the LadybugDB Book
  9. Connection to the Edge AI Book
  10. What This Book Covers: Chapter by Chapter
  11. Who This Book Is For
  12. Prerequisites
  13. How to Use This Book
  14. A Note on the Journey Ahead

Getting Started with DuckDB

  1. What Is DuckDB?
  2. Installation
  3. In-Memory vs Persistent Databases
  4. DuckDB CLI Shell Basics
  5. Python API Basics
  6. The Extension System
  7. Why DuckDB Is Uniquely Suited for Graph Experiments
  8. Project Structure Recommendations
  9. Next Steps

Why Embedded Databases Matter for Private AI Agents

  1. Privacy: Data Never Leaves the Device
  2. Latency: Zero Network Round-Trips
  3. Portability: The Entire Knowledge Graph Is a Single File
  4. Simplicity: No Docker, No Server Configuration
  5. Cost: Zero Infrastructure Cost for Inference-Time Retrieval
  6. The Embedded Database Landscape
  7. Why DuckDB Specifically
  8. Embedded Databases as the Memory Layer for Autonomous Agents
  9. The Privacy Argument for Enterprise
  10. A Real-World Scenario: The Private AI Agent
  11. Bridge to the Rest of This Book

The Property Graph Model

  1. Introduction to Graphs
  2. Directed vs Undirected Graphs
  3. The Labeled Property Graph Model
  4. LPG vs RDF: When to Use Which
  5. Mapping Relational Models to Graphs
  6. Adjacency List and Edge Table Patterns
  7. Building a Simple Social Graph in DuckDB
  8. Basic Graph Queries
  9. Limitations of Pure SQL Approach
  10. Why We Need Something Better

Graph Modeling in Pure SQL

  1. Deep Dive into SQL-Only Implementation
  2. Schema Design Patterns
  3. Recursive CTEs for Path Traversal
  4. PageRank in Pure SQL
  5. Community Detection Approximation
  6. Building a Knowledge Graph
  7. Indexing Strategies for Graph Queries
  8. Performance Considerations
  9. DuckDB-Specific Optimizations
  10. Complete Working Example: Software Project Knowledge Graph

From SQL to Cypher: DuckDB Graph Extensions

  1. The Leap from SQL to Graph Query Languages
  2. Introduction to Cypher
  3. Introduction to PGQL
  4. DuckDB Graph Extensions Ecosystem
  5. Translating Pure SQL Examples to Graph Extensions
  6. Variable-Length Paths
  7. Shortest Path Queries
  8. The SQL/PGQ Standard and DuckDB’s Alignment
  9. When to Use Pure SQL vs Graph Extensions
  10. Side-by-Side Comparisons
  11. Limitations and Edge Cases
  12. Conclusion and Next Steps

Graph Algorithms in DuckDB: DuckPGQ and Onager

  1. Two Extensions, Two Philosophies
  2. Installation
  3. Setting Up a Test Graph
  4. DuckPGQ: Property Graph Declaration
  5. DuckPGQ Graph Algorithms
  6. Onager: The Analytics Powerhouse
  7. Applying Algorithms to Our Metagraph
  8. Applying Algorithms to Hypergraphs
  9. DuckPGQ + Onager: Combined Workflows
  10. Algorithm Selection Guide
  11. Performance Considerations
  12. Pure SQL Fallback
  13. Summary

Typed Graphs and Ontologies

  1. Your Schema IS Your Ontology
  2. A Note on Schema Evolution
  3. Designing Typed Node Tables
  4. Designing Typed Edge Tables
  5. Beyond JSON: DuckDB’s Native Type System for Graph Properties
  6. DuckPGQ: Property Graphs as Ontology
  7. The Wiring Matrix: Allowed Connections
  8. Enforcing Ontological Constraints
  9. A Complete Knowledge Graph Ontology
  10. Conclusion

Subgraphs and Graph Partitioning

  1. Why Subgraphs Matter
  2. Named Subgraphs with Views
  3. Property Filtering Subgraphs
  4. Combining Subgraph Filters
  5. Multiple Property Graphs Over Shared Tables
  6. Connected Components in SQL
  7. Practical Example: Topic Subgraph
  8. Materialized Subgraphs
  9. Incremental Subgraph Updates
  10. When to Use Subgraphs
  11. Conclusion

Hypergraphs: Beyond Binary Relationships

  1. The Limitation of Binary Edges
  2. What is a Hypergraph?
  3. Real-World Examples
  4. Hypergraphs in DuckDB: The Bipartite Approach
  5. Querying Hypergraphs
  6. The Bipartite Rule
  7. Advantages of the Bipartite Approach
  8. With DuckPGQ: Property Graph Modeling
  9. Performance Considerations
  10. Relational Models for Hypergraphs

Metagraphs: Graphs About Graphs

  1. The Next Level: Relationships as First-Class Citizens
  2. The Fundamental Insight: Homoiconicity
  3. Why Metagraphs Matter for AI
  4. Relational Models for Metagraphs
  5. The Bipartite Edge-Node Pattern
  6. Four Fundamental Edge Types
  7. Wiring the Metagraph: A Complete Example
  8. Querying the Metagraph
  9. Meta-Reasoning: Edges About Edges
  10. Recursive Causal Chains
  11. PageRank on the Metagraph
  12. DuckPGQ: Property Graph over the Metagraph
  13. The Contains Pattern: Three Levels
  14. Performance: When Metagraphs Become Too Deep
  15. Temporal Snapshots
  16. Metagraph Statistics
  17. From LadybugDB to DuckDB: The Translation
  18. Summary

Semantic Spacetime

  1. The Two Organizing Axes
  2. Temporality: The Third Dimension
  3. Implementing Semantic Spacetime in DuckDB
  4. Temporal Snapshot Queries
  5. Knowledge Decay Queries
  6. Extending with Causal and Temporal Relations
  7. The Time Tree
  8. Abstract Time Nodes
  9. Complete Semantic Spacetime DDL

DuckDB as a Triple Store

  1. RDF Triples: The Data Model
  2. Modeling Triples in DuckDB
  3. SPARQL-like Queries in SQL
  4. Named Graphs: Organizing Triples
  5. Converting Between LPG and RDF
  6. Importing RDF/Turtle Data into DuckDB
  7. Exporting DuckDB Graph Data as RDF Triples
  8. When to Use Triple Store vs Property Graph Model
  9. Hybrid Approach: Property Graph + Triple Store
  10. Complete Example: Importing an Ontology
  11. Summary

Vector Indexes in DuckDB

  1. Why Vectors Matter for Graphs
  2. DuckDB’s Vector Similarity Search Extension
  3. Storing Embeddings in DuckDB
  4. Creating Vector Indexes
  5. Querying with Vector Search
  6. Vector Search + Graph Traversal Pipeline
  7. Which Nodes Get Embeddings?
  8. Embedding Strategy: Model Selection
  9. DuckDB-Specific: Fetching Embeddings from an API
  10. Combining Vector Search with Full-Text Search
  11. Performance: HNSW in DuckDB vs Dedicated Vector Stores
  12. Complete Example: Semantic Search Over a Knowledge Graph

Designing Graph Memory for AI Agents

  1. The Complete Memory Ontology
  2. Edge-Node Tables for All Relationship Types
  3. The Wiring: Participation Tables
  4. Building the Time Tree in DuckDB
  5. Query Patterns: The Agent’s Questions
  6. Complete Memory DDL
  7. Initializing the Time Tree
  8. Putting It Together: A Memory in Action
  9. Summary

Agentic Memory Patterns

  1. The Retrieval Pipeline: From Query to Context
  2. Working Memory vs Long-Term Memory
  3. Episodic Memory: Events Linked by Time
  4. Semantic Memory: Facts and Relationships
  5. Procedural Memory: Action Sequences
  6. Memory Consolidation: From Events to Permanence
  7. Forgetting: Temporal Decay and Expiry
  8. Implementation: A MemoryAgent Class in Python
  9. Summary

Hybrid RAG Pipeline in DuckDB

  1. The Problem with Flat RAG
  2. The Hybrid Approach: Vectors + Graphs + Algorithms
  3. Schema: The Graph RAG Data Model
  4. Stage 1 & 2: Chunking and Entity Extraction
  5. Stage 3: Graph Construction and PageRank
  6. Stage 4: Hybrid Retrieval with RRF
  7. Complete Python Implementation: HybridGraphRAG
  8. Comparison with LadybugDB
  9. When to Use DuckDB RAG vs Dedicated Graph Databases
  10. Summary

Promise Graphs for Agent AI

  1. The Problem with Command-Based Thinking
  2. Core Principles of Promise Theory
  3. Multi-Layered Architecture
  4. Why Promise Graphs Matter for Autonomous AI
  5. The Complete Promise Graph Ontology
  6. Complete Example: Two AI Agents Coordinating a Task
  7. Lifecycle Queries
  8. Summary

The Relational Bridge: DuckDB as Graph Gateway

  1. Reading from External Databases with ATTACH
  2. Importing Relational Data into Graph Structures
  3. Bulk Import: COPY FROM for Parquet, CSV, JSON
  4. The ETL Pipeline: Relational → Graph → Enriched Relational
  5. Practical Pattern: Enriching PostgreSQL with Graph Intelligence
  6. DuckDB as Analytics Layer
  7. Complete Example: Building a Customer Knowledge Graph
  8. Summary

DuckDB as a Bridge for Data Pipelines

  1. The Bridge Concept
  2. Reading from Heterogeneous Sources
  3. The Bridge Pattern for Knowledge Graphs
  4. DuckDB as the Analytical Backbone for AI Agent Workflows
  5. Building a Data Pipeline That Crosses Tool Boundaries
  6. DuckDB’s COPY TO for Exporting
  7. The Federated Query Pattern
  8. Why DuckDB Beats Custom ETL Scripts
  9. Practical Example: Building a Content Knowledge Graph
  10. Conclusion

GraphDuck Memory: Putting It All Together

  1. Architecture Overview
  2. The GraphDuckMemory Python Class
  3. Usage Example
  4. Complete Schema Walkthrough
  5. Comparison with LadybugDB Memory
  6. When to Choose GraphDuck vs LadybugDB
  7. Extending the Schema for Your Domain
  8. Conclusion: A Unified Memory Architecture

Do We Need a “Graph Database” for Knowledge Graphs?

  1. The Question Nobody Wants to Ask
  2. What a “Graph Database” Actually Gives You
  3. The Case Against Dedicated Graph Databases
  4. Graphs Can Be Represented in Many Ways
  5. The Embeddable Database Argument
  6. Lessons from the Field
  7. Memory Is Not Retrieval
  8. When You Actually Need a Graph Database
  9. The GraphDuck Philosophy
  10. Further Reading

Afterword: From SQL to Cypher and Back

  1. The Progression in Retrospect
  2. SQL and Cypher Are Complements, Not Alternatives
  3. When to Use Each Approach
  4. The Future of Graph Support in DuckDB
  5. Private AI and Embedded Databases
  6. Final Thoughts
  7. Resources

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub