Name: GraphDuck : duckdb for embedded Ai agents and graphs
Brand: Leanpub
Price: 27.45 USD
Availability: InStock

You don't need a separate graph database. You need graph thinking inside the database you already have. GraphDuck teaches you how to model, query, and reason over sophisticated graph structures entirely within DuckDB — the embedded analytical database that runs everywhere, from your laptop to a serverless function. No infrastructure to manage, no new query language to learn first, no data synchronization between systems. One database, one schema, one file. The book begins where every data practitioner already stands: SQL. You will build directed graphs from simple adjacency lists, implement PageRank with recursive CTEs, and traverse shortest paths — all in pure SQL you can run today. From there, the book progressively introduces richer structures: property graphs with typed edges and node labels, RDF-like triple stores for semantic data, hypergraphs that capture multi-entity relationships, and metagraphs where edges themselves become first-class objects that can participate in other relationships. But GraphDuck goes further than modeling. The book shows you how to build real systems on top of these structures. You will design an ontology for AI agent memory — episodic, semantic, and procedural — stored as a graph inside DuckDB. You will implement Promise Graphs that track agent commitments, assessments, and outcomes. You will build a Hybrid Graph RAG pipeline that combines HNSW vector similarity search with graph traversal using Reciprocal Rank Fusion, all in a single query. Along the way, you will learn to use DuckDB's graph extensions: DuckPGQ for SQL/PGQ pattern matching (the graph syntax from the SQL:2023 standard) and graph algorithms like PageRank, shortest path, community detection, and centrality analysis — available as simple SQL table functions. Every concept comes with runnable code. Every chapter builds on the previous one. The arc is deliberate: from flat tables to temporal knowledge graphs, from a single SELECT to a complete retrieval-augmented generation pipeline. ## This book is for you if: - You know SQL and want to add graph modeling to your analytical workflows without adopting a new database - You are building AI agents and need structured memory, knowledge graphs, or hybrid retrieval — without the operational overhead of a graph database - You want to understand hypergraphs, metagraphs, and semantic spacetime — not just as theory, but as working SQL you can execute and extend - You believe the best infrastructure is the infrastructure you don't have to manage ## What you will learn: - How to model directed graphs, property graphs, hypergraphs, metagraphs, and triple stores in DuckDB SQL - How to implement graph algorithms (PageRank, shortest path, community detection, centrality) using DuckPGQ - How to build HNSW vector indexes and combine vector similarity with graph traversal - How to design ontologies for AI agent memory, promise tracking, and temporal knowledge - How to build a complete Hybrid Graph RAG pipeline in a single embedded database - When you actually need a dedicated graph database — and when DuckDB is the better choice

About This Book

What You Will Learn

About the Cover

About the Author

Introduction: Why Graph Modeling in DuckDB?

The Problem: Graph Databases Require Separate Infrastructure
DuckDB as an Analytical Swiss Army Knife
The Thesis: Sophisticated Graph Structures Live in DuckDB
The SQL-First Approach: Start with What You Know
The Progression: Building Graph Sophistication Layer by Layer
Graph Extensions: Bringing Cypher and PGQL into DuckDB
The Hybrid Approach: Vectors + Graphs in One Engine
Connection to the LadybugDB Book
Connection to the Edge AI Book
What This Book Covers: Chapter by Chapter
Who This Book Is For
Prerequisites
How to Use This Book
A Note on the Journey Ahead

Getting Started with DuckDB

What Is DuckDB?
Installation
In-Memory vs Persistent Databases
DuckDB CLI Shell Basics
Python API Basics
The Extension System
Why DuckDB Is Uniquely Suited for Graph Experiments
Project Structure Recommendations
Next Steps

Why Embedded Databases Matter for Private AI Agents

Privacy: Data Never Leaves the Device
Latency: Zero Network Round-Trips
Portability: The Entire Knowledge Graph Is a Single File
Simplicity: No Docker, No Server Configuration
Cost: Zero Infrastructure Cost for Inference-Time Retrieval
The Embedded Database Landscape
Why DuckDB Specifically
Embedded Databases as the Memory Layer for Autonomous Agents
The Privacy Argument for Enterprise
A Real-World Scenario: The Private AI Agent
Bridge to the Rest of This Book

The Property Graph Model

Introduction to Graphs
Directed vs Undirected Graphs
The Labeled Property Graph Model
LPG vs RDF: When to Use Which
Mapping Relational Models to Graphs
Adjacency List and Edge Table Patterns
Building a Simple Social Graph in DuckDB
Basic Graph Queries
Limitations of Pure SQL Approach
Why We Need Something Better

Graph Modeling in Pure SQL

Deep Dive into SQL-Only Implementation
Schema Design Patterns
Recursive CTEs for Path Traversal
PageRank in Pure SQL
Community Detection Approximation
Building a Knowledge Graph
Indexing Strategies for Graph Queries
Performance Considerations
DuckDB-Specific Optimizations
Complete Working Example: Software Project Knowledge Graph

From SQL to Cypher: DuckDB Graph Extensions

The Leap from SQL to Graph Query Languages
Introduction to Cypher
Introduction to PGQL
DuckDB Graph Extensions Ecosystem
Translating Pure SQL Examples to Graph Extensions
Variable-Length Paths
Shortest Path Queries
The SQL/PGQ Standard and DuckDB’s Alignment
When to Use Pure SQL vs Graph Extensions
Side-by-Side Comparisons
Limitations and Edge Cases
Conclusion and Next Steps

Graph Algorithms in DuckDB: DuckPGQ and Onager

Two Extensions, Two Philosophies
Installation
Setting Up a Test Graph
DuckPGQ: Property Graph Declaration
DuckPGQ Graph Algorithms
Onager: The Analytics Powerhouse
Applying Algorithms to Our Metagraph
Applying Algorithms to Hypergraphs
DuckPGQ + Onager: Combined Workflows
Algorithm Selection Guide
Performance Considerations
Pure SQL Fallback
Summary

Typed Graphs and Ontologies

Your Schema IS Your Ontology
A Note on Schema Evolution
Designing Typed Node Tables
Designing Typed Edge Tables
Beyond JSON: DuckDB’s Native Type System for Graph Properties
DuckPGQ: Property Graphs as Ontology
The Wiring Matrix: Allowed Connections
Enforcing Ontological Constraints
A Complete Knowledge Graph Ontology
Conclusion

Subgraphs and Graph Partitioning

Why Subgraphs Matter
Named Subgraphs with Views
Property Filtering Subgraphs
Combining Subgraph Filters
Multiple Property Graphs Over Shared Tables
Connected Components in SQL
Practical Example: Topic Subgraph
Materialized Subgraphs
Incremental Subgraph Updates
When to Use Subgraphs
Conclusion

Hypergraphs: Beyond Binary Relationships

The Limitation of Binary Edges
What is a Hypergraph?
Real-World Examples
Hypergraphs in DuckDB: The Bipartite Approach
Querying Hypergraphs
The Bipartite Rule
Advantages of the Bipartite Approach
With DuckPGQ: Property Graph Modeling
Performance Considerations
Relational Models for Hypergraphs

Metagraphs: Graphs About Graphs

The Next Level: Relationships as First-Class Citizens
The Fundamental Insight: Homoiconicity
Why Metagraphs Matter for AI
Relational Models for Metagraphs
The Bipartite Edge-Node Pattern
Four Fundamental Edge Types
Wiring the Metagraph: A Complete Example
Querying the Metagraph
Meta-Reasoning: Edges About Edges
Recursive Causal Chains
PageRank on the Metagraph
DuckPGQ: Property Graph over the Metagraph
The Contains Pattern: Three Levels
Performance: When Metagraphs Become Too Deep
Temporal Snapshots
Metagraph Statistics
From LadybugDB to DuckDB: The Translation
Summary

Semantic Spacetime

The Two Organizing Axes
Temporality: The Third Dimension
Implementing Semantic Spacetime in DuckDB
Temporal Snapshot Queries
Knowledge Decay Queries
Extending with Causal and Temporal Relations
The Time Tree
Abstract Time Nodes
Complete Semantic Spacetime DDL

DuckDB as a Triple Store

RDF Triples: The Data Model
Modeling Triples in DuckDB
SPARQL-like Queries in SQL
Named Graphs: Organizing Triples
Converting Between LPG and RDF
Importing RDF/Turtle Data into DuckDB
Exporting DuckDB Graph Data as RDF Triples
When to Use Triple Store vs Property Graph Model
Hybrid Approach: Property Graph + Triple Store
Complete Example: Importing an Ontology
Summary

Vector Indexes in DuckDB

Why Vectors Matter for Graphs
DuckDB’s Vector Similarity Search Extension
Storing Embeddings in DuckDB
Creating Vector Indexes
Querying with Vector Search
Vector Search + Graph Traversal Pipeline
Which Nodes Get Embeddings?
Embedding Strategy: Model Selection
DuckDB-Specific: Fetching Embeddings from an API
Combining Vector Search with Full-Text Search
Performance: HNSW in DuckDB vs Dedicated Vector Stores
Complete Example: Semantic Search Over a Knowledge Graph

Designing Graph Memory for AI Agents

The Complete Memory Ontology
Edge-Node Tables for All Relationship Types
The Wiring: Participation Tables
Building the Time Tree in DuckDB
Query Patterns: The Agent’s Questions
Complete Memory DDL
Initializing the Time Tree
Putting It Together: A Memory in Action
Summary

Agentic Memory Patterns

The Retrieval Pipeline: From Query to Context
Working Memory vs Long-Term Memory
Episodic Memory: Events Linked by Time
Semantic Memory: Facts and Relationships
Procedural Memory: Action Sequences
Memory Consolidation: From Events to Permanence
Forgetting: Temporal Decay and Expiry
Implementation: A MemoryAgent Class in Python
Summary

Hybrid RAG Pipeline in DuckDB

The Problem with Flat RAG
The Hybrid Approach: Vectors + Graphs + Algorithms
Schema: The Graph RAG Data Model
Stage 1 & 2: Chunking and Entity Extraction
Stage 3: Graph Construction and PageRank
Stage 4: Hybrid Retrieval with RRF
Complete Python Implementation: HybridGraphRAG
Comparison with LadybugDB
When to Use DuckDB RAG vs Dedicated Graph Databases
Summary

Promise Graphs for Agent AI

The Problem with Command-Based Thinking
Core Principles of Promise Theory
Multi-Layered Architecture
Why Promise Graphs Matter for Autonomous AI
The Complete Promise Graph Ontology
Complete Example: Two AI Agents Coordinating a Task
Lifecycle Queries
Summary

The Relational Bridge: DuckDB as Graph Gateway

Reading from External Databases with ATTACH
Importing Relational Data into Graph Structures
Bulk Import: COPY FROM for Parquet, CSV, JSON
The ETL Pipeline: Relational → Graph → Enriched Relational
Practical Pattern: Enriching PostgreSQL with Graph Intelligence
DuckDB as Analytics Layer
Complete Example: Building a Customer Knowledge Graph
Summary

DuckDB as a Bridge for Data Pipelines

The Bridge Concept
Reading from Heterogeneous Sources
The Bridge Pattern for Knowledge Graphs
DuckDB as the Analytical Backbone for AI Agent Workflows
Building a Data Pipeline That Crosses Tool Boundaries
DuckDB’s COPY TO for Exporting
The Federated Query Pattern
Why DuckDB Beats Custom ETL Scripts
Practical Example: Building a Content Knowledge Graph
Conclusion

GraphDuck Memory: Putting It All Together

Architecture Overview
The GraphDuckMemory Python Class
Usage Example
Complete Schema Walkthrough
Comparison with LadybugDB Memory
When to Choose GraphDuck vs LadybugDB
Extending the Schema for Your Domain
Conclusion: A Unified Memory Architecture

Do We Need a “Graph Database” for Knowledge Graphs?

The Question Nobody Wants to Ask
What a “Graph Database” Actually Gives You
The Case Against Dedicated Graph Databases
Graphs Can Be Represented in Many Ways
The Embeddable Database Argument
Lessons from the Field
Memory Is Not Retrieval
When You Actually Need a Graph Database
The GraphDuck Philosophy
Further Reading

Afterword: From SQL to Cypher and Back

The Progression in Retrospect
SQL and Cypher Are Complements, Not Alternatives
When to Use Each Approach
The Future of Graph Support in DuckDB
Private AI and Embedded Databases
Final Thoughts
Resources

Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earned over $15 million writing, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub

You pay

Author earns

About

Share this book

Categories

Feedback

Author

Contents