Leanpub: Publish Early, Publish Often

Part 6: Applied AI Projects

The final part of the book presents three self-contained projects that put earlier techniques to work on practical problems. Each chapter stands alone — you can read them in any order — but together they demonstrate the range of what Swift can do when you combine modern concurrency, Foundation networking, and a little AI thinking.

We begin with the Knowledge Base Navigator, a modern reimagining of the classic Knowledge Graph Navigator (KGN) from my earlier books. Instead of wiring symbolic NLP to raw SPARQL queries, this version delegates everything to Google’s Gemini API in a two-stage pipeline: the LLM first extracts and disambiguates entities from natural-language input, then retrieves detailed encyclopedic facts and analyzes relationships between the entities you select. The result is an interactive command-line tool built entirely with URLSession, Codable, and Swift Concurrency — zero external dependencies — that lets you explore any topic conversationally.

The Anomaly Detection chapter implements a Gaussian anomaly detection model trained on the University of Wisconsin Breast Cancer dataset. The algorithm fits a per-feature Gaussian distribution to “normal” training examples, tunes an epsilon threshold against a cross-validation set, and evaluates precision, recall, and F1 on held-out test data. This chapter highlights an important real-world scenario: when anomalies are rare and your training data is heavily unbalanced, anomaly detection outperforms standard supervised classification. The Swift implementation includes log-transform preprocessing to push features toward Gaussian distributions and ASCII histogram visualization for exploratory data analysis.

Part 6 concludes with AutoContext, which tackles the problem of using small, local LLMs with limited context windows on large document collections. It implements a hybrid retrieval system that combines BM25 (lexical/keyword search) with vector similarity (semantic search using Gemini embeddings) to identify the most relevant text chunks for any query. The results are merged, deduplicated, and formatted into a compact, targeted prompt ready for any LLM. This is the same Retrieval-Augmented Generation pattern we saw earlier in the book, but with a self-contained BM25 implementation and a focus on generating prompts rather than directly querying a model — making it compatible with any backend, from a local Ollama model to a large cloud service.

By the end of Part 6 you will have built an AI-powered knowledge exploration tool, implemented a classical machine learning algorithm from scratch, and constructed a hybrid retrieval pipeline that bridges the gap between large document collections and small-context language models.

Up next

Knowledge Base Navigator: Building an AI-Powered Information System