Leanpub: Publish Early, Publish Often

Part 5: Apple Intelligence — On-Device LLMs with FoundationModels

Apple’s FoundationModels framework, introduced with macOS 26 (Tahoe), gives Swift developers direct access to the system’s built-in language model — no API keys, no network requests, and no data leaving the device. This part explores the framework in two chapters that build increasingly sophisticated on-device AI tools.

The first chapter builds a streaming chat tool that verifies model availability, initializes a LanguageModelSession with a system prompt and temperature setting, and enters an interactive read-evaluate-print loop. Each response is streamed token-by-token to the terminal for a real-time typewriter effect. A DispatchSource signal handler lets you press Control-C to cancel a long response without killing the process — a small but important usability detail.

The second chapter takes this further with an AI coding assistant. The tool walks a project directory, reads every Swift, Python, and Lisp source file it finds, and asks the on-device model to summarize each file and the project as a whole. It then drops into a streaming chat loop so you can ask follow-up questions about the codebase. The summarization pass uses temperature: 0 for deterministic, factual output, while the chat session uses a separate LanguageModelSession with temperature: 0.2 for slightly more creative responses. Run it inside any repository and you get an instant, context-aware, fully private coding companion.

By the end of Part 5 you will have built two practical tools with Apple’s on-device AI framework — a general-purpose streaming chat and a context-aware coding assistant — all running entirely on your Mac with complete privacy.

Up next

Using Apple Intelligence’s Default System Model To Build a Chat Command Line Tool