Unleash the Power of Local AI with C# and .NET
Are you a C# developer looking to integrate cutting-edge AI without relying on expensive, slow, and privacy-invading cloud APIs? Book 9: Edge AI & Local Inference is the definitive guide to running Large Language Models (LLMs), Computer Vision, and Audio models directly on your user's hardware.
Move beyond Python wrappers. This volume teaches you how to architect high-performance, native .NET solutions using ONNX Runtime, LlamaSharp, and Microsoft.ML. You will learn to build applications that are offline-capable, lightning-fast, and completely private.
What's Inside:
- Local LLM Integration: Run Llama 3, Phi-3, and Mistral locally using quantized models (GGUF/ONNX).
- Offline RAG Systems: Build a private "Chat with your Data" pipeline using local vector databases and embedding models.
- Hardware Acceleration: Optimize inference using CUDA, DirectML, and NPUs directly from C#.
- Real-Time Vision & Audio: Implement YOLOv8 object detection and Whisper transcription with zero latency.
- Professional Architecture: Master asynchronous streaming, memory management for VRAM, and thread-safe UI integration (WPF/WinForms).
Whether you are building a smart IoT gateway, a privacy-focused desktop tool, or a high-throughput local server, this book provides the production-ready code and architectural patterns you need.
Stop paying per token. Start building on the Edge.
Table of contents
Chapter 1: Cloud vs Local - Privacy, Latency, and Cost
Chapter 2: Understanding Model Formats - ONNX vs GGUF
Chapter 3: Quantization Explained (FP16, INT8, INT4)
Chapter 4: Hardware Acceleration - CUDA, DirectML, and NPUs
Chapter 5: Setting up the Local Environment
Chapter 6: Introduction to LlamaSharp
Chapter 7: Loading GGUF Models (Llama 3, Phi-3)
Chapter 8: Managing Context Windows Locally
Chapter 9: Streaming Inference to the Console
Chapter 10: Stateful Chat Sessions in Local Memory
Chapter 11: Introduction to Microsoft.ML
Chapter 12: Running BERT for Text Classification
Chapter 13: Object Detection with YOLO and ONNX
Chapter 14: Text-to-Speech (TTS) with Local Models
Chapter 15: Whisper.net - Local Audio Transcription
Chapter 16: Integrating AI into WPF/Windows Forms
Chapter 17: Background Processing without Freezing UI
Chapter 18: Offline RAG - Querying Local Files
Chapter 19: Fine-Tuning Basics (LoRA concepts)
Chapter 20: Capstone - Building a Private, Offline Coding Assistant
If printed, this ebook would span over 350 pages. Each chapter is structured into theoretical foundations, an annotated basic example, an annotated advanced example, and five coding exercises based on real-world scenarios with complete solutions.
Check also the other books in this series