Your homelab runs a dozen services. When one breaks at 2am, you SSH in, run the same three commands, and go back to bed. Next month, it breaks again. You do the samething. Again.
This book ends that loop.
"Self-Healing Infrastructure" walks you through building a complete, production-grade infrastructure stack — entirely managed as code. From Proxmox cluster provisioning to Docker fleet management, DNS, reverse proxy, monitoring, backups, and a private Git server, every layer is codified in Ansible playbooks that you can run, read, and adapt.
But the real differentiator is Chapter 8: the Remediation Bridge.No other homelab guide covers autonomous self-healing. The remediation bridge sits between Alertmanager and your Ansible playbooks. When an alert fires, the bridge receives the webhook, maps it to the right playbook, dispatches the fix against the affected host, and sends you a notification — all in under 60 seconds. You don't wake up. You don't SSH in. The homelab fixes itself.
What you'll build across 11 chapters:
- Proxmox VM/LXC cluster provisioned through Ansible
- Declarative Docker Compose fleet with automated container updates
- BIND9 DNS and Caddy reverse proxy with automatic TLS — managed as code
- Prometheus, Grafana, Loki, and Alertmanager monitoring stack
- Fleet-wide encrypted Restic backups to local and remote targets
- Gitea as your infrastructure's single source of truth
- The remediation bridge: alert → playbook → fix → notify, no human required
This isn't a collection of blog posts stitched together. Every chapter builds on the last. Every playbook runs in production. Every decision is explained — not just what the YAML does, but why it's written that way, what failed before it worked, and how to adapt it to your stack.
If you run a homelab and you're tired of being the on-call engineer for your own house, this book is for you. 44,000+ words. 11 Chapters. Real configs from a real homelab.