This book is a collection of 28 chapters on SRE concepts such as observability, monitoring, Service Level Objectives (SLOs), alerting, resilience and debugging.
Learn how to analyze application and service crashes and freezes, navigate through process user space, and diagnose heap corruption, memory and handle leaks, CPU spikes, blocked threads, deadlocks, wait chains, and much more using WinDbg debugger. The course covers more than 50 crash dump analysis patterns from x86 and x64 process memory dumps.
Stress is unavoidable. Failure is part of the system. Resilient by Design shows how to recover faster, stay stable under pressure, and keep going when things hit hard, without burnout clichés or motivational noise.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from the Software Diagnostics and Observability Institute and the Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, and software trace and log analysis written from 15 April 2024 to 14 November 2025.
Build bulletproof Spring Boot microservices—from monolith migration to domain-driven design and event-driven patterns—while mastering the production essentials of resilience, observability, and zero‑downtime delivery. Turn complex domains into clean, scalable services with bounded contexts, aggregates, repositories, and domain events, then ship faster with rate‑limited APIs, backoff‑retries, and Kubernetes rollouts. If leading Java teams to reliable, cloud‑ready microservices is the goal, this is the hands‑on guide that gets systems into production with confidence.
Learn how to analyze .NET application and service crashes and freezes, navigate memory dump space (managed and unmanaged code), and diagnose corruption, leaks, CPU spikes, blocked threads, deadlocks, wait chains, resource contention, and much more using WinDbg on Windows and LLDB on Linux. Covers 22 .NET memory dump analysis patterns, plus the additional 21 unmanaged patterns.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written from 15 April 2023 to 14 April 2024.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written from 15 August 2021 to 14 April 2023.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in August 2020 - 14 August 2021.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in December 2019 - July 2020.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in December 2018 - November 2019.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in June 2017 - November 2018.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in October 2016 - May 2017.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in March - September 2016.
This reference volume consists of revised, edited, cross-referenced, and thematically organized articles from Software Diagnostics Institute and Software Diagnostics Library (former Crash Dump Analysis blog) about software diagnostics, root cause analysis, debugging, crash and hang dump analysis, software trace and log analysis written in August 2015 - February 2016.