3. Introduction
Would your organization survive an estimated loss of over $100 million over a single weekend?
That was the initial estimated direct cost after an accidental shutdown of British Airways IT systems (Butler2017) in May 2017 that brought down most of their IT systems and left 75,000 passengers stranded worldwide for a couple of days. British Airways’ share value also plummeted by over $170 million.
Besides the discussion on what led to the disaster (initial blame fell on a power surge (Hern2017) and how it could have been avoided, we are interested in the fact that it took British Airways over 2 days to bring their systems back up and running.
Did they not store artifacts released into production in a secure and readily available repository? Did they not regularly provision (manually or automatically) a minified replica of their production environments?
We can only speculate but the key here is that every organization should be able to positively answer these and other questions, if they want to be resilient and recover quickly from disaster. Coincidentally, the same practices that help with recoverability also support faster and safer delivery.
In today’s world, being able to quickly release changes to our software (including infrastructure updates), either incrementally or from scratch, is mandatory for survival.
3.1 What is software releasability?
Software releasability is the capability to release changes to our IT systems with minimal delays, 24x7. To achieve software releasability, we must design and evolve the delivery system to continuously improve reliability and reduce time to feedback for everyone involved in delivery - from developers to testers, operations, security and business owners.
A resilient delivery system allows us to recover swiftly from mistakes and even disasters in our production systems, safe in the knowledge that our pipelines will be up and running, feedback will be provided in a timely fashion, and, crucially, any production system can be fully rebuilt from scratch (notwithstanding production data issues).
Increasing software releasability encompasses an adequate pipeline design - including its evolution over time, a reasoned choice of toolset, version controlling all the intervening pieces in a pipeline, and caring about the operability, scalability, security, testability and usability of our delivery system.
3.2 What constitutes a delivery system?
A delivery system encompasses all the tooling, configuration and practices required to get a change from idea into our customer’s hands, by progressing all the required artifacts through a delivery pipeline. Such a system needs to evolve along with both the software technology and the organization’s processes.
In particular, the delivery system includes at least these components:
- Continuous Integration (CI) tool
- Pipeline orchestration tool, also known as a Continuous Delivery (CD) tool
- CI + CD infrastructure
- Orchestration plugins and 3rd party tools
- Pipeline definitions
- Source code repositories
Throughout this book we will be referencing and giving examples of the above components.
3.3 What does resilient delivery feel like?
The best way to describe what resilient delivery means is to highlight how it feels.
Everyone involved in product development today, from developers to testers or product owners, is under pressure to deliver results as soon as possible. Fast feedback is paramount to validate results. Therefore a resilient delivery means that upgrading to the latest major version of a pipeline tool, adding a new plugin or changing the pipeline configuration does not stop delivery or delay feedback for hours. Changes to the delivery system itself are deployed transparently and quickly roll backed in cases of failure (applying patterns such as immutable infrastructure (Stella2015) and blue-green deployments (Fowler2010), which we cover in chapter 5). There is no need for maintenance windows or downtime in our delivery system or after hours updates.
A resilient delivery system gracefully handles peak load of build and pipeline runs, while maintaining an efficient resource usage. This has all to do with scalability, and is especially relevant in organizations with a large number of teams which make capacity planning for CI/CD challenging. There are several ways to go about this, from distributing the delivery system itself, to auto-scaling with cloud resources (see chapter 4). The point is to prevent queued builds and pipelines that force teams to wait for feedback due to unexpected lack of capacity (Christian Deger calls this the “deployment pipeline elasticity” (Deger2018a)).
Issues with CI/CD infrastructure or tools during regular operation are detected and handled swiftly in a resilient delivery system, with adequate (aggregated) logging, monitoring and alert mechanisms in place. Impact on development teams, while not 100% avoidable, is at least greatly minimized. For instance, a build does not fail due to lack of disk space because resource usage is being monitored and alerts get triggered when a defined threshold gets hit. Anomalies are dealt with before they become blocking issues for the teams. Chapter 6 covers these aspects.
Resiliency means that disaster recovery is possible (almost) at a click of a button, at least in terms of getting systems ready to deploy again. With nothing but source repositories for setup and configuration of our CI/CD toolchain and infrastructure - as well as the applications repositories themselves - we can recreate the entire delivery system from scratch on a regular basis, not only after a disaster. Supporting practices here include (applications) pipeline-as-code and CI/CD infrastructure as code, and others described in chapter 2.
Perhaps the greatest benefit of a predictable and reliable delivery system is to remove the unnecessary stressful situation of not knowing if/when we can recover from disaster. It’s not hard to imagine the high levels of stress and management pressure on the engineers working to bring BA’s systems back to normal operations. Avoiding teams and individuals from suffering burnout is more valuable than anything else.
3.4 Warning signs of software delivery debt
Strong software releasability capabilities are key to rapid and reliable delivery of modern software systems. However, it’s easy to underestimate its importance until a failure of serious consequences takes place. Like testability or operability, releasability is a “silent enabler” for delivering and running software effectively:
“Low drama flow doesn’t look like progress to most people” - John Cutler (Cutler2017)
The analogy here is that a low drama software release looks effortless, just changes flowing smoothly through the delivery pipeline. But without putting in the necessary work on an on-going basis, releases become painful, undesired but necessary procedures - like going to the dentist (for many of us, at least).
An ad-hoc, “as needed” approach to releasing software can work temporarily in a small startup or engineering department where everyone in the team is more or less abreast of how everything works. But changes in the software, technology and tools quickly pile up and any team, no matter how brilliant, will eventually find it impossible to keep the entire delivery process in their heads. Issues start to mount because of the increasing number of moving parts in our delivery system, consistency and reproducibility takes a hit and more of developers’ time gets spent fixing them rather than working on the software itself.
Some warning signs that the delivery approach needs re-thinking include:
- time from committing changes until they are deployed to production has increased significantly
- dependency conflicts (for instance different teams depend on different versions of the same component in the same environment) becoming more frequent
- drifting release processes between teams (a new engineer in the team has to re-learn how to release)
- most commits are at end-of-day to avoid wait time during the day, leading to the pipeline being broken in the morning more often than not
- issues in the CI/CD toolchain or infrastructure detected by developers first, and taking a long time to diagnose
- having to rebuild artifacts for a redeploy because an artifact management repository and policy are not in place
- failing manual steps that only one engineer knows how to fix
- no simple method for determining if a change has been released to production or is still “in the pipeline”
These signs of an ad-hoc approach to software releasability while rare at first, tend to manifest and bundle together quickly to a point where delivery becomes extremely slow, causing revenue and/or reputation loss for the organization.
3.5 Why invest in software releasability?
Software releasability has costs, no doubt. Infrastructure (especially to support scalability), people (we cover different organization models for CI/CD in chapter 8) and tooling all have costs.
But investing the necessary CI/CD capabilities means we save the time and effort (usually taken out of product development) previously spent on things like fixing the pipeline, restoring dependencies, searching for artifacts, or rolling forward deployment fixes.
On the customer side, new releases become an eventless routine, instead of a scheduling nightmare with consecutive delays. Downtime during release is minimized meaning higher availability and less disruption to the business and/or end users. Also, roll backs in case of blocking issues become a straightforward process.
3.6 Relationship to Continuous Delivery
This book wouldn’t exist without the ground breaking work in the “Continuous Delivery” book by Dave Farley and Jez Humble (HumbleFarley2010), a compendium of good practices around building, testing, deploying, and managing software. In fact, software releasability is one of the outcomes of fully implementing continuous delivery practices and principles (you can find a list of the book’s chapters and practices that can serve as a checklist for your continuous delivery adoption at Skelton2016).
You could look at the practices and patterns described in this book as an extension of the “Continuous Delivery” (CD) book. We’ve seen them work well across different clients and we hope this book provides a contextualized approach for teams adopting continuous delivery.
The deployment pipeline is a key technique introduced by Farley and Humble. Since the book came out in 2010, pipelines and the tooling around them have evolved significantly as more and more organizations adopted CD. This book is fundamentally about the sustainability part in Farley and Humble’s definition of Continuous Delivery:
“The ability to get changes of all types, into production, or into the hands of users, safely and quickly in a sustainable way”
We believe sustainability of the delivery system is a key enabler for speed and safety in the deployment pipeline.
Our aim with this book is in part to collect the patterns and anti-patterns we’ve collected from helping customers move to CD and experiencing their benefits and their pain, respectively. A core pattern here is to focus first on goals, principles and practices and later on automation and tooling.
3.7 What this book is (not) about
This book focuses on good practices around releasing software that emerged from both our consulting work with multiple clients and industry experience. Although it touches on multiple aspects of software delivery, it does not specifically address the following:
- How to write good build scripts (or Makefiles for that matter)
- Software operability and testability (if you’re interested on those topics please see the dedicated books in this book series)
- Defining the contents of a software release (this is highly contextual, our only advice is to keep them as small as you can)
- Deciding whether a given software release is a “go” or “no go” (again highly contextual, we only recommend asking yourselves “how confident are we with the testing we’ve done?”)
- How to build C#, Java or applications in any other stack (there are plenty of resources available out there, we would not be adding anything new)
- Change management processes (although we might have a suggestion or two if your delivery is being held back by slow change approval boards)
- Deployment strategies (we recommend reading chapters 6 and 10 of the “Continuous Delivery” book and look at examples of how other organizations implemented said strategies)
3.8 How to use this book
Each chapter is readable independently, containing the necessary level of detail to be understood and actionable on its own, without requiring any of the other chapters in the book to be read first (although certainly reading the full book will provide a more comprehensive understanding of the concepts and practices and their inter-relations).
Chapter 1 explains why we need to consider our pipelines and the delivery system as a product, rather than just another (set of) tools. Focusing on operability (as detailed in SkeltonThatcher2018) is fundamental for safe and rapid delivery today.
Chapters 2 to 6 detail the benefits and techniques for ensuring the delivery system is: recoverable, operable, scalable, testable, usable, and measurable.
Chapter 7 explains why mapping the value stream activities in the pipeline is crucial for reducing time to deliver and promote global optimization of the delivery flow, rather than local.
Chapter 8 covers different team structures supporting software releasability and the pros and cons of each.
3.9 Feedback and suggestions
We’d welcome feedback and suggestions for changes: please contact us at publications@confluxdigital.net, via @ReleasabilityY on Twitter, or on the Leanpub discussion at https://leanpub.com/SoftwareReleasability/feedback.
Chris O’Dell & Manuel Pais