There is a specific way engineering teams tend to fail at DORA metrics.
They look at the four measurements:
- Deployment frequency
- Lead time for changes
- Change failure rate
- Mean time to recovery
And treat them as four separate goals.
They:
- Assign an owner to each one.
- Set improvement targets for each one.
- Track progress on each one independently.
- Celebrate when individual numbers move in the right direction.
Six months later:
- Deployment frequency has improved significantly.
- Lead time has come down.
- Change failure rate has gotten worse.
- Mean time to recovery has gotten worse.
The team has two metrics that look better and two that look worse, and nobody is sure whether things have improved overall.
This is what happens when DORA metrics are treated as a checklist rather than as a system.
What Makes Them a System
The DORA metrics are not independent measurements.
They are interconnected signals about a single underlying thing:
How well a software delivery system converts developer work into reliable production outcomes.
The relationships between them are structural.
Improving one metric through an approach that ignores the others almost always creates pressure on the others in ways that are predictable in hindsight and invisible in the moment.
Deployment Frequency and Change Failure Rate
Deployment frequency and change failure rate have the most direct relationship.
Deploying more often creates more opportunities for regressions to reach production.
When teams improve deployment frequency without corresponding improvements to testing infrastructure, change failure rate rises.
The pipeline is faster. The software arriving through it is less reliably validated. More failures reach production.
Lead Time and Change Failure Rate
Lead time and change failure rate have a similar relationship.
When lead time is reduced by removing friction from the review and merge process without addressing why that friction existed, the validation steps that friction was providing stop happening.
Code moves faster. Failures that were being caught in review reach production instead.
Mean Time to Recovery and Reliability
Mean time to recovery and reliability have a reinforcing relationship.
Teams that invest in observability infrastructure recover from failures faster.
Faster recovery reduces the cost of individual failures.
Lower cost per failure changes how teams think about acceptable change failure rate.
The system develops a tolerance for failures that might otherwise be prevented.
None of these relationships make it impossible to improve DORA metrics sustainably.
They make it impossible to improve them sustainably by treating each metric as an isolated optimization target.
The Correct Reading of Each Metric
Reading DORA metrics as a system means understanding what each one is actually telling you rather than what the number appears to say on its own.
Deployment Frequency
Deployment frequency is a signal about delivery cadence.
High deployment frequency indicates that the team has removed barriers between completed work and production delivery.
But the signal only means what it appears to mean when change failure rate is also healthy.
High deployment frequency with high change failure rate means the team is delivering failures frequently, not value frequently.
Lead Time for Changes
Lead time for changes is a signal about flow efficiency.
Short lead time indicates that work moves smoothly from development through review, testing, and deployment without accumulating in queues.
But short lead time achieved by skipping validation steps is not flow efficiency.
It is validation debt that will appear in change failure rate and mean time to recovery.
Change Failure Rate
Change failure rate is the signal most directly connected to engineering quality practices.
Low change failure rate indicates that the validation infrastructure is catching behavioral regressions before they reach production.
It is the metric that testing investment most directly influences and the metric that reveals most clearly whether improvements to deployment frequency and lead time are genuine or illusory.
Mean Time to Recovery
Mean time to recovery is a signal about organizational resilience.
Fast recovery indicates that the team can identify failures quickly and restore service efficiently.
It is influenced by:
- Observability infrastructure
- Incident response practices
- Deployment architecture
It does not substitute for low change failure rate.
Recovering quickly from failures that should not have happened is not the same as not having those failures.
Reliability
Reliability, the fifth metric in the current DORA framework, is the cumulative signal.
It captures whether the system as a whole is maintaining stability under the pace of change.
A team can have individually acceptable scores on the other four metrics and still have degrading reliability if the aggregate pace of change consistently exceeds what the validation and recovery infrastructure can absorb.
The System Behavior That Produces Sustainable Improvement
Teams that improve all five DORA metrics sustainably tend to make changes to the underlying delivery system rather than to the individual metrics.
The changes that produce sustainable improvement share a common characteristic:
They address the root causes of metric values rather than the metric values themselves.
Improving Change Failure Rate
Change failure rate improves when regression testing infrastructure catches more failures before deployment.
Not when:
- Deployment gates are tightened to block more releases.
- Rollback processes are improved.
- Teams simply become more conservative about shipping.
Those changes affect other metrics but do not reduce the number of failures reaching production.
The sustainable improvement comes from improving validation quality.
Improving Lead Time
Lead time improves when code review bottlenecks are addressed upstream.
Common causes include:
- Large pull requests that are difficult to review quickly.
- Unclear change scope.
- Flaky tests that require investigation before approval.
Addressing those root causes reduces lead time without removing validation.
Improving Deployment Frequency
Deployment frequency increases sustainably when change failure rate is low enough that more frequent deployment does not produce proportionally more failures.
The frequency improvement that lasts is the one that follows testing infrastructure improvement rather than preceding it.
Improving Mean Time to Recovery
Mean time to recovery improves when observability investment precedes deployment frequency increases.
Teams that add observability tooling after reliability problems emerge are instrumenting a system they cannot yet see clearly.
Teams that invest in observability before accelerating delivery are building the foundation for fast recovery before it becomes urgently necessary.
What Treating Them as a System Actually Looks Like
In practice, treating DORA metrics as a system means making improvement decisions based on the relationships between metrics rather than on individual metric values.
Scenario 1: Low Deployment Frequency, Low Change Failure Rate
When deployment frequency is low but change failure rate is also low, the bottleneck is likely in the deployment process itself.
Potential causes include:
- Approval gates
- Manual deployment steps
- Pipeline architecture limitations
Increasing deployment frequency is appropriate.
Scenario 2: Low Deployment Frequency, High Change Failure Rate
When deployment frequency is low and change failure rate is high, the bottleneck is in testing infrastructure.
Increasing deployment frequency before addressing change failure rate will produce more frequent failures.
The correct intervention is testing infrastructure improvement first.
Scenario 3: Long Lead Time
When lead time is long, the cause determines the intervention.
Examples include:
- Slow code reviews → Improve review workflow.
- Slow pipeline execution → Improve test execution architecture.
- Frequent deployment gate blocks → Improve change failure rate.
The metric alone does not reveal the solution. The surrounding system behavior does.
Scenario 4: High Mean Time to Recovery
When mean time to recovery is high, both observability infrastructure and incident response processes deserve examination.
The weaker of the two is usually the highest-leverage place to invest.
Scenario 5: Degrading Reliability
When reliability is degrading while other metrics appear acceptable, the pace of change is likely exceeding what the validation and recovery infrastructure can safely absorb.
The appropriate response is not necessarily to slow delivery.
The better response is often to invest in:
- Testing infrastructure
- Observability systems
- Recovery processes
That make the current delivery pace sustainable.
The Diagnostic Value of Reading Them Together
The most useful thing about DORA metrics read as a system is that the pattern of values across all five metrics is more informative than any individual value.
Pattern: Fast Delivery, Poor Reliability
Characteristics:
- High deployment frequency
- Low lead time
- High change failure rate
- High mean time to recovery
Diagnosis:
Testing infrastructure problem.
The delivery process is fast. The validation process is insufficient.
Pattern: High Quality, Slow Releases
Characteristics:
- Low deployment frequency
- Low lead time
- Low change failure rate
- Low mean time to recovery
Diagnosis:
Unnecessary friction problem.
Quality is good. Something is blocking releases that does not need to be blocking them.
Pattern: Reliable Delivery, Slow Recovery
Characteristics:
- High deployment frequency
- Low lead time
- Low change failure rate
- High mean time to recovery
Diagnosis:
Observability problem.
The team is shipping reliably but recovering slowly when failures occur.
Each pattern points to a different root cause and a different intervention.
None of those interventions are visible when metrics are read in isolation.
Why DORA Metrics Are a System
That is why DORA metrics are a system.
Not because someone decided to group several measurements together.
But because the underlying delivery system they measure is itself a system.
And systems are only legible when you look at how their components relate to each other rather than at each component alone.