Nicole Forsgren: How to Measure Developer Productivity

Improving developer productivity is often the white whale of engineering leadership. Every CTO and VPE wants to move faster, ship more features, and maintain high quality, yet few have a concrete, data-backed strategy for achieving it. Instead of relying on gut feelings or flawed metrics like "lines of code," leaders need to turn to rigorous research. Dr. Nicole Forsgren, a partner at Microsoft Research and co-author of the award-winning book Accelerate, has spent years analyzing what actually drives high-performing technology organizations. By understanding the interplay between developer experience, culture, and tooling, organizations can transform their engineering velocity without sacrificing stability.

Key Takeaways

Speed and stability are correlated: Contrary to traditional belief, moving faster actually improves stability because it forces smaller batch sizes and reduces the "blast radius" of changes.
DORA metrics are the standard for delivery: To measure software delivery performance, focus on Lead Time, Deployment Frequency, Mean Time to Restore (MTTR), and Change Failure Rate.
Use the SPACE framework for holistic measurement: Do not rely on a single metric. Combine Satisfaction, Performance, Activity, Communication, and Efficiency to get a balanced view.
Trust subjective data: Telemetry tells you what happened, but surveys tell you why. If system data conflicts with developer sentiment, the developers are usually right.
Start with the "Four Box" framework: Before measuring data, define your goals in words to ensure you aren't optimizing for the wrong outcomes.

The Relationship Between Speed and Stability

For decades, the prevailing wisdom in IT management—often derived from frameworks like ITIL—was that speed comes at the cost of quality. The assumption was that to protect production environments, organizations needed heavy change advisory boards and long lead times. Research from the DORA (DevOps Research and Assessment) team has definitively disproven this.

Speed and stability move in tandem. High-performing organizations deploy frequently with short lead times and maintain low failure rates. The mechanism behind this is batch size. When teams deploy less frequently (e.g., once a month), they bundle massive amounts of code changes together. If an incident occurs, disentangling that "ball of mud" to find the root cause is difficult and time-consuming.

If you're pushing all the time, it's going to be very, very small changes which means you have a smaller blast radius. Which means when you push and you have an error in production, it's going to be easier to debug.

Conversely, elite performers push code on demand. Because the changes are granular, remediation is fast, and the cognitive load on the developer is lower because the code is still fresh in their mind.

Benchmarks for Elite Performance

If you are trying to gauge where your organization stands, compare your metrics against the DORA benchmarks for elite performers:

Deployment Frequency: On-demand (multiple deploys per day).
Lead Time for Changes: Less than one day (from code commit to running in production).
Time to Restore Service: Less than one hour.
Change Failure Rate: Between 0% and 15%.

It is important to note that precision down to the minute is less important than the general category. Whether your lead time is 4 hours or 6 hours matters less than whether it is measured in hours versus months.

Moving Beyond Delivery: The SPACE Framework

While DORA metrics are excellent for measuring the software delivery pipeline, they do not capture the full picture of developer productivity. To address this, Forsgren and her colleagues developed the SPACE framework. This approach acknowledges that software development is a complex, creative task that cannot be reduced to a single number.

The SPACE framework outlines five dimensions of productivity:

S - Satisfaction and Well-being: How developers feel about their work, tools, and culture. This is highly correlated with burnout and retention.
P - Performance: The outcome of the system, such as reliability or feature adoption.
A - Activity: A count of actions, such as Jira tickets closed or pull requests merged.
C - Communication and Collaboration: How teams work together, including documentation searchability and meeting density.
E - Efficiency and Flow: The ability to complete work with minimal interruptions and high focus.

The Rule of Three

When implementing SPACE, you should never try to measure all five dimensions at once, nor should you rely on just one. The recommendation is to pick metrics from at least three different dimensions. This creates a system of checks and balances.

For example, if a leader focuses solely on Activity (e.g., number of pull requests), developers might spam small, low-quality updates to game the metric. By pairing Activity with Performance (quality) and Satisfaction (developer sentiment), you ensure that an increase in speed isn't causing burnout or technical debt.

The Critical Role of Subjective Data

A common pitfall in engineering leadership is dismissing survey data in favor of "hard" system telemetry. Leaders often argue, "people lie, but systems don't." In reality, systems are frequently incomplete or misconfigured. You might have excellent deployment speed, but your developers might be achieving it through unsustainable heroics and overtime.

Subjective data—gathered through surveys and interviews—provides the necessary context for objective data. It can reveal friction points that telemetry misses, such as a confusing code review process or a hostile work environment.

If there is ever a disagreement between the surveys and the instrumentation... almost every time that I've ever heard of, the surveys are correct and not the instrumentation.

Even the most advanced engineering organizations, including Google, rely heavily on quarterly surveys to triangulate their system data. If your dashboard says productivity is up, but your developers say they are miserable and blocked, you have a productivity problem.

Implementation Strategy: The Four Box Framework

When teams decide to measure productivity, they often jump straight to the data they already have available. This is a mistake. It leads to measuring what is easy rather than what matters. To avoid this, utilize the "Four Box" framework to clarify your thinking before you write a single SQL query.

How to Use the Four Box Framework

Draw four boxes: two on top (Words), two on the bottom (Data).

Box 1 (Concept): Start with the abstract concept you want to influence (e.g., "Customer Satisfaction").
Box 2 (Outcome): Define what that concept leads to in plain English (e.g., "Return Customers").
Box 3 (Metric for Concept): Now, look for data that proxies the concept (e.g., NPS scores or survey results).
Box 4 (Metric for Outcome): Finally, determine the data that proves the outcome (e.g., renewal rates or referral links).

By forcing this "Words to Data" translation, you align stakeholders on the strategy before debating the validity of specific metrics. If the correlation between your chosen metrics fails, you can troubleshoot whether the data is bad, or if the original hypothesis (the relationship between the words) was flawed.

The Impact of AI on Developer Productivity

The rise of Generative AI and tools like GitHub Copilot is fundamentally shifting the nature of software engineering. We are moving from an era of writing code to an era of reviewing code. Early research suggests that developers using AI tools spend significantly more time reviewing suggestions than typing syntax.

This shift requires a re-evaluation of how we measure productivity. If a developer uses AI to complete a task 50% faster, the goal shouldn't necessarily be to reduce headcount. The value comes from freeing up cognitive load. Developers can now tackle more complex architectural problems or innovation challenges that were previously deprioritized due to time constraints.

However, AI introduces new variables that must be tracked, specifically Trust and Reliability. Organizations will need to measure how often developers accept AI suggestions and whether over-reliance on these tools is introducing subtle quality issues or degrading the learning process for junior engineers.

Conclusion

Improving developer productivity is not about buying a "DevOps tool" or setting arbitrary targets for lines of code. It requires a systematic approach to removing friction and building a culture of psychological safety. Whether you use DORA to optimize your pipeline or SPACE to balance team health, the goal remains the same: creating an environment where developers can do their best work with the least amount of friction.

For leaders looking to start today, the advice is simple: Go talk to your developers. Ask them what is slowing them down. Their answers will likely be more accurate than any dashboard you currently have. Once you identify the friction, use the frameworks above to measure your progress in removing it.

How to measure and improve developer productivity | Nicole Forsgren (Microsoft Research, Github)

Table of Contents

Key Takeaways