How to Measure and Improve Developer Productivity: Expert Insights from Nicole Forsgren

Developer productivity expert Nicole Forsgren reveals how DORA and SPACE frameworks help engineering teams move faster while maintaining quality and stability.

Key Takeaways

Moving faster actually improves code quality by reducing batch sizes and blast radius of changes
Elite teams deploy on demand with less than one day lead time and under 15% change failure rates
Company size shows no statistical significance in performance - small and large teams can achieve elite status equally
The SPACE framework requires measuring at least three dimensions: Satisfaction, Performance, Activity, Communication, and Efficiency
Developer satisfaction surveys often reveal insights that system metrics miss, making human feedback irreplaceable
Speed and stability metrics move together - teams that deploy more frequently have more stable systems
Clear problem definition prevents 80% of productivity improvement failures before they start
AI tools are shifting developer work from writing code to reviewing code, requiring new productivity measurement approaches
Both top-down executive buy-in and bottom-up developer engagement are essential for successful productivity initiatives

Timeline Overview

00:00–07:55 — Nicole's background: From IBM software engineer to PhD researcher, founding DORA, and her current dual role at Microsoft Research leading developer productivity research and cross-company infrastructure improvements
07:55–13:43 — Unpacking the terms "developer productivity," "developer experience," and "DevOps": How these related but distinct concepts work together, with productivity as output measurement, developer experience as user experience for developers, and DevOps as enabling capabilities
13:43–22:33 — The DORA framework and benchmarks for success: Four key metrics that revealed speed and stability move together, elite performance benchmarks (deploy on demand, <1 day lead time, <1 hour recovery, <15% failure rate), and why smaller batch sizes create stability
22:33–29:23 — Why company size doesn't matter and working backward from capabilities: No statistical difference between small and large company performance, with retail as the only industry outlier performing better due to survival pressure during digital transformation
29:23–41:29 — The SPACE framework, choosing metrics, and measuring satisfaction: Five-dimension framework for balanced measurement (Satisfaction, Performance, Activity, Communication, Efficiency), why teams need at least three dimensions, and combining system data with developer surveys
41:29–47:42 — Common pitfalls and current book project: 80% of initiatives fail due to unclear problem definition, importance of both top-down and bottom-up buy-in, and Nicole's upcoming book on practical measurement implementation
47:42–54:04 — How the DevOps space has progressed and AI's impact: Evolution from having to prove DevOps value to widespread acceptance, AI shifting work from writing to reviewing code, and new measurement challenges around trust and tool effectiveness
54:04–57:32 — First steps and communication importance: Starting with clear problem definition and existing data, Google as implementation example, and making work accessible to key audiences through clear communication
57:32–68:56 — Nicole's Four-Box framework and decision-making advice: Visual framework for hypothesis testing with words and data boxes, decision-making spreadsheets with weighted criteria, and the importance of knowing what not to do

The Foundation: Understanding Developer Productivity Terminology

Most organizations struggle with basic definitions before diving into measurement. Developer productivity focuses on how much teams accomplish over time, requiring holistic measurement because software development is fundamentally collaborative work. The sustainability aspect becomes crucial - true productivity improvements should reduce burnout rather than increase it.

Developer experience represents the user experience for developers themselves. This encompasses friction-free processes, predictable workflows, and reduced uncertainty in daily development tasks. When developers face constant friction in their tools and processes, productivity suffers regardless of individual talent levels.

DevOps serves as the bridge between these concepts, providing the capabilities, tools, and cultural practices that enable faster and more reliable software delivery. However, many organizations mistakenly treat DevOps as a product category rather than a set of organizational capabilities that must be developed over time.

The measurement challenge requires balancing multiple perspectives - system metrics alone miss critical insights that only developers can provide about their daily experience and challenges
Cultural transformation accompanies technical improvements - the most successful productivity initiatives address both tooling friction and team dynamics simultaneously
Holistic approaches prevent optimization traps - focusing solely on speed metrics without considering quality and developer well-being creates unsustainable performance gains
Terminology alignment prevents wasted effort - teams often spend months working toward different goals because they never clarified whether they're addressing culture, tooling friction, or process efficiency
Business case development becomes easier - when organizations understand these distinctions, they can better communicate value propositions to leadership and secure necessary resources
Cross-functional collaboration improves - product managers, engineering leaders, and developers work more effectively when they share common vocabulary around productivity concepts

The DORA Framework: Four Metrics That Changed Everything

The DORA research program discovered something revolutionary about software delivery: speed and stability move together with strong statistical significance. This finding challenged decades of conventional wisdom about needing to choose between moving fast and maintaining quality.

The four key metrics split into two categories. Speed metrics include lead time (commit to production deployment) and deployment frequency (how often code ships). Stability metrics encompass mean time to recovery (incident resolution speed) and change failure rate (percentage of deployments requiring human intervention).

Elite performers demonstrate remarkable benchmarks: deploying on demand, achieving lead times under one day, recovering from incidents in less than an hour, and maintaining change failure rates between zero and fifteen percent. These numbers might seem aggressive, but they represent achievable targets for well-functioning development organizations.

Smaller batch sizes create stability gains - frequent deployments with minimal changes reduce the blast radius when problems occur, making debugging and recovery dramatically faster
Statistical significance validates the approach - the correlation between speed and stability metrics holds across thousands of organizations and multiple years of data collection
Benchmark categories provide directional guidance - while precise timing matters less than consistent improvement, knowing industry performance levels helps teams set realistic goals
Lead time measurement focuses on deployment pipeline effectiveness - the metric captures how quickly teams receive feedback on their changes, not just raw deployment speed
Recovery time indicates system resilience - elite teams design for failure and practice incident response, making quick recovery a competitive advantage rather than lucky accident
Change failure rates reflect development practices - teams with good testing, code review, and deployment automation naturally achieve lower failure rates without sacrificing delivery speed

Why Moving Faster Actually Improves Quality

The counterintuitive relationship between speed and stability challenges traditional change management approaches. Organizations historically implemented lengthy approval processes believing they prevented problems, but research reveals these create larger, more dangerous deployments.

When teams deploy less frequently, they accumulate larger batches of changes. These big releases create massive blast radii when problems occur. Developers struggle to identify which specific change caused issues among hundreds of modifications, extending recovery times significantly.

Conversely, frequent deployments with small changes make problems easier to isolate and fix. If something breaks after a deployment containing three small changes, teams quickly identify and resolve the issue. The mental context remains fresh for developers, eliminating the need to re-familiarize themselves with months-old code.

Merge conflicts decrease with frequent integration - teams avoiding the pain of large merges by integrating changes continuously, reducing the complexity of combining different developers' work
Debugging becomes surgical rather than exploratory - small changes make root cause analysis straightforward, turning incident response from detective work into systematic problem-solving
Developer context switching reduces - when problems surface quickly, developers still maintain mental models of their recent changes rather than needing to reload entire workspaces
Risk perception shifts from deployment to development - teams focus on writing better code rather than avoiding deployments, improving overall engineering practices
Feedback loops accelerate learning - rapid deployment cycles provide faster validation of assumptions, helping teams course-correct before investing too much effort in wrong directions
Production confidence increases - regular, successful deployments build team confidence in their systems and processes, reducing the fear that often drives risk-averse behaviors

Company Size Doesn't Determine Performance

One of the most surprising research findings reveals no statistical difference in performance capabilities between small and large organizations. Both startup teams and enterprise engineering groups achieve elite performance levels at similar rates, challenging common assumptions about organizational constraints.

Large companies typically claim complexity disadvantages - more dependencies, legacy systems, and regulatory requirements. Small companies counter that they lack resources, funding, and specialized expertise. The research shows both groups can overcome their perceived limitations through focused capability development.

Retail organizations showed the only statistically significant difference, actually performing better than other industries. This likely reflects survival pressure during the retail apocalypse - companies that couldn't achieve elite performance simply didn't survive the transition to digital commerce and cloud-based scaling requirements.

Excuse-making patterns appear universally - organizations consistently blame their unique constraints rather than focusing on developing fundamental capabilities that predict success
Resource allocation matters more than resource quantity - small teams with focused improvement efforts often outperform large teams spreading attention across too many initiatives simultaneously
Survival pressure drives performance - industries facing existential threats tend to develop better practices faster than those in comfortable market positions
Legacy system challenges are overcomable - large organizations with significant technical debt can still achieve elite performance through strategic modernization and architectural improvements
Startup advantages are temporary - small companies must build sustainable practices early rather than relying on informal processes that break at scale
Performance distribution stays consistent - across all company sizes, roughly the same percentage achieves elite, high, medium, and low performance levels

The SPACE Framework: Choosing Balanced Metrics

While DORA provides specific metrics for software delivery performance, many teams need guidance selecting appropriate measurements for other productivity improvement areas. The SPACE framework addresses this gap by providing five dimensions for metric selection rather than prescriptive metrics.

Satisfaction and well-being capture developer sentiment through surveys and interviews. Performance measures process outcomes like reliability and efficiency. Activity counts discrete actions such as pull requests or commits. Communication encompasses collaboration patterns and system documentation quality. Efficiency tracks flow through systems and processes.

Teams should select at least three dimensions simultaneously to maintain balance. Activity metrics alone - like lines of code or commit frequency - create perverse incentives. Combining activity with satisfaction and efficiency provides a more complete picture that encourages sustainable improvement rather than gaming behaviors.

Three-dimension minimum prevents tunnel vision - measuring across multiple categories forces teams to consider trade-offs and unintended consequences of optimization efforts
Balance creates sustainable improvements - teams avoiding the boom-bust cycles that come from optimizing single metrics at the expense of overall system health
Context determines specific metric choices - SPACE provides thinking framework rather than prescriptive measurements, allowing teams to select metrics appropriate for their situation and available data
Qualitative data complements quantitative measurements - developer surveys reveal insights about system usability and process friction that automated metrics cannot capture
Implementation flexibility accommodates organizational constraints - teams can start with easily available metrics and gradually add more sophisticated measurements as their capability develops
Gaming resistance improves with metric diversity - developers find it much harder to manipulate measurements across multiple dimensions compared to single-metric systems

Measuring What Matters: From Systems and Surveys

Effective productivity measurement combines data from systems with insights from people. While automated metrics scale easily and provide objective measurements, they miss crucial context about developer experience and system usability that only humans can provide.

System metrics excel at capturing lead times, deployment frequencies, and error rates. These measurements run continuously without manual intervention and provide historical trends for analysis. However, they cannot reveal whether teams achieve good numbers through sustainable practices or unsustainable heroics.

Developer surveys and interviews fill critical gaps. They reveal when systems appear functional but require extensive workarounds, when code exists outside version control, or when teams fear deploying despite having technical capability. The most advanced organizations with comprehensive instrumentation still survey developers regularly because human insights remain irreplaceable.

Survey frequency should match decision cycles - quarterly surveys provide sufficient insight for strategic adjustments without creating survey fatigue among development teams
Incentive alignment improves data quality - developers rarely have reasons to lie about system problems they want fixed, making their feedback naturally reliable
System blind spots require human detection - automated metrics cannot identify shadow work, unofficial processes, or workarounds that significantly impact productivity
Heroics versus sustainability distinction needs human insight - good metrics achieved through unsustainable effort predict future problems that only developer feedback can reveal early
Historical context enriches current measurements - experienced developers provide valuable perspective on whether current metrics represent temporary fluctuations or meaningful trends
Cross-validation strengthens measurement confidence - when system metrics and developer feedback align, teams can confidently proceed with improvement initiatives

Implementation: Avoiding Common Pitfalls

Eighty percent of productivity improvement initiatives fail because teams never clearly defined their goals. Organizations frequently launch efforts to "improve developer experience" without specifying whether they're addressing tool friction, cultural issues, or process inefficiencies - three completely different problems requiring distinct solutions.

Successful implementations require both top-down executive support and bottom-up developer engagement. Executives must understand business value and prioritize improvements appropriately. Developers must trust that measurement serves improvement rather than performance evaluation. Without both perspectives aligned, initiatives either lack resources or face active resistance.

Communication becomes critical throughout implementation. Teams must translate technical improvements into business language for leadership while ensuring developers understand how measurements connect to their daily pain points. This dual communication requirement often determines whether initiatives gain sustainable momentum.

Problem definition prevents scope creep - teams spending weeks clarifying goals avoid months of misdirected effort working on the wrong challenges
Executive buy-in ensures resource allocation - productivity improvements require sustained investment in tools, training, and process changes that need leadership commitment
Developer trust enables honest feedback - if teams suspect measurements will be used for individual performance evaluation, they provide less useful data for improvement efforts
Cultural and technical changes proceed simultaneously - successful initiatives address both human and system factors rather than assuming technical fixes alone solve productivity problems
Value communication requires business translation - engineering leaders must articulate productivity improvements in terms of customer value, competitive advantage, and revenue impact
Measurement journey planning prevents perfectionism - teams starting with available data and gradually improving measurement sophistication avoid analysis paralysis

The AI Revolution: Changing How We Work and Measure

Artificial intelligence tools are fundamentally shifting developer work patterns from writing code to reviewing code. Research shows developers now spend approximately fifty percent of their time reviewing AI-generated code rather than writing it from scratch, creating new productivity measurement challenges.

Traditional metrics assume humans write most code, but AI assistance changes the cognitive model. Instead of measuring typing speed or lines produced, teams need metrics that capture code review quality, AI tool effectiveness, and the higher-level problem-solving that humans provide when AI handles routine implementation.

Trust and reliability emerge as new measurement dimensions. Teams must understand when to rely on AI suggestions versus when human judgment becomes essential. Over-reliance on AI tools without proper review creates new categories of technical debt and security vulnerabilities.

Cognitive load shifts from creation to evaluation - developers need different skills and measurement approaches when their primary task becomes reviewing rather than writing code
Productivity definitions require updating - traditional metrics like commit frequency become less meaningful when AI can generate large amounts of code quickly
Learning patterns change for novice developers - teams must consider how AI assistance affects skill development and knowledge transfer for junior engineers
Review quality becomes critical - as AI generates more code, human review skills become more important for maintaining system quality and security
Tool effectiveness varies by context - measurement systems need to capture when AI assistance helps versus when it introduces friction or incorrect solutions
Expertise remains essential for complex problems - while AI handles routine tasks effectively, human creativity and judgment become more valuable for architectural decisions and novel problem-solving

Getting Started: First Steps for Any Team

Teams beginning productivity improvement efforts should start with problem definition rather than metric selection. Spend one week clarifying what specific challenges need addressing - whether tool friction, process inefficiency, or cultural dysfunction - before choosing measurement approaches.

Look for existing data related to identified problems. This might include deployment logs, incident reports, or informal developer feedback. Starting with available information provides quick wins while building momentum for more sophisticated measurement systems later.

The quick check tool at dora.dev provides teams with immediate benchmarking and identifies likely constraint areas based on industry patterns. This assessment takes minutes but provides months of improvement direction for most teams.

Problem clarity prevents measurement confusion - teams knowing what they want to improve can select appropriate metrics rather than measuring everything hoping to find insights
Existing data provides immediate value - most organizations have more useful productivity data available than they realize, requiring analysis rather than new collection systems
Quick wins build improvement momentum - early successes with simple measurements help teams gain confidence and resources for more ambitious productivity initiatives
Industry benchmarking provides realistic goals - understanding where peer organizations perform helps teams set achievable targets rather than unrealistic expectations
Systematic approaches beat ad hoc efforts - following established frameworks like DORA and SPACE provides structure that prevents teams from reinventing measurement approaches
One-week timeframe enables rapid progress - productivity assessment and initial improvement planning can happen quickly with focused effort rather than extended analysis phases

These frameworks provide proven approaches for measuring and improving developer productivity without requiring massive upfront investments. Teams starting small and building measurement capability gradually achieve better results than those attempting comprehensive systems immediately.

The combination of clear goals, balanced metrics, and both human and system perspectives creates sustainable productivity improvements that benefit developers, organizations, and ultimately customers through faster delivery of valuable software features.

How to Measure and Improve Developer Productivity: Expert Insights from Nicole Forsgren

Table of Contents

Key Takeaways

Timeline Overview

The Foundation: Understanding Developer Productivity Terminology

The DORA Framework: Four Metrics That Changed Everything

Why Moving Faster Actually Improves Quality

Company Size Doesn't Determine Performance

The SPACE Framework: Choosing Balanced Metrics

Measuring What Matters: From Systems and Surveys

Implementation: Avoiding Common Pitfalls

The AI Revolution: Changing How We Work and Measure

Getting Started: First Steps for Any Team

Latest

The Best Car Tech at CES 2026

Atlas Has Left the Lab! In-Person Demo of Boston Dynamics Humanoid | What The Future

When You Stop Making Excuses, You Become Free - Jean-Paul Sartre

Meta to Become the Biggest Nuclear Buyer Among Hyperscalers | Bloomberg Tech 1/9/2026

How to Measure and Improve Developer Productivity: Expert Insights from Nicole Forsgren

Table of Contents

Key Takeaways

Timeline Overview

The Foundation: Understanding Developer Productivity Terminology

The DORA Framework: Four Metrics That Changed Everything

Why Moving Faster Actually Improves Quality

Company Size Doesn't Determine Performance

The SPACE Framework: Choosing Balanced Metrics

Measuring What Matters: From Systems and Surveys

Implementation: Avoiding Common Pitfalls

The AI Revolution: Changing How We Work and Measure

Getting Started: First Steps for Any Team

Related

Latest