Skip to content

Becoming evidence-guided | Itamar Gilad (Gmail, YouTube, Microsoft)

Google Plus failed due to opinion-based development, while Gmail's Tabbed Inbox succeeded through evidence. Former Google product leader Itamar Gilad explains how to stop relying on HiPPOs and start validating ideas with rigorous testing and evidence-gathering.

Table of Contents

When Google launched Google Plus, it was an all-hands-on-deck initiative. Fearing the meteoric rise of Facebook, Google created a dedicated division, reorganized teams, and poured millions of hours into building a social layer across all its products. It was a classic example of "opinion-based development"—a top-down conviction that the market needed this specific solution. Despite the massive investment and brilliant engineering, users simply didn’t want it. The project was eventually shut down.

Contrast this with the development of the "Tabbed Inbox" in Gmail (the feature that sorts email into Primary, Social, and Promotions). Initially, the team was skeptical. Early ideas were rejected. But instead of relying on a HiPPO (Highest Paid Person's Opinion), the team utilized a rigorous process of evidence gathering—starting with "Wizard of Oz" tests using fake HTML facades. The result was a feature used by billions today. Itamar Gilad, a former product leader at Google and author of Evidence Guided, argues that the difference between these two outcomes lies in a systematic approach to replacing guesswork with data.

Key Takeaways

  • Shift from "Plan and Execute" to "Evidence-Guided": Moving away from top-down mandates and rigid roadmaps allows teams to validate assumptions before committing expensive engineering resources.
  • Adopt the GIST Framework: Organize product development through Goals, Ideas, Steps, and Tasks to bridge the gap between high-level strategy and daily agile execution.
  • Use Metrics Trees over Simple KPIs: Instead of focusing solely on revenue, map out the "North Star" metric (value delivered) and the "Top Business KPI" (value captured) to see how granular inputs drive success.
  • The Confidence Meter: Stop guessing at the "Confidence" variable in ICE scoring. Use a standardized meter to objectively rate evidence strength, from gut feeling (low) to launch data (high).
  • Test Before You Build: Utilizing "Wizard of Oz" tests, smoke tests, and "fish fooding" (team testing) can invalidate bad ideas long before code is written.

The High Cost of Opinion-Based Development

In many organizations, product decisions are driven by the most persuasive executive or the most exciting PowerPoint deck. Gilad refers to this as "opinion-based development." This approach relies on the "Plan and Execute" model: a leader has an idea, a detailed roadmap is created, and the team spends months building it, only to discover at launch that the market is indifferent.

The Google Plus saga serves as the ultimate cautionary tale. The company had the resources and the talent, but they lacked the validation. Conversely, the Gmail Tabbed Inbox succeeded because the team assumed their initial ideas were wrong. They engaged in rapid discovery cycles, testing low-fidelity prototypes to gauge user interest before writing production code.

"Behind every terrible idea that was ever launched, someone thought it was great... If we had come with hard data and said, 'Listen, things are not actually panning out the way you guys are expecting,' the discussion would have been very different."

To replicate the success of the Gmail team, organizations need a structural change. They need a meta-framework that incorporates Lean Startup, Design Thinking, and Agile into a cohesive system. Gilad calls this GIST.

The GIST Framework

The GIST model breaks product planning and execution into four distinct layers, moving from strategic context to tactical action.

1. Goals

Goals define the end state. However, most companies confuse goals with planning, setting objectives like "Launch feature X by Q3." True goals must measure outcomes, not outputs. Gilad advocates for the "Value Exchange Loop," where you measure both the value delivered to the customer and the value captured by the business.

  • North Star Metric: Measures value created for the user (e.g., for WhatsApp, this is "messages sent").
  • Top Business KPI: Measures value captured (e.g., revenue or market share).

To make this actionable, teams should build a Metrics Tree. This visualization breaks down the North Star metric into its constituent parts—acquisition, activation, retention, etc.—allowing specific teams to own the variables they can actually influence.

2. Ideas

Ideas are hypothetical ways to achieve the goals. The challenge is that companies are often flooded with ideas from stakeholders, customers, and competitors. The default behavior is to pick the favorite idea and build it. A better approach is to use an objective prioritization framework like ICE (Impact, Confidence, Ease).

3. Steps

Steps are the bridge between ideas and execution. A "step" is not a development milestone; it is a learning milestone. Before building the full product, teams should define a series of validation steps—surveys, smoke tests, prototypes, or betas—that progressively reduce risk.

4. Tasks

Tasks are the familiar realm of Agile and Jira—the actual work of engineering and design. The GIST framework ensures that these tasks are not just focused on delivery, but are aligned with specific validation steps and broader goals.

Quantifying Certainty: The Confidence Meter

One of the biggest flaws in traditional prioritization is the "Confidence" variable in the ICE score. Product managers often guess their confidence level, assigning an arbitrary "8 out of 10" because they personally like the idea. To solve this, Gilad introduced the Confidence Meter.

The Confidence Meter transforms confidence from a subjective feeling into an objective score based on evidence type. It functions like a thermometer, ranging from zero (low confidence) to ten (high confidence).

  • Low Confidence (0.01 - 0.1): Opinions, themes, and strategy decks. Even if the CEO thinks it's a good idea, it is mathematically a low-confidence guess.
  • Medium-Low Confidence (0.5 - 1.0): Estimates, plans, and reviews by experts.
  • Medium Confidence (2.0 - 3.0): Anecdotal data, surveys, and competitive analysis. Knowing a competitor has a feature does not prove it will work for you.
  • Medium-High Confidence (4.0 - 5.0): Validation tests. This includes "fake door" tests or usability studies with prototypes.
  • High Confidence (6.0+): MVP launches, A/B tests, and full rollout data.
"Just by breaking the question into these three questions [Impact, Confidence, Ease], we usually have a slightly better discussion than just 'my idea is better than yours.' But then there's the third element which is confidence... I wanted to help people realize when they have strong evidence and when it's weak evidence."

By using this tool, teams can objectively say, "This is a great idea, but our confidence is currently a 0.5. We need to run a cheap test to get our confidence to a 3 before we invest engineering time."

Mastering "Steps" and Execution

The "Steps" layer of GIST is where the actual work of being evidence-guided happens. It requires a shift from "building to launch" to "building to learn." Gilad suggests a progression of testing methods that increase in fidelity and cost as confidence grows.

The Art of Faking It

You do not need to build software to test an idea. During the Gmail Tabs project, the team used a Wizard of Oz test. They showed users a facade of an inbox. Behind the scenes, a human manually sorted the emails. The users were amazed by the "algorithm," validating the demand without a single line of categorization code being written.

Other validation steps include:

  • Assessment: Business modeling and assumption mapping.
  • Fact Finding: Data analysis and user interviews.
  • Tests: Smoke tests and concierge tests.
  • Experiments: A/B tests and multivariate tests.
  • Release: Staged rollouts and "fish fooding" (internal testing within the immediate team, a precursor to dogfooding).

The GIST Board

To manage this process, teams can replace or augment their traditional roadmaps with a GIST Board. This board visualizes the work not as a timeline of features, but as a hierarchy of validation:

  1. Goals: The top 3-4 Key Results the team is targeting this quarter.
  2. Ideas: The ranked list of hypotheses the team is exploring to hit those goals.
  3. Steps: The immediate validation experiments (e.g., "Run usability study," "Build smoke test").

This structure engages engineers in the discovery process. Instead of mindlessly moving tickets across a Kanban board, the engineering team understands why they are building something and participates in the experiments that validate the work.

Conclusion

Becoming evidence-guided does not require an overnight transformation. For large organizations, the shift can start small. Teams can begin by clarifying their North Star metric or adopting the Confidence Meter to facilitate more objective roadmap discussions. The goal is to move away from the "build trap" where success is measured by shipping features, and toward a culture where success is measured by outcomes and validated learning.

As Gilad notes, even Steve Jobs—often cited as the ultimate "opinion-based" genius—was eventually swayed by evidence to launch the iPhone, a product he initially resisted. If the visionary founder of Apple could embrace evidence, your organization can too.

Latest

2 Hours with AI. Better Than 8 Hours with Teachers

2 Hours with AI. Better Than 8 Hours with Teachers

American education is stuck in an industrial "factory model." McKenzie Price’s Alpha School uses adaptive AI to fix it. With just two hours of academic work, students reach the 99th percentile, freeing up time to master life skills and high-agency projects.

Members Public
14 Habits for an Optimised Morning & Evening Routine - Arthur Brooks

14 Habits for an Optimised Morning & Evening Routine - Arthur Brooks

Harvard's Arthur Brooks argues happiness isn't luck—it's management. By understanding psychology as biology, we can master our emotions. Explore 14 habits to optimize your morning and evening routines, blending neurobiology with ancient wisdom for a life of purpose.

Members Public
NFA Live! Bitcoin in 2026

NFA Live! Bitcoin in 2026

It's January 2026. Institutional adoption is at an all-time high, yet prices remain stagnant. We explore the decoupling of news and price action, the "Coldplay Effect" on altcoins, and why investors are rethinking strategies amidst the "bear market blues."

Members Public
Unemployment Rate Drops to 4.4%

Unemployment Rate Drops to 4.4%

The unemployment rate has dropped to 4.4%, easing fears of a rapid economic downturn. However, a complex dynamic persists: hiring is slowing significantly while layoffs remain low. This divergence fuels market gains while crypto struggles in a restrictive environment.

Members Public