podcast — AI — Technology — Business

How People Actually Use AI Agents

Anthropic’s new research analyzes real-world AI agent usage. While software engineering makes up 50% of tasks, workflows are shifting toward marketing and back-office automation. The study also finds a "trust paradox" where power users grant more autonomy but interrupt agents more often.

, and Jax

February 21, 2026 . 1:34 AM

3 min read

Anthropic has released a pivotal new study, "Measuring AI Agent Autonomy in Practice," offering a rare glimpse into how businesses and developers are deploying AI agents in real-world environments versus theoretical settings. The research, which analyzes usage patterns from the company’s Claude Code product and public API, reveals that while software engineering remains the dominant use case, over half of all agentic workflows have migrated to non-coding functions, signaling a major shift toward general-purpose automation.

Key Takeaways

Broadening Utility: While software engineering accounts for roughly 50% of tool calls, significant adoption is occurring in back-office automation (9.1%), marketing (4.4%), and finance (4.0%).
The Trust Paradox: Experienced "power users" grant agents twice as much autonomy (40% auto-approval) as novices, yet they interrupt the AI nearly twice as often to guide outcomes.
Capability Overhang: The median agent interaction lasts only 45 seconds, but the 99.9th percentile of workflows shows agents successfully managing tasks exceeding 45 minutes, suggesting current human usage lags behind model potential.
Interactive Complexity: As task complexity rises, agents are statistically more likely to pause and ask for clarification than humans are to interrupt for corrections.

Moving Beyond Theoretical Benchmarks

For the past year, the AI industry has largely relied on benchmarks like the "Meter" study to gauge agent efficacy. These metrics typically measure the duration of tasks an AI can complete at specific success thresholds—usually 50% or 80%. However, Anthropic’s new research argues that these idealized settings, devoid of human interaction, fail to capture the reality of enterprise deployment.

In a professional context, a 50% success rate is untenable. Consequently, Anthropic focused its methodology on "turn duration"—the time elapsed between an agent starting a task and stopping—within its Claude Code environment. This approach creates a distinction between what a model can theoretically achieve and how humans actually trust it to perform.

The study highlights a "capability overhang." While the median turn duration has remained consistent at around 45 seconds, the top 0.1% of complex tasks saw average durations jump from 25 minutes to 45 minutes between October 2023 and January 2024. This growth suggests that while the technology handles long-duration autonomy, human workflow adaptation is still catching up.

The Evolution of Human-Agent Collaboration

Perhaps the most significant finding in the report concerns the "accumulation of trust." Anthropic tracked how users interact with agents over time, revealing a distinct behavioral gap between novices and experienced users.

New users tend to utilize "full auto-approval"—allowing the AI to execute a chain of actions without checks—roughly 20% of the time. For experienced users, this figure doubles to 40%. However, this increased trust does not equate to passivity. The data shows that experienced users interrupt the AI 9% of the time, compared to just 5% for new users.

"The higher interrupt rate may also reflect active monitoring by users who have more honed instincts for when their intervention is needed."

This dynamic mirrors the relationship between a manager and a junior employee. As the manager gains confidence in the employee, they allow more autonomy but become more adept at spotting specific moments where intervention ensures a better outcome. The study suggests that "autonomy" in the enterprise is not about the absence of humans, but the refinement of human oversight.

Shifting Domains: The Rise of General-Purpose Agents

While Claude Code is nominally a developer tool, the usage data paints a picture of a "code-enabled general-purpose agent." Software engineering represents approximately half of all tool calls, but the remaining activity is diversifying rapidly.

Breakdown of Agent Deployment by Domain:

Software Engineering: ~50%
Back-Office Automation: 9.1%
Marketing & Copywriting: 4.4%
Sales & CRM: 4.3%
Finance & Accounting: 4.0%

This distribution indicates that non-engineers are increasingly leveraging agentic workflows to handle complex, multi-step processes. The implication is that code is becoming the medium through which general business logic is executed, rather than the end product itself.

Interaction Patterns and Future Autonomy

The study also analyzed why workflows pause. Humans primarily interrupt agents to provide missing context or corrections (32% of interruptions). Conversely, the agents themselves most frequently stop to present the user with a choice between different approaches (35% of self-stops).

This bidirectional feedback loop implies that the future of AI agents is not necessarily "set and forget" automation, but "competent autonomy." Users are looking for systems that respect "blast radius" boundaries—keeping databases and production environments safe—while skipping trivial confirmation prompts.

As the market moves toward models capable of 6-hour independent work cycles, as predicted by industry leaders at OpenAI and Anthropic, the focus will likely shift from raw model capability to the sophistication of the interactive layer. The next phase of development will require interfaces that facilitate long-duration autonomy while allowing humans to intervene efficiently when the strategic direction drifts.

Latest

podcast

How to bet on yourself (without venture capital)

Building a startup doesn't require VC backing. Discover why industry leaders are choosing to bootstrap and prioritize long-term stability over the growth-at-all-costs model. Learn the advantages of self-funding your business today.

, and Jax

March 17, 2026

Paid Members Public

podcast

Brookfield CEO Connor Teskey on How to Invest With Less Risk and Better Returns

Brookfield CEO Connor Teskey reveals how to achieve better returns with less risk. From prioritizing essential infrastructure to capitalizing on the AI boom, discover the core principles driving Brookfield’s long-term investment success.

, and Jax

March 17, 2026

Paid Members Public

podcast

Game Theory #13: Epstein's World

Is our geopolitical reality a structural hallucination? Explore the mechanics of global power, narrative control, and wealth extraction in Game Theory #13 as we pull back the curtain on the institutions sustaining the status quo.

, and Jax

March 17, 2026

Paid Members Public

podcast

Greetings, Earthlings: Philip Johnston of Starcloud on Data Centers in Space

Struggling with the energy and regulatory limits of Earth-based AI infrastructure? StarCloud CEO Philip Johnston explains why moving data centers to orbit is the next frontier for scalable, solar-powered computing.

, and Jax

March 17, 2026

Paid Members Public

How People Actually Use AI Agents

Table of Contents

Key Takeaways

Moving Beyond Theoretical Benchmarks

The Evolution of Human-Agent Collaboration

Shifting Domains: The Rise of General-Purpose Agents

Interaction Patterns and Future Autonomy

Related

Latest