Table of Contents
New data from the Model Evaluation and Threat Research (METER) lab indicates that AI agent capabilities are accelerating at an unprecedented rate, with performance doubling approximately every 1.5 months. This rapid scaling, exemplified by the record-breaking benchmarks of Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.3, has forced a re-evaluation of AI "scaling walls" and sparked a viral discourse regarding the potential for a global economic crisis by 2028.
Key Points
- AI agent performance is now doubling every 1.5 months, a significant acceleration from the seven-month rate recorded in early 2024.
- Claude Opus 4.6 achieved a 14.5-hour "time horizon," tripling the capabilities of its predecessor and nearly saturating existing benchmarks.
- A new report from Catrini Research predicts a transition to a "capital-based society" that could lead to widespread labor displacement and market volatility.
- While some experts warn of a "global intelligence crisis," others argue the METER data is noisy and that market defensibility remains a barrier to total AI disruption.
The New Frontier of Agentic Scaling
The METER lab’s "Moore’s Law for AI Agents" chart has become a central fixture in economic discourse, tracking the complexity of tasks AI agents can reliably solve. The metric, known as the "time horizon," measures the difficulty of a software engineering task based on how long it takes a human professional to complete it. According to the latest update, Claude Opus 4.6 has reached a 14.5-hour time horizon at a 50% success rate, the largest generational jump in the study’s history.
This leap follows a period of skepticism where critics argued that AI development had hit a performance plateau. However, the data suggests that post-training improvements and new model architectures are pushing capabilities further than anticipated. Investor Nick Carter highlighted the significance of these findings, noting the speed of the shift.
"This is the most important chart in the world, and it's going absolutely ballistic."
Understanding the Benchmark
It is important to distinguish the "time horizon" from actual execution time. If an AI agent solves a problem in two minutes that would take a human two hours, the time horizon is recorded as two hours. The METER study focuses on the capability frontier—the limit of what these agents can achieve—rather than production-ready reliability. While a 50% success rate is insufficient for commercial deployment, the consistent upward trend serves as a leading indicator for future productivity gains.
Technical Caveats and Data Saturation
Despite the excitement, researchers at METER and within the broader AI community have issued warnings regarding the latest results. The 14.5-hour horizon for Opus 4.6 is so high that it has begun to saturate the existing task set. With few tasks in the benchmark exceeding the 14-hour mark, the confidence intervals for the data have widened significantly.
Researcher David Re cautioned against treating the current figures as absolute certainty, emphasizing the volatility of the measurements.
"Seems like a lot of people are taking this as gospel. When we say the measurement is extremely noisy, we really mean it. Concretely, if the task distribution we're using here was just a tiny bit different, we could have measured a time horizon of 8 hours or 20 hours."
The 2028 Global Intelligence Crisis
The technical acceleration has fueled a parallel narrative regarding the socio-economic consequences of abundant intelligence. A viral research note from Catrini Research, titled The 2028 Global Intelligence Crisis, posits that AI will soon move from sector-specific utility to a broad application of cheap machine intelligence. The firm argues that this shift will see economic activity transform from a household-based model to one dominated by capital owners, potentially leading to a collapse in traditional employment.
However, the "doomsday" thesis has met resistance from economists and market analysts. Dan Hockenmeier argues that building a functional app is only a small part of a successful business, noting that "liquid marketplaces" and operational reliability are more defensible than the underlying software. Similarly, economist Guy Burger questioned the internal consistency of the crisis narrative, asking why the wealth generated by AI owners would not eventually fuel new forms of GDP growth and employment.
As markets begin to reprice assets in response to these agentic breakthroughs, the focus shifts toward how traditional industries will integrate—or resist—these autonomous systems. The coming months will likely see a push for updated benchmarking methodologies as the industry outgrows current measurement standards, providing a clearer picture of whether this exponential growth can be sustained into the late 2020s.