podcast — AI — Technology — Philosophy

Why Scale Will Not Solve AGI | Vishal Misra - The a16z Show

Is scaling the path to AGI? Columbia’s Vishal Misra explains why LLMs hit a mathematical ceiling, the limitations of Bayesian inference, and the critical breakthroughs—plasticity and causal reasoning—needed to reach true human-level cognition.

, and Jax

March 18, 2026 . 1:14 AM

3 min read

The rapid rise of Large Language Models (LLMs) has sparked a fierce debate about the nature of intelligence. Are these systems on a direct, linear path to Artificial General Intelligence (AGI) simply by scaling up compute and data, or are they hitting a fundamental ceiling? Vishal Misra, a Professor of Computer Science at Columbia University, argues that while current models are extraordinary, they are operating within a specific mathematical framework—Bayesian inference—that limits their potential to reach human-level cognition.

Key Takeaways

LLMs function as sophisticated Bayesian inference engines, updating their "beliefs" about next-token probabilities based on the context provided in a prompt.
Scaling, while powerful, is not a panacea; current models are trapped in a cycle of correlation and cannot inherently perform the causal reasoning required for AGI.
True AGI requires two breakthroughs: plasticity (the ability to learn continually without forgetting) and the transition from correlation to causation.
Human intelligence relies on mental simulations to navigate the world—a process closer to Kolmogorov complexity—whereas current LLMs excel at Shannon entropy, which focuses on statistical patterns rather than underlying causal truth.

The Mechanics of LLMs: A Bayesian Perspective

To understand the limitations of LLMs, one must first understand what they are actually doing at the architectural level. Misra posits that an LLM can be viewed as a gargantuan matrix. Every row represents a unique prompt, and every column represents a probability distribution over the vocabulary of tokens.

When you provide a prompt like "protein," the model draws from its trained data to assign probabilities to the next possible words, such as "synthesis" or "shake." As you add more context, the model performs a Bayesian update, narrowing the distribution of possible outcomes. While critics initially pushed back against the idea that deep learning models are "Bayesian," Misra’s research—using "Bayesian wind tunnels"—demonstrated that these models perform Bayesian inference with incredible mathematical precision.

The Statistical Ceiling: Shannon Entropy vs. Kolmogorov Complexity

The core of the disconnect between current LLM capabilities and AGI lies in the distinction between statistical correlation and causal reality. Misra highlights the contrast between Shannon entropy and Kolmogorov complexity to illustrate this.

Shannon entropy is concerned with the predictability of data—the ability to correlate inputs to likely outputs. This is where LLMs shine; they are arguably the best tools ever created for capturing statistical associations. However, AGI demands an understanding of the world's structure, which is more akin to Kolmogorov complexity: finding the shortest, most efficient program or rule that describes a phenomenon.

"Deep learning is beautiful. It is extremely powerful. It does association. The second is intervention in the hierarchy. Deep learning models do not do that."

Einstein’s theory of relativity serves as the ultimate benchmark. Einstein didn’t just look at more data points; he identified a new representation of space-time that rendered Newtonian mechanics a special case of a larger truth. An LLM trained only on pre-1916 physics would struggle to reach this conclusion because it is tethered to the "data gravity" of existing, incorrect correlations.

Why Scaling Isn't Enough for AGI

There is a prevailing industry sentiment that simply adding more tokens and compute will eventually result in consciousness or true reasoning. Misra disputes this, pointing to the structural differences between silicon-based models and human cognition.

Humans possess plasticity; our brains evolve and retain learning throughout our lives. Conversely, LLMs are frozen after training. Even when an LLM performs "in-context learning," it is merely using the current conversation as a temporary scratchpad. Once the chat is closed, the "knowledge" gained evaporates. To reach AGI, models need a mechanism for continual learning that avoids "catastrophic forgetting"—a significant open challenge in AI research.

Moving Toward Causality

If scaling is not the final answer, where should research focus? Misra suggests that the path to AGI lies in shifting from association to causation. This involves moving toward architectures capable of intervention and counterfactual simulation.

"To get to what is called AGI, I think there are two things that need to happen. One is this plasticity... Secondly, we have to move from correlation to causation."

This shift requires adopting frameworks like Judea Pearl’s causal hierarchy, which moves beyond simple prediction into the realm of "what if" scenarios. By enabling models to build internal causal models of the world, rather than just optimizing for the next token, researchers may finally break the current ceiling of intelligence.

Ultimately, while current LLMs represent a triumph of engineering and statistical modeling, they are not yet thinking machines. They are highly optimized grains of silicon performing matrix multiplications with remarkable elegance. Recognizing the difference between statistical mastery and genuine causal reasoning is the first step toward building systems that don't just predict the world, but truly understand it.

Latest

podcast

IRAN WAR, point of no return. Larijani, Hormuz Siege, China Blockade

The Iran conflict has hit a Zugzwang state. From the Hormuz choke point to U.S. munitions shortages and failed decapitation strategies, we analyze why the region is approaching a dangerous point of no return.

, and Jax

March 18, 2026

Paid Members Public

podcast

Nvidia Forecasts $1 Trillion in Revenue Through 2027 | Bloomberg Tech 3/17/2026

Nvidia CEO Jensen Huang projects a $1 trillion AI infrastructure buildout through 2027. Driven by the new Blackwell architecture, demand remains robust as hyperscalers and sovereign entities accelerate their shift toward generative AI and autonomous agents.

, and Jax

March 18, 2026

Paid Members Public

podcast

Gecko Robotics Inks $71 Million Deal With US Navy

Gecko Robotics has signed a $71 million contract with the US Navy. The partnership will deploy autonomous robotic crawlers and AI-driven data analysis to accelerate maintenance cycles and enhance the long-term structural health of the naval fleet.

, and Jax

March 18, 2026

Paid Members Public

podcast

Everybody Loves Patrick Beja - DTNS MAILBAG

Following Patrick Beja's guest spot on DTNS, Tom Merritt and the team dive into listener feedback. Explore discussions on civil discourse, the nature vs. nurture debate, and the future of community engagement in this recap of our latest mailbag episode.

, and Jax

March 18, 2026

Paid Members Public

Why Scale Will Not Solve AGI | Vishal Misra - The a16z Show

Table of Contents

Key Takeaways

The Mechanics of LLMs: A Bayesian Perspective

The Statistical Ceiling: Shannon Entropy vs. Kolmogorov Complexity

Why Scaling Isn't Enough for AGI

Moving Toward Causality

Related

Latest