Table of Contents
The rapid rise of Large Language Models (LLMs) has sparked a fierce debate about the nature of intelligence. Are these systems on a direct, linear path to Artificial General Intelligence (AGI) simply by scaling up compute and data, or are they hitting a fundamental ceiling? Vishal Misra, a Professor of Computer Science at Columbia University, argues that while current models are extraordinary, they are operating within a specific mathematical framework—Bayesian inference—that limits their potential to reach human-level cognition.
Key Takeaways
- LLMs function as sophisticated Bayesian inference engines, updating their "beliefs" about next-token probabilities based on the context provided in a prompt.
- Scaling, while powerful, is not a panacea; current models are trapped in a cycle of correlation and cannot inherently perform the causal reasoning required for AGI.
- True AGI requires two breakthroughs: plasticity (the ability to learn continually without forgetting) and the transition from correlation to causation.
- Human intelligence relies on mental simulations to navigate the world—a process closer to Kolmogorov complexity—whereas current LLMs excel at Shannon entropy, which focuses on statistical patterns rather than underlying causal truth.
The Mechanics of LLMs: A Bayesian Perspective
To understand the limitations of LLMs, one must first understand what they are actually doing at the architectural level. Misra posits that an LLM can be viewed as a gargantuan matrix. Every row represents a unique prompt, and every column represents a probability distribution over the vocabulary of tokens.
When you provide a prompt like "protein," the model draws from its trained data to assign probabilities to the next possible words, such as "synthesis" or "shake." As you add more context, the model performs a Bayesian update, narrowing the distribution of possible outcomes. While critics initially pushed back against the idea that deep learning models are "Bayesian," Misra’s research—using "Bayesian wind tunnels"—demonstrated that these models perform Bayesian inference with incredible mathematical precision.
The Statistical Ceiling: Shannon Entropy vs. Kolmogorov Complexity
The core of the disconnect between current LLM capabilities and AGI lies in the distinction between statistical correlation and causal reality. Misra highlights the contrast between Shannon entropy and Kolmogorov complexity to illustrate this.
Shannon entropy is concerned with the predictability of data—the ability to correlate inputs to likely outputs. This is where LLMs shine; they are arguably the best tools ever created for capturing statistical associations. However, AGI demands an understanding of the world's structure, which is more akin to Kolmogorov complexity: finding the shortest, most efficient program or rule that describes a phenomenon.
"Deep learning is beautiful. It is extremely powerful. It does association. The second is intervention in the hierarchy. Deep learning models do not do that."
Einstein’s theory of relativity serves as the ultimate benchmark. Einstein didn’t just look at more data points; he identified a new representation of space-time that rendered Newtonian mechanics a special case of a larger truth. An LLM trained only on pre-1916 physics would struggle to reach this conclusion because it is tethered to the "data gravity" of existing, incorrect correlations.
Why Scaling Isn't Enough for AGI
There is a prevailing industry sentiment that simply adding more tokens and compute will eventually result in consciousness or true reasoning. Misra disputes this, pointing to the structural differences between silicon-based models and human cognition.
Humans possess plasticity; our brains evolve and retain learning throughout our lives. Conversely, LLMs are frozen after training. Even when an LLM performs "in-context learning," it is merely using the current conversation as a temporary scratchpad. Once the chat is closed, the "knowledge" gained evaporates. To reach AGI, models need a mechanism for continual learning that avoids "catastrophic forgetting"—a significant open challenge in AI research.
Moving Toward Causality
If scaling is not the final answer, where should research focus? Misra suggests that the path to AGI lies in shifting from association to causation. This involves moving toward architectures capable of intervention and counterfactual simulation.
"To get to what is called AGI, I think there are two things that need to happen. One is this plasticity... Secondly, we have to move from correlation to causation."
This shift requires adopting frameworks like Judea Pearl’s causal hierarchy, which moves beyond simple prediction into the realm of "what if" scenarios. By enabling models to build internal causal models of the world, rather than just optimizing for the next token, researchers may finally break the current ceiling of intelligence.
Ultimately, while current LLMs represent a triumph of engineering and statistical modeling, they are not yet thinking machines. They are highly optimized grains of silicon performing matrix multiplications with remarkable elegance. Recognizing the difference between statistical mastery and genuine causal reasoning is the first step toward building systems that don't just predict the world, but truly understand it.