Skip to content

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490

Sebastian Raschka and Nathan Lambert join Lex Fridman to dissect the "DeepSeek moment" and the shift from brute-force scaling to reasoning efficiency. Inside the technical realities of RLVR, open weights, and the evolving path toward AGI in 2026.

Table of Contents

The artificial intelligence landscape has shifted dramatically in the last year, moving from a brute-force race for larger parameters to a nuanced battle over reasoning capabilities, inference efficiency, and open-weight accessibility. The "DeepSeek moment" of early 2025—where a Chinese lab released state-of-the-art models at a fraction of the training cost—served as a wake-up call, accelerating competition across Silicon Valley and beyond.

In a wide-ranging conversation, machine learning researchers Sebastian Raschka and Nathan Lambert joined Lex Fridman to dissect the technical realities behind the hype. From the mechanics of Reinforcement Learning with Verifiable Rewards (RLVR) to the shifting definitions of AGI, the consensus is clear: while the fundamental Transformer architecture remains dominant, the methods we use to train and interact with these models are undergoing a radical transformation.

Key Takeaways

  • The "DeepSeek Moment" redefined open models: High-performance Chinese models like DeepSeek-R1 have challenged Western dominance by offering state-of-the-art capabilities as open weights, forcing a shift in how US labs approach proprietary moats.
  • Scaling laws have evolved, not stalled: While pre-training scaling is becoming prohibitively expensive, the industry has pivoted to "inference-time scaling"—allowing models to "think" longer to improve accuracy—and post-training optimization.
  • RLVR is the new RLHF: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical technique for math and coding, allowing models to self-correct and learn through "aha moments" rather than just human preference data.
  • Coding is becoming architectural: As AI tools like Claude Code and Cursor advance, the role of a developer is shifting from writing syntax to system design and managing agentic workflows.
  • Transformers are still king: Despite experiments with text diffusion and state-space models (like Mamba), the autoregressive Transformer remains the undisputed standard for state-of-the-art performance.

The Geopolitical Landscape: Open Weights vs. Closed Gardens

The narrative of 2025 and 2026 is defined by a tension between open-weight models, largely driven by Chinese labs, and the proprietary API-based services of US giants. The release of DeepSeek-R1 was a watershed moment, proving that efficient training techniques could produce frontier-level intelligence without the massive compute budgets previously thought necessary.

However, declaring a "winner" is complex. While Chinese labs like DeepSeek, Qwen, and MiniMax are winning the hearts of developers through open access, US labs like OpenAI, Anthropic, and Google retain structural advantages in infrastructure and productization.

To demarcate the point in time when we're recording this, the hype over Anthropic's Claude 3.5 Opus model has been absolutely insane. It's almost gotten to the point where it feels like a bit of a meme in terms of the hype.

The ecosystem is currently bifurcated by business models. US companies have successfully convinced enterprise and consumer markets to pay for intelligence via subscriptions (ChatGPT, Gemini). In contrast, Chinese firms, facing export controls and a domestic market less accustomed to paying for software, have leveraged open weights to build global influence and standard-setting power. This has created a vibrant "open" ecosystem where developers can run near-frontier models locally, a capability that ensures knowledge and technology remain fluid rather than locked within a single corporate vault.

The Evolution of Scaling Laws: From Pre-Training to Inference

A persistent debate in the AI community is whether scaling laws—the observation that more data and compute yield smarter models—are hitting a plateau. The consensus among researchers is that scaling is far from dead; it has simply changed venues. The "low-hanging fruit" of pre-training scaling has largely been harvested, leading to diminishing returns on raw parameter expansion. The new frontier lies in post-training and inference scaling.

The Rise of Reasoning Models

The introduction of "reasoning" models, exemplified by OpenAI’s o1 and DeepSeek-R1, marks a shift from fast, intuitive responses (System 1) to slower, deliberate problem-solving (System 2). This is achieved through inference-time scaling, where a model generates "hidden thoughts" or chain-of-thought tokens before producing a final answer. By allocating compute to "thinking time," models can verify their own logic, backtrack errors, and produce significantly more accurate results in domains like mathematics and coding.

RLVR vs. RLHF

This shift is powered by a change in training methodology. Historically, models were fine-tuned using Reinforcement Learning from Human Feedback (RLHF), where humans ranked model responses based on preference. While effective for style and tone, RLHF has limitations in objective accuracy.

The industry is now embracing Reinforcement Learning with Verifiable Rewards (RLVR). In domains where a correct answer can be programmatically verified (such as code compiling or a math problem resulting in the correct number), models can be trained to optimize for the correct outcome regardless of the path taken. This allows the model to "learn" reasoning strategies that humans might not even explicitly demonstrate.

To do the best RLHF you might not need the extra 10 or 100x of compute, but to do the best RLVR you do... The best runs you can let run for an extra 10x and you get a few x performance.

Architecture: The Endurance of the Transformer

Despite the frenetic pace of AI development, the underlying architecture has remained surprisingly stable. We are essentially still refining the architecture introduced in the "Attention Is All You Need" paper and popularized by GPT-2. The advancements we see today—Mixture of Experts (MoE), Grouped Query Attention, and Latent Attention—are optimizations rather than fundamental rewrites.

While researchers are exploring alternatives, such as Text Diffusion models (which generate text in parallel rather than sequentially) and linear attention mechanisms, none have yet dethroned the autoregressive Transformer for general-purpose intelligence. The dominance of the Transformer suggests that the current path to AGI may rely less on discovering a new "brain" architecture and more on optimizing how data flows through the systems we already have.

The Data Quality Imperative

With architecture stabilizing, the differentiator becomes data. The era of scraping the raw web is evolving into an era of synthetic data generation and heavy curation. Models are increasingly trained on data generated by other models (and verified by humans), or on high-quality proprietary datasets. This "mid-training" phase, where models are specialized on complex reasoning traces or scientific papers before fine-tuning, is where much of the "intelligence" is currently being baked in.

Education and the "Build from Scratch" Philosophy

As AI begins to automate significant portions of coding—with senior developers shipping substantial amounts of AI-generated code—the question arises: how do we learn? If an AI can write a Python script or debug a React app, does a human need to understand the underlying syntax?

Sebastian Raschka argues that building models from scratch remains the single most effective way to demystify AI. While high-level libraries like Hugging Face Transformers are excellent for production, they are too complex for learning. By coding a small GPT-style model from the ground up, implementing attention mechanisms and backpropagation manually, learners gain an intuition that cannot be prompted.

I truly believe in the machine learning/computer science world, the best way to learn and understand something is to build it yourself from scratch... The code doesn't lie.

There is a "Goldilocks zone" for using AI in education. Using it to bypass struggle entirely leads to a hollow understanding. However, using it as a sophisticated tutor—one that can unblock you after you've wrestled with a problem—accelerates mastery. The danger lies in losing the "joy of debugging," that specific dopamine hit that comes from solving a hard problem, which is essential for deep learning.

Conclusion: The Jagged Path to the Future

We are not heading toward a uniform Artificial General Intelligence (AGI) that instantly solves all problems. Instead, we are seeing a "jagged" capability curve. Models are becoming superhuman coders and math solvers while potentially remaining mediocre at tasks requiring distributed system design or nuanced social judgment.

The state of AI in 2026 is defined by the industrialization of reasoning. The transition from "chatbot" to "agent"—software that can actively use tools, execute code, and browse the web—is underway. For developers and researchers, the agency lies not in passively consuming these tools, but in understanding their mechanics. Whether through building small models from scratch or contributing to the evaluation of frontier models, the most valuable skill in this era is the ability to look past the hype and understand the machinery driving the intelligence.

Latest

The 2026 Immortality Protocol - Bryan Johnson (4K)

The 2026 Immortality Protocol - Bryan Johnson (4K)

Bryan Johnson moves beyond the headlines to reveal the core of his "Don't Die" philosophy. Discover the 2026 Immortality Protocol, why sleep is the ultimate status symbol, and how to escape society's "game of death" through data-driven longevity habits.

Members Public
Fellows On Fellows: Answering the Call to Lead | AGLN

Fellows On Fellows: Answering the Call to Lead | AGLN

The journey from professional success to social significance requires conviction. Fellows Samir Valia and Govindraj Ethiraj discuss their leap into social entrepreneurship, addressing root causes like child marriage and misinformation through unconventional pathways.

Members Public
Epstein Files: How New Documents Expose a Wider Network | Pivot

Epstein Files: How New Documents Expose a Wider Network | Pivot

The DOJ's release of 3.5 million pages of Epstein files exposes the opaque networks linking Silicon Valley, Wall Street, and Washington. This massive data dump highlights a systemic failure of elite judgment while the tech world faces its own existential shifts.

Members Public