Skip to content

Millions of books died so Claude could live | The Vergecast

Did Anthropic shred millions of books to train Claude? Court docs expose "Project Panama," an industrial effort to scan physical texts for data. The Vergecast explores this legal twist, plus Netflix's potential theatrical shift and the state of smart home tech.

Table of Contents

New court documents have shed light on "Project Panama," an internal initiative by AI startup Anthropic to destructively scan physical books on a massive scale for model training, distinguishing its data acquisition strategy from competitors who relied largely on digital piracy. The revelation, detailed during a recent discussion on The Vergecast, underscores the complex legal battles facing the artificial intelligence sector, even as the media industry grapples with Netflix’s potential pivot toward theatrical releases and the smart home market faces ongoing interoperability hurdles.

Key Points

  • Project Panama: Anthropic utilized industrial "hydraulic powered cutting machines" to slice spines off physical books for rapid digitization, aiming to ingest high-quality data that wasn't available on the open internet.
  • Legal Distinctions: Recent court rulings suggest that while training AI on copyrighted books may constitute "fair use," the method of acquiring that data (e.g., piracy vs. physical purchase) remains a major legal vulnerability for tech companies.
  • Netflix's Theatrical Pivot: Amid rumors of a Warner Bros. Discovery acquisition, experts suggest Netflix may need to embrace theatrical releases to combat slowing engagement growth and "eventize" its content.
  • Smart Home Fragmentation: Despite the promise of the Matter standard, Google Home’s lack of support for basic button controls is hampering the rollout of affordable devices from major retailers like IKEA.

Inside Project Panama: The Race for Data

As the race to build general superintelligence accelerates, AI companies have exhausted easily accessible internet data and are turning to more traditional—and legally fraught—sources. According to reporting by the Washington Post discussed on the program, Anthropic launched "Project Panama" in late 2023. The project’s goal was to close the gap with rivals like OpenAI by ingesting the contents of physical books, which are viewed as higher-quality data sources than general web scrapes.

While competitors reportedly utilized "shadow libraries"—vast, pirated repositories of digital books like LibGen—Anthropic attempted a hybrid approach. After initially utilizing pirated datasets, the company hired a former Google Books executive and utilized massive used book warehouses to acquire physical copies legally. These books were then processed through industrial cutters to remove bindings before being fed into high-speed scanners.

"The goal was to quote 'destructively scan all the books in the world.' It sounds like something a Bond villain would set out to do... It’s more efficient to slice the spines off and scan them. So you've just got a stack of pages."

The distinction between the data source and the training process has become central to copyright litigation. Judges in recent cases against Anthropic and Meta have signaled that the act of training a model on books is likely "transformative" and therefore fair use. However, the legal liability has shifted to the acquisition method. Companies that utilized pirated shadow libraries face significant exposure regarding how they obtained the files, whereas Anthropic’s purchase of physical books was an attempt to mitigate this specific risk.

Netflix and the Future of the Box Office

Beyond the tech sector, the entertainment industry is facing a potential consolidation wave, with speculation mounting regarding Netflix buying Warner Bros. Discovery. This potential merger forces a re-evaluation of Netflix's long-standing refusal to prioritize theatrical releases.

Historically, Netflix avoided theaters to retain exclusivity for its streaming platform. However, data indicates that films with a theatrical window perform significantly better on streaming services due to increased marketing awareness and perceived quality. With Netflix’s engagement growth slowing—showing only a 2% year-over-year increase recently—the company may need to leverage theaters to build "fandom" and sustain subscriber interest.

"Netflix... engagement is slowing across the board on their platform... If you look at what people are kind of spending the most time watching, a lot of it is film and a lot of it is licensed film. And so it's those films that have theatrical releases."

Industry analysts argue that for theaters to survive the decline of the mid-budget "40 million dollar movie," exhibitors must pivot toward becoming communal hubs. This could involve "rowdy screenings," sing-alongs, or nostalgic re-releases that prioritize social experiences over passive viewing, effectively turning cinemas into venues akin to sports bars for film fans.

Smart Home Stagnation

In consumer technology, the promise of the "Matter" interoperability standard continues to clash with reality. IKEA has recently launched a line of affordable smart home controllers, but users are reporting significant functionality issues, particularly with Google Home.

Despite the Matter standard being designed to unify ecosystems, Google Home currently does not support the "generic switch" device type, rendering IKEA’s new smart buttons useless on the platform. Furthermore, the complexity of managing Thread networks—specifically the inability to easily merge networks from Amazon and other providers—has left consumers struggling with basic connectivity.

"In trying to make it simple, it's become almost impossible... This is the first mass market manufacturer to really go all in on a matter over thread product. So I think this is going to start kickstart a lot of solutions to some of these problems."

While the hardware is becoming more accessible and affordable, the software infrastructure provided by major tech giants remains a bottleneck for mass adoption.

What's Next

As legal battles over AI training data proceed toward trials and potential settlements, the industry expects clearer guidelines on the distinction between data use and data acquisition. Simultaneously, the entertainment sector awaits regulatory filings that could confirm whether Netflix will acquire a legacy studio, a move that would fundamentally alter the economics of movie theaters. In the smart home space, pressure is mounting on Google and Amazon to resolve Thread and Matter implementation issues before the next wave of mass-market devices launches in April.

Latest

Humans secretly prefer AI writing

Humans secretly prefer AI writing

AI is no longer just a Silicon Valley trend; it is the backbone of modern power. Discover how the 'five-layer cake' of AI infrastructure is redefining economic influence, national security, and the future of human agency in an automated world.

Members Public
The End of the HODL Era

The End of the HODL Era

A dormant Satoshi-era wallet just moved 9,500 BTC, sparking market-wide fear. Yet, the price held steady. Discover how institutional OTC desks are neutralizing massive supply shocks, marking a structural shift in the Bitcoin market.

Members Public
UPDATE: Ukraine ramps up drone attacks into Moscow

UPDATE: Ukraine ramps up drone attacks into Moscow

As Ukraine intensifies drone strikes on Moscow, we analyze the strategic, political, and psychological impacts. Discover why these attacks are shifting the narrative within Russia and how they influence the broader, evolving landscape of the ongoing conflict.

Members Public
Instagram Ends Encrypted Messaging - DTH

Instagram Ends Encrypted Messaging - DTH

Meta has announced that Instagram will discontinue end-to-end encrypted messaging on May 8, 2026. The shift follows pressure from safety advocates, with Meta now directing users to WhatsApp for encrypted communications.

Members Public