How S3 Is Built: Engineering the Backbone of the Internet

AWS S3 is arguably the backbone of the modern internet. It is the world’s largest cloud storage service, yet the engineering reality behind it is often shrouded in mystery. How do you manage hundreds of exabytes of data while guaranteeing reliability at a scale where hardware failure is not an anomaly, but a constant state of existence?

In a recent discussion with Milan, the VP of Data and Analytics at AWS who has spent over a decade running S3, we gained rare access into the internal machinery of this massive system. From stacking hard drives to the International Space Station to using formal mathematical proofs to ensure code correctness, the engineering behind S3 offers profound lessons in distributed systems, consistency models, and the evolution of cloud infrastructure.

Key Takeaways

Unfathomable Scale: S3 holds over 500 trillion objects and processes over a quadrillion requests annually, managing tens of millions of hard drives across 120 availability zones.
The Shift to Strong Consistency: AWS successfully migrated S3 from eventual to strong consistency without increasing cost or latency, utilizing a new replicated journal data structure and cache coherency protocols.
Formal Methods for Correctness: At S3’s scale, standard testing is insufficient. The team uses automated reasoning—mathematical proofs incorporated into the deployment pipeline—to guarantee system behavior.
Handling Correlated Failures: Reliability is achieved by designing for "failure allowances" and mitigating correlated failures where entire racks or zones might go down simultaneously.
Evolution of Primitives: S3 is evolving beyond simple object storage into structured data handling with S3 Tables (for Apache Iceberg) and S3 Vectors for AI workloads.

The Physics of Storage at Hyper-Scale

To understand the engineering challenges of S3, one must first grasp the sheer physicality of the operation. It is easy to think of the cloud as ephemeral, but fundamentally, it is comprised of disks spinning in servers, racked in rows, housed in massive physical buildings.

The numbers are staggering. S3 currently stores hundreds of exabytes of data. For context, a single exabyte is 1,000 petabytes. The system handles hundreds of millions of transactions per second worldwide.

If you imagine stacking all of our drives one on top of another, it would go all the way to the International Space Station and just about back.

This physical footprint spans tens of millions of hard drives across 38 regions. When operating at this volume, the "edge cases" of hardware failure become daily routine. The system is designed not to prevent failure, but to swallow it whole without the customer ever noticing a blip in durability.

From Eventual to Strong Consistency

When S3 launched in 2006, it was built primarily for e-commerce assets like product images and backups. For these use cases, eventual consistency was an acceptable trade-off for high availability. If a system wrote an object, it was acceptable if that object didn't appear in a list request immediately, provided the read eventually settled.

However, as customers like Netflix and Pinterest began building massive data lakes on S3, the requirements changed. Analytics workloads and big data applications require strong consistency—where a read immediately following a write is guaranteed to return the new data.

The Engineering Breakthrough

Transitioning a system of S3’s size from eventual to strong consistency is one of the decade's most significant, yet quietest, engineering feats. Typically, moving to strong consistency incurs a penalty in either latency or cost. AWS managed to do it with neither.

To achieve this, the engineering team invented a new distributed data structure: a replicated journal. When a write enters the system, it flows sequentially through nodes chained together. Each storage node learns the sequence number of the value along with the value itself. This was combined with a novel cache coherency protocol that included a "failure allowance," ensuring that even if some servers failed during the request, the system could maintain consistency without sacrificing availability.

Ensuring Correctness with Formal Methods

At a certain scale, you cannot simply test your way to correctness. The combinatorics of potential states in a distributed system spanning millions of nodes are too vast to cover with integration tests alone. To solve this, S3 leans heavily on automated reasoning.

If computer science and math got married and had kids, it would be automated reasoning.

Automated reasoning applies formal logic to prove the correctness of algorithms. AWS uses this to mathematically prove that their consistency models hold up under every possible condition. These proofs are not just academic exercises; they are integrated into the development lifecycle. When an engineer checks in code related to the index subsystem or consistency logic, formal methods verify that the change has not regressed the system's consistency guarantees.

11 Nines of Durability

S3 is famous for its promise of "11 nines" (99.999999999%) of durability. Validating this promise requires more than math—it requires active auditing. Under the hood, S3 runs over 200 microservices. A significant portion of these are "auditor systems" that constantly inspect every byte across the fleet. If a bit rot or drive failure is detected, repair systems automatically kick in to regenerate the data from redundant copies stored across different fault domains.

Managing Correlated Failures

In distributed systems, independent failures are manageable. The true enemy is correlated failure—when a single event causes a large segment of the infrastructure to fail simultaneously. This could be a power outage affecting a rack, a network partition affecting an Availability Zone (AZ), or a software bug affecting a specific version of a service.

S3 mitigates this by aggressively spreading data. Objects are not just replicated; they are stored across physically separate availability zones. This ensures that even if a total blackout occurs in one facility, the data remains accessible from another. The system is architected with "failure allowances," assuming that at any given second, a certain percentage of the fleet is in a failed state, yet the aggregate service remains healthy.

Evolving Primitives: Tables and Vectors

While S3 started as a simple key-value object store, the rise of modern data stacks and AI has forced it to evolve. The concept of the "Data Lake" is turning into a "Data Ocean," and customers need more intelligent ways to manage this depth.

S3 Tables

With the rise of open table formats like Apache Iceberg, customers began managing tabular data directly on S3. To support this, AWS launched S3 Tables. This primitive treats Iceberg tables as native S3 resources, handling maintenance tasks like compaction automatically. This moves the complexity of table management from the client side into the storage layer itself.

S3 Vectors for AI

The most recent evolution is the introduction of S3 Vectors. As AI models rely heavily on embeddings (long strings of numbers representing semantic meaning), the need to store and search billions of vectors has exploded.

Unlike traditional vector databases that keep everything in expensive memory, S3 devised a way to store vectors on disk while maintaining low latency. They achieve this by pre-computing "vector neighborhoods"—clusters of similar vectors. When a query comes in, the system only loads the relevant neighborhood into memory, allowing for sub-100 millisecond warm query times at a scale of trillions of vectors.

Scale must be to your advantage. You can't build something where the bigger you get, the worse your performance gets.

The Culture of Technical Fearlessness

Maintaining a system that underpins the internet requires a specific engineering culture. Milan highlighted two conflicting but essential Amazon tenets that drive the S3 team: "Respect what came before" and "Be technically fearless."

Engineers must respect the massive legacy codebase that works reliably for millions of customers. However, they cannot be paralyzed by it. They must be fearless enough to rewrite core consistency logic or introduce entirely new data primitives like vectors.

This balance is maintained through a deep sense of ownership. In the world of S3, engineers do not just write code; they own the "byte." They are responsible for the lifecycle, durability, and correctness of that data from the moment it enters the system until it is deleted. It is a reminder that even at the scale of exabytes, success depends on the curiosity and diligence of individual engineers.

Conclusion

Amazon S3 is no longer just a "simple storage service." It has evolved into a complex, living organism that adapts to the changing needs of the software industry—from hosting static images to powering the next generation of AI agents. By leveraging formal methods, inventing new distributed data structures, and rigorously managing failure, the S3 team has proven that it is possible to grow a system to incomprehensible scales without sacrificing performance or reliability.

How S3 is built

Table of Contents

Key Takeaways

The Physics of Storage at Hyper-Scale