Ref:

image

Bucket: A logical container for objects. The bucket name is globally unique. To upload data to S3, we must first create a bucket.

Object: An object is an individual piece of data we store in a bucket. It contains object data (also called payload) and metadata. Object data can be any sequence of bytes we want to store. The metadata is a set of name-value pairs that describe the object.

image

image

image

Timeline of features

image

Ref: https://highscalability.com/behind-aws-s3s-massive-scale/

image

image

image

image

image

The design philosophy of object storage is very similar to that of the UNIX file system.

The object storage works similarly.

image

Architecture

image

High level, ref: https://newsletter.systemdesign.one/p/s3-architecture

image

S3 is said to be composed of more than 300 microservices.

It tries to follow the core design principle of simplicity.

You can distinct its architecture by four high-level services:

image

image

image

image

Upload

image

image

image

Download

image

Multi-part upload

How can we optimize performance when we upload large files to object storage service such as S3?

image

image

image

image

You can now break your larger objects into chunks and upload a number of chunks in parallel. If the upload of a chunk fails, you can simply restart it. You’ll be able to improve your overall upload speed by taking advantage of parallelism.

Ref: https://blog.bytebytego.com/p/how-to-upload-a-large-file-to-s3 and https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/

image

Storage Fleet

image

Hard Drives

image

image

Replication

image

Heat Management at Scale

image

image

But as the system aggregates millions of workloads, the underlying traffic to the storage flattens out remarkably. The aggregate demand results in a smoothened out, more predictable throughput.

When you aggregate on a large enough scale, a single workload cannot influence the aggregate peak.

The problem then becomes much easier to solve - you simply need to balance out a smooth demand rate across many disks.