Needle in a haystack: efficient storage of billions of photos

Ref:

image

The Photos application is one of Facebook’s most popular features. They handle billions of image scale. For each uploaded photo, Facebook generates and stores four images of different sizes, which translates to a total of 60 billion images and 1.5PB of storage.

These numbers pose a significant challenge for the Facebook photo storage infrastructure.

image

image

image

image

NFS photo infrastructure

The old photo infrastructure consisted of several tiers:

image

Since each image is stored in its own file, there is an enormous amount of metadata generated on the storage tier due to the namespace directories and file inodes. The amount of metadata far exceeds the caching abilities of the NFS storage tier, resulting in multiple I/O operations per photo upload or read request. The whole photo serving infrastructure is bottlenecked on the high metadata overhead of the NFS storage tier, which is one of the reasons why Facebook relies heavily on CDNs to serve photos. Two additional optimizations were deployed in order to mitigate this problem to some degree:

The major lesson we learned is that CDNs by themselves do not offer a practical solution to serving photos on a social networking site.

CDNs do effectively serve the hottest photos— profile pictures and photos that have been recently uploaded—but a social networking site like Facebook also generates a large number of requests for less popular (often older) content, which we refer to as the long tail.

image

image

image

Issues

image

Atleast 3 disk IOPS and because of nested inode, even more image

Requirements

image

image

Haystack

image

image

image

Components

image

image

image

image

Reads from Store

image

Upload

image

image

Index File

image

Layout of Haystack Index file image

image

Deletes

image

Infrastructure details

image

Storage

image

HTTP server

image