At Facebook, we have unique storage scalability challenges when it comes to our data warehouse. Our warehouse stores upwards of 300 PB of Hive data, with an incoming daily rate of about 600 TB. In the last year, the warehouse has seen a 3× growth in the amount of data stored. Given this growth trajectory, storage efficiency is and will continue to be a focus for our warehouse infrastructure.
600 TB of incoming data per day is mind-blowing. I can’t fathom it. And it’s great that they’re sharing this information. There can’t be that many entities dealing with this scale of data storage, and the others likely aren’t sharing what they’ve learned. This is the cutting edge of computer science.