Monday, February 18, 2013

Systems: Facebook Taking a Page From Sun ZFS?

 Systems: Facebook Taking a Page From Sun ZFS?

Large vendors like Google have long created their own systems for their data centers. Facebook follows Google on creating their own systems for their data center, but contemplates taking a page from Sun's ZFS storage for their own flash optimizations.

[Facebook Server, courtesy ARS Technica]

Rotating Rust & Flash:
Facebook recognizes that traditional hard disks require significant power. Sun, over a half-decade ago, recognized that Flash could be used to reduce this power consumption. Sun designed ZFS file system to leverage two different kinds of flash: enterprise grade flash for the write log, and lower quality flash for read cache. By wisely choosing where to put lower quality flash in the storage tier, they were able to increase performance and reduce power consumption with fewer high-performance hard disks by combining the technology with ZFS. If the cheaper flash cell goes bad, they were only cache, and the actual data is still backed up against the real storage, which can be accessed at slower speeds, and the data can be automatically re-cached by ZFS.

Facebook has finally figured it out cheap Flash could reduce storage costs. Note, the discussion on using lower quality NAND flash in this data center article.
Data center-class flash is typically far more expensive than spinning disks, but Frankovsky says there may be a way to make it worth it. "If you use the class of NAND [flash] in thumb drives, which is typically considered sweep or scrap NAND, and you use a really cool kind of controller algorithm to characterize which cells are good and which cells are not, you could potentially build a really high-performance cold storage solution at very low cost," he said.
A good thing can not be kept from the market, indefinitely. Now, if the Facebook team could just inquire with someone who has been doing what they are discussing for a half-decade, to finish their contemplations. Their goal is pretty simple:

Facebook is burdened with lots of "cold storage," stuff written once and rarely accessed again. Even there, Frankovsky wants to increasingly use flash because of the failure rate of spinning disks. With tens of thousands of devices in operation, "we don't want technicians running around replacing hard drives," he said.
The cheap flash does go bad, while Facebook does not want engineers running around replacing spinning disks, they need to understand they do not want to be doing the same thing, with replacing flash chips. This analysis was a driving force behind the ZFS architecture.

[Illumos Logo]
On Hold in Illumos:
The discussion had hit the OpenSolaris splinters, regarding providing persistent storage in the flash Level 2 ARC Cache, past reboots. Of course, this simple change could offer some amazing possibilities for enhanced performance after a reboot (as well as some additional life expectancy on the cache components, for not having to be re-written.)
Unfortunately for Facebook, they could have been making this competitive edge a half-decade ago, had they implemented on Solaris, OpenSolaris, or one of the OpenSolaris splinters. - which are so prevalent in the storage provider arenas. How Facebook decides to do it's implementation will be an interesting question. Will they perform backups of their cheap flash data in more in more cheap flash (which is still more expensive than disk storage), or will they useless expensive storage to provide the redundancy for the flash, like Sun designed, Oracle now leverages, and many other storage providers (i.e. Illumos, Nexenta, etc.) now leverage?

No comments:

Post a Comment