Showing posts with label Flash. Show all posts
Showing posts with label Flash. Show all posts

Tuesday, October 6, 2015

Flash? The End of Disk?

[Growing capacity shipped, courtesy The Register]

Flash? The End of Disk?

A short article in The Register mentions a topic, which is not very popular in an industry driven by innovation. Silicon to retain storage is growing by leans and bounds, but will it overtake Disk?
Samsung expects the NAND flash industry to have capacity to produce up to 253 exabytes of total storage capacity by 2020, essentially "an impressive 3x increase relative to the current industry capacity".
He points out that this is expected to account for less than 10 per cent of the total storage capacity the industry will need by 2020.
There is a tall mountain to climb... if there are capable climbers,  there must be enough rope in order to climb a mountain!

Not So Fast...

It seems there may not be enough rope to climb this mountain, yet. Disk will be around for a longer period of time, than some expect. Sometimes, the factors of technology are impacted by economics.
every 10,000PB of NAND capacity costs $20bn, then to catch up with HDD capacity shipped in 2019, the flash industry would have to spend $2tn. We don't think it is going to happen unless flash capacity $/GB leaving the foundry is sustainably lower than that of disk.
The cost of manufacturing Disks is less expensive than the cost of manufacturing silicon and this investment must be accounted for through the money supply as well as the people who are purchasing products.


Tuesday, August 18, 2015

ZFS: Flash & Cache 2015q1

ZFS: Flash & Cache 2015q1

Abstract:

The concept of Storage Tiering existed from the time computing came into existence. ZFS was one of the first mainstream file systems to think about automatic storage tiering during it's initial design phase. Advances in ZFS had been made to make better use of cache during recent times.

Multiple kinds of Flash?

Flash comes primarily in two different types: highly reliable single-level cell (SLC) memory and multi-level cell (MLC) memory. The EE Times published a technical article describing them.
SLC... NAND flash cells... Both writing and erasing are done gradually to avoid over-stressing, which can degrade the lifetime of the cell.  
MLC... packing more than one bit in a single flash storage cell... allows for a doubling or tripling of the data density with just a small increase in the cost and size of the overall silicon. 
The read bandwidths between SLC and MLC are comparable
If MLC packs so much more data, why bother with SLC? There is no "free lunch", there are differences between SLC and MLC in real world applications, as the IEEE article describes.
MLC can more than double the density [over SLC] with almost no die size penalty, and hence no manufacturing cost penalty beyond possibly yield loss.
...
Access and programming times [for MLC] are two to three times slower than for the single-level [SLC] design.
...
The endurance of SLC NAND flash is 10 to 30 times more than MLC NAND flash
...
difference in operating temperature, are the main reasons why SLC NAND flash is considered industrial-grade
...
The error rate for MLC NAND flash is 10 to 100 times worse than that of SLC NAND flash and degrades more rapidly with increasing program/erase cycles
...
The floating gates can lose electrons at a very slow rate, on the order of an electron every week to every month. With the various values in multi-level cells only differentiated by 10s to 100s of electrons, however, this can lead to data retention times that are measured in months, rather than years. This is one of the reasons for the large difference between SLC and MLC data retention and endurance. Leakage is also increased by higher temperatures, which is why MLC NAND flash is generally only appropriate for commercial temperature range applications.
It is important to understand the capabilities of Flash Technology to determine how to gain the best economics from the technology.

ZFS Usage of Flash and Cache

The usage of MLC Cache in a proper storage hierarchy is impossible to omit. The doubling of storage capacity at almost no cost impact is a deal nearly too great to ignore! How does one place such a technology into a storage system?

When a missing block of data can result in loss of data on the persistent storage pool, then a highly reliable Flash is required. The ZFS Intent Log (ZIL), normally stored on the same drives as the managed data set, was architected with an external Syncronous Write Log (SLOG) option to leverage SLC NAND Flash. The SLC flash units are normally mirrored and placed in front of all writes going to the disk units. There is a dramatic speed improvement whenever writes are committed to the flash since committing the writes to disk take vastly longer, and those writes can be streamed to disk after random writes were coalesced to Flash. This was affectionately referred to as "LogZilla".

If the data is residing on persistent storage (i.e. disks), then the loss of a block of data merely results in a cache miss, so the data is never lost. With ZFS, the Level 2 Adaptive Read Cache (L2ARC) was architected to leverage MLC NAND Flash. There is a dramatic speed improvement whenever reads hit the MLC before going to disk. This was affectionately referred to as "ReadZilla".

Two things to be cautious about, regarding flash... electrons disappear over time and just reading data can cause corruption of data. To compensate for factors such as these, ZFS was architected with error detection & correction, inherently in the file system.

Performance Boosts in ZFS from 2010 to 2015

ZFS has been running in production for a very long time. Many improvements have been made recently, in order to improve on "State of The Art" of Flash and Disk!

Re-Architecture of ZFS Adaptive Read Cache

Consolidate Data and Metadata Lists
"the reARC project.. No more separation of data and metadata and no more special protection. This improvement led to fewer lists to manage and simpler code, such as shorter lock hold times for eviction"
Deduplication of ARC Memory Blocks
"Multiple clones of the same data share the same buffers for read accesses and new copies are only created for a write access. It has not escaped our notice that this N-way pairing has immense consequences for virtualization technologies. As VMs are used, the in-memory caches that are used to manage multiple VMs no longer need to inflate, allowing the space savings to be used to cache other data. This improvement allows Oracle to boast the amazing technology demonstration of booting 16,000 VMs simultaneously."
Increase Scalability through Diversifying Lock Type and Increasing Lock Quantity
"The entire MRU/MFU list insert and eviction processes have been redesigned. One of the main functions of the ARC is to keep track of accesses, such that most recently used data is moved to the head of the list and the least recently used buffers make their way towards the tail, and are eventually evicted. The new design allows for eviction to be performed using a separate set of locks from the set that is used for insertion. Thus, delivering greater scalability.
...
the main hash table was modified to use more locks placed on separate cache lines improving the scalability of the ARC operations"
Stability of ARC Size: Suppress Growths, Smaller Shrinks
"The new model grows the ARC less aggressively when approaching memory pressure and instead recycles buffers earlier on. This recycling leads to a steadier ARC size and fewer disruptive shrink cycles... the amount by which we do shrink each time is reduced to make it less of a stress for each shrink cycle."
 Faster Sequential Resilvering of Full Large Capacity Disk Rebuilds
"We split the algorithm in two phases. The populating phase and the iterating phase. The populating phase is mostly unchanged... except... instead of issuing the small random IOPS, we generate a new on disk log of them. After having iterated... we now can sort these blocks by physical disk offset and issue the I/O in ascending order. "
On-Disk ZFS Intent Log Optimization under Heavy Loads
"...thundering herds, a source of system inefficiency... Thanks to the ZIL train project, we now have the ability to break down convoys into smaller units and dispatch them into smaller ZIL level transactions which are then pipelined through the entire data center.

With logbias set to throughput, the new code is attempting to group ZIL transactions in sets of approximately 40 operations which is a compromise between efficient use of ZIL and reduction of the convoy effect. For other types of synchronous operations we group them into sets representing about ~32K of data to sync."
ZFS Input/Output Priority Inversion
"prefetching I/Os... was handled... at a lower priority operation than... a regular read... Before reARC, the behavior was that after an I/O prefetch was issued, a subsequent read of the data that arrived while the I/O prefetch was still pending, would block waiting on the low priority I/O prefetch completion. In the end, the reARC project and subsequent I/O restructuring changes, put us on the right path regarding this particular quirkiness. Fixing the I/O priority inversion..."
While all of these improvements provide for a vastly superior file system, as far as performance is concerned, there is yet another movement in the industry which really changed the way flash is used in Solaris with ZFS. As flash becomes less expensive, it's use will increase in systems. A laser was placed upon optimizing the L2ARC, making it vastly more usable.

ZFS Level 2 Adaptive Read Cache (L2ARC) Memory Footprint Reduction
"buffers were tracked in the L2ARC (the SSD based secondary ARC) using the same structure that was used by the main primary ARC. This represented about 170 bytes of memory per buffer. The reARC project was able to cut down this amount by more than 2X to a bare minimum that now only requires about 80 bytes of metadata per L2 buffers."

ZFS Level 2 Adaptive Read Cache (L2ARC) Persistence on Reboot
"the new L2ARC has an on-disk format that allows it to be reconstructed when a pool is imported... this L2ARC import is done asynchronously with respect to the pool import and is designed to not slow down pool import or concurrent workloads. Finally that initial L2ARC import mechanism was made scalable with many import threads per L2ARC device."
With large storage systems, regular reboots are devastating to the performance of the cache. The process of flushing the cache and re-populating them will shorten the life span of the flash. With disk blocks already existing in L2ARC, performance  improve. This also brings the benefit of inexpensive flash media as persistent storage, while competing systems must use expensive Enterprise Flash in order to facilitate persistent storage.

Conclusions:

Solaris continues  to advance using engineering and technology to provide higher performance at a lower price point than competing solutions. The changes to Solaris continues to drive down the cost of high performance systems at a faster pace than mere dropping in price of commodity hardware that competing systems depend upon.

Tuesday, March 31, 2015

Security: 2015q1 Concerns

Viruses, Worms, Vulnerabilities and Spyware concerns during and just prior 2015 Q1.

  • [2015-03-07] Litecoin-mining code found in BitTorrent app, freeloaders hit the roof
    "μTorrent users are furious after discovering their favorite file-sharing app is quietly bundled with a Litecoin mining program. The alt-coin miner is developed by distributed computing biz Epic Scale, and is bundled in some installations of μTorrent, which is a Windows BitTorrent client. Some peeps are really annoyed that Epic's code is running in the background while they illegally pirate torrent movies and Adobe Creative Suite Linux ISOs, and say they didn't ask for it to be installed."

  • [2015-03-06] FREAKing HELL: All Windows versions vulnerable to SSL snoop
    "Microsoft has confirmed that its implementation of SSL/TLS in all versions of Windows is vulnerable to the FREAK encryption-downgrade attack. This means if you're using the firm's Windows operating system, an attacker on your network can potentially force Internet Explorer and other software using the Windows Secure Channel component to deploy weak encryption over the web. Intercepted HTTPS connections can be easily cracked, revealing sensitive details such as login cookies and banking information, but only if the website or service at the other end is still supporting 1990s-era cryptography (and millions of sites still are)."

  • [2015-03-05] Broadband routers: SOHOpeless and vendors don't care
    "Home and small business router security is terrible. Exploits emerge with depressing regularity, exposing millions of users to criminal activities. Many of the holes are so simple as to be embarrassing. Hard-coded credentials are so common in small home and office routers, comparatively to other tech kit, that only those with tin-foil hats bother to suggest the flaws are deliberate."
  • [2015-03-05] Obama criticises China's mandatory backdoor tech import rules
    "US prez Barack ‪Obama has criticised China's new tech rules‬, urging the country to reverse the policy if it wants a business-as-usual situation with the US to continue. As previously reported, proposed new regulations from the Chinese government would require technology firms to create backdoors and provide source code to the Chinese government before technology sales within China would be authorised. China is also asking that tech companies adopt Chinese encryption algorithms and disclose elements of their intellectual property."
  • [2015-03-05] Sales up at NSA SIM hack scandal biz Gemalto
    "Sales at the world's biggest SIM card maker, Gemalto, which was last month revealed to have been hacked by the NSA and GCHQ, rose by five per cent to €2.5bn (£1.8bn) in 2014. Following the hack, the company's share price fell by $470m last month. In February, it was revealed that the NSA and Britain's GCHQ had hacked the company to harvest the encryption keys, according to documents leaked by former NSA sysadmin, whistleblower Edward Snowden."

  • [2015-02-24] SSL-busting adware: US cyber-plod open fire on Comodo's PrivDog
    "Essentially, Comodo's firewall and antivirus package Internet Security 2014, installs a tool called PrivDog by default. Some versions of this tool intercept encrypted HTTPS traffic to force ads into webpages. PrivDog, like the Lenovo-embarrassing Superfish, does this using a man-in-the-middle attack: it installs a custom root CA certificate on the Windows PC, and then intercepts connections to websites. Web browsers are fooled into thinking they are talking to legit websites, such as online banks and secure webmail, when in fact they are being tampered with by PrivDog so it can inject adverts. If that's not bad enough, PrivDog turns invalid HTTPS certificates on the web into valid ones: an attacker on your network can point your computer at an evil password-stealing website dressed up as your online bank, and you'd be none the wiser thanks to PrivDog."
  • [2015-02-23] Psst, hackers. Just go for the known vulnerabilities
    "Every one of the top ten vulnerabilities exploited in 2014 took advantage of code written years or even decades ago, according to HP, which recorded an increase in the level of mobile malware detected. “Many of the biggest security risks are issues we’ve known about for decades, leaving organisations unnecessarily exposed,” said Art Gilliland, senior vice president and general manager, Enterprise Security Products, HP. “We can’t lose sight of defending against these known vulnerabilities by entrusting security to the next silver bullet technology; rather, organisations must employ fundamental security tactics to address known vulnerabilities and in turn, eliminate significant amounts of risk," he added."

[Chinese Virus Image, courtesy WatchChinaTimes.com]
  • [2015-02-20] So long, Lenovo, and no thanks for all the super-creepy Superfish
    "Chinese PC maker Lenovo has published instructions on how to scrape off the Superfish adware it installed on its laptops – but still bizarrely insists it has done nothing wrong. That's despite rating the severity of the deliberate infection as "high" on its own website. Well played, Lenonope. Superfish was bundled on new Lenovo Windows laptops with a root CA certificate so it could intercept even HTTPS-protected websites visited by the user and inject ads into the pages. Removing the Superfish badware will leave behind the root certificate – allowing miscreants to lure Lenovo owners to websites masquerading as online banks, webmail and other legit sites, and steal passwords in man-in-the-middle attacks."

  • [2015-02-15] Mozilla's Flash-killer 'Shumway' appears in Firefox nightlies
    "Open source SWF player promises alternative to Adobe's endless security horror. In November 2012 the Mozilla Foundation announced “Project Shumway”, an effort to create a “web-native runtime implementation of the SWF file format.” Two-and-a-bit years, and a colossal number of Flash bugs later, Shumway has achieved an important milestone by appearing in a Firefox nightly, a step that suggests it's getting closer to inclusion in the browser. Shumway's been available as a plugin for some time, and appears entirely capable of handling the SWF files."

  • [2015-01-29] What do China, FBI and UK have in common? All three want backdoors...
    "The Chinese government wants backdoors added to all technology imported into the Middle Kingdom as well as all its source code handed over. Suppliers of hardware and software must also submit to invasive audits, the New York Times reports. The new requirements, detailed in a 22-page document approved late last year, are ostensibly intended to strengthen the cybersecurity of critical Chinese industries. Ironically, backdoors are slammed by computer security experts because the access points are ideal for hackers to exploit as well as g-men."
     
  • [2015-01-15] Console hacker DDoS bot runs on lame home routers
    "Console DDoSers Lizard Squad are using insecure home routers for a paid service that floods target networks, researchers say. The service crawls the web looking for home and commercial routers secured using lousy default credentials that could easily be brute-forced and then added to its growing botnet. Researchers close to a police investigation into Lizard Squad shared details of the attacks with cybercrime reporter Brian Krebs. The attacks used what was described as a 'crude' spin-off of a Linux trojan identified in November that would spread from one router to another, and potentially to embedded devices that accept inbound telnet connections. High-capacity university routers were also compromised in the botnet which according to the service boasted having run 17,439 DDoS attacks or boots at the time of writing."
  • [2014-12-14] CoolReaper pre-installed malware creates backdoor on Chinese Androids
    "Security researchers have discovered a backdoor in Android devices sold by Coolpad, a Chinese smartphone manufacturer. The “CoolReaper” vuln has exposed over 10 million users to potential malicious activity. Palo Alto Networks reckons the malware was “installed and maintained by Coolpad despite objections from customers”. It's common for device manufacturers to install software on top of Google’s Android mobile operating system to provide additional functionality or to customise Android devices. Some mobile carriers install applications that gather data on device performance. But CoolReaper operates well beyond the collection of basic usage data, acting as a true backdoor into Coolpad devices - according to Palo Alto.CoolReaper has been identified on 24 phone models sold by Coolpad."

  • [2014-11-24] Regin: The super-spyware the security industry has been silent about
    "A public autopsy of sophisticated intelligence-gathering spyware Regin is causing waves today in the computer security world... On Sunday, Symantec published a detailed dissection of the Regin malware, and it looks to be one of the most advanced pieces of spyware code yet found. The software targets Windows PCs, and a zero-day vulnerability said to be in Yahoo! Messenger, before burrowing into the kernel layer. It hides itself in own private area on hard disks, has its own virtual filesystem, and encrypts and morphs itself multiple times to evade detection. It uses a toolkit of payloads to eavesdrop on the administration of mobile phone masts, intercept network traffic, pore over emails, and so on... Kaspersky's report on Regin today shows it has the ability to infiltrate GSM phone networks. The malware can receive commands over a cell network, which is unusual."




Monday, February 18, 2013

Systems: Facebook Taking a Page From Sun ZFS?

 Systems: Facebook Taking a Page From Sun ZFS?

Abstract:
Large vendors like Google have long created their own systems for their data centers. Facebook follows Google on creating their own systems for their data center, but contemplates taking a page from Sun's ZFS storage for their own flash optimizations.


[Facebook Server, courtesy ARS Technica]

Rotating Rust & Flash:
Facebook recognizes that traditional hard disks require significant power. Sun, over a half-decade ago, recognized that Flash could be used to reduce this power consumption. Sun designed ZFS file system to leverage two different kinds of flash: enterprise grade flash for the write log, and lower quality flash for read cache. By wisely choosing where to put lower quality flash in the storage tier, they were able to increase performance and reduce power consumption with fewer high-performance hard disks by combining the technology with ZFS. If the cheaper flash cell goes bad, they were only cache, and the actual data is still backed up against the real storage, which can be accessed at slower speeds, and the data can be automatically re-cached by ZFS.


Facebook has finally figured it out cheap Flash could reduce storage costs. Note, the discussion on using lower quality NAND flash in this data center article.
Data center-class flash is typically far more expensive than spinning disks, but Frankovsky says there may be a way to make it worth it. "If you use the class of NAND [flash] in thumb drives, which is typically considered sweep or scrap NAND, and you use a really cool kind of controller algorithm to characterize which cells are good and which cells are not, you could potentially build a really high-performance cold storage solution at very low cost," he said.
A good thing can not be kept from the market, indefinitely. Now, if the Facebook team could just inquire with someone who has been doing what they are discussing for a half-decade, to finish their contemplations. Their goal is pretty simple:

Facebook is burdened with lots of "cold storage," stuff written once and rarely accessed again. Even there, Frankovsky wants to increasingly use flash because of the failure rate of spinning disks. With tens of thousands of devices in operation, "we don't want technicians running around replacing hard drives," he said.
The cheap flash does go bad, while Facebook does not want engineers running around replacing spinning disks, they need to understand they do not want to be doing the same thing, with replacing flash chips. This analysis was a driving force behind the ZFS architecture.

[Illumos Logo]
On Hold in Illumos:
The discussion had hit the OpenSolaris splinters, regarding providing persistent storage in the flash Level 2 ARC Cache, past reboots. Of course, this simple change could offer some amazing possibilities for enhanced performance after a reboot (as well as some additional life expectancy on the cache components, for not having to be re-written.)
Thoughts...
Unfortunately for Facebook, they could have been making this competitive edge a half-decade ago, had they implemented on Solaris, OpenSolaris, or one of the OpenSolaris splinters. - which are so prevalent in the storage provider arenas. How Facebook decides to do it's implementation will be an interesting question. Will they perform backups of their cheap flash data in more in more cheap flash (which is still more expensive than disk storage), or will they useless expensive storage to provide the redundancy for the flash, like Sun designed, Oracle now leverages, and many other storage providers (i.e. Illumos, Nexenta, etc.) now leverage?

Wednesday, November 14, 2012

Automatic Storage Tiering

Image courtesy: WWPI.
Automatic Storage Tiering

Abstract:
Automatic Storage Tiering or Hierarchical Storage Management is the process of placing the most data onto storage which is most cost effective, while meeting basic accessibility and efficient requirements. There has been much movement over the past half-decade in storage management.



Early History:
When computers were first being built on boards, RAM (most expensive) held the most volatile data while ROM held the least changing data. EPROM's provided a way for users to occasionally change mostly static data (requiring a painfully slow erasing mechanism using light and special burning mechanism using high voltage), placing the technology in a middle tier. EEPROM's provided a simpler way to update data on the same machine, without removing the chip for burning. Tape was created, to archive storage for longer periods of time, but it was slow, so it went to the bottom of the capacity distribution pyramid. Rotating disks (sometimes referred to as rotating rust) was created, taking a middle storage tier. As disks became faster, the fastest disks (15,000 RPM) moved towards the upper part of the middle tier while slower disks (5,400RPM or slower) moved towards the bottom part of the middle tier. Eventually, consumer-grade (not very reliable) IDE and SATA disks became available, occupying the higher areas of the lowest tier, slightly above tape.

Things seemed to settle out for awhile in the storage hierarchy; RAM, ROM, EEPROM, 15K FibreChannel/SCSI/SAS Disks, 7.2K FibreChannel/SCSI/SAS Disks, IDE/SATA, Tape - until the creation of ZFS by Sun Microsystems.
Logo Sun Microsystems
Storage Management Revolution:
In the early 2000's, Sun Microsystems started to invest more in flash technology. They anticipated a revolution in storage management, with the increase performance of a technology called "Flash", which is little more than EEPROM's. These became known as Solid State Drives or SSD's. In 2005, Sun released ZFS under their Solaris 10 operating system and started adding features that included flash acceleration.

There was a general problem that Sun noticed: flash was either cheap with low reliability (Multi-Level Cell or MLC) or expensive with higher reliability (Single-Level Cell or SLC). It was also noted that flash was not as reliable as disks (for many write environment.) The basic idea was to turn automatic storage management on it's head with the introduction of flash to ZFS under Solaris: RAM via ARC (Adaptive Read Cache), Cheap Flash via L2ARC (Level 2 Adaptive Read Cache), Expensive Flash via Write Log, Cheap Disks (protected with disk parity & CRC's at the block level.)

Sun Funfire 4500 aka Thumper, courtesy Sun Microsystems

For the next 5-7 years, the industry experienced hand-wringing with what to do with the  file system innovation introduced by Sun Microsystems under Solaris ZFS. More importantly, ZFS is a 128 bit file system, meaning massive data storage is now possible, on a scale that could not be imagined. The OS and File System were also open-sourced, meaning it was a 100% open solution.

With this technology, built into all of their servers, This became increasingly important as Sun released mid-range storage systems based upon this technology with a combination of high capacity, lower price, and higher speed access. Low end solutions based upon ZFS also started to hit the market, as well as super-high-end solutions in MPP clusters.

Drobo-5D, courtesy Drobo

Recent Vendor Advancements - Drobo:
A small-business centric company called Data Robotics or Drobo released a small RAID system with an innovative feature: add drives of different size, system adds capacity and reliability on-the-fly, with the loss of some disk space. While there is some disk space loss noted, the user is protected against drive failure on any drive in the RAID, regardless of size, and any size drive could replace the bad unit.
Drobo-Mini, courtesy Drobo
The Drobo was very innovative, but they started to move into Automatic Storage Tiering with the release of flash drive recognition. When an SSD (expensive flash drive) is added to the system, it is understood and used to accelerate the storage of the rotating disks. A short review of the product was done by IT Business Edge. With a wide range of products, from the portable market (leveraging 2.5 inch drives), to midsize market (leveraging 3.5 inch drives), to small business market (leveraging up to 12 3.5 inch drives) - this is a challenging low to medium end competitor.

The challenge with Drobo - it is a proprietary hardware solution, sitting on top of a fairly expensive hardware, for the low to medium end market. This is not your $150 raid box, where you can stick in a couple of drives and go.
 


Recent Vendor Advancements - Apple:
Not to be left out, Apple was going to bundle ZFS into every Apple Macintosh they were to ship. Soon enough, Apple canceled their ZFS announcement, canceled their open-source ZFS announcement, placed job ads for people to create a new file system, and went into hybernation for years.

Apple recently released a Mac OSX feature they referred to as their Fusion Drive. Mac Observer noted:
"all writes take place on the SSD drive, and are later moved to the mechanical drive if needed, resulting in faster initial writes. The Fusion will be available for the new iMac and new Mac mini models announced today"
Once again, the market was hit with an innovative product, bundled into an Operating System, not requiring proprietary software.


Implications for Network Management:
With continual improvement in storage technology, this will place an interesting burden on Network, Systems, and Storage Management vendors. How does one manage all of these solutions in an Open way, using Open protocols, such as SNMP?

The vendors to "crack" the management "nut" may be in the best position to have their product accepted into existing heterogeneous storage managed shops, or possibly supplant them.

Tuesday, August 14, 2012

ZFS: A Multi-Year Case Study in Moving From Desktop Mirroring (Part 3)

Abstract:
ZFS was created by Sun Microsystems to innovate the storage subsystem of computing systems by simultaneously expanding capacity & security exponentially while collapsing the formerly striated layers of storage (i.e. volume managers, file systems, RAID, etc.) into a single layer in order to deliver capabilities that would normally be very complex to achieve. One such innovation introduced in ZFS was the ability to dynamically add additional disks to an existing filesystem pool, remove the old disks, and dynamically expand the pool for filesystem usage. This paper discusses the upgrade of high capacity yet low cost mirrored external media under ZFS.

Case Study:
A particular Media Design House had formerly used multiple external mirrored storage on desktops as well as racks of archived optical media in order to meet their storage requirements. A pair of (formerly high-end) 400 Gigabyte Firewire drives lost a drive. An additional pair of (formerly high-end) 500 Gigabyte Firewire drives experienced a drive loss within one month later. A media wall of CD's and DVD's was getting cumbersome to retain.

First Upgrade:
A newer version of Solaris 10 was released, which included more recent features. The Media House was pleased to accept Update 8, with the possibility of supporting Level 2 ARC for increased read performance and Intent Logging for increase write performance. A 64 bit PCI card supporting gigabit ethernet was used on the desktop SPARC platform, serving mirrored 1.5 Terabyte "green" disks over "green" gigabit ethernet switches. The Media House determined this configuration performed adequately.

ZIL Performance Testing:
Testing was performed to determine what the benefit was to leveraging a new feature in ZFS called the ZFS Intent Log or ZIL. Testing was done across consumer grade USB SSD's in different configurations. It was determined that any flash could be utilized in the ZIL to gain a performance increase, but an enterprise grade SSD provided the best performance increase, of about 20% with commonly used throughput loads of large file writes going to the mirror. It was determined at that point to hold off on the use of the SSD's, since the performance was adequate enough.

External USB Drive Difficulties:
The original Seagate 1.5 TB drives were working well, in the mirrored pair. One drive was "flaky" (often reported errors, a lot of "clicking". The errors were reported in the "/var/adm/messages" log.

# more /var/adm/messages
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/usb@4,2/storage@1/disk@0,0 (sd17):
Jul 15 13:16:13 Ultra60         Error for Command: write(10)  Error Level: Retryable
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Requested Block: 973089160   Error Block: 973089160
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Vendor: Seagate  Serial Number:            
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Sense Key: Not Ready
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   ASC: 0x4 (LUN initializing command required), ASCQ: 0x2, FRU: 0x0
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/usb@4,2/storage@1/disk@0,0 (sd17):
Jul 15 13:16:13 Ultra60         Error for Command: write(10)  Error Level: Retryable
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Requested Block: 2885764654  Error Block: 2885764654
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Vendor: Seagate  Serial Number:            
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Sense Key: Not Ready
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   ASC: 0x4 (LUN initializing command required), ASCQ: 0x2, FRU: 0x0


It was clear that one drive was unreliable, but in a ZFS pair, the unreliable drive was not a significant liability.

Mirrored Capacity Constraints:
Eventually, the 1.5 TB pair was out of capacity.
# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.36T  1.33T  25.5G    98%  ONLINE  -
Point of Decision:
It was time to perform the drive upgrade. 2 TB drives were previously purchased and ready to be concatenated to the original set. Instead of concatenating the 2 TB drives to the 1.5 TB drives, as originally planned, a straight swap would be done, to eliminate the "flaky" drive int he 1.5 TB pair. The 1.5 TB pair could be used for other uses, which were less critical.

Target Drives to Swap:
The target drives to swap were both external USB. The zpool command provides the device names.
$ zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The
       
pool can still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0

errors: No known data errors
The former OS upgrade can be noted, where the pool was not upgraded, since the new features were not yet required to be leveraged. The old ZFS version is just fine, for this engagement, since the newer features are not required, and offers the ability to swap the drives to another SPARC in their office, without having to worry about being on a newer version of Solaris 10.

Scrubbing Production Dataset:
The production data set should be scrubbed, to validate no silent data corruption was introduced to the set over the years through the "flaky" drive.
Ultra60/root# zpool scrub zpool2

It will take some time, for the system to complete the operation, but the business can continue to function, as the system performs the bit by bit checksum check and repair across the 1.5TB of media.
Ultra60/root# zpool status zpool2
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The
       
pool can still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: scrub completed after 39h33m with 0 errors on Wed Jul 18 00:27:19 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0

errors: No known data errors
There is a time estimate on the scrub time, provided to allow the consumer to have an estimate of when the operation will be complete. Once the scrub is over, the 'zpool status' command above demonstrates the time absorbed by the scrub command.

Adding New Drives:
The new drives will be placed, in a 4 way mirror. Additional 2TB disks of media will be added to the existing 1.5TB mirrored set,  .
Ultra60/root# time zpool attach zpool2 c5t0d0s0 c8t0d0
real    0m21.39s
user    0m0.73s
sys     0m0.55s

Ultra60/root# time zpool attach zpool2 c8t0d0 c9t0d0

real    1m27.88s
user    0m0.77s
sys     0m0.59s
Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h1m, 0.00% done, 1043h38m to go
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0
            c8t0d0    ONLINE       0     0     0  42.1M resilvered
            c9t0d0    ONLINE       0     0     0  42.2M resilvered

errors: No known data errors
The second drive took more time to add, since the first drive was in the process of resilvering. After waiting awhile, the estimates get better. Adding additional pair to the existing pair, to make a 4 way mirror completed in not muchlonger than it took to mirror a single drive - partially because each drive is on a dedicated USB port and the drives are split between 2 PCI buses.
Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 45h32m with 0 errors on Sun Aug  5 01:36:57 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0
            c8t0d0    ONLINE       0     0     0  1.34T resilvered
            c9t0d0    ONLINE       0     0     0  1.34T resilvered

errors: No known data errors

Detaching Old Small Drives

Thew 4-way mioor is very for redundancy, but the purpose of this activity was to move the data from 2 smaller drives (where one drive was less reliable) to two newer drives, which should both be more reliable. The old disks now need to be detached.
Ultra60/root# time zpool detach zpool2 c4t0d0s0

real    0m1.43s
user    0m0.03s
sys     0m0.06s

Ultra60/root# time zpool detach zpool2 c5t0d0s0

real    0m1.36s
user    0m0.02s
sys     0m0.04s

As one can see, the activity to remove the mirrored drives from the 4-way mirror is very fast. The integrity of the pool can be validated through the zpool status command.

Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 45h32m with 0 errors on Sun Aug  5 01:36:57 2012
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0  1.34T resilvered
            c9t0d0  ONLINE       0     0     0  1.34T resilvered

errors: No known data errors

Expanding the Pool

The pool is still the same size as the former drives. Under the older versions of ZFS, the pool would automatically extend. Under newer versions, the extension needs to be a manual process. (This is partially because there is no way to shrink a pool due to a provisioning error, so zfs developers make the administrastor make this mistake on purposes now!)

Using Auto Expand Property

One option is to use the autoexpand option.
Ultra60/root# zpool set autoexpand=on zpool2

This feature may not be available, depending on the version of ZFS.  If it is not available, you may get the following error:

cannot set property for 'zpool2': invalid property 'autoexpand'

If you fall into this category, other options exist.

Using Online Expand Option

Another option is to use the online expand option
Ultra60/root# zpool online -e zpool2 c8t0d0 c9t0d0

If this option is not available under the version of ZFS being used, the following error may occur:
invalid option 'e'
usage:
        online ...
Once again, if you fall into this category, other options exist.

Using Export / Import Option

When using an older version of ZFS, the zpool replace option on both disks (individually) would have caused an automatic expansion. In other words, had this approach been done, this step may have been unnecessary in this case.

This would have nearly doubled the re-silvering time, however. The judgment call, in this case, was to shorten the re-silver time, and build a 4-way mirror to shorten completion time.

With this old version of ZFS, taking the volume offline via the export and bringing it back online via import, is a safe and reasonably short method of forcing a growth.

Ultra60/root# zpool set autoexpand=on zpool2
cannot set property for 'zpool2': invalid property 'autoexpand'

Ultra60/root# time zpool export zpool2

real    9m15.31s
user    0m0.05s
sys     0m3.94s

Ultra60/root# zpool status
no pools available

Ultra60/root# time zpool import zpool2

real    0m19.30s
user    0m0.06s
sys     0m0.33s

Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0

errors: No known data errors

Ultra60/root# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.81T  1.34T   486G    73%  ONLINE  -
As noted above, the outage time of 9 minutes to a saving 40 hours of re-silvering, was determined an effective trade-off.




Sunday, October 16, 2011

ZFS: A Multi-Year Case Study in Moving From Desktop Mirroring (Part 2)



Abstract:
ZFS was created by Sun Microsystems to innovate the storage subsystem of computing systems by simultaneously expanding capacity & security exponentially while collapsing the formerly striated layers of storage (i.e. volume managers, file systems, RAID, etc.) into a single layer in order to deliver capabilities that would normally be very complex to achieve. One such innovation introduced in ZFS was the ability to provide inexpensive limited life solid state storage (FLASH media) which may offer fast (or at least greater deterministic) random read or write access to the storage hierarchy in a place where it can enhance performance of less deterministic rotating media. This paper discusses the use of various configurations of inexpensive flash to enhance the write performance of high capacity yet low cost mirrored external media with ZFS.

Case Study:
A particular Media Design House had formerly used multiple external mirrored storage on desktops as well as racks of archived optical media in order to meet their storage requirements. A pair of (formerly high-end) 400 Gigabyte Firewire drives lost a drive. An additional pair of (formerly high-end) 500 Gigabyte Firewire drives experienced a drive loss within one month later. A media wall of CD's and DVD's was getting cumbersome to retain.

First Upgrade:
A newer version of Solaris 10 was released, which included more recent features. The Media House was pleased to accept Update 8, with the possibility of supporting Level 2 ARC for increased read performance and Intent Logging for increase write performance.

The Media House did not see the need to purchase flash for read or write logging at this time. The mirrored 1.5 Terabyte SAN performed adequately.


Second Upgrade:
The Media House started becoming concerned, about 1 year later, when 65% of their 1.5 Terabyte SAN storage was burned through.
Ultra60/root# zpool list

NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.36T   905G   487G    65%  ONLINE  -
The decision to invest in an additional pair of 2 Terabyte drives for the SAN was an easy one. The external Seagate Expansion drives were selected, because of the reliability of the former drives, and the built in power management which would reduce power consumption.

Additional storage was purchased for the network, but if there was going to be an upgrade, a major question included: what kind of common flash media would perform best for the investment?


Multiple Flash Sticks or Solid State Disk?

Understanding that Flash Media normally has a high Write latency, the question in everyone's mind is: what would perform better, an army of flash sticks or a solid state disk?

This simple question started what became a testing rat hole where people ask the question but often the responses comes from anecdotal assumptions. The media house was interested in the real answer.

Testing Methodology

It was decided that the copying of large files to/from large drive pairs was the most accurate way to simulate the day to day operations of the design house. This is what they do with media files, so this is how the storage should be tested.

The first set of tests surrounded testing the write cache in different configurations.
  • The USB sticks would each use a dedicated 400Mbit port
  • USB stick mirroring would occur across 2 different PCI buses
  • 4x consumer grade 8 Gigabyte USB sticks from MicroCenter were procured
  • Approximately 900 Gigabytes of data would be copied during each test run
  • The same source mirror was used: the 1.5TB mirror
  • The same destination mirror would be used: the 2TB mirror
  • The same Ultra60 Creator 3D with dial 450MHz processors would be used
  • The SAN platform was maxed out at 2 GB of ECC RAM
  • The destination drives would be destroyed and re-mirrored between tests
  • Solaris 10 Update 8 would be used
The Base System
# Check patch release
Ultra60/root# uname -a
SunOS Ultra60 5.10 Generic_141444-09 sun4u sparc sun4u


# check OS release
Ultra60/root# cat /etc/release
Solaris 10 10/09 s10s_u8wos_08a SPARC
Copyright 2009 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 September 2009


# check memory size
Ultra60/root# prtconf | grep Memory
Memory size: 2048 Megabytes


# status of zpool, show devices
Ultra60/root# zpool status zpool2
pool: zpool2
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t0d0s0 ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0

errors: No known data errors
The Base Test: No Write Cache

A standard needed to be created by which each additional run could be tested against. This base test was a straight create and copy.

ZFS is a tremendously fast system for creating a mirrored pool under. A 2TB mirrored poll takes only 4 seconds on an old dual 450MHz UltraSPARC II.
# Create mirrored pool of 2x 2.0TB drives
Ultra60/root# time zpool create -m /u003 zpool3 mirror c8t0d0 c9t0d0

real 0m4.09s
user 0m0.74s
sys 0m0.75s
The data to be copied with source and destination storage is easily listed.
# show source and destination zpools
Ultra60/root# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
zpool2 1.36T 905G 487G 65% ONLINE -
zpool3 1.81T 85.5K 1.81T 0% ONLINE -

The copy of over 900 GB between mirrored USB pairs takes about 41 hours.
# perform copy of 905GBytes of data from old source to new destination zpool
Ultra60/root# cd /u002 ; time cp -r . /u003
real 41h6m14.98s
user 0m47.54s
sys 5h36m59.29s
The time to destroy the 2 TB mirrored pool holding 900GB of data was about 2 seconds.
# erase and unmount new destination zpool
Ultra60/root# time zpool destroy zpool3
real 0m2.19s
user 0m0.02s
sys 0m0.14s
Another Base Test: Quad Mirrored Write Cache

The ZFS Intent Log can be split from the mirror onto higher throughput media, for the purpose of speeding writes. Because this is a write cache, it is extremely important that this media is redundant - a loss to the write cache can result in a corrupt pool and loss of data.

The first test was to create a quad mirrored write cache. With 2 GB of RAM, there is absolutely no way that the quad 8 GB sticks would ever have more than a fraction flash used, but the hope is that such a small amount of flash used would allow the commodity sticks to perform well.

The 4x 8GB sticks were inserted into the system, they were found, formatted (see this article for additional USB stick handling under Solaris 10), and the system was now ready to accept them for creating a new destination pool.

Creation of 4x mirror ZFS Intent Log with 2TB mirror took longer - 20 seconds.
# Create mirrored pool with 4x 8GB USB sticks for ZIL for highest reliability
Ultra60/root# time zpool create -m /u003 zpool3 \
mirror c8t0d0 c9t0d0 \
log mirror c1t0d0s0 c2t0d0s0 c6t0d0s0 c7t0d0s0
real 0m20.01s
user 0m0.77s
sys 0m1.36s
The new zpool is clearly composed of a 4 way mirror.
# status of zpool, show devices
Ultra60/root# zpool status zpool3
pool: zpool3
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
zpool3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
logs
mirror ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0
c2t0d0s0 ONLINE 0 0 0
c6t0d0s0 ONLINE 0 0 0
c7t0d0s0 ONLINE 0 0 0

errors: No known data errors
No copy was done using the quad mirrored USB ZIL, because this level of redundancy was not needed.

A destroy of the 4 way mirrored ZIL with 2TB mirrored zpool still only took 2 seconds.

# destroy zpool3 to create without mirror for highest throughput
Ultra60/root# time zpool destroy zpool3
real 0m2.19s
user 0m0.02s
sys 0m0.14s
The intention of this setup was just to see if it was possible, ensure the USB sticks were functioning, and determine if adding an unreasonable amount of redundant ZIL to the system created any odd performance behaviors. Clearly, if this is acceptable, nearly every other realistic scenario that is tried will be fine.

Scenario One: 4x Striped USB Stick ZIL

The first scenario to test will be the 4-way striped USB Stick ZFS Intent Log. With 4 USB sticks, 2 sticks on each PCI bus, each stick on a dedicated USB 2.0 port - this should offer the greatest amount of throughput from these commodity flash sticks, but the least amount of security from a failed stick.
# Create zpool without mirror to round-robin USB sticks for highest throughput (dangerous)
Ultra60/root# time zpool create -m /u003 zpool3 \
mirror c8t0d0 c9t0d0 \
log c1t0d0s0 c2t0d0s0 c6t0d0s0 c7t0d0s0
real 0m19.17s
user 0m0.76s
sys 0m1.37s

# list zpools
Ultra60/root# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
zpool2 1.36T 905G 487G 65% ONLINE -
zpool3 1.81T 87K 1.81T 0% ONLINE -


# show status of zpool including devices
Ultra60/root# zpool status zpool3
pool: zpool3
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zpool3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
logs
c1t0d0s0 ONLINE 0 0 0
c2t0d0s0 ONLINE 0 0 0
c6t0d0s0 ONLINE 0 0 0
c7t0d0s0 ONLINE 0 0 0
errors: No known data errors


# start copy of 905GB of data from mirrored 1.5TB to 2.0TB
Ultra60/root# cd /u002 ; time cp -r . /u003
real 37h12m43.54s
user 0m49.27s
sys 5h30m53.29s

# destroy it again for new test
Ultra60/root# time zpool destroy zpool3
real 0m2.77s
user 0m0.02s
sys 0m0.56s
The zpool creation was 19 seconds, destroying almost 3 seconds, but the copy decreased from 41 to 37 hours or about 10% savings... with no redundancy.

Scenario Two: 2x Mirrored USB ZIL on 2TB Mirrored Pool

Adding quad mirrored ZIL offered a 10% boost with no redundancy, what if we added a pair of mirrored USB ZIL sticks, to offer write striping for speed and mirroring for redundancy?
# create zpool3 with pair of mirrored intent USB intent logs
Ultra60/root# time zpool create -m /u003 zpool3 mirror c8t0d0 c9t0d0 \
log mirror c1t0d0s0 c2t0d0s0 mirror c6t0d0s0 c7t0d0s0
real 0m19.20s
user 0m0.79s
sys 0m1.34s

# view new pool with pair of mirrored intent logs
Ultra60/root# zpool status zpool3
pool: zpool3
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zpool3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
logs
mirror ONLINE 0 0 0
c1t0d0s0 ONLINE 0 0 0
c2t0d0s0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c6t0d0s0 ONLINE 0 0 0
c7t0d0s0 ONLINE 0 0 0
errors: No known data errors

# run capacity test
Ultra60/root# cd /u002 ; time cp -r . /u003
real 37h9m52.78s
user 0m48.88s
sys 5h31m30.28s


# destroy it again for new test
Ultra60/root# time zpool destroy zpool3
real 0m21.99s
user 0m0.02s
sys 0m0.31s
The results were almost identical. A 10% improvement in speed was measured. Splitting the commodity 8GB USB sticks into a mirror offered redundancy without lacking performance.

If 4 USB sticks are to be purchased for ZIL, don't bother striping all 4, split them into mirrored pairs and get your 10% boost in speed.


Scenario Three: Vertex OCZ Solid State Disk

Purchasing 4 USB sticks for the purpose of a ZIL starts to approach the purchase price of a fast SATA SSD drive. On the UltraSPARC II processors, the drivers for SATA are lacking, so that is not necessarily a clear option.

The decision test a USB to SATA conversion kit with the SSD and run a single SSD SIL was made.
# new flash disk, format
Ultra60/root# format -e
Searching for disks...done
AVAILABLE DISK SELECTIONS:
...
2. c1t0d0
/pci@1f,2000/usb@1,2/storage@4/disk@0,0
...


# create zpool3 with SATA-to-USB flash disk intent USB intent log
Ultra60/root# time zpool create -m /u003 zpool3 mirror c8t0d0 c9t0d0 log c1t0d0
real 0m5.07s
user 0m0.74s
sys 0m1.15s
# show zpool3 with intent log
Ultra60/root# zpool status zpool3
  pool: zpool3
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zpool3      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0
        logs
          c1t0d0    ONLINE       0     0     0

# run capacity test
Ultra60/root# cd /u002 ; time cp -r . /u003
real 32h57m40.99s
user 0m52.04s
sys 5h43m13.31s
The single SSD over a SATA to USB interface provided a 20% boost in throughput.

In Conclusion

Using commodity parts, a ZFS SAN can have the write performance boosted using USB sticks by 10% as well as by 20% using an SSD. The SSD is a more reliable device and better choice for ZIL.