Abstract:
ZFS was created by Sun Microsystems to innovate the storage subsystem of
computing systems by simultaneously expanding capacity & security
exponentially while collapsing the formerly striated layers of storage
(i.e. volume managers, file systems, RAID, etc.) into a single layer in
order to deliver capabilities that would normally be very complex to
achieve. One such innovation introduced in ZFS was the ability to dynamically add additional disks to an existing filesystem pool, remove the old disks, and dynamically expand the pool for filesystem usage. This paper discusses
the upgrade of high capacity yet low cost mirrored external media under ZFS.
Case Study:
A particular
Media Design House had formerly used multiple external mirrored storage
on desktops as well as racks of archived optical media in order to meet
their storage requirements. A pair of (formerly high-end) 400 Gigabyte
Firewire drives lost a drive. An additional pair of (formerly high-end)
500 Gigabyte Firewire drives experienced a drive loss within one month
later. A media wall of CD's and DVD's was getting cumbersome to retain.
First Upgrade:
A
newer version of Solaris 10 was released, which included more recent
features. The Media House was pleased to accept Update 8, with the
possibility of supporting Level 2 ARC for increased read performance and
Intent Logging for increase write performance. A 64 bit PCI card supporting gigabit ethernet was used on the desktop SPARC platform, serving mirrored 1.5 Terabyte "green" disks over "green" gigabit ethernet switches. The Media House determined this configuration performed adequately.
ZIL Performance Testing:
Testing was performed to determine what the benefit was to leveraging a new feature in ZFS called the ZFS Intent Log or ZIL. Testing was done across consumer grade USB SSD's in different configurations. It was determined that any flash could be utilized in the ZIL to gain a performance increase, but an
enterprise grade SSD provided the best performance increase, of about 20% with commonly used throughput loads of large file writes going to the mirror. It was determined at that point to hold off on the use of the SSD's, since the performance was adequate enough.
External USB Drive Difficulties:
The original Seagate 1.5 TB drives were working well, in the mirrored pair. One drive was "flaky" (often reported errors, a lot of "clicking". The errors were reported in the "/var/adm/messages" log.
# more /var/adm/messages
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/usb@4,2/storage@1/disk@0,0 (sd17):
Jul 15 13:16:13 Ultra60 Error for Command: write(10) Error Level: Retryable
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] Requested Block: 973089160 Error Block: 973089160
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] Vendor: Seagate Serial Number:
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] Sense Key: Not Ready
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] ASC: 0x4 (LUN initializing command required), ASCQ: 0x2, FRU: 0x0
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/usb@4,2/storage@1/disk@0,0 (sd17):
Jul 15 13:16:13 Ultra60 Error for Command: write(10) Error Level: Retryable
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] Requested Block: 2885764654 Error Block: 2885764654
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] Vendor: Seagate Serial Number:
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] Sense Key: Not Ready
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice] ASC: 0x4 (LUN initializing command required), ASCQ: 0x2, FRU: 0x0
It was clear that one drive was unreliable, but in a ZFS pair, the unreliable drive was not a significant liability.
Mirrored Capacity Constraints:
Eventually, the 1.5 TB pair was out of capacity.
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
zpool2 1.36T 1.33T 25.5G 98% ONLINE -
Point of Decision:
It was time to perform the drive upgrade. 2 TB drives were previously purchased and ready to be concatenated to the original set. Instead of concatenating the 2 TB drives to the 1.5 TB drives, as originally planned, a straight swap would be done, to eliminate the "flaky" drive int he 1.5 TB pair. The 1.5 TB pair could be used for other uses, which were less critical.
Target Drives to Swap:
The target drives to swap were both external USB. The zpool command provides the device names.
$ zpool status
pool: zpool2
state: ONLINE
status: The pool is formatted using an older on-disk format. The
pool can still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t0d0s0 ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0
errors: No known data errors
The former OS upgrade can be noted, where the pool was not upgraded, since the new features were not yet required to be leveraged. The old ZFS version is just fine, for this engagement, since the newer features are not required, and offers the ability to swap the drives to another SPARC in their office, without having to worry about being on a newer version of Solaris 10.
Scrubbing Production Dataset:
The production data set should be scrubbed, to validate no silent data corruption was introduced to the set over the years through the "flaky" drive.
Ultra60/root# zpool scrub zpool2
It will take some time, for the system to complete the operation, but the business can continue to function, as the system performs the bit by bit checksum check and repair across the 1.5TB of media.
Ultra60/root# zpool status zpool2
pool: zpool2
state: ONLINE
status: The pool is formatted using an older on-disk format. The
pool can still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: scrub completed after 39h33m with 0 errors on Wed Jul 18 00:27:19 2012
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t0d0s0 ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0
errors: No known data errors
There is a time estimate on the scrub time, provided to allow the consumer to have an estimate of when the operation will be complete. Once the scrub is over, the 'zpool status' command above demonstrates the time absorbed by the scrub command.
Adding New Drives:
The new drives will be placed, in a 4 way mirror. Additional 2TB disks of media will be added to the existing 1.5TB mirrored set, .
Ultra60/root# time zpool attach zpool2 c5t0d0s0 c8t0d0
real 0m21.39s
user 0m0.73s
sys 0m0.55s
Ultra60/root# time zpool attach zpool2 c8t0d0 c9t0d0
real 1m27.88s
user 0m0.77s
sys 0m0.59s
Ultra60/root# zpool status
pool: zpool2
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h1m, 0.00% done, 1043h38m to go
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t0d0s0 ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0 42.1M resilvered
c9t0d0 ONLINE 0 0 0 42.2M resilvered
errors: No known data errors
The second drive took more time to add, since the first drive was in the process of resilvering. After waiting awhile, the estimates get better. Adding additional pair to the existing pair, to make a 4 way mirror completed in not muchlonger than it took to mirror a single drive - partially because each drive is on a dedicated USB port and the drives are split between 2 PCI buses.
Ultra60/root# zpool status
pool: zpool2
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: resilver completed after 45h32m with 0 errors on Sun Aug 5 01:36:57 2012
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c4t0d0s0 ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0 1.34T resilvered
c9t0d0 ONLINE 0 0 0 1.34T resilvered
errors: No known data errors
Detaching Old Small Drives
Thew 4-way mioor is very for redundancy, but the purpose of this activity was to move the data from 2 smaller drives (where one drive was less reliable) to two newer drives, which should both be more reliable. The old disks now need to be detached.
Ultra60/root# time zpool detach zpool2 c4t0d0s0
real 0m1.43s
user 0m0.03s
sys 0m0.06s
Ultra60/root# time zpool detach zpool2 c5t0d0s0
real 0m1.36s
user 0m0.02s
sys 0m0.04s
As one can see, the activity to remove the mirrored drives from the 4-way mirror is very fast. The integrity of the pool can be validated through the zpool status command.
Ultra60/root# zpool status
pool: zpool2
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: resilver completed after 45h32m with 0 errors on Sun Aug 5 01:36:57 2012
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0 1.34T resilvered
c9t0d0 ONLINE 0 0 0 1.34T resilvered
errors: No known data errors
Expanding the Pool
The pool is still the same size as the former drives. Under the older versions of ZFS, the pool would automatically extend. Under newer versions, the extension needs to be a manual process. (This is partially because there is no way to shrink a pool due to a provisioning error, so zfs developers make the administrastor make this mistake on purposes now!)
Using Auto Expand Property
One option is to use the autoexpand option.
Ultra60/root# zpool set autoexpand=on zpool2
This feature may not be available, depending on the version of ZFS. If it is not available, you may get the following error:
cannot set property for 'zpool2': invalid property 'autoexpand'
If you fall into this category, other options exist.
Using Online Expand Option
Another option is to use the online expand option
Ultra60/root# zpool online -e zpool2 c8t0d0 c9t0d0
If this option is not available under the version of ZFS being used, the following error may occur:
invalid option 'e'
usage:
online ...
Once again, if you fall into this category, other options exist.
Using Export / Import Option
When using an older version of ZFS, the zpool replace option on both disks (individually) would have caused an automatic expansion. In other words, had this approach been done, this step may have been unnecessary in this case.
This would have nearly doubled the re-silvering time, however. The judgment call, in this case, was to shorten the re-silver time, and build a 4-way mirror to shorten completion time.
With this old version of ZFS, taking the volume offline via the export and bringing it back online via import, is a safe and reasonably short method of forcing a growth.
Ultra60/root# zpool set autoexpand=on zpool2
cannot set property for 'zpool2': invalid property 'autoexpand'
Ultra60/root# time zpool export zpool2
real 9m15.31s
user 0m0.05s
sys 0m3.94s
Ultra60/root# zpool status
no pools available
Ultra60/root# time zpool import zpool2
real 0m19.30s
user 0m0.06s
sys 0m0.33s
Ultra60/root# zpool status
pool: zpool2
state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on older software versions.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
zpool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
c8t0d0 ONLINE 0 0 0
c9t0d0 ONLINE 0 0 0
errors: No known data errors
Ultra60/root# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
zpool2 1.81T 1.34T 486G 73% ONLINE -
As noted above, the outage time of 9 minutes to a saving 40 hours of re-silvering, was determined an effective trade-off.