Friday, August 31, 2012

Installing Solaris on Former Non-Solaris Disks


Abstract:
When installing [SPARC] Solaris on a non-Solaris disk drive, after completing system identification information, the installer may terminate with the error "One of the following problems exists: Hardware failure Unformatted disk" error. This is due to the lack of proper disk labeling. The user can exit the installer and perform the labeling exercize before restarting the installation.

What's in a Label?
The label is the description to the operating system regarding what is on the media. The media has a table filled with slices, some legacy systems support up to 16 slices, while Solaris supports 8 slices. There are two types of supported labels: SMI and EFI. SMI is used for UFS filesystem while EFI is used for ZFS filesystem.

Slices on a label may be overlapping, where slice 2 holds the entire disk (encapsulating all slices), slice 0 holds the root filesystem (and boot code), slice 1 normally holds the swap slice (to augment physical memory by acting as virtual memory), and other slices can be used for other filesystems such as /var (so a growing log file does not take down a system) or /export/home (so a user's home directory does not crash a system by having data which grows out of control.)
Labeling a Disk:
After a failed installation, the user will drop out to a root "#" prompt. The disks can now be labeled through the "format" command and system rebooted to the cdrom install media. Choose the SMI label, if prompted - this has been tested up to Solaris 10 Update 10.


# format -e
format&gt disk
(choose disk)
format&gt label
[0] SMI Label
[1] EFI Label
Specify Label type[1]: 0
format&gt quit
# cd / ; init 0

ok boot cdrom

Conclusion:
Non-Sun and non-Oracle disks can be used on older equipment to provide a storage or performance boost when installing Solaris 10. This procedure was used with Solaris 10 Update 10.

Wednesday, August 29, 2012

ZFS: A Multi-Year Case Study in Moving From Desktop Mirroring (Part 4)

Abstract:
ZFS was created by Sun Microsystems to innovate the storage subsystem of computing systems by simultaneously expanding capacity & security exponentially while collapsing the formerly striated layers of storage (i.e. volume managers, file systems, RAID, etc.) into a single layer in order to deliver capabilities that would normally be very complex to achieve. One such innovation introduced in ZFS was the ability to dynamically add additional disks to an existing filesystem pool, remove the old disks, and dynamically expand the pool for filesystem usage. This paper discusses the upgrade of high capacity yet low cost mirrored external media under ZFS.

Case Study:
A particular Media Design House had formerly used multiple external mirrored storage on desktops as well as racks of archived optical media in order to meet their storage requirements. A pair of (formerly high-end) 400 Gigabyte Firewire drives lost a drive. An additional pair of (formerly high-end) 500 Gigabyte Firewire drives experienced a drive loss within one month later. A media wall of CD's and DVD's was getting cumbersome to retain.

First Upgrade - Migration to Solaris:
A newer version of Solaris 10 was released, which included more recent features. The Media House was pleased to accept Update 8, with the possibility of supporting Level 2 ARC for increased read performance and Intent Logging for increase write performance. A 64 bit PCI card supporting gigabit ethernet was used on the desktop SPARC platform, serving mirrored 1.5 Terabyte "green" disks over "green" gigabit ethernet switches. The Media House determined this configuration performed adequately.


ZIL Performance Testing:
Testing was performed to determine what the benefit was to leveraging a new feature in ZFS called the ZFS Intent Log or ZIL. Testing was done across consumer grade USB SSD's in different configurations. It was determined that any flash could be utilized in the ZIL to gain a performance increase, but an enterprise grade SSD provided the best performance increase, of about 20% with commonly used throughput loads of large file writes going to the mirror. It was determined at that point to hold off on the use of the SSD's, since the performance was adequate enough.

Second Upgrade - Drives Replaced:
One of the USB drives experienced some odd behavior from the time it was purchased, but it was decided the drives behaved well enough under ZFS mirroring. Eventually, the drive started to perform poorly and were logging occasional errors. When the drives were nearly out of capacity, they were upgraded from 1.5 TB mirror to a 2 TB mirror.

Third Upgrade - SPARC Upgraded:
The Ultra60 desktop was being moved to a new location in the media house, a PM (preventative maintenance) was conducted (to remove dust), but the Ultra 60 did not boot in the new location. It was time to move the storage to a newer server.

The old Ultra60 was a nice unit, with 2 Gig of RAM and a dual 450MHz UltraSPARC II CPU's, but did not offer some of the features that modern servers offered. An updated V240 platform was chosen: Dual 1.5GHz UltraSPARC IIIi, 4 Gig of RAM, redundant power supplies, and an upgraded UPS.

Listing the Drives:

After booting the new system, attaching the USB drives, a general "disks" command was run, to force a discovery of the drives. Whether this is needed or not, is not necessarily important, but it is a step seasoned system administrators do.

The listing of the drives is simple to do through
V240/root$ ls -la /dev/rdsk/c*0
lrwxrwxrwx 1 root root 46 Jan  2  2010 /dev/rdsk/c0t0d0s0 -> ../../devices/pci@1e,600000/ide@d/sd@0,0:a,raw
lrwxrwxrwx 1 root root 47 Jan  2  2010 /dev/rdsk/c1t0d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@0,0:a,raw
lrwxrwxrwx 1 root root 47 Jan  2  2010 /dev/rdsk/c1t1d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@1,0:a,raw
lrwxrwxrwx 1 root root 47 Mar 25  2010 /dev/rdsk/c1t2d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@2,0:a,raw
lrwxrwxrwx 1 root root 47 Sep  4  2010 /dev/rdsk/c1t3d0s0 -> ../../devices/pci@1c,600000/scsi@2/sd@3,0:a,raw
lrwxrwxrwx 1 root root 59 Aug 14 21:20 /dev/rdsk/c3t0d0 -> ../../devices/pci@1e,600000/usb@a/storage@2/disk@0,0:wd,raw
lrwxrwxrwx 1 root root 58 Aug 14 21:20 /dev/rdsk/c3t0d0s0 -> ../../devices/pci@1e,600000/usb@a/storage@2/disk@0,0:a,raw
lrwxrwxrwx 1 root root 59 Aug 14 21:20 /dev/rdsk/c4t0d0 -> ../../devices/pci@1e,600000/usb@a/storage@1/disk@0,0:wd,raw
lrwxrwxrwx 1 root root 58 Aug 14 21:20 /dev/rdsk/c4t0d0s0 -> ../../devices/pci@1e,600000/usb@a/storage@1/disk@0,0:a,raw

The USB storage was recognized. ZFS may not recognize the drives, when plugged into different USB ports on the new machine. ZFS will see the drives through the "zpool import" command.
V240/root$ zpool status
no pools available
V240/root$ zpool list
no pools available
V240/root$ zpool import
  pool: zpool2
    id: 10599167846544478303
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        zpool2      ONLINE
          mirror    ONLINE
            c3t0d0  ONLINE
            c4t0d0  ONLINE

Importing Drives on New Platform:
Since the drives were taken from another platform, ZFS tried to warn the administrator, but the admin is all to well aware that the old Ultra60 is dysfunctional and the importing the drive mirror is exactly what is desired to be done.
V240/root$ time zpool import zpool2
cannot import 'zpool2': pool may be in use from other system, it was last accessed by Ultra60 (hostid: 0x80c6e89a) on Mon Aug 13 20:10:14 2012
use '-f' to import anyway

real    0m6.48s
user    0m0.01s
sys     0m0.05s

The drives are ready for import, use the force flag, and the storage is available.
V240/root$ time zpool import -f zpool2

real    0m23.64s
user    0m0.02s
sys     0m0.08s

The pool was imported quickly.
240/root$ zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0

errors: No known data errors
V240/root$ zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.81T  1.34T   480G    74%  ONLINE  -
The storage movement went very well to the existing SPARC server.

Conclusions:
ZFS for this ongoing engagement has proved very reliable. The ability to reduce rebuild time from days to seconds, upgrade underlying OS releases, retain compatibility with older file system releases, increase write throughput by adding consumer or commercial grade flash storage, recover from drive failures, and recover from chassis failure demonstrates the robustness of ZFS as the basis for a storage system.

Tuesday, August 28, 2012

VMWare Resolves Some Issues

VMWare 5.1 Resolves Some Issues

Abstract:
With the advent of simple and cost effective virtualization under Solaris 10, Zones, LDoms, and Virtual Box - pressure has been placed upon dominate virtualization vendors to create less expensive alternatives. VMWare, after being purchased by EMC, had decided to move in the opposite direction, making purchasing of VMWare very difficult, with odd pricing constraints in ESXi 5.0 in July 2011. The market has moved to 2012 and ESXi 5.1 has been released, fixing some of VMWare's problems.

Compatibility Issue Resolved:
If customers wanted to move an older VM to newer hardware, the VM's needed to be upgraded. In other words, there was compatibility issues which needed to be resolved. VM's created under ESX Server 3.5 and later will now run under ESXi 5.1 unchanged. This is good news for service providers.

No Longer Windows Bound:
Customers who had VMWare ESXi were required to use a lousy Microsoft Windows platform to manage the VMWare platform. When managing an ESXi server in a DMZ, this makes little sense for a service provider. This has now been resolved, with a web interface.

Memory Tax Issue Resolved:
The pricing constraints of ESXi 5.0 forced service providers to have to decide - is VMWare the correct hypervisor for the job... is Windows and/or Linux worth the aggravation of being nickel and dimed to death? When trying to determine hardware and hypervisor pricing for a new cluster where one does not know exactly how much memory will be required per instance because infrastructure is being purchased by a managed services provider before the first customer deal is sold, how does one know how much to buy?

Clearly, EMC's VMWare did not have a clue. The confusion that the pricing placed upon managed service providers negatively impacted purchasing of other EMC software products such as ITOI (aka Ionix, aka SMARTS) and RSA Archer, enVision, etc. If a managed service provider can not determine what to buy, they will not buy from that vendor. Solaris is clearly the better choice for Network Management, and other vendors are clearly the better choice for tools bound to VMWare & Windows.

The removing of the memory constraints for ESXi 5.1 was a good move, to simplify pricing. EMC Software is now in a better position to compete against other virtualized platforms.

Outstanding Core Issues:
For reasonable flexibility in the data center environment, when there is a spike in usage, there needs to be a way to easily migrate heavy usage live instances to lower utilized hypervisors. Dynamic migration with autobalancing is included with Oracle LDom's, but not quite there yet with VMWare.

When dealing with network virtualization, if one is trying to emulate a WAN environment, one could spin up dozens of zones under a Solaris 11 platform, and apply the WAN characteristics to the virtual network (latency, throughput, etc.) Technology like Solaris Crossbow is missing from VMWare.
Conclusions:
VMWare is a great benefit to the Windows and Linux world, but constraints by the vendor made purchasing difficult and implementation less desirable. Some of the issues have been resolved, but management is not yet what it needs to be for managed service providers.

Tuesday, August 14, 2012

ZFS: A Multi-Year Case Study in Moving From Desktop Mirroring (Part 3)

Abstract:
ZFS was created by Sun Microsystems to innovate the storage subsystem of computing systems by simultaneously expanding capacity & security exponentially while collapsing the formerly striated layers of storage (i.e. volume managers, file systems, RAID, etc.) into a single layer in order to deliver capabilities that would normally be very complex to achieve. One such innovation introduced in ZFS was the ability to dynamically add additional disks to an existing filesystem pool, remove the old disks, and dynamically expand the pool for filesystem usage. This paper discusses the upgrade of high capacity yet low cost mirrored external media under ZFS.

Case Study:
A particular Media Design House had formerly used multiple external mirrored storage on desktops as well as racks of archived optical media in order to meet their storage requirements. A pair of (formerly high-end) 400 Gigabyte Firewire drives lost a drive. An additional pair of (formerly high-end) 500 Gigabyte Firewire drives experienced a drive loss within one month later. A media wall of CD's and DVD's was getting cumbersome to retain.

First Upgrade:
A newer version of Solaris 10 was released, which included more recent features. The Media House was pleased to accept Update 8, with the possibility of supporting Level 2 ARC for increased read performance and Intent Logging for increase write performance. A 64 bit PCI card supporting gigabit ethernet was used on the desktop SPARC platform, serving mirrored 1.5 Terabyte "green" disks over "green" gigabit ethernet switches. The Media House determined this configuration performed adequately.

ZIL Performance Testing:
Testing was performed to determine what the benefit was to leveraging a new feature in ZFS called the ZFS Intent Log or ZIL. Testing was done across consumer grade USB SSD's in different configurations. It was determined that any flash could be utilized in the ZIL to gain a performance increase, but an enterprise grade SSD provided the best performance increase, of about 20% with commonly used throughput loads of large file writes going to the mirror. It was determined at that point to hold off on the use of the SSD's, since the performance was adequate enough.

External USB Drive Difficulties:
The original Seagate 1.5 TB drives were working well, in the mirrored pair. One drive was "flaky" (often reported errors, a lot of "clicking". The errors were reported in the "/var/adm/messages" log.

# more /var/adm/messages
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/usb@4,2/storage@1/disk@0,0 (sd17):
Jul 15 13:16:13 Ultra60         Error for Command: write(10)  Error Level: Retryable
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Requested Block: 973089160   Error Block: 973089160
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Vendor: Seagate  Serial Number:            
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Sense Key: Not Ready
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   ASC: 0x4 (LUN initializing command required), ASCQ: 0x2, FRU: 0x0
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/usb@4,2/storage@1/disk@0,0 (sd17):
Jul 15 13:16:13 Ultra60         Error for Command: write(10)  Error Level: Retryable
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Requested Block: 2885764654  Error Block: 2885764654
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Vendor: Seagate  Serial Number:            
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   Sense Key: Not Ready
Jul 15 13:16:13 Ultra60 scsi: [ID 107833 kern.notice]   ASC: 0x4 (LUN initializing command required), ASCQ: 0x2, FRU: 0x0


It was clear that one drive was unreliable, but in a ZFS pair, the unreliable drive was not a significant liability.

Mirrored Capacity Constraints:
Eventually, the 1.5 TB pair was out of capacity.
# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.36T  1.33T  25.5G    98%  ONLINE  -
Point of Decision:
It was time to perform the drive upgrade. 2 TB drives were previously purchased and ready to be concatenated to the original set. Instead of concatenating the 2 TB drives to the 1.5 TB drives, as originally planned, a straight swap would be done, to eliminate the "flaky" drive int he 1.5 TB pair. The 1.5 TB pair could be used for other uses, which were less critical.

Target Drives to Swap:
The target drives to swap were both external USB. The zpool command provides the device names.
$ zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The
       
pool can still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0

errors: No known data errors
The former OS upgrade can be noted, where the pool was not upgraded, since the new features were not yet required to be leveraged. The old ZFS version is just fine, for this engagement, since the newer features are not required, and offers the ability to swap the drives to another SPARC in their office, without having to worry about being on a newer version of Solaris 10.

Scrubbing Production Dataset:
The production data set should be scrubbed, to validate no silent data corruption was introduced to the set over the years through the "flaky" drive.
Ultra60/root# zpool scrub zpool2

It will take some time, for the system to complete the operation, but the business can continue to function, as the system performs the bit by bit checksum check and repair across the 1.5TB of media.
Ultra60/root# zpool status zpool2
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The
       
pool can still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: scrub completed after 39h33m with 0 errors on Wed Jul 18 00:27:19 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0

errors: No known data errors
There is a time estimate on the scrub time, provided to allow the consumer to have an estimate of when the operation will be complete. Once the scrub is over, the 'zpool status' command above demonstrates the time absorbed by the scrub command.

Adding New Drives:
The new drives will be placed, in a 4 way mirror. Additional 2TB disks of media will be added to the existing 1.5TB mirrored set,  .
Ultra60/root# time zpool attach zpool2 c5t0d0s0 c8t0d0
real    0m21.39s
user    0m0.73s
sys     0m0.55s

Ultra60/root# time zpool attach zpool2 c8t0d0 c9t0d0

real    1m27.88s
user    0m0.77s
sys     0m0.59s
Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h1m, 0.00% done, 1043h38m to go
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0
            c8t0d0    ONLINE       0     0     0  42.1M resilvered
            c9t0d0    ONLINE       0     0     0  42.2M resilvered

errors: No known data errors
The second drive took more time to add, since the first drive was in the process of resilvering. After waiting awhile, the estimates get better. Adding additional pair to the existing pair, to make a 4 way mirror completed in not muchlonger than it took to mirror a single drive - partially because each drive is on a dedicated USB port and the drives are split between 2 PCI buses.
Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 45h32m with 0 errors on Sun Aug  5 01:36:57 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zpool2        ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c4t0d0s0  ONLINE       0     0     0
            c5t0d0s0  ONLINE       0     0     0
            c8t0d0    ONLINE       0     0     0  1.34T resilvered
            c9t0d0    ONLINE       0     0     0  1.34T resilvered

errors: No known data errors

Detaching Old Small Drives

Thew 4-way mioor is very for redundancy, but the purpose of this activity was to move the data from 2 smaller drives (where one drive was less reliable) to two newer drives, which should both be more reliable. The old disks now need to be detached.
Ultra60/root# time zpool detach zpool2 c4t0d0s0

real    0m1.43s
user    0m0.03s
sys     0m0.06s

Ultra60/root# time zpool detach zpool2 c5t0d0s0

real    0m1.36s
user    0m0.02s
sys     0m0.04s

As one can see, the activity to remove the mirrored drives from the 4-way mirror is very fast. The integrity of the pool can be validated through the zpool status command.

Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: resilver completed after 45h32m with 0 errors on Sun Aug  5 01:36:57 2012
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0  1.34T resilvered
            c9t0d0  ONLINE       0     0     0  1.34T resilvered

errors: No known data errors

Expanding the Pool

The pool is still the same size as the former drives. Under the older versions of ZFS, the pool would automatically extend. Under newer versions, the extension needs to be a manual process. (This is partially because there is no way to shrink a pool due to a provisioning error, so zfs developers make the administrastor make this mistake on purposes now!)

Using Auto Expand Property

One option is to use the autoexpand option.
Ultra60/root# zpool set autoexpand=on zpool2

This feature may not be available, depending on the version of ZFS.  If it is not available, you may get the following error:

cannot set property for 'zpool2': invalid property 'autoexpand'

If you fall into this category, other options exist.

Using Online Expand Option

Another option is to use the online expand option
Ultra60/root# zpool online -e zpool2 c8t0d0 c9t0d0

If this option is not available under the version of ZFS being used, the following error may occur:
invalid option 'e'
usage:
        online ...
Once again, if you fall into this category, other options exist.

Using Export / Import Option

When using an older version of ZFS, the zpool replace option on both disks (individually) would have caused an automatic expansion. In other words, had this approach been done, this step may have been unnecessary in this case.

This would have nearly doubled the re-silvering time, however. The judgment call, in this case, was to shorten the re-silver time, and build a 4-way mirror to shorten completion time.

With this old version of ZFS, taking the volume offline via the export and bringing it back online via import, is a safe and reasonably short method of forcing a growth.

Ultra60/root# zpool set autoexpand=on zpool2
cannot set property for 'zpool2': invalid property 'autoexpand'

Ultra60/root# time zpool export zpool2

real    9m15.31s
user    0m0.05s
sys     0m3.94s

Ultra60/root# zpool status
no pools available

Ultra60/root# time zpool import zpool2

real    0m19.30s
user    0m0.06s
sys     0m0.33s

Ultra60/root# zpool status
  pool: zpool2
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zpool2      ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c8t0d0  ONLINE       0     0     0
            c9t0d0  ONLINE       0     0     0

errors: No known data errors

Ultra60/root# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zpool2  1.81T  1.34T   486G    73%  ONLINE  -
As noted above, the outage time of 9 minutes to a saving 40 hours of re-silvering, was determined an effective trade-off.