Monday, October 19, 2015

Solaris 11.2: Extending ZFS rpool Under Virtualized x86

Solaris 11.2: Extending ZFS "rpool" Under Virtualized x86

Abstract

Often when an OS is first installed, resources or redundancy may be required beyond what was originally in-scope on a project. Adding additional disks by adding file systems was an early solution, but the disks were always next to the original file system while pushing the effort to applications to resolve them. Virtual file systems were created to be able to add or mount additional storage anywhere in a filesystem. Volume managers were later created, to create volumes which file systems could sit on top of, with tweeks to file systems to allow expansion. In the modern world, file systems like ZFS provide all of those capabilities. In a virtualized environment, underlying disks are no longer even disks, and can be extended using shared storage, making file systems like ZFS even more important.

[Solaris Zone/Container Virtualization for Solaris 10+]

Use Cases

This document will discuss use cases where Solaris 11.2 was installed in an x86 environment on top of VMWare where a vSphere administrator will extend the virtual disks which the ZFS root file system was installed upon.

Two use specific cases to be evaluated include:
1) A simple Solaris 11.2 x86 installation with a single "rpool" Root Pool where it needs a mirror and was sized too small.
2) A more complex Solaris 11.2 x86 installation with a mirrored "rpool" Root Pool where it was sized too small.

A final Use Case is evaluated, which can be applied after either one of the previous cases:
3) Extend swap space on a ZFS "rpool" Root Pool

The terminology for ZFS is "autoexpand" for the ZFS filesystem filling the extended virtual disk file. For this article, the VMWare vSphere virtual disk extend is out of scope. It is expected that this process will work with other hypervisors.


[Solaris Logo, courtesy former Sun Microsystems]

Use Case 1: Simple OS Complexity Install Problem

Problem Background: Single Disk Lacks Redundancy and Capacity

When a simple Solaris 11.2 installation occurs, a single disk may be the original installation.
sun9999/root# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME      STATE     READ WRITE CKSUM
        rpool     ONLINE       0     0     0
          c2t1d0  ONLINE       0     0     0

errors: No known data errors

sun9999/root#

As the platform becomes more important, additional disk space (beyond the original 230GB) may be required in the root pool as well as additional redundancy (beyond the single disk.)
sun9999/root# zpool list
NAME   SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  228G   182G  46.4G  79%  1.00x  ONLINE  -

sun9999/root#

Under Solaris, these attributes can be augmented without additional software or reboots.
[Sun Microsystems Logo]

Solution: Add and Extend Virtual Disks

Solaris systems under x86 are increasingly deployed under VMWare. Virtual disks  may be the original allocation, and these disks can be added and later even extended by the hypervisor. It will take some time before Solaris 11 recognizes that a change is done against the underlying virtual disks and these disks can be extended. The disks must be carefully identified before making any changes. Only the 3 steps in purple are required.

[OCZ solid state hard disk]

Identifying the Disk Candidates

The disks can be identified with "format" command.
sun9999/root# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
       0. c2t0d0
          /pci@0,0/pci15ad,1976@10/sd@0,0
       1. c2t1d0
          /pci@0,0/pci15ad,1976@10/sd@1,0
       2. c2t2d0
          /pci@0,0/pci15ad,1976@10/sd@2,0

Specify disk (enter its number):

The 3x disks identified above are clearly virtual, but it is unclear the role of each disk.

The "zpool status" performed earlier identified Disk "1" as a root pool disk.

The older style Virtual File System Table will show other disks with older file system types. In the following case, clearly Disk "2" is a UFS filesystem, which can not be used for root.
sun9999/root# grep c2 /etc/vfstab
/dev/dsk/c2t2d0s0 /dev/rdsk/c2t2d0s0 /u000 ufs 1 yes onerror=umount
This leaves us with Disk "0", to be verified via format, which may be a good candidate for root mirroring.
Specify disk (enter its number): 0
selecting c2t0d0
[disk formatted]
Note: detected additional allowable expansion storage space that can be
added to current SMI label's computed capacity.
Select to adjust the label capacity.
...
format>
Solaris 11.2 has noted that Disk "0" can also be extended.

The "format" command will also verify the other sliced.
Specify disk (enter its number): 1
selecting c2t1d0
[disk formatted]
/dev/dsk/c2t1d0s1 is part of active ZFS pool rpool. Please see zpool(1M).

...
format> disk
...

Specify disk (enter its number)[1]: 2
selecting c2t2d0
[disk formatted]
Warning: Current Disk has mounted partitions.
/dev/dsk/c2t2d0s0 is currently mounted on /u000. Please see umount(1M).

format> quit

sun9999/root#

Clearly, no other disk is available, with the exception of Disk "0", for mirroring the root pool.

[Sun Microsystems Storage Server]
Adding Disk "0" to Root Pool "rpool"

It was already demonstrated the single "c2t1d0" device is in the "rpool" and the new disk candidate is "c2t0d0". To create a mirror, use the "attach" to add to the existing device disk a new candidate device disk and observe progress with "status" until resilvering is completed.
sun9999/root# zpool attach -f rpool c2t1d0 c2t0d0
Make sure to wait until resilver is done before rebooting.
sun9999/root# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function in a degraded state.
action: Wait for the resilver to complete.
        Run 'zpool status -v' to see device specific details.
  scan: resilver in progress since Thu Oct 15 17:19:49 2015
    184G scanned
    39.5G resilvered at 135M/s, 21.09% done, 0h18m to go
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t0d0  DEGRADED     0     0     0  (resilvering)

errors: No known data errors
sun9999/root#
The  previous resilver suggests future maintenance on the mirror with similar data may take ~20 minutes.
[Seagate External Hard Disk]

Extending Root Pool "rpool"

Verify there is a known good mirror so the root pool can be extended safely.
sun9999/root# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 184G in 0h19m with 0 errors on Thu Oct 15 17:39:34 2015
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0

errors: No known data errors


sun9999/root#

The newly added "c2t0d0" virtual disk has been automatically extended by zpool.
sun9999/root# prtvtoc -h /dev/dsk/c2t0d0
       0     24    00        256    524288    524543
       1      4    00     524544 1048035039 1048559582
       8     11    00  1048559583     16384 1048575966
sun9999/root# prtvtoc -h /dev/dsk/c2t1d0
       0     24    00        256    524288    524543
       1      4    00     524544 481803999 482328542
       8     11    00  482328543     16384 482344926
sun9999/root#
Next, enable auto expand or (extend) on rpool to resize, once the "c2t1d0" disk has been resized.
sun9999/root# zpool set autoexpand=on rpool
sun9999/root# zpool get autoexpand rpool
NAME   PROPERTY    VALUE  SOURCE
rpool  autoexpand  on     local

sun9998/root#
Detect the new disk size for the existing "c2t1d0" disk that was resized.
sun9999/root# devfsadm -Cv
...
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s14
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s15
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s8
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s9
sun9999/root#
The expansion should now take place, nearly instantaneously.

[Oracle Logo]

Verifying the Root Pool "rpool" Expansion

Note the original disk "c2t1d0" disk was extended.
sun9999/root# prtvtoc -h /dev/dsk/c2t0d0
       0     24    00        256    524288    524543
       1      4    00     524544 1048035039 1048559582
       8     11    00  1048559583     16384 1048575966

sun9999/root# prtvtoc -h /dev/dsk/c2t1d0
       0     24    00        256    524288    524543
       1      4    00     524544 1048035039 1048559582
       8     11    00  1048559583     16384 1048575966


sun9999/root#
The disk space is now extended to 500GB
sun9999/root# zpool list
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  498G   184G  314G  37%  1.00x  ONLINE  -

sun9999/root#
And it is not a bad time to scrub the new disks, it will take about 1 hour, to ensure there are no errors.

sun9999/root# zpool scrub rpool
sun9999/root# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 1h3m with 0 errors on Thu Oct 15 19:58:09 2015
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0

errors: No known data errors
sun9998/root#

The Solaris installation on the ZFS Root Pool "rpool" is healthy.

[Oracle Servers]

Use Case 2: Medium Complexity OS Installation

Problem:  Mirrored Disks Lacks Capacity

The previous section was extremely detailed, this section will be more brief. Like the previous section, there is a lack of capacity in the root pool. Unlike the previous section, this pool is already mirrored.

Solution: Extend Mirrored Root Pool "rpool"

 The following use case is merely to extend the Solaris 11 Root Pool "rpool" after the VMWare Administrator had already increased the size of the root virtual disks. Note, only the two steps in purple are required.

Extend Root Pool "rpool"

The following steps take only seconds to run.

sun9998/root# zpool list
NAME   SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  228G   179G  48.9G  78%  1.00x  ONLINE  -


sun9998/root# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 99.1G in 0h11m with 0 errors on Tue Apr  7 15:48:39 2015
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0

errors: No known data errors


sun9998/root# echo | format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c2t0d0
          /pci@0,0/pci15ad,1976@10/sd@0,0
       1. c2t2d0
          /pci@0,0/pci15ad,1976@10/sd@2,0
       2. c2t3d0
          /pci@0,0/pci15ad,1976@10/sd@3,0
Specify disk (enter its number): Specify disk (enter its number):

sun9998/root# zpool set autoexpand=on rpool
sun9998/root# zpool get autoexpand rpool
NAME   PROPERTY    VALUE  SOURCE
rpool  autoexpand  on     local


sun9998/root# devfsadm -Cv
devfsadm[7155]: verbose: removing file: /dev/dsk/c2t0d0s10
devfsadm[7155]: verbose: removing file: /dev/dsk/c2t0d0s11
...

devfsadm[7155]: verbose: removing file: /dev/rdsk/c2t3d0s8
devfsadm[7155]: verbose: removing file: /dev/rdsk/c2t3d0s9

sun9998/root# zpool list
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  498G   179G  319G  35%  1.00x  ONLINE  -


sun9998/root#

And, the effort is done, as fast as you can type the commands.

[Sun Microsystems Flash Module]

Verify Root Pool "rpool"

 The following verification is for the paranoid, the scrub will be kicked off in the background, performance will be monitored for about 20 seconds on 2 second polls, and the verification may take about 1-5 hours (depending on how busy the system or I/O subsystem is.)

sun9998/root# zpool scrub rpool

sun9998/root# zpool iostat rpool 2 10
          capacity     operations    bandwidth
pool   alloc   free   read  write   read  write
-----  -----  -----  -----  -----  -----  -----
rpool   179G   319G     11    111  1.13M  2.55M
rpool   179G   319G    121      5  5.58M  38.0K
rpool   179G   319G    103    189  6.15M  2.53M
rpool   179G   319G    161      8  4.60M   118K
rpool   179G   319G     82      3  10.3M  16.0K
rpool   179G   319G    199    113  6.38M  1.56M
rpool   179G   319G     31      5  1.57M  38.0K
rpool   179G   319G    117      3  9.64M  18.0K
rpool   179G   319G     30     96  2.28M  1.74M
rpool   179G   319G     24      4  3.12M  36.0K

sun9998/root# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 4h32m with 0 errors on Fri Oct 16 00:42:28 2015
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c2t0d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0

errors: No known data errors
sun9998/root#
Solaris installation and ZFS Root Pool "rpool" is healthy.

Use Case 3: AddSwap in a ZFS "rpool" Root Pool

Problem: Swap Space Lacking

After more disk space is added to the ZFS "rpool" Rooi Pool, it may be desired to extend the swap space. This must be done in another operation, after the "rpool" is already extended.

Solution: Add Swap to ZFS and the Virtual File System Table

The user community determines they need to increase swap from 12 GB to 20 GB, but they can not afford reboot. There are 2 steps required:
1) add swap space
2) make swap space permanent
First, existing swap space must be understood.

Review Swap Space

Swap space can be reviewed for reservation, activation, and persistence with "swap", "zfs", and "grep".
sun9999/root# zfs list rpool/swap
NAME         USED  AVAIL  REFER  MOUNTPOINT
rpool/swap  12.4G   306G  12.0G  -


sun9999/root# swap -l -h
swapfile                 dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 279,1     4K      12G      12G


sun9999/root# grep swap /etc/vfstab
swap                      -  /tmp    tmpfs  - yes     -
/dev/zvol/dsk/rpool/swap  -  -       swap   - no      -


sun9999/root# 
Note, the "zfs list" above will only work with a single swap dataset. When adding a second swap dataset, a different methodology must be used.

Swap Space Dataset Creation

To add swap space to the existing root pool, without a reboot, requires adding another dataset. To increase from 12 GB to 20 GB, the additional dataset should be 8 GB. This takes a split second.
sun9999/root# zfs create -V 8G rpool/swap2
sun9999/root# 
Swap dataset is now ready to be manually activated.

Swap Space Activation


The swap space is activated using the "swap" command. This takes a split second.
sun9999/root# swap -a /dev/zvol/dsk/rpool/swap2

sun9999/root# swap -l -h
swapfile                    dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap  279,1        4K      12G      12G
/dev/zvol/dsk/rpool/swap2 279,3        4K     8.0G     8.0G

sun9999/root#
This swap space is only temporary, until the next reboot.

Swap Space Persistence

To make the swap space persistent, after a reboot, it must be added to the Virtual File System Table
sun9999/root# cp -p /etc/vfstab /etc/vfstab.2015_10_16_dh
sun9999/root# vi /etc/vfstab

(add the following line)
/dev/zvol/dsk/rpool/swap2  -  -       swap   - no      -
sun9999/root#
 The added swap space will now be activated automatically, upon the next reboot.

Swap Space Validation

Commands to verify: zfs swap datasets, active swap datasets, and persistent datasets
sun9999/root# zfs list | grep swap
rpool/swap                         12.4G   298G  12.0G  -
rpool/swap2                        8.25G   297G  8.00G  -


sun9999/root# swap -l -h
swapfile                    dev    swaplo   blocks     free 
/dev/zvol/dsk/rpool/swap  279,1        4K      12G      12G
/dev/zvol/dsk/rpool/swap2 279,3        4K     8.0G     8.0G


sun9999/root# grep swap /etc/vfstab
swap                       -   /tmp  tmpfs  -  yes     -
/dev/zvol/dsk/rpool/swap   -   -     swap   -  no      -
/dev/zvol/dsk/rpool/swap2  -   -     swap   -  no      -


sun9999/root#
Note, the zfs list command now uses a "grep", to capture multiple datasets.
A total of [12G + 8G =] 20GB is now available in swap.

Conclusions

Most of the above document is fluff, filled with paranoia, checking import items to ensure no data loss multiple times. Very few commands are required to perform the aspects of mirroring and root pool extension, Solaris provides a seemless methodology at the OS level to perform activities which are often painful under other operating systems or require additional 3rd party software to perform.

No comments:

Post a Comment