Solaris 11.2: Extending ZFS "rpool" Under Virtualized x86
Abstract
Often when an OS is first installed, resources or redundancy may be required beyond what was originally in-scope on a project. Adding additional disks by adding file systems was an early solution, but the disks were always next to the original file system while pushing the effort to applications to resolve them. Virtual file systems were created to be able to add or mount additional storage anywhere in a filesystem. Volume managers were later created, to create volumes which file systems could sit on top of, with tweeks to file systems to allow expansion. In the modern world, file systems like ZFS provide all of those capabilities. In a virtualized environment, underlying disks are no longer even disks, and can be extended using shared storage, making file systems like ZFS even more important.
|
[Solaris Zone/Container Virtualization for Solaris 10+] |
Use Cases
This document will discuss use cases where Solaris 11.2 was installed in an x86 environment on top of VMWare where a vSphere administrator will extend the virtual disks which the ZFS root file system was installed upon.
Two use specific cases to be evaluated include:
1) A simple Solaris 11.2 x86 installation with a single "rpool" Root Pool where it needs a mirror and was sized too small.
2) A more complex Solaris 11.2 x86 installation with a mirrored "rpool" Root Pool where it was sized too small.
A final Use Case is evaluated, which can be applied after either one of the previous cases:
3) Extend swap space on a ZFS "rpool" Root Pool
The terminology for ZFS is "autoexpand" for the ZFS filesystem filling the extended virtual disk file. For this article, the VMWare vSphere virtual disk extend is out of scope. It is expected that this process will work with other hypervisors.
|
[Solaris Logo, courtesy former Sun Microsystems] |
Use Case 1: Simple OS Complexity Install Problem
Problem Background: Single Disk Lacks Redundancy and Capacity
When a simple Solaris 11.2 installation occurs, a
single disk may be the original installation.
sun9999/root# zpool status
pool: rpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
errors: No known data errors
sun9999/root#
As the platform becomes more important, additional disk space (beyond the original
230GB) may be required in the root pool as well as additional redundancy (beyond the
single disk.)
sun9999/root# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 228G 182G 46.4G 79% 1.00x ONLINE -
sun9999/root#
Under Solaris, these attributes can be augmented without additional software or reboots.
|
[Sun Microsystems Logo] |
Solution: Add and Extend Virtual Disks
Solaris systems under x86 are increasingly deployed under VMWare. Virtual disks may be the original allocation, and these disks can be added and later even extended by the hypervisor. It will take some time before Solaris 11 recognizes that a change is done against the underlying virtual disks and these disks can be extended. The disks must be carefully identified before making any changes. Only the
3 steps in purple are required.
|
[OCZ solid state hard disk] |
Identifying the Disk Candidates
The disks can be identified with "format" command.
sun9999/root# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t1d0
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t2d0
/pci@0,0/pci15ad,1976@10/sd@2,0
Specify disk (enter its number):
The 3x disks identified above are clearly virtual, but it is unclear the role of each disk.
The "zpool status" performed earlier identified Disk "
1" as a root pool disk.
The older style Virtual File System Table will show other disks with older file system types. In the following case, clearly Disk "
2" is a UFS filesystem, which can not be used for root.
sun9999/root# grep c2 /etc/vfstab
/dev/dsk/c2t2d0s0 /dev/rdsk/c2t2d0s0 /u000 ufs 1 yes onerror=umount
This leaves us with Disk "
0", to be verified via format, which may be a good candidate for root mirroring.
Specify disk (enter its number): 0
selecting c2t0d0
[disk formatted]
Note: detected additional allowable expansion storage space that can be
added to current SMI label's computed capacity.
Select to adjust the label capacity.
...
format>
Solaris 11.2 has noted that Disk "
0" can also be extended.
The "format" command will also verify the other sliced.
Specify disk (enter its number): 1
selecting c2t1d0
[disk formatted]
/dev/dsk/c2t1d0s1 is part of active ZFS pool rpool. Please see zpool(1M).
...
format> disk
...
Specify disk (enter its number)[1]: 2
selecting c2t2d0
[disk formatted]
Warning: Current Disk has mounted partitions.
/dev/dsk/c2t2d0s0 is currently mounted on /u000. Please see umount(1M).
format> quit
sun9999/root#
Clearly, no other disk is available, with the exception of Disk "
0", for mirroring the root pool.
|
[Sun Microsystems Storage Server] |
Adding Disk "0" to Root Pool "rpool"
It was already demonstrated the single "c2t1d0" device is in the "rpool" and the new disk candidate is "c2t0d0". To create a mirror, use the "attach" to add to the existing device disk a new candidate device disk and observe progress with "status" until resilvering is completed.
sun9999/root# zpool attach -f rpool c2t1d0 c2t0d0
Make sure to wait until resilver is done before rebooting.
sun9999/root# zpool status
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function in a degraded state.
action: Wait for the resilver to complete.
Run 'zpool status -v' to see device specific details.
scan: resilver in progress since Thu Oct 15 17:19:49 2015
184G scanned
39.5G resilvered at 135M/s, 21.09% done, 0h18m to go
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t1d0 ONLINE 0 0 0
c2t0d0 DEGRADED 0 0 0 (resilvering)
errors: No known data errors
sun9999/root#
The previous resilver suggests future maintenance on the mirror with similar data may take ~20 minutes.
|
[Seagate External Hard Disk] |
Extending Root Pool "rpool"
Verify there is a known
good mirror so the root pool can be extended safely.
sun9999/root# zpool status
pool: rpool
state: ONLINE
scan: resilvered 184G in 0h19m with 0 errors on Thu Oct 15 17:39:34 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
errors: No known data errors
sun9999/root#
The newly added "
c2t0d0" virtual disk has been automatically extended by zpool.
sun9999/root# prtvtoc -h /dev/dsk/c2t0d0
0 24 00 256 524288 524543
1 4 00 524544 1048035039 1048559582
8 11 00 1048559583 16384 1048575966
sun9999/root# prtvtoc -h /dev/dsk/c2t1d0
0 24 00 256 524288 524543
1 4 00 524544 481803999 482328542
8 11 00 482328543 16384 482344926
sun9999/root#
Next,
enable auto expand or (extend) on rpool to resize, once the "c2t1d0" disk has been resized.
sun9999/root# zpool set autoexpand=on rpool
sun9999/root# zpool get autoexpand rpool
NAME PROPERTY VALUE SOURCE
rpool autoexpand on local
sun9998/root#
Detect the new disk size for the existing "c2t1d0" disk that was resized.
sun9999/root# devfsadm -Cv
...
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s14
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s15
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s8
devfsadm[13903]: verbose: removing file: /dev/rdsk/c2t1d0s9
sun9999/root#
The expansion should now take place, nearly instantaneously.
|
[Oracle Logo] |
Verifying the Root Pool "rpool" Expansion
Note the original disk "
c2t1d0" disk was extended.
sun9999/root# prtvtoc -h /dev/dsk/c2t0d0
0 24 00 256 524288 524543
1 4 00 524544 1048035039 1048559582
8 11 00 1048559583 16384 1048575966
sun9999/root# prtvtoc -h /dev/dsk/c2t1d0
0 24 00 256 524288 524543
1 4 00 524544 1048035039 1048559582
8 11 00 1048559583 16384 1048575966
sun9999/root#
The disk space is now extended to
500GB
sun9999/root# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 498G 184G 314G 37% 1.00x ONLINE -
sun9999/root#
And it is not a bad time to scrub the new disks, it will take about 1 hour, to ensure there are no errors.
sun9999/root# zpool scrub rpool
sun9999/root# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 1h3m with 0 errors on Thu Oct 15 19:58:09 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
errors: No known data errorssun9998/root#
The Solaris installation on the ZFS Root Pool "rpool" is healthy.
|
[Oracle Servers] |
Use Case 2: Medium Complexity OS Installation
Problem: Mirrored Disks Lacks Capacity
The previous section was extremely detailed, this section will be more brief. Like the previous section, there is a lack of capacity in the root pool. Unlike the previous section, this pool is already mirrored.
Solution: Extend Mirrored Root Pool "rpool"
The following use case is merely to extend the Solaris 11 Root Pool "rpool" after the VMWare Administrator had already increased the size of the root virtual disks. Note,
only the two steps in purple are required.
Extend Root Pool "rpool"
The following steps take only seconds to run.
sun9998/root# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 228G 179G 48.9G 78% 1.00x ONLINE -
sun9998/root# zpool status
pool: rpool
state: ONLINE
scan: resilvered 99.1G in 0h11m with 0 errors on Tue Apr 7 15:48:39 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
errors: No known data errors
sun9998/root# echo | format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t2d0
/pci@0,0/pci15ad,1976@10/sd@2,0
2. c2t3d0
/pci@0,0/pci15ad,1976@10/sd@3,0
Specify disk (enter its number): Specify disk (enter its number):
sun9998/root# zpool set autoexpand=on rpool
sun9998/root# zpool get autoexpand rpool
NAME PROPERTY VALUE SOURCE
rpool autoexpand on local
sun9998/root# devfsadm -Cv
devfsadm[7155]: verbose: removing file: /dev/dsk/c2t0d0s10
devfsadm[7155]: verbose: removing file: /dev/dsk/c2t0d0s11
...
devfsadm[7155]: verbose: removing file: /dev/rdsk/c2t3d0s8
devfsadm[7155]: verbose: removing file: /dev/rdsk/c2t3d0s9
sun9998/root# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 498G 179G 319G 35% 1.00x ONLINE -
sun9998/root#
And, the effort is done, as fast as you can type the commands.
|
[Sun Microsystems Flash Module] |
Verify Root Pool "rpool"
The following verification is for the paranoid, the scrub will be kicked off in the background, performance will be monitored for about 20 seconds on 2 second polls, and the verification may take about 1-5 hours (depending on how busy the system or I/O subsystem is.)
sun9998/root# zpool scrub rpool
sun9998/root# zpool iostat rpool 2 10
capacity operations bandwidth
pool alloc free read write read write
----- ----- ----- ----- ----- ----- -----
rpool 179G 319G 11 111 1.13M 2.55M
rpool 179G 319G 121 5 5.58M 38.0K
rpool 179G 319G 103 189 6.15M 2.53M
rpool 179G 319G 161 8 4.60M 118K
rpool 179G 319G 82 3 10.3M 16.0K
rpool 179G 319G 199 113 6.38M 1.56M
rpool 179G 319G 31 5 1.57M 38.0K
rpool 179G 319G 117 3 9.64M 18.0K
rpool 179G 319G 30 96 2.28M 1.74M
rpool 179G 319G 24 4 3.12M 36.0K
sun9998/root# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 4h32m with 0 errors on Fri Oct 16 00:42:28 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
errors: No known data errors
sun9998/root#
Solaris installation and ZFS Root Pool "rpool" is healthy.
Use Case 3: AddSwap in a ZFS "rpool" Root Pool
Problem: Swap Space Lacking
After more disk space is added to the ZFS "rpool" Rooi Pool, it may be desired to extend the swap space. This must be done in another operation, after the "rpool" is already extended.
Solution: Add Swap to ZFS and the Virtual File System Table
The user community determines they need to increase swap from 12 GB to 20 GB, but they can not afford reboot. There are 2 steps required:
1) add swap space
2) make swap space permanent
First, existing swap space must be understood.
Review Swap Space
Swap space can be reviewed for reservation, activation, and persistence with "swap", "zfs", and "grep".
sun9999/root# zfs list rpool/swap
NAME USED AVAIL REFER MOUNTPOINT
rpool/swap 12.4G 306G 12.0G -
sun9999/root# swap -l -h
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 279,1 4K 12G 12G
sun9999/root# grep swap /etc/vfstab
swap - /tmp tmpfs - yes -
/dev/zvol/dsk/rpool/swap - - swap - no -
sun9999/root#
Note, the "zfs list" above will only work with a single swap dataset. When adding a second swap dataset, a different methodology must be used.
Swap Space Dataset Creation
To add swap space to the existing root pool, without a reboot, requires adding another dataset. To increase from 12 GB to 20 GB, the additional dataset should be 8 GB. This takes a split second.
sun9999/root# zfs create -V 8G rpool/swap2
sun9999/root#
Swap dataset is now ready to be manually activated.
Swap Space Activation
The swap space is activated using the "swap" command. This takes a split second.
sun9999/root# swap -a /dev/zvol/dsk/rpool/swap2
sun9999/root# swap -l -h
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 279,1 4K 12G 12G
/dev/zvol/dsk/rpool/swap2 279,3 4K 8.0G 8.0G
sun9999/root#
This swap space is only temporary, until the next reboot.
Swap Space Persistence
To make the swap space persistent, after a reboot, it must be added to the Virtual File System Table
sun9999/root# cp -p /etc/vfstab /etc/vfstab.2015_10_16_dh
sun9999/root# vi /etc/vfstab
(add the following line)
/dev/zvol/dsk/rpool/swap2 - - swap - no -
sun9999/root#
The added swap space will now be activated automatically, upon the next reboot.
Swap Space Validation
Commands to verify: zfs swap datasets, active swap datasets, and persistent datasets
sun9999/root# zfs list | grep swap
rpool/swap 12.4G 298G 12.0G -
rpool/swap2 8.25G 297G 8.00G -
sun9999/root# swap -l -h
swapfile dev swaplo blocks free
/dev/zvol/dsk/rpool/swap 279,1 4K 12G 12G
/dev/zvol/dsk/rpool/swap2 279,3 4K 8.0G 8.0G
sun9999/root# grep swap /etc/vfstab
swap - /tmp tmpfs - yes -
/dev/zvol/dsk/rpool/swap - - swap - no -
/dev/zvol/dsk/rpool/swap2 - - swap - no -
sun9999/root#
Note, the zfs list command now uses a "grep", to capture multiple datasets.
A total of [12G + 8G =] 20GB is now available in swap.
Conclusions
Most of the above document is fluff, filled with paranoia, checking import items to ensure no data loss multiple times. Very few commands are required to perform the aspects of mirroring and root pool extension, Solaris provides a seemless methodology at the OS level to perform activities which are often painful under other operating systems or require additional 3rd party software to perform.