Wednesday, April 8, 2009

ZFS: Managing Storage for Network Management

Abstract

Network management systems normally require significant attention to detail regarding availability of data. The disk subsystem chosen for data storage can significantly enhance availability. OpenSolaris ZFS infrastructure provides one of the more robust file systems for secure and available data.

Introduction to ZFS

ZFS is a comprehensive system for managing data. It combines many of the features normally compartmentalized and commoditized by storage and system vendors into a single component. With ZFS comes the ability to provide:
  • pooling of resources into volumes
  • snapshots of volumes, for backup and restoral purposes
  • serialization of volumes for backup and restoral purposes
  • cloning of volumes for system or application duplication
  • promotion of clones for new system or application installations
  • compression of data for increased disk throughput and capacity
  • single or double parity to compensate for 1 or 2 failed devices in a volume
  • silent error correction for data corruption due to hardware errors
  • sharing of data across clustered systems based in zones on the same platform
  • sharing of data across to other operating systems via nfs or cifs
  • sharing of data across to other hardware platforms via iSCSI
This article discusses a very simple implementation leveraging ZFS.

Preparing Disk Devices

Disks can be added to your system, and in this case, I will add a variety of small external SCSI disks to a controller 1. After adding the disks, the following commands are used to find those disks in the operating system and slice them for usage. Normally, I will partition disks so they are all the exact same size, the size of the smallest disk, so they are easy to find replacements for in case of failure.
Ultra2/root $ disks

Ultra2/root $ ls -al /dev/rdsk/c1*0
lrwxrwxrwx 1 root root 55 2009-04-08 13:18 /dev/rdsk/c1t2d0s0 -> ../../devices/sbus@1f,0/SUNW,fas@1,8800000/sd@2,0:a,raw
lrwxrwxrwx 1 root root 55 2009-04-08 13:18 /dev/rdsk/c1t3d0s0 -> ../../devices/sbus@1f,0/SUNW,fas@1,8800000/sd@3,0:a,raw
lrwxrwxrwx 1 root root 55 2009-04-08 15:20 /dev/rdsk/c1t4d0s0 -> ../../devices/sbus@1f,0/SUNW,fas@1,8800000/sd@4,0:a,raw

Ultra2/root $ format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
...
2. c1t2d0
/sbus@1f,0/SUNW,fas@1,8800000/sd@2,0
3. c1t3d0
/sbus@1f,0/SUNW,fas@1,8800000/sd@3,0
4. c1t4d0
/sbus@1f,0/SUNW,fas@1,8800000/sd@4,0

Specify disk (enter its number): 2
selecting c1t2d0
[disk formatted]

FORMAT MENU:
format> partition

PARTITION MENU:
partition> print
Current partition table (unnamed):
Total disk cylinders available: 3984 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 3980 2.00GB (3981/0/0) 4191993
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 3983 2.00GB (3984/0/0) 4195152
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0


partition> quit

FORMAT MENU:
...
format> disk

AVAILABLE DISK SELECTIONS:
...
Specify disk (enter its number)[2]: 3
selecting c1t3d0
[disk formatted]
format> partition

PARTITION MENU:
...
partition> print
Current partition table (unnamed):
Total disk cylinders available: 3981 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 3980 2.00GB (3981/0/0) 4191993
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 3980 2.00GB (3981/0/0) 4191993
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
partition> quit

FORMAT MENU:
...
format> disk

AVAILABLE DISK SELECTIONS:
...
Specify disk (enter its number)[3]: 4
selecting c1t4d0
[disk formatted]

format> partition

PARTITION MENU:
...
partition> print
Current partition table (unnamed):
Total disk cylinders available: 3981 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 3980 2.00GB (3981/0/0) 4191993
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 3980 2.00GB (3981/0/0) 4191993
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0

partition> quit

FORMAT MENU:
format> quit
Creating a ZFS Pool

ZFS has the ability to leverage resources stored in: disks, slices, or files. These resources must be included into a pool, in order for ZFS to gain access to them. The pool can be straight striped disks, mirrored, RAID, or raid with parity and silent error correction. This is done through the "zpool" command.

To create a pool "u000" of storage from whole disks using "raidz" (with parity and silent error correction), the following command can be executed.
Ultra2/root $ zpool status
no pools available

Ultra2/root $
zpool create u000 raidz c1t2d0s0 c1t3d0s0 c1t4d0s0

Ultra2/root $
zpool status
pool: u000
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
u000 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t2d0s0 ONLINE 0 0 0
c1t3d0s0 ONLINE 0 0 0
c1t4d0s0 ONLINE 0 0 0

errors: No known data errors


Ultra2/root $ df -h | egrep '(u000|File)'
Filesystem size used avail capacity Mounted on
u000 3.9G 24K 3.9G 1% /u000

Ultra2/root $ zfs list
NAME USED AVAIL REFER MOUNTPOINT
u000 122K 3.91G 24.0K /u000
Adding Features to a ZFS Pool

ZFS offers a variety of features, including compression, to make the most of the capacity and throughput of a disk subsystem. Often, after parity is added to a RAID system, compression is desired, in order to redeem the space that was lost in the process of adding the parity.
Ultra2/root $ zfs set compression=on u000
Determining Failure in a ZFS Pool

If a disk in a ZFS pool fails and you are using some kind of redundancy (i.e. raidz) - then the status will indicate you have a failed device. In the following example, target drive 2 had failed in a 3 disk zpool.
Ultra2/root $ zpool status
pool: u000
state: DEGRADED
status: One or moredevices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scrub: resilver completed after 0h0m with 0 errors on Sun April 12 00:30:43 2009
config:

NAME STATE READ WRITE CKSUM
u000 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
c1t2d0s0 FAULTED 0 0 0
c1t3d0s0 ONLINE 0 0 0
c1t4d0s0 ONLINE 0 0 0

errors: No known data errors
Manually Replacing a Failed Drive in a ZFS Pool

To replace a failed drive in a zpool, the hardware must be replaced with a drive of equal or greater size. After the replacement is done, a clear command must be issued against the pool and device.
Ultra2/root $ zpool replace u000 c1t2d0s0

Ultra2/root $
zpool status
pool: u000
state: DEGRADED
scrub: resilver completed after 0h0m with 0 errors on Sun Apr 12 04:07:47 2009
config:

NAME STATE READ WRITE CKSUM
u000 DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
replacing DEGRADED 0 0 0
c1t2d0s0/old FAULTED 0 0 0 corrupted data
c1t2d0s0 ONLINE 0 0 0
c1t3d0s0 ONLINE 0 0 0
c1t4d0s0 ONLINE 0 0 0

errors: No known data errors

Ultra2/root $ zpool status
pool: u000
state: ONLINE
scrub:
resilver completed after 0h0m with 0 errors on Sun Apr 12 04:07:47 2009
config:

NAME STATE READ WRITE CKSUM
u000 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t2d0s0 ONLINE 0 0 0
c1t3d0s0 ONLINE 0 0 0
c1t4d0s0 ONLINE 0 0 0

errors: No known data errors

Automatically Recovering a Failed Drive in a ZFS Pool

It is possible to set a property feature on a ZFS Pool so when a failed drive is replaced, the zfs pool will automatically recover. This is very nice for simplifying system administration - all that is required is to insert a new drive (of similar or higher capacity) inserted into the same position as the old drive and the system will automatically recover the pool.
Ultra2/root $ zfs set autoreplace=on u000



No comments:

Post a Comment