Monday, October 7, 2019

Solaris 11.4: Eliminating Silent Data Corruption

Solaris 11.4: Eliminating Silent Data Corruption

Abstract:

Storage has been increasing in geometric proportions, for decades. As storage has been increasing, a problem referred to as Silent Data Corruption has been noticed. Forward thinking engineers at Sun Microsystems had created ZFS to manage this risk by having discovery & correction occur passively & automatically upon future reads & writes. Oracle later purchased Sun Microsystems and introduced proactive automated discovery & correction on a monthly basis, as part of Solaris 11.4

The Problem:

Silent Data Corruption has been measured by various industry players dealing with massive quantity of storage.
the fast database at Greenplum, which is a database software company specializing in large-scale data warehousing and analytics, faces silent corruption every 15 minutes.[9] As another example, a real-life study performed by NetApp on more than 1.5 million HDDs over 41 months found more than 400,000 silent data corruptions, out of which more than 30,000 were not detected by the hardware RAID controller. Another study, performed by CERN over six months and involving about 97 petabytes of data, found that about 128 megabytes of data became permanently corrupted.
 As storage continues to expand, the need to resolve silent corruption became more important.

The Passive Solution:

Jeff Bonwick at Sun Microsystems created ZFS, specifically to address storage as data storage quantities increased. The ZFS File System was not a 32 bit File System, like 30 year old technology, but was engineered to be a 128 bit filesystem, projected to accommodate data into the next 30 years. With such  a massive quantity of data to be retained, Silent Data Corruption was addressed by performing a checksum on the data during the write and verifying it on future reads. If the checksum does not match on the read, then a redundant block of the data on the ZFS File System will be automatically read, and a correction would occur to the formerly read bad block. This feature was very unique to Solaris.

A system administrator can read every block via an operation referred to as a "scrub".
sc25client01/root# zpool list rpool
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  416G   296G  120G  71%  1.00x  ONLINE  -


sc25client01/root#
zpool scrub rpool 

sc25client01/root#
This scrub will continue in the background until all disks had all of the blocks read. The scrub always reads data at a rate which does not interfere with the operation of the platform or applications.


The Proactive Solution:

With the release of Solaris 11.4, formerly known as Solaris 12, an automated schedule of reading every byte of data in the entire pool is scheduled by default in the storage pool once a month. By reading every block of data once a month, silent data corruption can be rooted out and corrected automatically, which is a very unique feature of Oracle's Solaris!

Under an older OS release (Solaris 11.3 SRU 31),  notice that the property does not exist.
sc25client01/root# uname -a
SunOS sc01client01 5.11 11.3 sun4v sparc sun4v

sc25client01/root# pkg list entire
NAME (PUBLISHER) VERSION                    IFO
entire           0.5.11-0.175.3.31.0.6.0    i--

sc25client01/root# zpool get lastscrub rpool
bad property list: invalid property 'lastscrub'
For more info, run: zpool help get
Under a modern OS release (Solaris 11.4 SRU 13), the last scrub occurred less than a month ago.
sun9781/root# uname -a
SunOS sun1824-cd 5.11 11.4.13.4.0 sun4v sparc sun4v

sun9781/root# pkg list entire
NAME (PUBLISHER) VERSION                    IFO
entire           11.4-11.4.13.0.1.4.0       i--

sun9781/root# zpool get lastscrub rpool
NAME   PROPERTY   VALUE   SOURCE
rpool  lastscrub  Sep_10  local
The last scrub details can be seen through the status option.
sun9781/root# zpool list
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  278G  36.9G  241G  13%  1.00x  ONLINE  -

sun9781/root# zpool status
  pool: rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
        pool will no longer be accessible on older software versions.
  scan: scrub repaired 0 in 16m24s with 0 errors on Tue Sep 10 03:42:44 2019

config:
        NAME                       STATE      READ WRITE CKSUM
        rpool                      ONLINE        0     0     0
          mirror-0                 ONLINE        0     0     0
            c0t5000CCA0251CF0F0d0  ONLINE        0     0     0
            c0t5000CCA0251E4BC8d0  ONLINE        0     0     0

errors: No known data errors
The above 278 Gigabyte pool was able to be read in a little over 15 minutes, and checked with no errors to be corrected.

Conclusions:

Network Management is well aware that the more storage that is needed that the more critical the data recovery process becomes. Redundancy through advanced file systems like ZFS under managed services class operating systems like Solaris are a good choice. Solaris 11.4 keeps data healthy, no matter what quantity of physical disks managed or data being retained.

No comments:

Post a Comment