A quick demo of using the ZFS hot spare feature. We talk of ZFS in the Oracle University course at our Minneapolis location.
After the install is complete I added 4 2-GB drives so ZFS had some drives to use.
bash-3.00# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0d0 /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0 1. c0d1 /pci@0,0/pci-ide@7,1/ide@0/cmdk@1,0 2. c1d1 /pci@0,0/pci-ide@7,1/ide@1/cmdk@1,0 3. c2t0d0 /pci@0,0/pci1000,30@10/sd@0,0 4. c2t1d0 /pci@0,0/pci1000,30@10/sd@1,0
There were no existing ZFS pools
bash-3.00# zpool list
no pools available
So I created a pool named brian, mirrored 2 drives and added one as a spare
bash-3.00# zpool create brian mirror c0d1 c1d1 spare c2t0d0
bash-3.00# zpool status brian pool: brian state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM brian ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d1 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 spares c2t0d0 AVAIL errors: No known data errors
Note that there is a spare identified in the zpool status output. Spares can be used by multiple pools. Mr. Eric Schrock that wrote the code for this tells us that there is now an FMA agent, zfs-retire, which subscribes to vdev failure faults and automatically initiates replacements if there are any hot spares available.
Now I force a failure and use zfs replace so the spare takes over
bash-3.00# zpool offline brian c0d1
Bringing device c0d1 offline
bash-3.00# zpool replace brian c0d1 c2t0d0
bash-3.00# zpool status brian pool: brian state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: resilver completed with 0 errors on Sun Jun 22 11:55:46 2008 config: NAME STATE READ WRITE CKSUM brian DEGRADED 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c0d1 OFFLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 spares c2t0d0 INUSE currently in use errors: No known data errors
Note the the spare is now marked as INUSE but is still marked as a spare. The replacement is only temporary and once the original device is replaced it will return to the pool.
Now I replace the “failed” drive and the spare returns to the AVAIL state.
bash-3.00# zpool replace brian c0d1 c2t1d0
bash-3.00# zpool status brian pool: brian state: ONLINE scrub: resilver completed with 0 errors on Sun Jun 22 11:58:02 2008 config: NAME STATE READ WRITE CKSUM brian ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 spares c2t0d0 AVAIL errors: No known data errors
And finally I remove the spare from this pool if it is no longer required
bash-3.00# zpool remove brian c2t0d0
bash-3.00# zpool status brian pool: brian state: ONLINE scrub: resilver completed with 0 errors on Sun Jun 22 11:58:02 2008 config: NAME STATE READ WRITE CKSUM brian ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 errors: No known data errors