Upgrading NAS #2 SAS drives to double capacity

pete_c

Guru
NAS #2 is now at 88% capacity so decided to upgrade it doubling size of SAS drives.  Easier than I thought.  What is nice is that I can still utilize the NAS in a degraded state.  It is slow to resilver each new drive ~ 24 hours.
 
This is a learning experience for me using XigmaNAS
 
1 - Log on to your NAS box and verify that your ZFS pool is online and disks are healthy.
2 - Shut down NAS box
3 - Started with top SAS drive here.  Removed it and wrote down serial number.
4 - powered up NAS box
5 - log on to your NAS box
6 - using the ZFS tools menu "offline" disk with serial number written above.
7 - Your pool will now enter into a degraded state because we have forced a disk offline. This is intentional.
8 - Shut down your NAS box
9 - remove documented hard drive and replace with new hard drive
10 - power up NAS box.
11 - go to the ZFS tools menu and replace offline drive with new SAS drive using serial number of new drive
12 - It will automatically take you back to the status screen and should show the pool is "resilvering" and the specific disk is "replacing".
13 - note here for me that is taking more than one day and close to 24 hours per disk.

14 - after resilvering each disk seeing this message on disks configuration:
Configuration information about devices is different from physical devices. Please remove those devices and re-add them or run import disks with clear configuration option enabled.
15 - Syncing disks to web gui using this:
Disks > ZFS > Configuration > Synchronize
 
I never like to have an array 'live' while replacing drives.  Yeah, it's supposed to be 'ok' to do it, but if it goes wrong you're often faced with spending MORE time rebuilding/restoring from backup than you would if you started blank and reloaded from backups.  

But, yeah, ZFS has some a long way.  What NAS are you running?
 
XigmaNAS  (which is really FreeNAS - TrueNAS) X 2 boxes
 
Both DIY boxes levels above SOHO boxes with embedded custom firmware running BSD / XigmaNAS (like PFSense)
 
Older NAS is a backup of newer NAS and is offline.  IE: same SAS drives, capacity
 
It is very slow to resilver as I upgrade each drive.

The ZFS replacement / upgrade drive procedure is using same commands whatever you are running for a NAS OS.
 
Started endeavor long time ago building NAS box #2 from scratch and going to ZFS; replicating NAS box #1 first from a Raid 5 set up to ZFS new on NAS #2.
 
Then redid NAS #1 to ZFS and copied NAS #2 ZFS stuff to new ZFS NAS #1.  NAS #1 is current off line.
 
Already tinkered with settings to speed things up and messed with the resilvering process extending it to 6 days per drive.  I then reset the configuration, shut it down during resilvering and rebooted and all is fine so far.   Most important are my pictures / home videos / music which are replicated. IE: music is replicated to two automobiles.
 
System > Advanced > sysctl.conf or loader.conf and adding:
 
vfs.zfs.scrub_delay = 0
vfs.zfs.resilver_min_time_ms = 5000
vfs.zfs.resilver_delay = 0
vfs.zfs.top_maxinflight = 128
 
2 drives upgraded here.  6 more to go.  The resilvering process is slow and really not paying much attention until it is complete.  I have been using the NAS to stream media while it is resilvering and it works fine.
 
The back plane on the box is made for quick replacement of drives in vivo but I did not want to do that.
 
aaaand just this morning I wake up to a 10tb drive in a NAS showing errors.  Ya f'ng JINXED me!   :angry2:
 
I'm also one that does not believe in tempting fate trying to R&R spinning rust drives from live trays just because the system claims to support it.  
 
Geez Bill.  Maybe you should consider ZFS now eh?  The ZFS pools / raid are very redundant.
 
Yes here using Hitachi Enterprise SAS drives. Much smaller though at 2TB each and now going to 4TB each.  (~16TB to ~30TB).   Built two identical boxes each with 8 2TB drives (duplicate MFG drives).
 
The old ones were EMC formatted drives and I had to remove them from their trays and reformat them for use.  
 
They are much louder than the SATA drives.  The NAS boxes are in the basement server rack and I do not hear them upstairs anyhow...
 
I am working on replacing disk #3 and #4 in the next couple of days.  Here is a little text relating to using software and hardware raid.
 
If your hardware has enough ram to use ZFS, then you should be using ZFS, and not any other form of software or hardware RAID. Software and hardware RAID DO NOT provide the kind of robust data protection that ZFS redundancy and checksums provide. ZFS checksums are done at the block level and can detect and repair bitrot transparently. Software and hardware RAID DO NOT provide these block-level protections. Software and hardware RAID will ignorantly pass corrupted data to the system AND there is no mechanism for the system to respond with, “This data is corrupt. Please consult some other part of the RAID.”
 
Update 11th of January, 2022
 
Working on update drive #3 today.
 
I set autoexpand to on before starting endeavor.
 

zpool get autoexpand ZFS0
NAME  PROPERTY    VALUE   SOURCE
ZFS0  autoexpand  off     default

Code:
zpool set autoexpand=on ZFS0

 
This was a mistake as the resilvering process went to 7 days.
 
So I shut off autoexpand
 

zpool set autoexpand=off ZFS0

Shut down NAS
 
Restarted it. Way better than 7 days.
 

Tue Jan 11 10:07:13 CST 2022
pool: ZFS0
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jan 11 09:15:55 2022
    1.06T scanned at 3.28G/s, 32.9G issued at 102M/s, 12.9T total
    4.01G resilvered, 0.25% done, 1 days 12:44:09 to go
config:

    NAME STATE READ WRITE CKSUM
    ZFS0 DEGRADED 0 0 0
     raidz2-0 DEGRADED 0 0 0
     da0p1 ONLINE 0 0 0
     replacing-1 DEGRADED 0 0 3
     5421859058317337756 OFFLINE 0 0 0 was /dev/da1p1
     da7 ONLINE 0 0 0
     da6 ONLINE 0 0 0
     da5 ONLINE 0 0 0
     da1p1 ONLINE 0 0 0
     da2p1 ONLINE 0 0 0
     da3p1 ONLINE 0 0 0
     da4p1 ONLINE 0 0 0

errors: No known data errors

Not going to tinker with this as it will autoexpand when all drives are updated automagically.
 
Update 14th of January, 2022
 
Finished with update to drive #4 of 8 drives.
 
Last drive update took almost 2 days (48 hours).
 
What is nice is that the NAS / ZFS continues to work fine in a degraded state while updating drives.
 
Paying attention to logging entries now and keep seeing the USB HID for the UPS keep disconnecting and reconnecting.
 
It is using the BSD Nut configuration and it appears to be loading two drivers for same USB port.
 
 
16th January, 2022
 
Taking a break here before updating disk #5 this week.
 
Doing a ZFS scrub which can be done via command line or gui.  I have not done this to date and it is recommended to do this at least once a month.
 
Explicit ZFS Data Scrubbing
 
The simplest way to check data integrity is to initiate an explicit scrubbing of all data within the pool. This operation traverses all the data in the pool once and verifies that all blocks can be read. Scrubbing proceeds as fast as the devices allow, though the priority of any I/O remains below that of normal operations. This operation might negatively impact performance, though the pool's data should remain usable and nearly as responsive while the scrubbing occurs. To initiate an explicit scrub, use the zpool scrub command. 
 
zpool scrub ZFS0
 
Via SSH:


 ~# zpool status -v ZFS0
  pool: ZFS0
 state: ONLINE
  scan: scrub in progress since Sun Jan 16 09:38:11 2022
2.70T scanned at 1.55G/s, 1.57T issued at 921M/s, 12.7T total
0 repaired, 12.35% done, 0 days 03:31:12 to go
config:


NAME        STATE     READ WRITE CKSUM
ZFS0        ONLINE       0     0     0
  raidz2-0  ONLINE       0     0     0
    da7     ONLINE       0     0     0
    da6     ONLINE       0     0     0
    da5     ONLINE       0     0     0
    da4     ONLINE       0     0     0
    da0p1   ONLINE       0     0     0
    da1p1   ONLINE       0     0     0
    da2p1   ONLINE       0     0     0
    da3p1   ONLINE       0     0     0


errors: No known data errors

Decided to upgrade power supply today.  Current stock Silverstone power supply is around 300 watts.
 
Getting this one tomorrow from Amazon.
 
EVGA 220-G5-0650-X1 Super Nova 650 G5, 80 Plus Gold 650W, Fully Modular, ECO Mode with Fdb Fan, 10 Year Warranty, Compact 150mm Size, Power Supply

 
Power supply will be a PITA to replace as I have to pull the 8 drive tray out to get to it.
 
Decided today to also upgrade back up NAS which is a copy of the Silverstone NAS with new drives doubling capacity.
 
And I just got the drive re-integrated with my NAS.  This after ordering one from Amazon and getting shipped in nothing more than a flimsy mylar envelope and a skin of thin bubble wrap.  No chance in Hell a drive would have survived that sort of careless packaging.  So I ordered one from Bestbuy... and it came in box but LOOSE inside.  Straight back. 

Walked into a Microcenter and bought one.  I mean, there's no guarantee someone didn't randomly man-handle that one, but at least I have some vague hopes it was treated better than the two poorly-shipped ones!

Took 11 hours for it to rebuild.  Fingers-crossed.

The one that died may still be in warranty, so I'll jump through those hoops later this week.
 
The upshot to this is it reminded me to check on some two-drive mirrors in other machines and found their e-mail notification wasn't configured properly (due to password changes).  So yay for finding that BEFORE drives failed.  
 
I have a 24 bay chassis that's been out of commission for nearly a decade now, I keep meaning to resurrect it and get a zfs setup installed on it.  But now that drive prices have crept back up again I'm even less motivated to bring it back online.
 
Good news Bill!!!
 
Yes here going baby steps here...
 
Are you using an over the counter NAS or a DIY built NAS?
 
What specs does your NAS have (CPU, Memory, Raid card?)
 
It's an older QNAP and does a pretty decent job overall.  It has an intel processor and I could change how it boots to run something else, but their overall feature set is pretty decent, and they keep the software pretty regularly updated.  Granted, not without reasons due to security failings and such, but I don't run much else on them to have those security risks (nor do I have networking security left lax enough to let that happen anyway).

I mean, I know /how/ to do all the dirty work to make a DIY setup, but this has been a good set-and-forget arrangement to hold a bunch of media files.  Thus freeing me to torment myself and family with /other/ tinkering.
 
Back
Top