Upgrading NAS #2 SAS drives to double capacity

Originally here in NAS DIY learning mode went to the "Serve the Home" forum for the NAS build.  Most of the work was updating the firmware on the LSi (IBM) raid controller card.  It was well documented on the forum.  I had no dependencies on a new NAS still using the old US Robotics 8700 NAS.  So I tinkered some with whatever SATA drives I had around.
 
I have moved now here from Samba sharing to NFS sharing as it seems more efficient and faster to me.
 
Got the new modular power supply today (1 day shipping from Amazon).  Gonna try to swap power supplies here without removing or disconnecting the NAS sub chasis or tray.  Being modular there will be less wires inside of the case.  For the SAS drives went to using those thin SAS/SATA cables to the LSi controller.  I replaced the old Silverstone fans with better fans.  Got the HDD temps down to around 31-33 ° C.  After today's endeavor will replace the 4 last SAS drives.
 
19th January, 2022
 
 
Sent the power supply back to Amazon as it was the same size as the current Silverstone PS in the box at 600 Watts.  May replace it with a 1000 watt PS.
 
Currently resilvering replacement drive #5 of 8 drives.  Ran in to a little glitch.  First time I did this it resilvered new drive #5 and still had old drive details in place so that ZFS was running in a degraded state.  ZFS message was that I had to add new drive to ZFS pool or replace it which didn't make any sense.  So replaced it again with same drive and resilvering ZFS again.  No errors first time I did this.  Back to another day of resilvering.
 
20th of January 20, 2022
 
Still upgrading here.
 
Interesting videos to watch.
 
RAID vs HBA SAS controllers | What's the difference? Which is better? (Oct 22, 2021)
 
[youtube] http://youtu.be/xEbQohy6v8U[/youtube]
 
SATA vs SAS As Fast As Possible (old Dec 31, 2015)
 
[youtube]http://youtu.be/5ADpSMtEQxY[/youtube]
 
21st of January, 2022
 
Resilivering drive #6 now.  2 more to go.  While the NAS continues to work fine during resilvering it is showing slower transfer speeds.
 
Did a scrub yesterday before installing drive #6.  
 
23nd of January, 2022
 
 
Resilvering drive #7 today.  Speaking to a Drobo NAS user yesterday ....it is just as slow to do this with a Drobo NAS update....

The Drobo NAS update main GUI screen has much more overall information and is very graphical. The XigmaNAS GUI screens are more text oriented.
 
25th of January, 2022
 
Finished replacing drive #8 today.  Drive pool still shows old capacity with autoexpand on.
 
I thought it would show new capacity after finishing last drive.  It did not until I ran the follow command per dev on ZFS pool.
 
zpool online -e ZFS0 /dev/da0
 
After above command expansion (double capacity was showing).  I ran it for the rest of the devices anyhow.
 
zpool online -e ZFS0 /dev/da1
zpool online -e ZFS0 /dev/da2
zpool online -e ZFS0 /dev/da3
zpool online -e ZFS0 /dev/da4
zpool online -e ZFS0 /dev/da5
zpool online -e ZFS0 /dev/da6
zpool online -e ZFS0 /dev/da7
 
now status shows
 


zpool list 
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
ZFS0    29T  12.7T  16.3T        -         -     1%    43%  1.00x  ONLINE  -

 

Next scrubbing the ZFS0 pool. (do not really need to do this).
 
This only takes 2-3 hours.
 
All complete for NAS #1.  Next is NAS #2 which is off. 
 
26th of January, 2022
 
Started to see disconnects here from the LSi controller (with ram).  No errors on the drives.

I saw this a few times when upgrading the drives during resilvering. I would shut down and coldstart the NAS and the errors would be gone.
 
I would see this sometimes copying multiple 10-20 Gb files (4K movies) which transfer speeds would be very fast and then slow down.
 
After googling some it might be related to the memory cache buffers.
 
Noticed this doing another scrub of the newly updated NAS.
 
For time being have shut off some of the cache and no longer seeing the disconnects.  Could be related to faster drive speeds, controller memory cache, et al.
 
Base Xeon server motherboard is a Haswell Intel(R) Core(TM) i3-3245 CPU @ 3.40GHz and 16Gb of RAM.  Shutting off the external controller ram cache noticed an increase temperature on the 4 cores of the Xeon processor.
 
I do have the two Intel Gb NICs on the NAS connected to two Gb ports on the managed switch (years now).
 
vfs.zfs.cache
 
to off for now.  
 
Reading this to tweak.
 
ZFS Tuning guide for BSD and using the XigmaNAS advanced tuning options in loader.conf file.
 
The issue of kernel memory exhaustion is a complex one, involving the interaction between disk speeds, application loads and the special caching ZFS does. Faster drives will write the cached data faster but will also fill the caches up faster. Generally, larger and faster drives will need more memory for ZFS.
 
So will have to (and I do not want to) tweak out some kernel variables in the BSD ZFS configuration.
 
I am very used to the 116MB/s transfer speeds that I have been seeing.  IE: when using SFTP on Linux for copying to and from NAS I see live transfer speeds.

I read somewhere that the Lsi controller sucks a lot of power so ordered a new 1000 watt modular power supply to replace the Silverstone 600 Watt power supply anyhow.

Also read cheap cables can also cause issues so will be replacing the SAS cables going to the back plane.
 
28th of January, 2022
 
Transfer speeds appeared slower here at 75 Mb/s.  Reset buffers to defaults and speeds are back up to 116 Mb/s now.
 
Attempting to see if new 1000 watt SFX power supply will fit inside case today.  If not will leave the old Silverstone 600 Watt SFX PS in place.

Revisiting performance values.

Suggested:
Scrub and Resilver Performance
If you're getting horrible performance during a scrub or resilver, the following sysctls can be set:

vfs.zfs.scrub_delay=0
vfs.zfs.top_maxinflight=128
vfs.zfs.resilver_min_time_ms=5000
vfs.zfs.resilver_delay=0

Mine are currently show but not enabled. vfs.zfs.top_maxinflight not shown as an option

vfs.zfs.scrub_delay
vfs.zfs.resilver_min_time_ms
vfs.zfs.resilver_delay

For loader conf file suggested is:

#Assuming 8GB of memory

#If Ram = 4GB, set the value to 512M
#If Ram = 8GB, set the value to 1024M
vfs.zfs.arc_min="1024M"

#Ram x 0.5 - 512 MB
vfs.zfs.arc_max="3584M"

#Ram x 2
vm.kmem_size_max="16G"

#Ram x 1.5
vm.kmem_size="12G"

#The following were copied from FreeBSD ZFS Tuning Guide
#https://wiki.freebsd.org/ZFSTuningGuide

# Disable ZFS prefetching
# http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
# Increases overall speed of ZFS, but when disk flushing/writes occur,
# system is less responsive (due to extreme disk I/O).
# NOTE: Systems with 4 GB of RAM or more have prefetch enabled by default.
vfs.zfs.prefetch_disable="1"

# Decrease ZFS txg timeout value from 30 (default) to 5 seconds. This
# should increase throughput and decrease the "bursty" stalls that
# happen during immense I/O with ZFS.
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
# default in FreeBSD since ZFS v28
vfs.zfs.txg.timeout="5"

# Increase number of vnodes; we've seen vfs.numvnodes reach 115,000
# at times. Default max is a little over 200,000. Playing it safe...
# If numvnodes reaches maxvnode performance substantially decreases.
kern.maxvnodes=250000

# Set TXG write limit to a lower threshold. This helps "level out"
# the throughput rate (see "zpool iostat"). A value of 256MB works well
# for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on
# disks which have 64 MB cache.

# NOTE: in v27 or below , this tunable is called 'vfs.zfs.txg.write_limit_override'.
vfs.zfs.write_limit_override=1073741824

I have inserted above settings but not enabled them.
 
Again purchased wrong PS for DS380.  Sending it back today.  Like the return policy with Amazon.


Model No. SST-SX600-G  Currently installed in DS380*
Form factor SFX
Dimension 125 mm (W) x 63.5 mm (H) x 100 mm (D) 
          4.92" (W) x 2.5" (H) x 3.94" (D)


Model No. ‎EVGA ‎220-GT-1000-X1 Purchased and returning
Form factor    Internal
Dimension      150mm (W) X 86mm (H) x 150mm (L)
        ‎5.9" (W) 3.4" (H) x 5.9" (D)

DS380 requires SFX PSU with standard 100mm depth.

 
 
1st of February, 2022
 
I have a UPS connected which works fine but kept getting disconnect messages in my system logs.


usbhid-ups[9710]: Got disconnected by another driver: Device busy

 
SSHing to the BSD NAS box checked USB connections.  Using Nut in BSD.
 

Code:
lsusb

Protocol spec without prior Class and Subclass spec at line 23281
Bus /dev/usb Device /dev/ugen1.4: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
 

First message is related to the USB id's DB.
 
To fix the device busy message did the following:
 


mount -uw /


Edited the devd.conf file:
nomatch 10 {match "bus" "uhub[0-9]+";
match "vendor" "0x0764";
match "product" "0x0501";
};

 

Code:
mount -ur /

service devd restart
 

 
 
What NAS are you running?  Because for freenas I saw some posts about the USB driver getting erroneously loaded twice, causing that kind of problem.

Granted, this was a while ago, don't know if it's still relevant: https://www.truenas.com/community/threads/got-disconnected-by-another-driver-device-busy.40511/

I'm with you on wanting logs to be free of extraneous messages.  And USB connections are often a lot more annoying than "they should be".  It's the one thing that keeps me from going with VMs for some things.  Just having a straight USB connection into the host running the software is tedious enough with USB.  Doing pass-through to a VM is often a hopeless time sink.  What works over one reboot never works again, or worse, only works sometimes.
 
Running BSD XigmaNAS here.  BSD is BSD and seeing same issues with many folks running BSD.  The above fixed the log messages.  I do have the latest USB ids file loaded so not touching that.  Have no issues with PFSense connected to another Cyberpower UPS and secondary PFSense box connected to an APC UPS.
 
The messages where there every 15 minutes or so.  Now they are gone.
 
It is interesting that I see these messages only with the XigmaNAS box.  That said it is working fine now:
 
Here is the raw NUT info:


battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 20
battery.mfr.date: CPS
battery.runtime: 1750
battery.runtime.low: 300
battery.type: PbAcid
battery.voltage: 13.6
battery.voltage.nominal: 12
device.mfr: CPS
device.model:  CP 1500C
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/ugen1.4
driver.parameter.synchronous: no
driver.version: 2.7.4.1
driver.version.data: CyberPower HID 0.5
driver.version.internal: 0.43
input.transfer.high: 140
input.transfer.low: 90
input.voltage: 117.0
input.voltage.nominal: 120
output.voltage: 116.0
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.load: 19
ups.mfr: CPS
ups.model:  CP 1500C
ups.productid: 0501
ups.realpower.nominal: 388
ups.status: OL
ups.test.result: Done and warning
ups.timer.shutdown: -60
ups.timer.start: 0
ups.vendorid: 0764

 
 
I am using a Xeon motherboard with 16Gb of RAM and can run Oracle VB's on the box today but do not. 
 
I am running Oracle VBs on the Ubuntu 20.04 Automation box just fine.  The automation box runs Homeseer and Home Assistant.
 
I use the Oracle VBs for running embedded Windows for SAPI use in Homeseer. 
 
I have no issues with the USB 2.0 and USB 3.0 pass thru which I use for RF ID and W800 X10 sensors.
 
02, February, 2022
 
Above mentioned fix did not delete the logging errors.


Feb 2 06:28:49 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 2 05:42:32 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 2 05:22:23 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 2 05:04:23 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 2 04:38:45 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 2 04:17:33 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 2 02:12:16 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 21:19:24 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 21:18:51 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 19:43:43 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 14:42:09 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 13:41:35 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 12:33:33 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy
Feb 1 09:49:41 ics-raid-00 usbhid-ups: Got disconnected by another driver: Device busy

 
 
3rd of Febuary, 2022
 
 
Maybe found the issue.  I think I was loading the drivers twice by accident by not putting in the root password when configuring it assuming the configuration already knew the root password.
 
I would originally get an error when enabling the service even though it would run.  I disabled the UPS Nut service.  I then re-enabled it using the root password and saw no error.
 
Now watching the logs.

One log entry is still too much for me after a few hours.

Feb 3 13:53:30 usbhid-ups: Got disconnected by another driver: Device busy

running ps -awx -l

See this:

0 4036 1 2 20 0 12168 3024 select Ss - 0:01.04 /usr/local/libexec/nut/usbhid-ups -a CPS1 -u root0 4038 1 0 20 0 15456 4948 select Ss - 0:00.28 /usr/local/sbin/upsd -u root0 4073 1 0 20 0 15820 5220 nanslp Is - 0:00.02 /usr/local/bin/upslog -s CPS1@localhost -l /var/log/ups.log -i 30 4150 1 0 52 0 15228 4980 piperd Is - 0:00.00 /usr/local/sbin/upsmon -u root localhost0 4152 4150 0 20 0 15792 5200 nanslp S - 0:00.27 /usr/local/sbin/upsmon -u root localhost
#4036 is only running one instance.

fstat | grep -i usbroot     usbhid-ups  4036 text /usr/local  79875 -r-xr-xr-x  214536  rroot     usbhid-ups  4036   wd /var      33046 drw-------     512  rroot     usbhid-ups  4036 root /             2 drwxr-xr-x     512  rroot     usbhid-ups  4036    0 /dev         52 crw-rw-rw-    null rwroot     usbhid-ups  4036    1 /dev         52 crw-rw-rw-    null rwroot     usbhid-ups  4036    2 /dev         52 crw-rw-rw-    null rwroot     usbhid-ups  4036    3* local dgram fffff8007374d200 <-> fffff8007367cc00root     usbhid-ups  4036    4* local stream fffff8007374d100root     usbhid-ups  4036    5 /dev        116 crw-rw----  usb/1.4.0 rwroot     usbhid-ups  4036    6 /dev        116 crw-rw----  usb/1.4.0 rwroot     usbhid-ups  4036    7* local stream fffff800737cb800root     usbhid-ups  4036    8* local stream fffff800737cb500 <-> fffff800737cb600
Edited this post  at 18:47 C time and the earlier formatting got trashed.
 
Rebooted the NAS box. Boot up logs and a couple of hours later do not see any errors in the logs.

IE: usbhid-ups: Got disconnected by another driver: Device busy
 
Back
Top