NAS lagging

I recently upgraded my NAS from a QNAP TS-231P to a QNAP TS-932PX. The new setup uses 4 SSDs in RAID 6 for file sharing and 5 HDDs in RAID 6 for backups. It also runs an additional cloud backup every night.

About 10 people in our office use the NAS for file sharing, and it’s joined to Active Directory on Windows Server 2019.

Everything worked perfectly for several months. However, sometime last month users started reporting very slow performance every morning. Browsing folders or opening files can take anywhere from 2 to 30 seconds.

If I reboot the NAS in the morning, it works normally for the rest of the day. But the next morning the lag comes back, and the only way to fix it again is another reboot.

I contacted QNAP support, and their higher-level technicians tried troubleshooting but couldn’t find any clear issues. They suggested monitoring the swap file and possibly adding more RAM, but that doesn’t seem logical because:

  • RAM usage never goes above 40%

  • CPU usage stays under 10%

  • Network (2.5Gb) is barely utilized

  • I removed and rejoined the NAS to Active Directory

  • Ping latency between users, the server, and the NAS is always around 1 ms.

  • The disks are running at their proper speed.

After two weeks of troubleshooting, I’m running out of ideas. My next steps might be to remove it from Active Directory to test, or possibly reset and reformat the NAS.

Thanks for any suggestions.

So a couple things:

Never assume CPU usage = CPU Load. They are different. You can have a low CPU usage and a high CPU load which will absolutely affect your system. To see CPU load, open an SSH shell and run the command top. There will be a line named Load Average that consists of 3 numbers. The numbers represent the last 1 minute, 5 minute and 15 minute averages of the loading or amount of threads being processed by the CPU. If the load average number is greater than your number of cores on the unit, then you will see slow speeds. For example my TS-873A has 4 real cores each capable of 2 threads which = 8 cores. So when my TS-873A gets over 8 it begins to slow down. If you have having speed issues, check this when it is slow.

image

If have seen issues on my QNAP where there are zombie processes that are running that are consuming which don’t get shut down properly. I had an issue a month ago where the load was showing huge numbers. Turns out it was caused by something in Hybrid Backup Sync.

What process or application is your NAS doing say overnight or something like that which could cause the slowdown? Backups can take a lot of resources and slow things down if they did not complete before the next day. So can things with other apps, etc.

So steps to resolve this:

1.) Start by looking at TOP and seeing what your load is and if it’s high seeing if there’s an app showing high in CPU usage in the list below.

2.) Begin stopping potentially high usage apps like Container Station, Virtualization Station, etc. Stop an application from App Center. Watch for a few minutes to see if the usage starts to drop. If it does, you found the offending app. Keep doing this until you find the app sucking up your loading.

3.) With the offending app stopped, reboot the NAS.

4.) After the NAS reboots then restart the app.

5.) Monitor your loading to see if the problem still occurs.

These issues can take some time and effort to figure out. If you keep pressing QNAP support to look into this, they will help you. It will take some gymnastics with them and you will need to explain everything about what you are doing. Ask for the case to be escalated. Make sure they log into your NAS from Helpdesk and look at things.

This is during slowdown on Feb 27 and I see similar load average for other days during slowdown

\[\~\] # top
Mem: 3710592K used, 325376K free, 54016K shrd, 1495552K buff, 298496K cached
CPU:  6.5% usr  0.0% sys  0.0% nic  0.0% idle 93.4% io  0.0% irq  0.0% sirq
Load average: 22.89 20.50 13.37 2/879 17322
PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
9727     1 admin    S    2314m 58.3   3  2.1 /sbin/hal_daemon -f
17316 14395 admin    R     3520  0.0   1  2.1 top
24832 17582 admin    S <  3060m 77.1   0  0.0 /usr/local/apache/bin/apache_proxy
29175     1 admin    S    1406m 35.4   1  0.0 {cc3-fastcgi} python /share/CACHED
14546     1 admin    S    1238m 31.1   1  0.0 /usr/local/sbin/qulogdb --defaults
26475     1 admin    S    1210m 30.4   0  0.0 {p2pagent} /share/CACHEDEV2_DATA/.
30866 18157 admin    S    1205m 30.3   3  0.0 {apache_proxys} /usr/local/apache/
31531     1 admin    S     967m 24.3   2  0.0 /usr/local/mariadb/bin/mysqld --de
15705 15704 admin    S     856m 21.5   0  0.0 /usr/local/sbin/ncd
12976     1 admin    S     855m 21.5   1  0.0 /usr/local/mariadb/bin/mysqld --de
23248     1 admin    S     838m 21.1   2  0.0 /usr/local/mariadb/bin/mysqld --de
15614     1 admin    S     486m 12.2   2  0.0 /usr/local/sbin/ncdb --defaults-fi
7338     1 admin    S     425m 10.7   2  0.0 /mnt/ext/opt/Python/bin/python ./m
26437     1 admin    S     326m  8.2   0  0.0 tunnelagent
5698     1 admin    S     296m  7.4   1  0.0 /sbin/cs_qdaemon
11507     1 admin    S     266m  6.7   3  0.0 /mnt/ext/opt/Python/bin/python /mn
14644 14636 admin    S     260m  6.5   2  0.0 /usr/local/sbin/qulogd
14366     1 admin    S     260m  6.5   0  0.0 /usr/local/sbin/qulogd
17118     1 admin    S     260m  6.5   2  0.0 /usr/local/sbin/qulogd
31916     1 admin    S     249m  6.2   3  0.0 /usr/local/bin/qsyncsrv_monitor -p

This is now when nobody’s working

Mem: 3723328K used, 312640K free, 46592K shrd, 1653312K buff, 219584K cached
CPU:  0.9% usr  1.3% sys  0.0% nic 97.6% idle  0.0% io  0.0% irq  0.0% sirq
Load average: 1.28 1.32 1.40 1/823 6903
PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
9838     1 admin    S    2251m 56.7   1  0.9 /sbin/hal_daemon -f
31602     1 admin    S     903m 22.7   2  0.0 /usr/local/mariadb/bin/mysqld --defaults-file=/etc/
9890     1 admin    S    46976  1.1   3  0.0 {nmd} python /usr/local/network/nmd/nmd.pyc
7426     1 admin    S    14912  0.3   0  0.0 /mnt/ext/opt/netmgr/util/redis/redis-server \*:0
6856 28205 admin    R     4096  0.1   0  0.0 top
3808 17771 admin    S <  3060m 77.1   2  0.0 /usr/local/apache/bin/apache_proxy -k start -f /etc
29211     1 admin    S    1406m 35.4   0  0.0 {cc3-fastcgi} python /share/CACHEDEV3_DATA/.qpkg/Hy
14657     1 admin    S    1238m 31.1   3  0.0 /usr/local/sbin/qulogdb --defaults-file=/mnt/ext/op
26675     1 admin    S    1209m 30.4   0  0.0 /share/CACHEDEV2_DATA/.qpkg/CloudLink/bin/p2pagent
30877 18338 admin    S    1205m 30.3   2  0.0 /usr/local/apache/bin/apache_proxys -k start -f /et
13123     1 admin    S     855m 21.5   0  0.0 /usr/local/mariadb/bin/mysqld --defaults-file=/usr/
23639     1 admin    S     838m 21.1   0  0.0 /usr/local/mariadb/bin/mysqld --defaults-file=/etc/
15746 15742 admin    S     827m 20.8   2  0.0 /usr/local/sbin/ncd
15651     1 admin    S     485m 12.2   2  0.0 /usr/local/sbin/ncdb --defaults-file=/mnt/ext/opt/N
7388     1 admin    S     425m 10.7   3  0.0 /mnt/ext/opt/Python/bin/python ./manage.pyc runfcgi
27172     1 admin    S     328m  8.2   0  0.0 tunnelagent
5757     1 admin    S     296m  7.4   1  0.0 /sbin/cs_qdaemon
11649     1 admin    S     266m  6.7   1  0.0 /mnt/ext/opt/Python/bin/python /mnt/ext/opt/netmgr/
14759 14751 admin    S     260m  6.5   2  0.0 /usr/local/sbin/qulogd
31944     1 admin    S     249m  6.2   1  0.0 /usr/local/bin/qsyncsrv_monitor -pid:31940 -reg:/sh
23915     1 admin    S     248m  6.2   0  0.0 /usr/local/sbin/pp_qcoolied -f /etc/config/pp_qcool
4329     1 admin    S     227m  5.7   3  0.0 /sbin/lvmetad
20500     1 admin    S     219m  5.5   2  0.0 /usr/sbin/rsyslogd -f /etc/rsyslog_only_klog.conf -
30153     1 admin    S     194m  4.8   1  0.0 {php-fpm-proxy} php-fpm: master process (/etc/php-f
30157 30153 admin    S     194m  4.8   2  0.0 {php-fpm-proxy} php-fpm: pool www
30158 30153 admin    S     194m  4.8   1  0.0 {php-fpm-proxy} php-fpm: pool www
31833     1 admin    S     192m  4.8   3  0.0 /sbin/qsyncsrv_dbm -b
20970     1 admin    S     174m  4.4   1  0.0 /sbin/qShield
20965     1 admin    S     174m  4.4   3  0.0 qNoticeEngined: Write notice is enabled…
19864     1 admin    S     169m  4.2   1  0.0 /usr/local/bin/rfsd_qmonitor -f:/tmp/rfsd_qmonitor.
18780     1 admin    S     168m  4.2   2  0.0 /mnt/ext/opt/Python/bin/python2 /sbin/wsd.py
20967     1 admin    S     167m  4.2   2  0.0 qLogEngined: Write log is enabled…
10396 10393 admin    S     149m  3.7   0  0.0 /usr/local/bin/rates_monitor_start
18367 32316 admin    S     135m  3.4   2  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
4595 32316 admin    S     129m  3.2   0  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
24710 32316 admin    S     126m  3.1   3  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
3771 32316 admin    S     126m  3.1   1  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
9922  9890 admin    S     116m  2.9   2  0.0 {ncaas} python /usr/local/network/nmd/nmd.pyc
9923  9890 admin    S     116m  2.9   1  0.0 {qserviced} python /usr/local/network/nmd/nmd.pyc
32424 32316 admin    S     113m  2.8   2  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
32316     1 admin    S     113m  2.8   0  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
14751     1 admin    S     108m  2.7   2  0.0 /usr/local/sbin/qulogd
10729  9808 admin    S <   107m  2.7   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
15742     1 admin    S     106m  2.6   2  0.0 /usr/local/sbin/ncd
21592     1 admin    S     104m  2.6   2  0.0 /usr/bin/qsnapman
29655     1 admin    S      99m  2.5   2  0.0 /usr/bin/RTRR_MANAGER
9846  9808 admin    S <    98m  2.4   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
15727     1 admin    S    99520  2.4   0  0.0 /usr/local/sbin/ncloud
9847  9808 admin    S <  97152  2.3   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
9902  9808 admin    S <  96832  2.3   3  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
32321 32316 admin    S    95360  2.3   0  0.0 {cleanupd} /usr/local/samba/sbin/smbd -l /var/log -
32319 32316 admin    S    95232  2.3   3  0.0 {smbd-notifyd} /usr/local/samba/sbin/smbd -l /var/l
7037  9808 admin    S <  93312  2.3   2  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
9808     1 admin    S <  86912  2.1   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
19856     1 admin    S    77056  1.9   2  0.0 /sbin/rfsd -i -f /etc/rfsd.conf
30107 30105 admin    S <  73792  1.8   0  0.0 /home/httpd/cgi-bin/qsync/qsyncsrv.fcgi

I also fed the stats to AI

https://x.com/i/grok/share/c0ff440f825b46b8a25c50b0050f0416

So a load of 22.89 is going to be brutally slow.

So now, you need to figure out what is causing that. It’s hard to tell what process specifically that is. So look at the following:

1.) Are you running backups overnight?
2.) Did those backups finish before morning?
3.) What happens if you don’t run a backup overnight?
4.) Start stopping applications one by one when the load is high. Start with things like Container Station, Virtualization Station, Web Server, etc. Take them down one by one. Wait a few minutes each time and see what happens to the load. When you find the app in question, it will drop pretty quickly.

Unfortunately, your TS-932PX is an ARM A57 CPU. It’s an upgrade from your TS-231P but barely. If you are using this NAS in a business situation, you need to buy a different NAS. The ARM Cortex A57 is a great MPU but really for specific embedded applications. It’s just not what you want to use to run a file server for 10 people. I’m sorry to tell you this but you bought the wrong NAS as an upgrade. You should at the very least move to an X86 NAS and the very lowest I would go there would be an AMD Ryzen V1500B type model like the TS-x73A NAS. Better yet would be an i5 or an i7 but these start to get more pricey.

1 Like

The NAS is used strictly for file sharing. It’s not hosting any servers. TS-231P worked fine for years. I upgraded to TS-932PX just to increase the disk space.

The NAS works fast whole day if rebooted in the morning. Backups are finished at night, and not running in the morning.

Your suggestion about high load average was the most helpful thing ever so far and gives me a direction for troubleshooting further.

Definitely not good.

Just a side note, if it really was just a space issue, you could have upgraded the drives in the 231P. That’s what I did (QNAP TS-231P-US 2x18TB Seagate Exos)

I usually suspect the storage devices in this type of situation. Normally, I’d suggest checking the drive stats and SMART diagnostics, but I’m not sure what’s available to you with SSDs (as I don’t have any in my own NAS units).

Maybe post the SSD model numbers and see if it triggers anything for someone else.

There is a bottleneck in there somewhere causing the CPU Load delay.

You could also enter a support ticket with QNAP and they can investigate.

The disks have been tested and running at proper speed and IOPS. I did migrate from QNAP TS-231P. So I’m thinking about resetting the whole thing starting from fresh setup.

5x WD Red Plus for RAID6 nightly backups

IOPS 73, 117, 77, 125, 113

MB/s 176, 177, 176, 194

4x WD Red SA500 2TB RAID6 storage

IOPS 40434, 40701, 40627, 40011

MB/s 537, 537, 538, 538