NAS lagging

I recently upgraded my NAS from a QNAP TS-231P to a QNAP TS-932PX. The new setup uses 4 SSDs in RAID 6 for file sharing and 5 HDDs in RAID 6 for backups. It also runs an additional cloud backup every night.

About 10 people in our office use the NAS for file sharing, and it’s joined to Active Directory on Windows Server 2019.

Everything worked perfectly for several months. However, sometime last month users started reporting very slow performance every morning. Browsing folders or opening files can take anywhere from 2 to 30 seconds.

If I reboot the NAS in the morning, it works normally for the rest of the day. But the next morning the lag comes back, and the only way to fix it again is another reboot.

I contacted QNAP support, and their higher-level technicians tried troubleshooting but couldn’t find any clear issues. They suggested monitoring the swap file and possibly adding more RAM, but that doesn’t seem logical because:

  • RAM usage never goes above 40%

  • CPU usage stays under 10%

  • Network (2.5Gb) is barely utilized

  • I removed and rejoined the NAS to Active Directory

  • Ping latency between users, the server, and the NAS is always around 1 ms.

  • The disks are running at their proper speed.

After two weeks of troubleshooting, I’m running out of ideas. My next steps might be to remove it from Active Directory to test, or possibly reset and reformat the NAS.

Thanks for any suggestions.

So a couple things:

Never assume CPU usage = CPU Load. They are different. You can have a low CPU usage and a high CPU load which will absolutely affect your system. To see CPU load, open an SSH shell and run the command top. There will be a line named Load Average that consists of 3 numbers. The numbers represent the last 1 minute, 5 minute and 15 minute averages of the loading or amount of threads being processed by the CPU. If the load average number is greater than your number of cores on the unit, then you will see slow speeds. For example my TS-873A has 4 real cores each capable of 2 threads which = 8 cores. So when my TS-873A gets over 8 it begins to slow down. If you have having speed issues, check this when it is slow.

image

If have seen issues on my QNAP where there are zombie processes that are running that are consuming which don’t get shut down properly. I had an issue a month ago where the load was showing huge numbers. Turns out it was caused by something in Hybrid Backup Sync.

What process or application is your NAS doing say overnight or something like that which could cause the slowdown? Backups can take a lot of resources and slow things down if they did not complete before the next day. So can things with other apps, etc.

So steps to resolve this:

1.) Start by looking at TOP and seeing what your load is and if it’s high seeing if there’s an app showing high in CPU usage in the list below.

2.) Begin stopping potentially high usage apps like Container Station, Virtualization Station, etc. Stop an application from App Center. Watch for a few minutes to see if the usage starts to drop. If it does, you found the offending app. Keep doing this until you find the app sucking up your loading.

3.) With the offending app stopped, reboot the NAS.

4.) After the NAS reboots then restart the app.

5.) Monitor your loading to see if the problem still occurs.

These issues can take some time and effort to figure out. If you keep pressing QNAP support to look into this, they will help you. It will take some gymnastics with them and you will need to explain everything about what you are doing. Ask for the case to be escalated. Make sure they log into your NAS from Helpdesk and look at things.

This is during slowdown on Feb 27 and I see similar load average for other days during slowdown

\[\~\] # top
Mem: 3710592K used, 325376K free, 54016K shrd, 1495552K buff, 298496K cached
CPU:  6.5% usr  0.0% sys  0.0% nic  0.0% idle 93.4% io  0.0% irq  0.0% sirq
Load average: 22.89 20.50 13.37 2/879 17322
PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
9727     1 admin    S    2314m 58.3   3  2.1 /sbin/hal_daemon -f
17316 14395 admin    R     3520  0.0   1  2.1 top
24832 17582 admin    S <  3060m 77.1   0  0.0 /usr/local/apache/bin/apache_proxy
29175     1 admin    S    1406m 35.4   1  0.0 {cc3-fastcgi} python /share/CACHED
14546     1 admin    S    1238m 31.1   1  0.0 /usr/local/sbin/qulogdb --defaults
26475     1 admin    S    1210m 30.4   0  0.0 {p2pagent} /share/CACHEDEV2_DATA/.
30866 18157 admin    S    1205m 30.3   3  0.0 {apache_proxys} /usr/local/apache/
31531     1 admin    S     967m 24.3   2  0.0 /usr/local/mariadb/bin/mysqld --de
15705 15704 admin    S     856m 21.5   0  0.0 /usr/local/sbin/ncd
12976     1 admin    S     855m 21.5   1  0.0 /usr/local/mariadb/bin/mysqld --de
23248     1 admin    S     838m 21.1   2  0.0 /usr/local/mariadb/bin/mysqld --de
15614     1 admin    S     486m 12.2   2  0.0 /usr/local/sbin/ncdb --defaults-fi
7338     1 admin    S     425m 10.7   2  0.0 /mnt/ext/opt/Python/bin/python ./m
26437     1 admin    S     326m  8.2   0  0.0 tunnelagent
5698     1 admin    S     296m  7.4   1  0.0 /sbin/cs_qdaemon
11507     1 admin    S     266m  6.7   3  0.0 /mnt/ext/opt/Python/bin/python /mn
14644 14636 admin    S     260m  6.5   2  0.0 /usr/local/sbin/qulogd
14366     1 admin    S     260m  6.5   0  0.0 /usr/local/sbin/qulogd
17118     1 admin    S     260m  6.5   2  0.0 /usr/local/sbin/qulogd
31916     1 admin    S     249m  6.2   3  0.0 /usr/local/bin/qsyncsrv_monitor -p

This is now when nobody’s working

Mem: 3723328K used, 312640K free, 46592K shrd, 1653312K buff, 219584K cached
CPU:  0.9% usr  1.3% sys  0.0% nic 97.6% idle  0.0% io  0.0% irq  0.0% sirq
Load average: 1.28 1.32 1.40 1/823 6903
PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
9838     1 admin    S    2251m 56.7   1  0.9 /sbin/hal_daemon -f
31602     1 admin    S     903m 22.7   2  0.0 /usr/local/mariadb/bin/mysqld --defaults-file=/etc/
9890     1 admin    S    46976  1.1   3  0.0 {nmd} python /usr/local/network/nmd/nmd.pyc
7426     1 admin    S    14912  0.3   0  0.0 /mnt/ext/opt/netmgr/util/redis/redis-server \*:0
6856 28205 admin    R     4096  0.1   0  0.0 top
3808 17771 admin    S <  3060m 77.1   2  0.0 /usr/local/apache/bin/apache_proxy -k start -f /etc
29211     1 admin    S    1406m 35.4   0  0.0 {cc3-fastcgi} python /share/CACHEDEV3_DATA/.qpkg/Hy
14657     1 admin    S    1238m 31.1   3  0.0 /usr/local/sbin/qulogdb --defaults-file=/mnt/ext/op
26675     1 admin    S    1209m 30.4   0  0.0 /share/CACHEDEV2_DATA/.qpkg/CloudLink/bin/p2pagent
30877 18338 admin    S    1205m 30.3   2  0.0 /usr/local/apache/bin/apache_proxys -k start -f /et
13123     1 admin    S     855m 21.5   0  0.0 /usr/local/mariadb/bin/mysqld --defaults-file=/usr/
23639     1 admin    S     838m 21.1   0  0.0 /usr/local/mariadb/bin/mysqld --defaults-file=/etc/
15746 15742 admin    S     827m 20.8   2  0.0 /usr/local/sbin/ncd
15651     1 admin    S     485m 12.2   2  0.0 /usr/local/sbin/ncdb --defaults-file=/mnt/ext/opt/N
7388     1 admin    S     425m 10.7   3  0.0 /mnt/ext/opt/Python/bin/python ./manage.pyc runfcgi
27172     1 admin    S     328m  8.2   0  0.0 tunnelagent
5757     1 admin    S     296m  7.4   1  0.0 /sbin/cs_qdaemon
11649     1 admin    S     266m  6.7   1  0.0 /mnt/ext/opt/Python/bin/python /mnt/ext/opt/netmgr/
14759 14751 admin    S     260m  6.5   2  0.0 /usr/local/sbin/qulogd
31944     1 admin    S     249m  6.2   1  0.0 /usr/local/bin/qsyncsrv_monitor -pid:31940 -reg:/sh
23915     1 admin    S     248m  6.2   0  0.0 /usr/local/sbin/pp_qcoolied -f /etc/config/pp_qcool
4329     1 admin    S     227m  5.7   3  0.0 /sbin/lvmetad
20500     1 admin    S     219m  5.5   2  0.0 /usr/sbin/rsyslogd -f /etc/rsyslog_only_klog.conf -
30153     1 admin    S     194m  4.8   1  0.0 {php-fpm-proxy} php-fpm: master process (/etc/php-f
30157 30153 admin    S     194m  4.8   2  0.0 {php-fpm-proxy} php-fpm: pool www
30158 30153 admin    S     194m  4.8   1  0.0 {php-fpm-proxy} php-fpm: pool www
31833     1 admin    S     192m  4.8   3  0.0 /sbin/qsyncsrv_dbm -b
20970     1 admin    S     174m  4.4   1  0.0 /sbin/qShield
20965     1 admin    S     174m  4.4   3  0.0 qNoticeEngined: Write notice is enabled…
19864     1 admin    S     169m  4.2   1  0.0 /usr/local/bin/rfsd_qmonitor -f:/tmp/rfsd_qmonitor.
18780     1 admin    S     168m  4.2   2  0.0 /mnt/ext/opt/Python/bin/python2 /sbin/wsd.py
20967     1 admin    S     167m  4.2   2  0.0 qLogEngined: Write log is enabled…
10396 10393 admin    S     149m  3.7   0  0.0 /usr/local/bin/rates_monitor_start
18367 32316 admin    S     135m  3.4   2  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
4595 32316 admin    S     129m  3.2   0  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
24710 32316 admin    S     126m  3.1   3  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
3771 32316 admin    S     126m  3.1   1  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
9922  9890 admin    S     116m  2.9   2  0.0 {ncaas} python /usr/local/network/nmd/nmd.pyc
9923  9890 admin    S     116m  2.9   1  0.0 {qserviced} python /usr/local/network/nmd/nmd.pyc
32424 32316 admin    S     113m  2.8   2  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
32316     1 admin    S     113m  2.8   0  0.0 /usr/local/samba/sbin/smbd -l /var/log -D -s /etc/c
14751     1 admin    S     108m  2.7   2  0.0 /usr/local/sbin/qulogd
10729  9808 admin    S <   107m  2.7   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
15742     1 admin    S     106m  2.6   2  0.0 /usr/local/sbin/ncd
21592     1 admin    S     104m  2.6   2  0.0 /usr/bin/qsnapman
29655     1 admin    S      99m  2.5   2  0.0 /usr/bin/RTRR_MANAGER
9846  9808 admin    S <    98m  2.4   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
15727     1 admin    S    99520  2.4   0  0.0 /usr/local/sbin/ncloud
9847  9808 admin    S <  97152  2.3   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
9902  9808 admin    S <  96832  2.3   3  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
32321 32316 admin    S    95360  2.3   0  0.0 {cleanupd} /usr/local/samba/sbin/smbd -l /var/log -
32319 32316 admin    S    95232  2.3   3  0.0 {smbd-notifyd} /usr/local/samba/sbin/smbd -l /var/l
7037  9808 admin    S <  93312  2.3   2  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
9808     1 admin    S <  86912  2.1   0  0.0 /usr/local/samba/sbin/winbindd -s /etc/config/smb.c
19856     1 admin    S    77056  1.9   2  0.0 /sbin/rfsd -i -f /etc/rfsd.conf
30107 30105 admin    S <  73792  1.8   0  0.0 /home/httpd/cgi-bin/qsync/qsyncsrv.fcgi

I also fed the stats to AI

https://x.com/i/grok/share/c0ff440f825b46b8a25c50b0050f0416

So a load of 22.89 is going to be brutally slow.

So now, you need to figure out what is causing that. It’s hard to tell what process specifically that is. So look at the following:

1.) Are you running backups overnight?
2.) Did those backups finish before morning?
3.) What happens if you don’t run a backup overnight?
4.) Start stopping applications one by one when the load is high. Start with things like Container Station, Virtualization Station, Web Server, etc. Take them down one by one. Wait a few minutes each time and see what happens to the load. When you find the app in question, it will drop pretty quickly.

Unfortunately, your TS-932PX is an ARM A57 CPU. It’s an upgrade from your TS-231P but barely. If you are using this NAS in a business situation, you need to buy a different NAS. The ARM Cortex A57 is a great MPU but really for specific embedded applications. It’s just not what you want to use to run a file server for 10 people. I’m sorry to tell you this but you bought the wrong NAS as an upgrade. You should at the very least move to an X86 NAS and the very lowest I would go there would be an AMD Ryzen V1500B type model like the TS-x73A NAS. Better yet would be an i5 or an i7 but these start to get more pricey.

The NAS is used strictly for file sharing. It’s not hosting any servers. TS-231P worked fine for years. I upgraded to TS-932PX just to increase the disk space.

The NAS works fast whole day if rebooted in the morning. Backups are finished at night, and not running in the morning.

Your suggestion about high load average was the most helpful thing ever so far and gives me a direction for troubleshooting further.

Definitely not good.

Just a side note, if it really was just a space issue, you could have upgraded the drives in the 231P. That’s what I did (QNAP TS-231P-US 2x18TB Seagate Exos)

I usually suspect the storage devices in this type of situation. Normally, I’d suggest checking the drive stats and SMART diagnostics, but I’m not sure what’s available to you with SSDs (as I don’t have any in my own NAS units).

Maybe post the SSD model numbers and see if it triggers anything for someone else.

There is a bottleneck in there somewhere causing the CPU Load delay.

You could also enter a support ticket with QNAP and they can investigate.

The disks have been tested and running at proper speed and IOPS. I did migrate from QNAP TS-231P. So I’m thinking about resetting the whole thing starting from fresh setup.

5x WD Red Plus for RAID6 nightly backups

IOPS 73, 117, 77, 125, 113

MB/s 176, 177, 176, 194

4x WD Red SA500 2TB RAID6 storage

IOPS 40434, 40701, 40627, 40011

MB/s 537, 537, 538, 538

So it is possible that there are some zombie threads left over from your backups that are not shutting down properly. I’ve seen this before.

What backup app are you using?

If you are using Hybrid Backup Sync do this:

1.) Shut down HBS
2.) Reboot the NAS
3.) Start HBS

There’s something about shutting down the app before rebooting that will kill all these leftover processes and they won’t start up again as opposed to rebooting which basically brings things back the way they were.

I disabled backup service completely and it was still slow the next morning. My latest findings is that it’s probably related to the swap file which grows to 800+MB. Nobody knows why. Tech support will connect remotely today.

I chose TS-932PX because of the amount of drives it supports. 4x 2.5” and 5x 3.5” which is very convenient. I’ve been using Synology NAS my whole life, and went with QNAP because this office already had a QNAP which made it easy to transfer the settings. I kind of regret not going with Synology.

So did you reboot the NAS after disabling your backup software?

800 MB swap should not be a problem. My swap is at 1 Gig right now and my NAS is running fine.

Have you done as I suggested and slowly shut down apps/services you are running to see what is going on?

Also, silly me for not asking this: Are you using Qsirch? Qsirch will take up an enormous amount of system resources until it indexes your entire drive.

I don’t think you are using that based on your TOP results but I have to ask. I went back and looked at the TOP results you posted. Something stands out:

First how much memory do you have?

What are you using that uses Apache? That’s the built in web server. You should not need to have it enabled unless you are running a web server app of some kind. These apps are all taking up some solid virtual memory and the %VSZ number is what is troubling. I just took a look at my rather loaded TS-873A and while my apps are all using some solid virtual memory numbers none of them are above 12% Virtual Size.

Based on your TOP output, you have used 3.71 GB of memory. You have 325 MB of memory free. This is your problem. You need more memory especially if you have 10 people sharing your NAS. Are you running with the stock 4 GB that came with the unit?

You should run at least 16 Gig if not 32 Gig IMO.

This is NOT a QNAP vs. Synology issue. This is a NAS w/o proper resources applied to it.

The old NAS had 1GB of RAM and it run fine for years.

This is from tech support. The QNAP devs are currently investigating it.

-two HDDs (sda and sdb) are experiencing very high I/O utilization (~100%), which makes the RAID pool and NAS enter a busy state.

  • free memory drops to around 450 MB, and the CPU load average reaches 24.9.

10 people accessing shared files over SMB shouldn’t require more RAM on NAS.

I checked and the web server is not enabled. From checking online, it’s Apache proxy used by QTS

Suit yourself. You can ignore our suggestions, but I am telling you that 4 Gig of RAM is not sufficient. You have high virtual memory utilization because your apps are sucking up the memory you have an don’t have enough room to operate. I don’t care if your old NAS had 1 Gig and ran fine for years. It likely ran an older OS as well. My old IBM PC had 128k. It’s not enough today!

You can ignore our advice if you choose, but then there is nothing more we can do to help you.

Apache is taking up a huge amount of virtual memory resources. I don’t see this on my NAS at all. That’s why I questioned about what you are using Apache for. On my NAS, it’s so far down the list, Ican’t see it.

You also have a peer to peer agent (p2pagent) running as well that’s taking up memory resources.

You have never stated if you have attempted to shut down other apps and see what happens. That has been my suggestion from the beginning but you keep saying all you have is people sharing files, but what we are seeing is large amounts of virtual memory usage because your apps that are running are doing something, are filling up the available memory and as such, they have having to run with some of their resources cached to disk which is why it’s all so slow.

But whatever…

Web server is not checked. I don’t know how to disable Apache, and why is it running anyway? I though it was a part of the QNAP OS.

I don’t know how to shutdown apps that are not part of the NAS store, and no apps are installed form the NAS store except for the backup, which is currently disabled.

Nothing should be cached anywhere. We’re only using basic folder shares. Something is not right with the configuration. I’m going refresh the NAS from scratch. I have a feeling something didn’t go right with transition from the old NAS.

I really appreciate your help.

I think that might be a great idea as it would eliminate a number of variables.

I’m confused though by the steps that you actually took when you say you “transitioned” from your old NAS. What or how did you do for this? It may be pointless, but it may provide some insight as to what is going on. It would be unfortunate if there was an easy solution or you ended up reconfiguring something that is having an adverse effect.

It is, but the problem is that it appears it’s using quite a bit of memory and you don’t have a lot of memory.

You shut down apps from the App Center. Click the arrow next to the app and select stop:

If all you are using is file sharing and Hybrid Backup Sync, there may be other apps you can stop as well. Worth looking at.

OK, when I say cached, this is the OS caching your apps to disc because you don’t have enough memory. So you have all these processes running, you run out of memory so the OS then uses the hard drive space as virtual memory. It writes part or all of the apps in memory to the hard drive. Then when you want to access them, it has to read from memory, etc. This is the “swap” file.

You can try rebuilding the NAS. But I’m still thinking you are going to be running out of memory. Four gigabytes in a modern system is just not enough -especially a server…

I don’t have anything installed except for HBS3 which is currently disabled.

I moved the disks from the old NAS and restored a config backup file on the new NAS.

I’m going to do a full reset and reformat this weekend.

The NAS became a lot faster after doing a full factory reset and disk re-format.

Good. I hope that your memory issues don’t return. If they do then your solution will be to get more RAM. It’s never a bad thing