I recently upgraded my NAS from a QNAP TS-231P to a QNAP TS-932PX. The new setup uses 4 SSDs in RAID 6 for file sharing and 5 HDDs in RAID 6 for backups. It also runs an additional cloud backup every night.
About 10 people in our office use the NAS for file sharing, and it’s joined to Active Directory on Windows Server 2019.
Everything worked perfectly for several months. However, sometime last month users started reporting very slow performance every morning. Browsing folders or opening files can take anywhere from 2 to 30 seconds.
If I reboot the NAS in the morning, it works normally for the rest of the day. But the next morning the lag comes back, and the only way to fix it again is another reboot.
I contacted QNAP support, and their higher-level technicians tried troubleshooting but couldn’t find any clear issues. They suggested monitoring the swap file and possibly adding more RAM, but that doesn’t seem logical because:
RAM usage never goes above 40%
CPU usage stays under 10%
Network (2.5Gb) is barely utilized
I removed and rejoined the NAS to Active Directory
Ping latency between users, the server, and the NAS is always around 1 ms.
The disks are running at their proper speed.
After two weeks of troubleshooting, I’m running out of ideas. My next steps might be to remove it from Active Directory to test, or possibly reset and reformat the NAS.
Never assume CPU usage = CPU Load. They are different. You can have a low CPU usage and a high CPU load which will absolutely affect your system. To see CPU load, open an SSH shell and run the command top. There will be a line named Load Average that consists of 3 numbers. The numbers represent the last 1 minute, 5 minute and 15 minute averages of the loading or amount of threads being processed by the CPU. If the load average number is greater than your number of cores on the unit, then you will see slow speeds. For example my TS-873A has 4 real cores each capable of 2 threads which = 8 cores. So when my TS-873A gets over 8 it begins to slow down. If you have having speed issues, check this when it is slow.
If have seen issues on my QNAP where there are zombie processes that are running that are consuming which don’t get shut down properly. I had an issue a month ago where the load was showing huge numbers. Turns out it was caused by something in Hybrid Backup Sync.
What process or application is your NAS doing say overnight or something like that which could cause the slowdown? Backups can take a lot of resources and slow things down if they did not complete before the next day. So can things with other apps, etc.
So steps to resolve this:
1.) Start by looking at TOP and seeing what your load is and if it’s high seeing if there’s an app showing high in CPU usage in the list below.
2.) Begin stopping potentially high usage apps like Container Station, Virtualization Station, etc. Stop an application from App Center. Watch for a few minutes to see if the usage starts to drop. If it does, you found the offending app. Keep doing this until you find the app sucking up your loading.
3.) With the offending app stopped, reboot the NAS.
4.) After the NAS reboots then restart the app.
5.) Monitor your loading to see if the problem still occurs.
These issues can take some time and effort to figure out. If you keep pressing QNAP support to look into this, they will help you. It will take some gymnastics with them and you will need to explain everything about what you are doing. Ask for the case to be escalated. Make sure they log into your NAS from Helpdesk and look at things.
So now, you need to figure out what is causing that. It’s hard to tell what process specifically that is. So look at the following:
1.) Are you running backups overnight?
2.) Did those backups finish before morning?
3.) What happens if you don’t run a backup overnight?
4.) Start stopping applications one by one when the load is high. Start with things like Container Station, Virtualization Station, Web Server, etc. Take them down one by one. Wait a few minutes each time and see what happens to the load. When you find the app in question, it will drop pretty quickly.
Unfortunately, your TS-932PX is an ARM A57 CPU. It’s an upgrade from your TS-231P but barely. If you are using this NAS in a business situation, you need to buy a different NAS. The ARM Cortex A57 is a great MPU but really for specific embedded applications. It’s just not what you want to use to run a file server for 10 people. I’m sorry to tell you this but you bought the wrong NAS as an upgrade. You should at the very least move to an X86 NAS and the very lowest I would go there would be an AMD Ryzen V1500B type model like the TS-x73A NAS. Better yet would be an i5 or an i7 but these start to get more pricey.
The NAS is used strictly for file sharing. It’s not hosting any servers. TS-231P worked fine for years. I upgraded to TS-932PX just to increase the disk space.
The NAS works fast whole day if rebooted in the morning. Backups are finished at night, and not running in the morning.
Your suggestion about high load average was the most helpful thing ever so far and gives me a direction for troubleshooting further.
Just a side note, if it really was just a space issue, you could have upgraded the drives in the 231P. That’s what I did (QNAP TS-231P-US 2x18TB Seagate Exos)
I usually suspect the storage devices in this type of situation. Normally, I’d suggest checking the drive stats and SMART diagnostics, but I’m not sure what’s available to you with SSDs (as I don’t have any in my own NAS units).
Maybe post the SSD model numbers and see if it triggers anything for someone else.
There is a bottleneck in there somewhere causing the CPU Load delay.
You could also enter a support ticket with QNAP and they can investigate.
The disks have been tested and running at proper speed and IOPS. I did migrate from QNAP TS-231P. So I’m thinking about resetting the whole thing starting from fresh setup.