Lost my Volume

I have a TS-453D with QTS 5.2.8.3359 and using a Mac to access the system with a Firefox browser
The system has 4 Seagate IronWolf Pro 4TB disks configured with RAID5 and a Hot Spare
The Hot Spare presented a problem, disappeared from the system and I had it replaced with a new one
The new drive had a better performance from the other ones and I decided to use it as a Data Drive
I removed one of the drives and the system begun syncing
While syncing another drive had R/W errors
I re-inserted the drive that I removed earlier but this is now presented as a free drive
I have scan twice the faulty drive for bad blocks but the health status still shows an “Error” indication
The volume is read only
Is there a way to reintroduce the drive I removed to the raid group and restart syncing?
Any help would be greatly appreciated

Yeah, well a couple things here.

1.) First of all, what do you mean that you have a hot spare? Do you mean you have one disk that is never used? You should then have a 5 disk RAID5 as by definition, RAID 5 has redundancy. You have had a disk sitting there doing effectively nothing when it could have been part of the array giving you more storage.

2.) The performance of a RAID is driven by the performance of the “worst” drive in the bunch. So adding a better performing drive to the RAID did nothing to help.

3.) Your RAID was rebuilding when you swapped drives the second time. Now you effectively acted like you had another failed drive. I pray you have a backup. One should never ever never ever remove any drive from the array when it is rebuilding.

4.) Reinserting the first drive you removed won’t help as the array sees that drive as gone now and was rebuilding the storage space.

5.) You could try reinserting the drive that had the R/W errors but I’m not sure that will work either.

Maybe someone else can provide better insight, but I do hope you have a backup.

Yeah..with no backups, your only chance would be to contact a data recovery specialist..that’s gona be expensive.

Thank you for your response

On number 3, I removed only one drive for the hot spare to take over.

While it was syncing another drive presented read write errors

I will try number 5

I was counting on a recovery idea from you but I get it, no miracles today

Thanks dolbyman

I understand. But the RAID was busy rebuilding. While it was rebuilding consider that another drive failed. But did it fail or did you just start to get errors? Still by removing that drive you basically introduced another failure. It would have been better to simply wait for the RAID to fully sync before removing that drive. But this is the problem with RAID5. You can only afford to lose a single drive. If you lose a drive and then lose a second drive during the rebuild process (which can take days), then you lose everything.

I do hope you have a backup of this data.

And for the record: Don’t sit there in the future with another “hot standby” drive. That’s a waste of a hard drive. Either add it to your array to utilize it as storage space in RAID 5 (which already has “hot spare” built in) or make your array RAID6 which gives you even more redundancy. If you aren’t going to use the storage space of that hot standby drive then utilize a RAID6 configuration.

I never removed another drive, It doesn’t get more clear than that, right?

It started giving errors and prompted for a bad sector check. The drives are still in their place.

Now, I don’t know where this hot spare notion of being built in RAID 5 comes from. Probably you mean that a RAID 5 group is able to sustain a single errors, because that’s what it is. The hot spare is used in case you loose this one drive to have the data sync with the other drives so your exposure to a second error, as far as time is concerned, is minimized and that’s it’s usefulness.

Having a “hot” spare generally doesn’t work well. If the drive is powered up and running, then the motor and runtime end up being the same as the other drives shortening it’s lifespan. Yes, the actuator isn’t used as much perhaps, but unless you run a production server, it is typically better to have a “cold” spare sitting on the shelf ready to go.

RAID 5 (6, 10 or any level) should never be considered a “backup”. This has been covered to death in a million articles.

Any RAID failure or disk in an array that starts throwing errors clauses huge amounts of effort by the system to recover or sync. This OFTEN causes other drives to start throwing erros or fail AT THE SAME time as they are generally all the same vintage and usage, but even if they are not, resync causes a lot of stress.

Having said all that, the only prevention (for others reading this as it may be too late for you) is a backup to another device (or preferrably devices) as part of a backup strategy that matches the level of importance and variability of your data.

As @dolbyman pointed out, once you have pulled disks, resyncs have started and 1 or more other drives are throwing errors, assuming you don’t have a backup, then your best plan is a data recovery specialist BEFORE YOU STAT MESSING AROUND AND MAKING THINGS WORSE.

You may want to check by entering a support ticket with QNAP support, but you are more likely to make things WORSE if you mess around with it at this point.

1 Like

Thank you for your time dosborne.

Actually having a hot spare has proven quite useful to me over the years as well as to many others. I am an IT professional, not an engineer, and one of the companies I worked for was EMC.

RAID is not backup for more that one reasons nor it was treated as such and you are right to point it out for the benefit of whoever reads this.

The drives were not bought at the same time but even if they were statistically speaking, it is rare to present errors only a few minutes after sync commenced. Yes, resyncing is indeed a stressful operation. The drives are less than two years old though.

I went through with chatgpt and a useful advise was to get on to the system via smb or ssh and try to copy files. This way I was able to see some of the files but the directory with the data most crucial to me needed cleaning and this was a “forbidden” operation. The conclusion was shut the system down and try to mount the drives via linux.

Rare, perhaps. But I’ve seen many posts in the old forum where this very thing happened. It isn’t a random suggestion, it is based on evidence from many peoples actual circumstances. It makes sense. Syncing puts a drive under probably the most load it will probably ever see. The more drives you have, the greater the chances that one will be stressed to the point where something bad happens. Sure, it may be rare, but the odds increase above zero.

To each their own, but also as an “IT Professional” with 40 years in the business, I would not be depending on ChatGPT just yet :slight_smile: :slight_smile: Chat GPT should have pointed out the extremely complex process it is to “mount” a raid array on a standard Linux box as QNAP systems in particular have multiple layers that need to be dealt with. Running random Chat-suggested commands, or making attempts to load the filesystem without knowing what you are doing puts any chance of actual data recovery in jeopardy.

I strongly encourage you to contact the actual professionals, starting with QNAP support, to see if they can advise you properly, with actual knowledge of their system internals. If they are not able to provide you direction, then as mentioned, you may have to resort to data recovery specialists.

Accessing the files via SMB/FTP/SSH was a good suggestion, and in rare cases does provide recovery of some files. Ultimately, it seems that the possibility of recovery will come down to how bad the remaining drive that is throwing errors really is.

I wish you success in your endeavour and am confident support can get you at least part of the way there.

Not relying on chatgpt is a fair point. That’s why I came here first. I trust people here more than any AI. It was a move of desperation proceeding consulting chatgpt . The idea to find the files via smb was a good one though.

1 Like

OK. I read it differently. I read that you pulled that drive that was starting to throw errors and put the first drive you removed in that slot.

It’s not a hot spare per se but since the total drive capacity is (NumberDrive-1)*space_of_smallest_drive, you have redundancy built in case a drive fails. It is lost on me how a “hot standby” drive would work as it is not part of the RAID Array. Even if it has all your data, that makes no difference. As soon as you add it to the RAID, then it’s going to start rebuilding and writing the data to the drive as needed for that drive in the array. But you do you…

Bottom line is when a RAID is syncing you can’t mess with removing drives.

At this point, I would open a ticket with QNAP.

To prevent the situation from worsening, please refrain from swapping hard drives or making any adjustments to the system settings at this time.

We strongly recommend that you back up your important data as soon as possible and submit a support ticket to us. Our Support Team will then assist you with further analysis and resolution. Thank you!

Hi Steve,

Since I have not open a support ticket before, could you please tell me what would be the appropriate way to do it?

The system is shut down

Regarding this issue, please submit a support ticket directly via the Helpdesk app on your NAS, or you can also do so by visiting this URL: Customer Service - QNAP

Thank you!

Regarding hot spare. You assign a hot spare drive to the RAID Group or Pool. There is an indication next to this spare drive saying “Spare”. So essentially it is part of the RAID Group and automatically kicks in the moment a drive fails and starts syncing. Therefore the time you are exposed to a second drive failing is less than a manual intervention. So you get more time to procure a replacement drive.

Except, you need to note that the additional stress of a sync to the new spare may cause other drives to have failures or fail completely. This is not theoretical, but there is a risk even if relatively low.

Remember the rule: RAID is NOT a BACKUP. Hot spares is NOT a BACKUP.

:slight_smile:

So if a disk fails you are not going to replace it?

You miss the point. People rely on RAID and do not have backups. This puts them into a position where they think they are safe. Reality is different. This post, and the same echoed all over the internet, is that the only way to protect your data is to back it up to an external medium.

Doing ANYTHING with your volume or physical disks introduces risk. Just take a look through this forum where people count on RAID, or snapshots, or volumes within the same NAS, then do stuff randomly and are surprised when they can no longer get to their data.

It is a well known fact that rebuilding an array causes stress on all the drives.

Every user should have a backup STRATEGY.

A backup strategy identifies the risks and has a plan based on the value of the data. Are you protecting against a disk error, a drive failure, multiple drive failures, a NAS failure, a fire, a flood, theft, accidental file deletion, intentional file deletion, malware, a power failure, a network failure, a cloud failure, malware against the NAS, malware against the cloud service, etc.

Some data is important, some isn’t. Every data set needs a matching strategy. Some doesn’t need backup (could be downloaded again), some just needs a simple copy to another location, some needs versioning, some needs replication, some needs staged backups, some needs active backups.

Only the user can decide what is right, BUT THEY MUST BE INFORMED DECISIONS USING ACTUAL DATA. Thinking you are safe because you setup RAID 1 for example doesn’t protect you against accidentally erasing a file you need, theft, NAS failure, etc. With the exact same setup, as 2 volumes instead, maybe a staged backup to a second drive is more important for that users use case, maybe not.

Each situation and data set is unique. There is no 1 right answer. Its about educating the user so that can decide how valuable the data is to them and pick a strategy that probably includes multiple solutions. Example: some data doesn’t need to be backed up, some goes on a RAID array, some gets copied to the cloud, some get replicated to another NAS, some gets copied to a USB drive and stored in a safety deposit box. As a home user, I use all these strategies and more. each data set gets matched to a backup strategy based on my application of a value of the data.

Maybe YOUR data and YOUR setup works with a RAID with a hot spare. This will not protect you against multiple drive failure, file corruption or deletion, malware, theft, fire, power failure or NAS failure (unless your setup can be migrated to a new NAS - but that isn’t really a “backup”).

Again, this may work for you, but a hot spare is not a magical solution that will save you 100% of the time.

(There are some thoughts that mixing drive models or different aged drives also increases the risk (above zero) but it doesn’t sound like you’d be too receptive to hearing about that :slight_smile: )

2 Likes

Further, replacing the hardware is not the same as replacing the data.

A simple solution in a desperate situation, before replacing the drive, take a (new) backup of the data (Ideally, have backups all the time of course so this isn’t as important). Even in degraded mode, a backup is less risky than a rebuild. Then, once the data has been saved, deal with the hardware issue.