Abnormal SMART status, but IHM says the drive is healthy

SnArL817 · June 21, 2025, 2:49am

I have an Ironwolf drive that is reporting abnormal SMART status with an abnormal sector count of 10, but the IronWolf Health Monitor says that the drive is healthy. I’m scanning for bad blocks, but is there anything else I need to do? The drive only has 492 days of runtime.

dolbyman · June 21, 2025, 3:51am

Can you test the disk externally? If it really has errors it should have warranty

SnArL817 · June 21, 2025, 4:36am

Bad block scan is still running, but has reported a URE. IHM still says the drive is healthy.

I’m beginning to wonder if the IHM actually means anything, or if the QNAP is TOO sensitive…but the URE tells me that the health monitor is useless.

Once the bad block scan finishes, what’s the threshold for Seagate warranty replacement? Does it have to completely fail? Or can I tell them “X sectors have failed, send me a replacement and I’ll zero the failing drive and ship it back to you.”

SnArL817 · June 22, 2025, 6:28am

16 bad blocks detected. Ran a full SMART test, got a “Fatal or unknown error”.

IHM still says the drive is fine. Will Seagate replace it under warranty if IHM doesn’t report any problems? I guess I’ll call support on Monday and ask.

dolbyman · June 22, 2025, 3:13pm

Seagate will want you to test it with their own tool.

SteveKo · June 23, 2025, 4:01am

When SMART detects an issue, even if IHM indicates the drive is still operational, it doesn’t rule out the possibility of potential underlying risks with the hard drive.

Based on the principle that your data is paramount, we still recommend that you back up your data and replace the hard drive. This will help prevent any unforeseen data loss.

SnArL817 · June 23, 2025, 3:34pm

Seagate issued me an RMA for the drive. I’ll detach it, connect it to a server via a dock, and use the shred command to zero the device before shipping it back.

SnArL817 · June 23, 2025, 4:24pm

QUESTION: This is a RAID6 array. Will it still be usable in DEGRADED state? Or will the storage pool go into read-only mode?

SnArL817 · June 26, 2025, 1:42pm

SO, I detached the drive, pulled it, and removed it from the tray and stuck it in a USB 3.0 SATA dock attached to one of my Linux servers. I used the shred command to wipe the drive ahead of returning it to Seagate. Then I ran smartctl, and SMART still reports the faults. The fine print on the Seagate Warranty paperwork says that they will reject the claim if no trouble is found. So I download and install the SeaTools and perform a long test on the drive. Which it passed. So they will likely just ship it back to me. Remounted it and popped it back in the NAS and it’s rebuilding. The status says Healthy.

From a technology standpoint: modern drives have spare sectors. If a sector is marked bad, the firmware will remap it to a spare. The shred command writes 3 passes of random data to the drive …I aborted it about 1/3 of the way through the 3rd pass. No errors were reported during the shred. It’s entirely likely that the writes resulted in the bad blocks being relocated on the drive, and now everything is good.

I don’t trust the drive, though. Not fully. I was PLANNING on upgrading the drives next year. THIS drive will be the FIRST to get swapped out for a larger WD RED.

dolbyman · June 26, 2025, 4:04pm

SMART should say that there was relocated sectors though (there is a value for it)

SnArL817 · June 26, 2025, 4:52pm

Weirdly enough, THAT stat is 0:
retired_block_count: Value: 100, Worst: 100, Threshold: 10, Raw value: 0
In the FULL SMART stats that I got from using smartctl:
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x03 0x020 4 0 — Number of Reallocated Logical Sectors
0x03 0x028 4 20 — Read Recovery Attempts
0x03 0x030 4 0 — Number of Mechanical Start Failures
0x03 0x038 4 0 — Number of Realloc. Candidate Logical Sectors
0x03 0x040 4 3 — Number of High Priority Unload Events
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 18 — Number of Reported Uncorrectable Errors
0x04 0x010 4 0 — Resets Between Cmd Acceptance and Completion
0x04 0x018 4 0 -D- Physical Element Status Changed

In the FARM logs:
FARM Log Page 3: Error Statistics
Unrecoverable Read Errors: 0
Unrecoverable Write Errors: 0
Number of Reallocated Sectors: 0
Number of Read Recovery Attempts: 20
Number of Mechanical Start Failures: 0
Number of Reallocated Candidate Sectors: 0
Number of ASR Events: 24

	Uncorrectable errors: 0
	Cumulative Lifetime Unrecoverable Read errors due to ERC: 0

…
Cum Lifetime Unrecoverable by head 7:
Cumulative Lifetime Unrecoverable Read Repeating: 18
Cumulative Lifetime Unrecoverable Read Unique: 0

FARM Log Page 5: Reliability Statistics
	Error Rate (SMART Attribute 1 Raw): 0x000000000bf750ae
	Error Rate (SMART Attribute 1 Normalized): 83
	Error Rate (SMART Attribute 1 Worst): 64
	Seek Error Rate (SMART Attr 7 Raw): 0x0000000022dc4e9e
	Seek Error Rate (SMART Attr 7 Normalized): 88
	Seek Error Rate (SMART Attr 7 Worst): 60
	High Priority Unload Events: 3
	Helium Pressure Threshold Tripped: 0
	LBAs Corrected By Parity Sector: 1

Based on this, it looks like the drive experienced a transient fault, the QNAP freaked out, and now everything seems to be ok. Of course, those 18 errors are logged.
Error 18 occurred at disk power-on lifetime: 11814 hours (492 days + 6 hours)
Error: UNC at LBA = 0x0fffffff = 268435455

Error 17:
Error: WP at LBA = 0x0fffffff = 268435455

And of course, the self test history shows:
#3 Extended offline Completed: read failure 90% 11830 1580896880

Subsequent testing says everything is ok. Once the rebuild is done, I intend to initiate ANOTHER full SMART test, and then a bad block scan. This is REALLY weird, and that’s coming from someone who has had to troubleshoot transient failures before. This doesn’t exactly boost my confidence in Seagate any. I suppose if it happens again, I’ll initiate a RAID scrub (for those more familiar with mdadm, “echo resync > /sys/block/md1/md/sync_action” which will basically perform a writeback and parity validation. On a RAID 5, block errors can fatally corrupt your data, but with RAID 6, the extra parity bit can usually resolve these. Removing the drive, wiping it, then putting it back in and letting the system rebuild it works, too, but takes longer. The upside is: a wiped drive CAN’T screw up the parity calculations.

dolbyman · June 26, 2025, 4:56pm

You can do a RAID scrub via GUI too

https://docs.qnap.com/operating-system/qts/4.5.x/en-us/GUID-0BE980D3-1190-486B-A1E8-228DB71A9C93.html

Those FARM values have been in the news recently, as the only indicators for massive mislabelled drive fraud on Seagate drives

Amol · June 27, 2025, 12:58am

@SnArL817

This may help you: Why does my NAS show an 'Abnormal' S.M.A.R.T. status but IHM reports the drive as 'Healthy'? | QNAP