I/O errors, but after running badblocks everything works again : how is that possible?
TLDR;
HDD seemed damaged. Unable to format partition (mkfs.ext4
I/O errors), even with a newly created GPT table. SMART test shows some errors. I was about to throw the disk away. Before that, out of curiosity, I ran a full badblocks
test. Big surprise : it didn’t detect any bad blocks ! Went back to GParted, created a GPT table + a few partitions. Everything works fine now ! What did badblocks
do ?
The full story
I am trying to make sense of what just happened : I was about to throw a HDD away because I was unable to create partitions on it, and SMART showed some errors. Before throwing the disk away I just wanted to play a little with badblocks
, and … big surprise : badblocks
seemed to have repaired my disk ! I didn’t even know that it could do that ! So I am happy now, I can indeed use my disk, it works fine, but I am still trying to figure out what just happened !
It’s a 4TB Seagate HDD that I hadn’t used in a few years. I plugged it in a SATA ↔ USB adapter (adapter works fine, I use it with several other HDDs). Wirh GParted I created a new GPT partition table, and then a partition. It was unable to proceed to the end, there was a mkfs.ext4
I/O error :
(...)
Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: 0/895
mke2fs 1.46.2 (28-Feb-2021)
mkfs.ext4: Input/output error while writing out and closing file system
I tried several times, with different USB adapters, different USB cables, different USB ports. Never worked.
I then did a SMART short test :
# smartctl -t short -C /dev/sde
(...)
# smartctl -a /dev/sde
(...)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed: read failure 90% 528 191105024
(...)
Obviously the HDD seems defect, right ? So I was about to throw it away, but did a badblocks
test before :
# badblocks -wvs -t random -b 4096 /dev/sde
Checking for bad blocks in read-write mode
From block 0 to 976754645
Testing with random pattern: done
Reading and comparing: done
Pass completed, 0 bad blocks found. (0/0/0 errors)
The test lasted about 19 hours (4TB disk), it didn’t show any errors. I was very surprised !
Back to GParted, created a new GPT table, some partitions, everything went smooth.
I ended up doing some copy tests I am used to do, in order to check the disk’s performances, and everything seems normal (155MB/s R/W when copying big files).
Also did another SMART short test, it completed without error
this time :
# smartctl -t short -C /dev/sde
(...)
# smartctl -a /dev/sde
(...)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed without error 00% 549 -
# 2 Short captive Completed: read failure 90% 528 191105024
(...)
Can someone make sense of that ? It’s as if running badblocks
somehow repaired my HDD. How is that possible ? Is badblocks
even supposed to do that ?
Note : more info is available if needed (full SMART output and full GParted results)
Yes, badblocks
can have that effect — not really by design, but because hard drives can remap failing blocks, and will do so when they encounter a failed block during a write (since there’s no data that can be lost). By writing to every single accessible sector in the drive, badblocks
gives ample opportunity for the drive to do so; and if the drive’s spare capacity is sufficient to remap all the failed blocks, badblocks
won’t see anything amiss.
If you run smartctl -a
on the drive, you should see that it has a non-zero “reallocated sector count” (attribute 5). This indicates that it has remapped sectors.
While the drive may work fine now, this does indicate that it has problems, so it should be treated with suspicion; if part of its storage has failed, more is liable to fail in the not-too-distant future.
See also SSD: `badblocks` / `e2fsck -c` vs reallocated/remapped sectors.