I/O errors, but after running badblocks everything works again : how is that possible?


HDD seemed damaged. Unable to format partition (mkfs.ext4 I/O errors), even with a newly created GPT table. SMART test shows some errors. I was about to throw the disk away. Before that, out of curiosity, I ran a full badblocks test. Big surprise : it didn’t detect any bad blocks ! Went back to GParted, created a GPT table + a few partitions. Everything works fine now ! What did badblocks do ?

The full story

I am trying to make sense of what just happened : I was about to throw a HDD away because I was unable to create partitions on it, and SMART showed some errors. Before throwing the disk away I just wanted to play a little with badblocks, and … big surprise : badblocks seemed to have repaired my disk ! I didn’t even know that it could do that ! So I am happy now, I can indeed use my disk, it works fine, but I am still trying to figure out what just happened !

It’s a 4TB Seagate HDD that I hadn’t used in a few years. I plugged it in a SATA ↔ USB adapter (adapter works fine, I use it with several other HDDs). Wirh GParted I created a new GPT partition table, and then a partition. It was unable to proceed to the end, there was a mkfs.ext4 I/O error :

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: 0/895
mke2fs 1.46.2 (28-Feb-2021)
mkfs.ext4: Input/output error while writing out and closing file system

I tried several times, with different USB adapters, different USB cables, different USB ports. Never worked.

I then did a SMART short test :

# smartctl -t short -C /dev/sde

# smartctl -a /dev/sde
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%       528         191105024

Obviously the HDD seems defect, right ? So I was about to throw it away, but did a badblocks test before :

# badblocks -wvs -t random -b 4096 /dev/sde
Checking for bad blocks in read-write mode
From block 0 to 976754645
Testing with random pattern: done                                                 
Reading and comparing: done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

The test lasted about 19 hours (4TB disk), it didn’t show any errors. I was very surprised !

Back to GParted, created a new GPT table, some partitions, everything went smooth.

I ended up doing some copy tests I am used to do, in order to check the disk’s performances, and everything seems normal (155MB/s R/W when copying big files).

Also did another SMART short test, it completed without error this time :

# smartctl -t short -C /dev/sde

# smartctl -a /dev/sde
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed without error       00%       549         -
# 2  Short captive       Completed: read failure       90%       528         191105024

Can someone make sense of that ? It’s as if running badblocks somehow repaired my HDD. How is that possible ? Is badblocks even supposed to do that ?

Note : more info is available if needed (full SMART output and full GParted results)

Asked By: ChennyStar


Yes, badblocks can have that effect — not really by design, but because hard drives can remap failing blocks, and will do so when they encounter a failed block during a write (since there’s no data that can be lost). By writing to every single accessible sector in the drive, badblocks gives ample opportunity for the drive to do so; and if the drive’s spare capacity is sufficient to remap all the failed blocks, badblocks won’t see anything amiss.

If you run smartctl -a on the drive, you should see that it has a non-zero “reallocated sector count” (attribute 5). This indicates that it has remapped sectors.

While the drive may work fine now, this does indicate that it has problems, so it should be treated with suspicion; if part of its storage has failed, more is liable to fail in the not-too-distant future.

See also SSD: `badblocks` / `e2fsck -c` vs reallocated/remapped sectors.

Answered By: Stephen Kitt
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.