Are there any filesystems with builtin data repairing via checksums?
I’ve read that ZFS/BtrFS have a checksum check, but they don’t use it for data recovery, only for recovering data from a full local copy or a mirror copy.
On the other hand, RAR archives support data redundancy for a long time, with a configurable amount. The more the amount, the higher is the probability of a successful recovery. Same for Dvdisaster which is able to create .ecc files with recovery data, yet on a separate medium.
Many advanced media, like optical disks or hard disks, have a low-level ECC check implemented in a drive controller, so it’s not that needed on higher levels of abstraction. But other ones, like cheap microSD cards, may lack it and are perceivably unreliable.
So, there are ECC checks on hardware level and application level, but are there any ECC-backed filesystems?
Yes, they’re called, collectively, "RAID": Redundant Array of Independent Disks.
The best that’s out there for NON-RAID configurations includes EXT4 on Linux. I believe there are others. But these don’t fix them so much as catch errors when writing, I believe.
Linux has the dm-integrity layer with which you can add error correction to any block device.
Sadly, it’ll be relatively bad at actually solving the issues the unreliable SD cards pose:
The most typical fault mode is.. just not working anymore. That’s typically happening when the amount of wear leveling the physically available memory has been able to sustain has been depleted. Nothing you can do about that but write and read less. Adding error coding information is, counter-intuitively, hurting there, because you amplify the amount of data you write and read. But that’s just an amplification by 1/r, r being the rate of the code.
After the SD card has applied its built-in error correction, the data you read is either correct, or the errors are block-local and correlated. If you need to correct these, you will have to use a code whose blocks span multiple logical blocks from the SD card. That again means a read and write amplification, but this time by an integer factor of at least two. So, that’s actually significantly worse.
So, in all honesty, if your problem is unreliable flash storage, the appropriate response is to deal with that between the physical flash and the point where the things appear as blocks of memory to your storage system. In other words, in the flash translation layer within the SD card; that would additionally allow you to apply soft decoding for additional coding gain, and could use codes designed for the asymmetric channel (typically: a Z-channel!!) which flash memory represents, at the lowest level – these are properties lost through the decoding/decision and deinterleaving happening in the FTL itself. That loss will be hard to compensate on the data you get from the SD card.
You there would directly choose a code that fulfills your reliability requirements. The problem with that is that the worse the physical flash memory is, and the more reliable you want the storage to behave, the lower your code rate gets, meaning that you need more flash cells for a bit of data. Which is exactly the trade-off that makes any flash based storage device either cheap and less reliable or expensive and more reliable.
So, with unreliable SD cards, you’ve basically lost. There might be a window where a bit of coding in your PC could correct errors without making errors more likely than that prevents, but you’d really need to run a large study on how long it takes to make your SD card fail before you could settle on a rate for that. Which isn’t worth the trouble – you’re not buying 100000 cards from the same factory run just to figure out how to make them 0.1% more reliable. You’d just buy or order more reliable cards.
Sorry.
What you could do is if course add true redundancy by using independent cards in a fairly data balancing mirroring or parity scheme, but the usual caveats for any kind RAID apply: you need to make sure that the moment you need to restore one of the underlying volumes from the others, it’s not too late and the intense recovery read load uncovers or even causes further, then unfortunately uncorrectable errors. Again, cheap SD card are the worst commercially available choice for that, because the quality of information on reliability is low, and so is their individual device reliability.
Concluding, I don’t really see a practical scenario where you would want to make reliable storage out of unreliable SD cards.