Using Btrfs to eliminate corrupted archives

I am trying to get my personal filesystem in order

I want to tar everything and get it organized into proper directories

I have many HDDs and a few SSDs

I want to use Btrfs with it’s checksum and redundancy features

The question I bring to you now is: How do I configure a data scrubbing daemon that automatically detects when I randomly lose a HDD sector causing an archive to not pass it’s checksum verification, then to copy a backup onto another sector.

I’m under the impression that this backup can actually be on the same HDD. It’s unlikely I’m going to lose an entire drive all at once with normal aging and wear. In the future I plan to expand my backup filesystem to multiple HDDs. But for now I’m interested in having duplicates scattered across multiple sectors of the same drive.

My end result is to never again have to deal with losing a random JPEG or text file

I do have multiple HDDs I can dedicate towards a single large redundant autocorrecting Btrfs filesystem. But I don’t have multiple of the same sized drives

Please link me to some good reading material covering what I’m interested in here

Asked By: nope

||

How do I configure a data scrubbing daemon that automatically detects when I randomly lose a HDD sector causing an archive to not pass it’s checksum verification, then to copy a backup onto another sector.

It’s built in to Btrfs. Try:

btrfs scrub start

It has never yet happened to me that an error was found, but I expect that if one is found then:

  • It would be reported in the kernel log output
  • It would be corrected if possible. That is, it would be corrected as long as the storage policy is not single.

Since you have multiple devices, you can use raid1 or raid5 or raid6 as your storage policy for both metadata and data. Be aware that raid5 and raid6 were introduced much more recently and may not be considered as stable (trustworthy) as the rest of Btrfs.

But I don’t have multiple of the same sized drives

With Btrfs, that’s perfectly OK, unlike with block-level RAID.

Using different-sized drives doesn’t necessarily limit the effective capacity to the size of the smaller drive unless you have exactly 2 drives. If you have more than 2 drives, some data can be on 1+2 while other data can be on 3+4 and still other data can be on 1+4, and so on. It can potentially balance out quite well. If it gets out of balance over time (perhaps due to uneven data churn) you can just run btrfs balance later on – but that can take awhile.

Related: ideal btrfs storage scheme to incorporate external USB HDD as backup media

Answered By: Celada

Side note about the comment telling that "Btrfs only offer DUP for metadata" that it nowadays offers support for DUP for data too (ie even before 2022).
https://zejn.net/b/2017/04/30/single-device-data-redundancy-with-btrfs/

# mkfs.btrfs --data dup --metadata dup /dev/sdX
Answered By: Alban Browaeys
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.