zfs pool metadata corrupt
I am an idiot. I had on my list to get my offsite backups set up and .. you guessed it, I didn’t get around to it before this happened. I actually thought I had set up local backups properly but it turns out that no, I hadn’t. Anyway:
I’m new to ZFS. I am running Proxmox, and enabled passthrough on 9 drives on a HDA card to a TrueNAS VM for a pool. I have two NVMe drives though I think I only set one of them up for caching, and one SSD for Proxmox. For reasons that aren’t clear to me, my zpool corrupted yesterday. My Proxmox host seems aware of the pool, which is odd to me because I created the pool in the TrueNAS guest.
I have tried running
zpool import with
-fFX flags. I’m not sure if I should be running these commands on the host or the guest. I’ve also tried with
--readonly=on and (on the host) I’ve tried setting
echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata, although I haven’t tried doing that before trying to import the zpool on the guest, because frankly, I’m a bit freaked out that both the host and guest seem to have access to the pool and I’m not sure that that isn’t contributing to the problem.
The error I’m getting is that the metadata is corrupt.
I don’t know if this is related, but this happened around the time I was trying to get a GPU installed and PCIe/GPU passthrough enabled in Proxmox for that device.
root@proxmox:~# zpool import pool: Seabreeze id: 821564149027342835 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72 config: Seabreeze FAULTED corrupted data raidz2-0 FAULTED corrupted data sdf2 ONLINE sdh2 ONLINE sdc2 ONLINE sde2 ONLINE sdj2 ONLINE sdb2 ONLINE sdg2 ONLINE sdd2 ONLINE sdi2 ONLINE root@proxmox:~#
truenas% sudo zpool import pool: Seabreeze id: 821564149027342835 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY config: Seabreeze FAULTED corrupted data raidz2-0 FAULTED corrupted data gptid/bb911e9d-c067-11ec-b393-734570047b00 ONLINE gptid/bbb5c9f6-c067-11ec-b393-734570047b00 ONLINE gptid/bba92ac5-c067-11ec-b393-734570047b00 ONLINE gptid/bbbf0f87-c067-11ec-b393-734570047b00 ONLINE gptid/bbda0fa2-c067-11ec-b393-734570047b00 ONLINE gptid/bc03effa-c067-11ec-b393-734570047b00 ONLINE gptid/bc114e59-c067-11ec-b393-734570047b00 ONLINE gptid/bbd0f901-c067-11ec-b393-734570047b00 ONLINE gptid/bc18eaf4-c067-11ec-b393-734570047b00 ONLINE truenas%
Is my data recoverable?
zdb -u -l to dump a list of uberblocks, set
vfs.zfs.spa.load_verify_data to 0, and used a combination of
-o readonly=on and
-T with the txg of an older uberblock’s txg to at least get to where the data is present, in read-only form. From there I was able to see with
zpool status -v, which files were corrupt, then decrypt the pool, and file-level copy the data out to an external HDD.