zfs pool metadata corrupt
I am an idiot. I had on my list to get my offsite backups set up and .. you guessed it, I didn’t get around to it before this happened. I actually thought I had set up local backups properly but it turns out that no, I hadn’t. Anyway:
I’m new to ZFS. I am running Proxmox, and enabled passthrough on 9 drives on a HDA card to a TrueNAS VM for a pool. I have two NVMe drives though I think I only set one of them up for caching, and one SSD for Proxmox. For reasons that aren’t clear to me, my zpool corrupted yesterday. My Proxmox host seems aware of the pool, which is odd to me because I created the pool in the TrueNAS guest.
I have tried running zpool import
with -f
-F
-FX
and -fFX
flags. I’m not sure if I should be running these commands on the host or the guest. I’ve also tried with --readonly=on
and (on the host) I’ve tried setting echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata
, although I haven’t tried doing that before trying to import the zpool on the guest, because frankly, I’m a bit freaked out that both the host and guest seem to have access to the pool and I’m not sure that that isn’t contributing to the problem.
The error I’m getting is that the metadata is corrupt.
I don’t know if this is related, but this happened around the time I was trying to get a GPU installed and PCIe/GPU passthrough enabled in Proxmox for that device.
Proxmox:
root@proxmox:~# zpool import
pool: Seabreeze
id: 821564149027342835
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72
config:
Seabreeze FAULTED corrupted data
raidz2-0 FAULTED corrupted data
sdf2 ONLINE
sdh2 ONLINE
sdc2 ONLINE
sde2 ONLINE
sdj2 ONLINE
sdb2 ONLINE
sdg2 ONLINE
sdd2 ONLINE
sdi2 ONLINE
root@proxmox:~#
TrueNAS:
truenas% sudo zpool import
pool: Seabreeze
id: 821564149027342835
state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:
Seabreeze FAULTED corrupted data
raidz2-0 FAULTED corrupted data
gptid/bb911e9d-c067-11ec-b393-734570047b00 ONLINE
gptid/bbb5c9f6-c067-11ec-b393-734570047b00 ONLINE
gptid/bba92ac5-c067-11ec-b393-734570047b00 ONLINE
gptid/bbbf0f87-c067-11ec-b393-734570047b00 ONLINE
gptid/bbda0fa2-c067-11ec-b393-734570047b00 ONLINE
gptid/bc03effa-c067-11ec-b393-734570047b00 ONLINE
gptid/bc114e59-c067-11ec-b393-734570047b00 ONLINE
gptid/bbd0f901-c067-11ec-b393-734570047b00 ONLINE
gptid/bc18eaf4-c067-11ec-b393-734570047b00 ONLINE
truenas%
Is my data recoverable?
I used zdb -u -l
to dump a list of uberblocks, set vfs.zfs.spa.load_verify_metadata
and vfs.zfs.spa.load_verify_data
to 0, and used a combination of -n
, -N
, -R /some/Mountpoint
, -o readonly=on
and -T
with the txg of an older uberblock’s txg to at least get to where the data is present, in read-only form. From there I was able to see with zpool status -v
, which files were corrupt, then decrypt the pool, and file-level copy the data out to an external HDD.