VM "not a bootable disk" after upgrade from Promox 6.x to 7.x
Today I decided to upgrade Proxmox Virtual Environment (PVE)
from V6.2.4 to V7.1.
I did it according to the following steps:
- First upgrade to the most recent 6.x version with
apt-get update
followed byapt-get upgrade-dist
. - I followed these steps: Upgrade Proxmox VE from 6.x to 7.0
During the upgrade I had some errors about "disk not found"
which I was worried about, but the upgrade just continued.
(Maybe they have a relation with the issues I’m now running into.)
After the upgrade finished, I restarted the host to complete the upgrade.
The LXC containers were started (start on boot) without any issue.
Some VMs (start on boot) also worked directly.
Some of the VMs gave the error Boot failed: not a bootable disk
in the console and kept rebooting. After some Google attempts I found a post somewhere that it sometimes helps to reboot the host,
so I did it again and after that all the VMs gave the concerning error.
I Googled for hours and found a lot of similar issues.
The only thing that worked for the most of the VMs was restoring the backup.
Unfortunately, this does not work for the most important machine,
the mail server.
It contains 170 GB of mails and the only backup I have is "proxmox backups" (images. 7x). None of them works as they all give the same issue.
The questions:
- How can I make the VM boot again?
If it’s not fixable, is there a way to enter the disk so I can get the data? - How did this occur? is it my fault? is it a bug? is it a known issue?
- I don’t dare to reboot Proxmox anymore
as I’m scared that other VMs will also break permanently.
How can I be sure that this is not happening again?
Or at least make sure the backup works!
Some important facts:
- I’m 100% sure all the VMs worked find before the Proxmox upgrade
- I created a backup of each machine to a network share (backup server)
before I executed any update related command - The backups are made via the Proxmox web interface
- All the VMs run on Ubuntu 20.04 LTS, including the mail server
- I tried to set the BIOS to UEFI (without success)
- I’m not a Proxmox Pro user, so if extra data is required,
please explain how I can get it to avoid unnecessary posts
The error:
Machine UUID a238b981-27dd-4ebd-acee-1a9ee97d66a11 Booting from Hard Disk... Boot failed: not a bootable disk
[Manually transcribed from this image.]
pveversion -v output
root@hv1:/home/axxmin# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.4: 6.4-11
pve-kernel-5.3: 6.1-6
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
Backup configuration from concerning backup:
balloon: 6144
boot: cdn
bootdisk: sata0
cores: 2
ide2: none,
media=cdrom
memory: 12288
name: axx-mcow-srv01
net0: virtio=4E:86:95:6A:FC:46,bridge=vmbr20,
firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: vm_instances:vm-140-disk-0,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=a238b981-27dd-4ebd-acee-1a9ee97d66a1
sockets: 1
vmgenid: 971eb84d-7502-4a68-97af-66c595c011b9 #qmdump#map:sata0:drive-sata0:vm_instances:raw:
The VM config
root@hv1:~# qm config 140
balloon: 6144
boot: cdn
bootdisk: sata0
cores: 2
ide2: none,media=cdrom
memory: 12288
name: axx-mcow-srv01
net0: virtio=4E:86:95:6A:FC:46,bridge=vmbr20,firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: vm_instances:vm-140-disk-0,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=a238b981-27dd-4ebd-acee-1a9ee97d66a1
sockets: 1
vmgenid: 66102f99-158b-451b-a8e2-187ebed7b183
UPDATE1
I found a page After Backup – boot failed: not a bootable disk
at the Proxmox Support Forum, which makes sense to my understandings.
As I don’t have a separated backup of the partition table,
I found a "boot-repair disk".
This did not fix my VM
but gave me some extra information, which may be useful.
UPDATE2
After a boot with GParted live, I can say that the partition table is gone.
I tried to rebuild it with testdisk
(see Restore Damaged or Corrupted Linux Partition Table),
but this is not working.
Is it possible to create an VM with exactly the same configuration (incl. disk size) and install Ubuntu on it (same disk settings as we always use the default) and copy that partition table to the "broken" server?
I could "fix" the partition table so the server booted again.
After that I was able to migrate all the stuff to a new fresh server
to make sure everything works as it should.
I did it with the steps below.
- Boot the server with the boot repair disk.
- Close the automatically started tool "boot repair".
- Open a terminal window and execute the command
sudo testdisk /dev/sda
.
This will start a (command line) tool to scan your disk (sda) for partitions. - Confirm the disk with
Proceed
. - Select the used partition table type.
I had a "BIOS" disk so I neededIntel
.
If you have a "EUFI" disk, selectEFI GPT
. - Go for
Analyse
to analyse your disk. - Continue with
Quick search
to execute a quick scan.
Depending on the disk, this can take seconds to an hour. - I think in some cases a
quick scan
will reach out but I noticed that adeeper search
is more accurate and increases the chance to fix the disk.
In my case thequick scan
was not enough.
The deep search
will probably find multiple partitions.
You can browse through the files by using P
.
You can go back by selecting the ..
or pressing Q
, but be careful;
press Q
one time too many and you need to start over again!
Find the partition with the etc
directory inside
and mark it as P
(primary) partition with the arrow keys
(← or →).
In my case there was a second partition with a Linux-like file structure,
so I set this as "Primary bootable" *
,
but I noticed that this partition is not always found.
If it’s not found, (in my test) it reaches out to mark the partition
with the etc
directory as primary P
.
(See the screenshot of my test machine below.)
Confirm the settings with Enter (continue),
followed by Write
to write the new partition table to the disk.
After the partition table is written, open the boot repair tool
(which we closed at the beginning) to fix the Linux boot loader (GRUB).
You can perform the fix with Recommended repair
.
If this option is not available, restart the boot repair tool,
as it probably was still active during the partition table scan.
I could fix our mail server like this.
I recovered (broken) backups of machines I fixed
with a working backup on to some test machines.
I was able to fix all (5) the machines I tested this on.
I hope this answer can help someone else.
Thank you VERY much for posting your solution – saved my life!
Situation: Made adoption (CPU resources) for 5 Debian LAMP VMs in a Proxmox 7.2-7 environment, and restarted them to make changes effective. NONE of them came to life, all didn’t boot with the well-known "not a bootable disk" error.
In my case, there were two details different in the repair process:
sudo testdisk /dev/sda
failed with message: "unable to open file or device /dev/sda: Not a directory", although the device in question is sda.
Resolution: Entered just sudo testdisk
,
and after the logging options I could select /dev/sda
.
Quick test resulted in this:
Partition Start End Size in sectors
>*Linux 0 32 33 121 157 36 1951744 [boot]
P Linux 121 157 37 3161 11 50 48828416 [system]
P Linux Swap 3161 11 51 3647 99 36 7813120
L Linux 3647 164 38 13054 10 12 151113728 [data1]
I did not need to change/toggle anything;
just went to the next screen with Enter, and then chose "write".
From there it worked like written above.
Maybe my small hints can help someone, too.
This is being tracked as Proxmox bug 2874.