VM "not a bootable disk" after upgrade from Promox 6.x to 7.x
Today I decided to upgrade Proxmox Virtual Environment (PVE)
from V6.2.4 to V7.1.
I did it according to the following steps:
- First upgrade to the most recent 6.x version with
apt-get updatefollowed by
- I followed these steps: Upgrade Proxmox VE from 6.x to 7.0
During the upgrade I had some errors about "disk not found"
which I was worried about, but the upgrade just continued.
(Maybe they have a relation with the issues I’m now running into.)
After the upgrade finished, I restarted the host to complete the upgrade.
The LXC containers were started (start on boot) without any issue.
Some VMs (start on boot) also worked directly.
Some of the VMs gave the error
Boot failed: not a bootable disk in the console and kept rebooting. After some Google attempts I found a post somewhere that it sometimes helps to reboot the host,
so I did it again and after that all the VMs gave the concerning error.
I Googled for hours and found a lot of similar issues.
The only thing that worked for the most of the VMs was restoring the backup.
Unfortunately, this does not work for the most important machine,
the mail server.
It contains 170 GB of mails and the only backup I have is "proxmox backups" (images. 7x). None of them works as they all give the same issue.
- How can I make the VM boot again?
If it’s not fixable, is there a way to enter the disk so I can get the data?
- How did this occur? is it my fault? is it a bug? is it a known issue?
- I don’t dare to reboot Proxmox anymore
as I’m scared that other VMs will also break permanently.
How can I be sure that this is not happening again?
Or at least make sure the backup works!
Some important facts:
- I’m 100% sure all the VMs worked find before the Proxmox upgrade
- I created a backup of each machine to a network share (backup server)
before I executed any update related command
- The backups are made via the Proxmox web interface
- All the VMs run on Ubuntu 20.04 LTS, including the mail server
- I tried to set the BIOS to UEFI (without success)
- I’m not a Proxmox Pro user, so if extra data is required,
please explain how I can get it to avoid unnecessary posts
Machine UUID a238b981-27dd-4ebd-acee-1a9ee97d66a11 Booting from Hard Disk... Boot failed: not a bootable disk
[Manually transcribed from this image.]
pveversion -v output
root@hv1:/home/axxmin# pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve) pve-manager: 7.1-8 (running version: 7.1-8/5b267f33) pve-kernel-helper: 7.1-6 pve-kernel-5.13: 7.1-5 pve-kernel-5.4: 6.4-11 pve-kernel-5.3: 6.1-6 pve-kernel-5.13.19-2-pve: 5.13.19-4 pve-kernel-5.4.157-1-pve: 5.4.157-1 pve-kernel-5.4.41-1-pve: 5.4.41-1 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.3.18-2-pve: 5.3.18-2 ceph-fuse: 14.2.21-1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: residual config ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.0 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-5 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.0-14 libpve-guest-common-perl: 4.0-3 libpve-http-server-perl: 4.0-4 libpve-storage-perl: 7.0-15 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.2.0-3 proxmox-backup-client: 2.1.2-1 proxmox-backup-file-restore: 2.1.2-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-4 pve-cluster: 7.1-2 pve-container: 4.1-3 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-3 pve-ha-manager: 3.3-1 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.0-3 pve-xtermjs: 4.12.0-1 qemu-server: 7.1-4 smartmontools: 7.2-pve2 spiceterm: 3.2-2 swtpm: 0.7.0~rc1+2 vncterm: 1.7-1 zfsutils-linux: 2.1.1-pve3
Backup configuration from concerning backup:
balloon: 6144 boot: cdn bootdisk: sata0 cores: 2 ide2: none, media=cdrom memory: 12288 name: axx-mcow-srv01 net0: virtio=4E:86:95:6A:FC:46,bridge=vmbr20, firewall=1 numa: 0 onboot: 1 ostype: l26 sata0: vm_instances:vm-140-disk-0,size=200G scsihw: virtio-scsi-pci smbios1: uuid=a238b981-27dd-4ebd-acee-1a9ee97d66a1 sockets: 1 vmgenid: 971eb84d-7502-4a68-97af-66c595c011b9 #qmdump#map:sata0:drive-sata0:vm_instances:raw:
The VM config
root@hv1:~# qm config 140 balloon: 6144 boot: cdn bootdisk: sata0 cores: 2 ide2: none,media=cdrom memory: 12288 name: axx-mcow-srv01 net0: virtio=4E:86:95:6A:FC:46,bridge=vmbr20,firewall=1 numa: 0 onboot: 1 ostype: l26 sata0: vm_instances:vm-140-disk-0,size=200G scsihw: virtio-scsi-pci smbios1: uuid=a238b981-27dd-4ebd-acee-1a9ee97d66a1 sockets: 1 vmgenid: 66102f99-158b-451b-a8e2-187ebed7b183
I found a page After Backup – boot failed: not a bootable disk
at the Proxmox Support Forum, which makes sense to my understandings.
As I don’t have a separated backup of the partition table,
I found a "boot-repair disk".
This did not fix my VM
but gave me some extra information, which may be useful.
After a boot with GParted live, I can say that the partition table is gone.
I tried to rebuild it with
(see Restore Damaged or Corrupted Linux Partition Table),
but this is not working.
Is it possible to create an VM with exactly the same configuration (incl. disk size) and install Ubuntu on it (same disk settings as we always use the default) and copy that partition table to the "broken" server?
I could "fix" the partition table so the server booted again.
After that I was able to migrate all the stuff to a new fresh server
to make sure everything works as it should.
I did it with the steps below.
- Boot the server with the boot repair disk.
- Close the automatically started tool "boot repair".
- Open a terminal window and execute the command
sudo testdisk /dev/sda.
This will start a (command line) tool to scan your disk (sda) for partitions.
- Confirm the disk with
- Select the used partition table type.
I had a "BIOS" disk so I needed
If you have a "EUFI" disk, select
- Go for
Analyseto analyse your disk.
- Continue with
Quick searchto execute a quick scan.
Depending on the disk, this can take seconds to an hour.
- I think in some cases a
quick scanwill reach out but I noticed that a
deeper searchis more accurate and increases the chance to fix the disk.
In my case the
quick scanwas not enough.
deep search will probably find multiple partitions.
You can browse through the files by using
You can go back by selecting the
.. or pressing
Q, but be careful;
Q one time too many and you need to start over again!
Find the partition with the
etc directory inside
and mark it as
P (primary) partition with the arrow keys
(← or →).
In my case there was a second partition with a Linux-like file structure,
so I set this as "Primary bootable"
but I noticed that this partition is not always found.
If it’s not found, (in my test) it reaches out to mark the partition
etc directory as primary
(See the screenshot of my test machine below.)
Confirm the settings with Enter (continue),
Write to write the new partition table to the disk.
After the partition table is written, open the boot repair tool
(which we closed at the beginning) to fix the Linux boot loader (GRUB).
You can perform the fix with
If this option is not available, restart the boot repair tool,
as it probably was still active during the partition table scan.
I could fix our mail server like this.
I recovered (broken) backups of machines I fixed
with a working backup on to some test machines.
I was able to fix all (5) the machines I tested this on.
I hope this answer can help someone else.
Thank you VERY much for posting your solution – saved my life!
Situation: Made adoption (CPU resources) for 5 Debian LAMP VMs in a Proxmox 7.2-7 environment, and restarted them to make changes effective. NONE of them came to life, all didn’t boot with the well-known "not a bootable disk" error.
In my case, there were two details different in the repair process:
sudo testdisk /dev/sda failed with message: "unable to open file or device /dev/sda: Not a directory", although the device in question is sda.
Resolution: Entered just
and after the logging options I could select
Quick test resulted in this:
Partition Start End Size in sectors >*Linux 0 32 33 121 157 36 1951744 [boot] P Linux 121 157 37 3161 11 50 48828416 [system] P Linux Swap 3161 11 51 3647 99 36 7813120 L Linux 3647 164 38 13054 10 12 151113728 [data1]
I did not need to change/toggle anything;
just went to the next screen with Enter, and then chose "write".
From there it worked like written above.
Maybe my small hints can help someone, too.
This is being tracked as Proxmox bug 2874.