Moving to pvgrub2 on the fly

on September 16, 2014

In my previous post, I explained how to build a grub2 standalone image that is capable of booting a Xen domU without having to put any grub configuration or binary into the domU, and, most importantly, without having to put a copy of the domU kernel and initrd into the dom0.

The next question is… How to convert all running systems to use pvgrub2? In this post, I’ll explain how we moved to using pvgrub2 at next reboot, on the fly.


In these examples, I’m using the Xen xm toolstack on Debian Wheezy with Xen 4.1.

When using Xen, every virtual machine has its own configuration file that describes which kernel the domU should use, where its disk image lives etc… It typically looks a bit like this example:

name = "example.domU"
kernel = "/usr/lib/linux-domu-kernels/vmlinuz-3.14-1-amd64-3.14.12-1"
ramdisk = "/usr/lib/linux-domu-kernels/initrd.img-3.14-1-amd64-3.14.12-1"
root = "/dev/xvda ro"
vcpus = 4
memory = 1024
vif = [
disk = [

Now when starting this virtual machine, we use the command xm create. Why isn’t this command called xm start? It is, because when bringing the virtual machine to life, this configuration file is imported in an internal datastore of Xen. After the ‘creation’ of the domU, any change to the configuration file will not take effect as long as the domain exists. Only when issuing a shutdown or destroy, and when using create again, the configuration file will be read again, and the domain will be recreated. When doing a reboot from within the virtual machine, the configuration that is present in the internal Xen datastore will be reused, and the configuration file will not be read again.

So we need to reboot everything?

Actually, this means that in order to change from using the explicit kernel version to pvgrub2 right now, we need to change all configuration files, then shut down and start all virtual machines again. There’s only one problem with that approach… Our customers won’t like it. And to be honest, we also don’t like it, because it sounds so stupid that there must be a more clever way to do it. 🙂

Another approach is to just change all configuration files, remove all kernel and initrd images from the dom0s and then just see what happens. But, that means there will be quite some time in which there will be a slow shift from not being able to reboot at all (because the referenced kernel image is no longer available) to being able to reboot, using the installed (or just upgraded) kernel inside the vm. I’ve seen virtual machines in our network with an uptime of more than 800 days, which have been migrated back and forth between the same or slightly different newer hardware during their existence… It’s no fun to have no idea for a few years whether the virtual machine you’re working on will just return back on the net after you reboot it or not…

There must be a better solution to this!

Did I just say Xen has an internal datastore where it maintains a copy of the configuration file?

Yes, I did, multiple times. It’s called XenStore, which is a small hierarchical database. The contents of it can be read with xenstore-ls, or xenstore-read, and can be written to using xenstore-write. It’s also possible to use C or python bindings to directly program using XenStore, but for what’s happening right here, that would be overengineering too much.

The xenstore-ls program can dump a textual representation of what’s inside the xenstore. The -f flag displays it with the full path of each key. Here’s part of the output of xenstore-ls -f on a test system where I just started the example.domU shown above:

/local/domain/63 = ""
/local/domain/63/vm = "/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee"
/local/domain/63/device = ""
/local/domain/63/device/vbd = ""
/local/domain/63/device/vbd/51712 = ""
/local/domain/63/device/vbd/51712/virtual-device = "51712"
/local/domain/63/device/vbd/51712/device-type = "disk"
/local/domain/63/device/vbd/51712/protocol = "x86_64-abi"
/local/domain/63/device/vbd/51712/backend-id = "0"
/local/domain/63/device/vbd/51712/state = "4"
/local/domain/63/device/vbd/51712/backend = "/local/domain/0/backend/vbd/63/51712"
/local/domain/63/device/vbd/51712/ring-ref = "8"
/local/domain/63/device/vbd/51712/event-channel = "27"
/local/domain/63/device/vbd/51712/feature-persistent = "1"
/local/domain/63/device/vif = ""
/local/domain/63/device/vif/0 = ""
/local/domain/63/device/vif/0/mac = "00:16:3e:00:0b:2e"
/local/domain/63/device/vif/0/handle = "0"
/local/domain/63/device/vif/0/protocol = "x86_64-abi"
/local/domain/63/device/vif/0/backend-id = "0"
/local/domain/63/device/vif/0/state = "4"
/local/domain/63/device/vif/0/backend = "/local/domain/0/backend/vif/63/0"
/local/domain/63/device/vif/0/tx-ring-ref = "768"
/local/domain/63/device/vif/0/rx-ring-ref = "769"
/local/domain/63/device/vif/0/event-channel = "28"
/local/domain/63/device/vif/0/request-rx-copy = "1"
/local/domain/63/device/vif/0/feature-rx-notify = "1"
/local/domain/63/device/vif/0/feature-sg = "1"
/local/domain/63/device/vif/0/feature-gso-tcpv4 = "1"
/local/domain/63/device/vif/0/feature-gso-tcpv6 = "1"
/local/domain/63/device/vif/0/feature-ipv6-csum-offload = "1"
/local/domain/63/device/console = ""
/local/domain/63/device/console/0 = ""
/local/domain/63/device/console/0/protocol = "x86_64-abi"
/local/domain/63/device/console/0/state = "1"
/local/domain/63/device/console/0/backend-id = "0"
/local/domain/63/device/console/0/backend = "/local/domain/0/backend/console/63/0"
/local/domain/63/control = ""
/local/domain/63/control/platform-feature-multiprocessor-suspend = "1"
/local/domain/63/error = ""
/local/domain/63/memory = ""
/local/domain/63/memory/target = "1048576"
/local/domain/63/guest = ""
/local/domain/63/hvmpv = ""
/local/domain/63/data = ""
/local/domain/63/device-misc = ""
/local/domain/63/device-misc/vif = ""
/local/domain/63/device-misc/vif/nextDeviceID = "1"
/local/domain/63/device-misc/console = ""
/local/domain/63/device-misc/console/nextDeviceID = "1"
/local/domain/63/console = ""
/local/domain/63/console/ring-ref = "4288551"
/local/domain/63/console/port = "2"
/local/domain/63/console/limit = "1048576"
/local/domain/63/console/type = "xenconsoled"
/local/domain/63/console/tty = "/dev/pts/1"
/local/domain/63/image = ""
/local/domain/63/image/entry = "18446744071588053488"
/local/domain/63/image/loader = "generic"
/local/domain/63/image/hv-start-low = "18446603336221196288"
/local/domain/63/image/guest-os = "linux"
/local/domain/63/image/hypercall-page = "18446744071578849280"
/local/domain/63/image/guest-version = "2.6"
/local/domain/63/image/pae-mode = "yes"
/local/domain/63/image/paddr-offset = "0"
/local/domain/63/image/virt-base = "18446744071562067968"
/local/domain/63/image/suspend-cancel = "1"
/local/domain/63/image/features = ""
/local/domain/63/image/features/pae-pgdir-above-4gb = "1"
/local/domain/63/image/features/writable-page-tables = "0"
/local/domain/63/image/xen-version = "xen-3.0"
/local/domain/63/cpu = ""
/local/domain/63/cpu/3 = ""
/local/domain/63/cpu/3/availability = "online"
/local/domain/63/cpu/1 = ""
/local/domain/63/cpu/1/availability = "online"
/local/domain/63/cpu/2 = ""
/local/domain/63/cpu/2/availability = "online"
/local/domain/63/cpu/0 = ""
/local/domain/63/cpu/0/availability = "online"
/local/domain/63/store = ""
/local/domain/63/store/ring-ref = "4288552"
/local/domain/63/store/port = "1"
/local/domain/63/description = ""
/local/domain/63/name = "example.domU"
/local/domain/63/domid = "63"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image = "(linux (kernel /usr/lib/linux-domu-kernels/vmlinuz-3.14-1-amd64-3.14.12-1) (ramdisk /usr/lib/linux-domu-kernels/initrd.img-3.14-1-amd64-3.14.12-1) (args 'root=/dev/xvda ro \..."
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/ostype = "linux"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/kernel = "/usr/lib/linux-domu-kernels/vmlinuz-3.14-1-amd64-3.14.12-1"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/cmdline = "root=/dev/xvda ro "
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/ramdisk = "/usr/lib/linux-domu-kernels/initrd.img-3.14-1-amd64-3.14.12-1"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vbd = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vbd/51712 = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vbd/51712/frontend = "/local/domain/63/device/vbd/51712"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vbd/51712/frontend-id = "63"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vbd/51712/backend-id = "0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vbd/51712/backend = "/local/domain/0/backend/vbd/63/51712"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vif = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vif/0 = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vif/0/frontend = "/local/domain/63/device/vif/0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vif/0/frontend-id = "63"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vif/0/backend-id = "0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/vif/0/backend = "/local/domain/0/backend/vif/63/0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/console = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/console/0 = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/console/0/frontend = "/local/domain/63/device/console/0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/console/0/frontend-id = "63"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/console/0/backend-id = "0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/device/console/0/backend = "/local/domain/0/backend/console/63/0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/on_xend_stop = "ignore"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/pool_name = "Pool-0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/shadow_memory = "0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/uuid = "10e5aded-1a79-42a8-0d89-8e7e4f7862ee"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/on_reboot = "restart"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/start_time = "1410905669.59"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/on_poweroff = "destroy"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/bootloader_args = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/on_xend_start = "ignore"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/on_crash = "restart"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/xend = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/xend/restart_count = "0"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/vcpus = "4"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/vcpu_avail = "15"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/bootloader = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/name = "example.domU"

Again, read the XenStore documentation to learn what this information means, and why it’s split into the /local/ and /vm/ tree.

Can we mess around with XenStore a bit?

Well, for example, at the /vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/kernel location we see that the currently running virtual machine thinks it’s using /usr/lib/linux-domu-kernels/vmlinuz-3.14-1-amd64-3.14.12-1 as kernel image, and it will start to look for that file again when you reboot it.

At first, I tried to change the kernel, cmdline and ramdisk values to the values I observed when looking at the output of a virtual machine that has been started using pvgrub2. Using xenstore-write it’s easily possible to write new values to the XenStore. But, after rebooting the domU, the values just changed back to the old values again. O_o

After an afternoon of debugging and browsing through the code of Xend, I found out that my assumption that the huge /image field was made up of a combination of the ostype, kernel, cmdline and ramdisk values was just totally wrong. Actually, the long image value is read, and then the separate ostype, kernel, etc. fields are written back again.

Let’s see what the value of /vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image actually is, because it’s been truncated in the xenstore-ls output:

# xenstore-read /vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image
(linux (kernel /usr/lib/linux-domu-kernels/vmlinuz-3.14-1-amd64-3.14.12-1) (ramdisk /usr/lib/linux-domu-kernels/initrd.img-3.14-1-amd64-3.14.12-1) (args 'root=/dev/xvda ro ') (superpages 0) (videoram 4) (pci ()) (nomigrate 0) (tsc_mode 0) (notes (HV_START_LOW 18446603336221196288) (FEATURES '!writable_page_tables|pae_pgdir_above_4gb') (VIRT_BASE 18446744071562067968) (GUEST_VERSION 2.6) (PADDR_OFFSET 0) (GUEST_OS linux) (HYPERCALL_PAGE 18446744071578849280) (LOADER generic) (SUSPEND_CANCEL 1) (PAE_MODE yes) (ENTRY 18446744071588053488) (XEN_VERSION xen-3.0)))

Oh wow. That’s a lot of stuff. Let’s see what it looks like when creating a domU using pvgrub2 directly…

# xenstore-read /vm/fdb2a823-97c4-64f6-88cd-2a1a02fbc05a/image
(linux (kernel /usr/lib/pvgrub2/grub-x86_64-xen-xvda-fire-ze-missile) (superpages 0) (videoram 4) (pci ()) (nomigrate 0) (tsc_mode 0) (notes (ENTRY 0) (XEN_VERSION xen-3.0) (GUEST_OS GRUB) (VIRT_BASE 0) (LOADER generic)))

Now, what would happen when I just write this image value back into the xenstore information of a domU that has been started with an old configuration file?

# xenstore-write /vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image "(linux (kernel /usr/lib/pvgrub2/grub-x86_64-xen-xvda-fire-ze-missile) (superpages 0) (videoram 4) (pci ()) (nomigrate 0) (tsc_mode 0) (notes (ENTRY 0) (XEN_VERSION xen-3.0) (GUEST_OS GRUB) (VIRT_BASE 0) (LOADER generic)))"
# xenstore-ls -f
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image = "(linux (kernel /usr/lib/pvgrub2/grub-x86_64-xen-xvda-fire-ze-missile) (superpages 0) (videoram 4) (pci ()) (nomigrate 0) (tsc_mode 0) (notes (ENTRY 0) (XEN_VERSION xen-3.0)\..."
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/ostype = "linux"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/kernel = "/usr/lib/linux-domu-kernels/vmlinuz-3.14-1-amd64-3.14.12-1"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/cmdline = "root=/dev/xvda ro "
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/ramdisk = "/usr/lib/linux-domu-kernels/initrd.img-3.14-1-amd64-3.14.12-1"

Excellent (do mr burns gesture now)! The long image field has been updated, but the separate fields still show the old kernel etc… value.

Now just try a reboot from inside the virtual machine! And, maybe first just install some other kernel than the 3.14.12-1 above so it’s possible to quickly check if the old values have been used again after reboot, or whether pvgrub2 has been used, booting into the new kernel image.

root@example.domU:~# reboot

Broadcast message from root@example.domU (pts/0) (Tue Sep 16 22:46:57 2014):

The system is going down for reboot NOW!
root@example.domU:~# Connection to example.domU closed by remote host.
Connection to example.domU closed.

After the reboot is finished, we can see that pvgrub2 has been used to boot it, which went successful. When looking at xm console example.domU during the reboot, you can actually see it happen. The xenstore output now shows updated fields for kernel and the rest of them:

# xenstore-ls -f
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image = "(linux (kernel /usr/lib/pvgrub2/grub-x86_64-xen-xvda-fire-ze-missile) (superpages 0) (videoram 4) (pci ()) (nomigrate 0) (tsc_mode 0) (notes (ENTRY 0) (XEN_VERSION xen-3.0)\..."
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/ostype = "linux"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/kernel = "/usr/lib/pvgrub2/grub-x86_64-xen-xvda-fire-ze-missile"
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/cmdline = ""
/vm/10e5aded-1a79-42a8-0d89-8e7e4f7862ee/image/ramdisk = ""

Let’s do it for real

Using what we learned so far, it should be possible to convert all of the running virtual machines to use pvgrub2 the next time they do a reboot.

Doing so is actually very easy. Just grab the domain ids from the xm list output, then lookup the vm path in xenstore using the /local/domain info and change the image value :

for i in $(xm list | tail -n +3 | awk '{ print $2 }'); do vmpath=$(xenstore-read /local/domain/$i/vm | cut -d \" -f 1); echo $vmpath; xenstore-write $vmpath/image "(linux (kernel /usr/lib/pvgrub2/grub-x86_64-xen-xvda-fire-ze-missile) (superpages 0) (videoram 4) (pci ()) (nomigrate 0) (tsc_mode 0) (notes (ENTRY 0) (XEN_VERSION xen-3.0) (GUEST_OS GRUB) (VIRT_BASE 0) (LOADER generic)))"; done

During the migration, it’s possible to look up which virutal machines still haven’t been rebooted using pvgrub2 by looking at the separate kernel, cmdline etc fields in xenstore, because they will still show the old kernel path before the next reboot, and they will show pvgrub2 and an empty cmdline and ramdisk value after that.