yahoo-eng-team team mailing list archive

Thread
Date

[Bug 2125567] [NEW] Nova allows migration between nodes with different backend (ceph or not)

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Thomas Goirand <2125567@xxxxxxxxxxxxxxxxxx>
Date: Wed, 24 Sep 2025 11:23:25 -0000
Reply-to: Bug 2125567 <2125567@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

Public bug reported:

We did a (human) mistake doing the setup of a compute node, not
activating Ceph on it, while all the other compute do have a Ceph
backend.

Then we cold migrated a VM from a compute with Ceph backend, to that new
compute that had the VM system disk under
/var/lib/nova/instances/<UUID>/disk. Instead of refusing the cold
migration, Nova happily accepted it. As a consequence, our customer got
a fresh new disk from Glance (the one that was configured for the VM to
begin with), and booted that.

If this is a feature, that's not a funny one. The customer had down time
as his VM lost its system disk setup, and we thought the data of the VM
was lost. In fact, the data were in Ceph, but not used. We did a dump
from Ceph like this:

rbd -p nova export UUID_disk /srv/data/UUID_disk.raw

and imported that raw file in Glance, so the customer could destroy the
VM and re-create it with this image from Glance.

So, to avoid surprise, Nova should either:
- convert the VM from Ceph and create the disk file on the destination compute (ideal scenario)
- refuse the cold migration

If we want to keep things as they are right now, then at least we should
put an option in the scheduler (or as a policy in the API?) so this type
of cold migration that may loose the system disk can be forbidden.

** Affects: nova
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2125567

Title:
Nova allows migration between nodes with different backend (ceph or
not)

Status in OpenStack Compute (nova):
New

Bug description:
We did a (human) mistake doing the setup of a compute node, not
activating Ceph on it, while all the other compute do have a Ceph
backend.

Then we cold migrated a VM from a compute with Ceph backend, to that
new compute that had the VM system disk under
/var/lib/nova/instances/<UUID>/disk. Instead of refusing the cold
migration, Nova happily accepted it. As a consequence, our customer
got a fresh new disk from Glance (the one that was configured for the
VM to begin with), and booted that.

If this is a feature, that's not a funny one. The customer had down
time as his VM lost its system disk setup, and we thought the data of
the VM was lost. In fact, the data were in Ceph, but not used. We did
a dump from Ceph like this:

rbd -p nova export UUID_disk /srv/data/UUID_disk.raw

and imported that raw file in Glance, so the customer could destroy
the VM and re-create it with this image from Glance.

So, to avoid surprise, Nova should either:
- convert the VM from Ceph and create the disk file on the destination compute (ideal scenario)
- refuse the cold migration

If we want to keep things as they are right now, then at least we
should put an option in the scheduler (or as a policy in the API?) so
this type of cold migration that may loose the system disk can be
forbidden.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2125567/+subscriptions