← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2047182] [NEW] BFV VM may be unexpectedly moved to different AZ

 

Public bug reported:

In cases when:
- each availability zone has a separate storage cluster([cinder]/cross_az_attach option helps to achieve that)
and
- there is no default_schedule_zone
VM may be unexpectedly moved to different AZ.

When a VM is created from pre-existing volume, nova places the specific
availability zone in request_specs which prevents a VM from being moved
to different AZ during resize/migrate[1]. In this case, everything works
fine.

Unfortunately, problems start in the following cases:
a) VM is created with --boot-from-volume argument which dynamically creates volume for the VM
b) VM has only ephemeral volume

Lets focus on case a) because option b) may be not working "by design".

_get_volume_from_bdms() method considers only pre-existing volumes[2]. Volume that will be created later on with `--boot-from-volume` does not exist yet so it cannot fetch its availability zone.
As a result, request_specs contains '"availability_zone": null' and VM can be moved to different AZ during resize/migrate. Because storage is not shared between AZs, it breaks a VM.

It's not easy to fix because:
- nova API is not aware of the designated AZ at the time of placing request_specs in DB
- looking at schedule_and_build_instances method[3] we do not create the cinder volumes before downcalling to the compute agent. And we do not allow upcalls from the compute-agent to the api db in general, so it's hard to update request_specs after the volume is created.

Unfortunately, at this point I don't see any easy way to fix this issue.

[1] https://github.com/openstack/nova/blob/d28a55959e50b472e181809b919e11a896f989e3/nova/compute/api.py#L1268C19
[2] https://github.com/openstack/nova/blob/d28a55959e50b472e181809b919e11a896f989e3/nova/compute/api.py#L1247
[3] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1646

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2047182

Title:
  BFV VM may be unexpectedly moved to different AZ

Status in OpenStack Compute (nova):
  New

Bug description:
  In cases when:
  - each availability zone has a separate storage cluster([cinder]/cross_az_attach option helps to achieve that)
  and
  - there is no default_schedule_zone
  VM may be unexpectedly moved to different AZ.

  When a VM is created from pre-existing volume, nova places the
  specific availability zone in request_specs which prevents a VM from
  being moved to different AZ during resize/migrate[1]. In this case,
  everything works fine.

  Unfortunately, problems start in the following cases:
  a) VM is created with --boot-from-volume argument which dynamically creates volume for the VM
  b) VM has only ephemeral volume

  Lets focus on case a) because option b) may be not working "by
  design".

  _get_volume_from_bdms() method considers only pre-existing volumes[2]. Volume that will be created later on with `--boot-from-volume` does not exist yet so it cannot fetch its availability zone.
  As a result, request_specs contains '"availability_zone": null' and VM can be moved to different AZ during resize/migrate. Because storage is not shared between AZs, it breaks a VM.

  It's not easy to fix because:
  - nova API is not aware of the designated AZ at the time of placing request_specs in DB
  - looking at schedule_and_build_instances method[3] we do not create the cinder volumes before downcalling to the compute agent. And we do not allow upcalls from the compute-agent to the api db in general, so it's hard to update request_specs after the volume is created.

  Unfortunately, at this point I don't see any easy way to fix this
  issue.

  [1] https://github.com/openstack/nova/blob/d28a55959e50b472e181809b919e11a896f989e3/nova/compute/api.py#L1268C19
  [2] https://github.com/openstack/nova/blob/d28a55959e50b472e181809b919e11a896f989e3/nova/compute/api.py#L1247
  [3] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1646

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2047182/+subscriptions