← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1817927] [NEW] device tagging support is not checked during move operations

 

Public bug reported:

When creating a server with bdm or port tags, the compute service (which
the scheduler picked) checks to see if the underlying virt driver
supports device tags and if not, the build is aborted (not rescheduled
to an alternate host):

https://github.com/openstack/nova/blob/6efa3861a5a829ba5883ff191e2552b063028bb0/nova/compute/manager.py#L2114

However, that same type of check is not performed for any other move
operation, like cold/live migration, evacuate or unshelve.

So for example, I could have two compute hosts A and B where A supports
device tagging but B does not. I create a server with device tags on
host A and then shelve offload the server. In the meantime, host A is
unavailable (either it's at capacity or down for maintenance) when I
unshelve my instance and it goes to host B which does not support device
tags. Now my guest will be unable to get device tag metadata via config
drive or the metadata API because the virt driver is not providing that
information, but the unshelve operation did not fail.

This was always a gap in the initial device tag support anyway since
there is no filtering in the scheduler to pick a host that supports
device tagging, nor is there any policy rule in the API for disallowing
device tagging if the cloud does not support it, e.g. if the cloud is
only running with the vcenter or ironic drivers.

The solution probably relies on adding a placement request filter that
builds on this change:

https://review.openstack.org/#/c/538498/

Which exposes compute driver capabilities as traits to placement so then
we could pass the required traits via the RequestSpec to a placement
request filter which would add those required traits to the GET
/allocation_candidates call made in the scheduler. In the case of device
tags, we'd require a compute node with the "COMPUTE_DEVICE_TAGGING"
trait.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1817927

Title:
  device tagging support is not checked during move operations

Status in OpenStack Compute (nova):
  New

Bug description:
  When creating a server with bdm or port tags, the compute service
  (which the scheduler picked) checks to see if the underlying virt
  driver supports device tags and if not, the build is aborted (not
  rescheduled to an alternate host):

  https://github.com/openstack/nova/blob/6efa3861a5a829ba5883ff191e2552b063028bb0/nova/compute/manager.py#L2114

  However, that same type of check is not performed for any other move
  operation, like cold/live migration, evacuate or unshelve.

  So for example, I could have two compute hosts A and B where A
  supports device tagging but B does not. I create a server with device
  tags on host A and then shelve offload the server. In the meantime,
  host A is unavailable (either it's at capacity or down for
  maintenance) when I unshelve my instance and it goes to host B which
  does not support device tags. Now my guest will be unable to get
  device tag metadata via config drive or the metadata API because the
  virt driver is not providing that information, but the unshelve
  operation did not fail.

  This was always a gap in the initial device tag support anyway since
  there is no filtering in the scheduler to pick a host that supports
  device tagging, nor is there any policy rule in the API for
  disallowing device tagging if the cloud does not support it, e.g. if
  the cloud is only running with the vcenter or ironic drivers.

  The solution probably relies on adding a placement request filter that
  builds on this change:

  https://review.openstack.org/#/c/538498/

  Which exposes compute driver capabilities as traits to placement so
  then we could pass the required traits via the RequestSpec to a
  placement request filter which would add those required traits to the
  GET /allocation_candidates call made in the scheduler. In the case of
  device tags, we'd require a compute node with the
  "COMPUTE_DEVICE_TAGGING" trait.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1817927/+subscriptions