← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1702932] Re: Unshelving an offloaded server with volume attachments may not attach to the guest in multi-cell env

 

Turns out this was invalid. Volume attach works for a shelved offloaded
instance with cells v2 because the context is targeted to the cell that
the target instance lives in when we lookup the instance in the API
code, in nova.compute.api.API._get_instance. So when the BDM is created
using that context, it's also created in the same cell as the instance.

** Changed in: nova
       Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1702932

Title:
  Unshelving an offloaded server with volume attachments may not attach
  to the guest in multi-cell env

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  This is based on code inspection currently but it looks like this
  should fail in the following case:

  https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723

  When we attach a volume to a shelved offloaded server, we create the
  BDM in the API. If the API is configured to point at cell0, then the
  BDM would be created in cell0.

  When we unshelve the instance, conductor asks the scheduler for a host
  (which is in some cell) and we build the instance in that cell. This
  could be a different cell because we currently don't restrict that in
  the conductor task manager when unshelving like we do for migrate:

  https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66

  The fact we don't restrict where the instance goes when it's unshelved
  is a separate bug.

  When unshelving the instance, it gets built on some compute and we
  pull the BDMs from the database configured for that cell (should be
  cell1, cell2, ..., cellN - some specific non-cell0 database).

  https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513

  If the BDM was created in the API in cell0, it shouldn't come back
  from that query in the compute manager code.

  What's most confusing about this is Tempest has tests for testing
  attach/detach a volume to a shelved offloaded instance:

  https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148

  And those are passing on the devstack change that runs with multiple
  cells and configures the API to use cell0 for the [database] section
  where the BDM would live:

  https://review.openstack.org/#/c/473565/

  Unless maybe that test is broken.

  We are configured to run ssh validation in the gate jobs on master
  (pike) though, so the test is counting the number of partitions on the
  guest before and after the unshelve operation to see that they show
  up. It's also listing volume attachments for the instance after
  unshelve.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1702932/+subscriptions


References