← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1702932] [NEW] Unshelving an offloaded server with volume attachments may not attach to the guest in multi-cell env

 

Public bug reported:

This is based on code inspection currently but it looks like this should
fail in the following case:

https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723

When we attach a volume to a shelved offloaded server, we create the BDM
in the API. If the API is configured to point at cell0, then the BDM
would be created in cell0.

When we unshelve the instance, conductor asks the scheduler for a host
(which is in some cell) and we build the instance in that cell. This
could be a different cell because we currently don't restrict that in
the conductor task manager when unshelving like we do for migrate:

https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66

The fact we don't restrict where the instance goes when it's unshelved
is a separate bug.

When unshelving the instance, it gets built on some compute and we pull
the BDMs from the database configured for that cell (should be cell1,
cell2, ..., cellN - some specific non-cell0 database).

https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513

If the BDM was created in the API in cell0, it shouldn't come back from
that query in the compute manager code.

What's most confusing about this is Tempest has tests for testing
attach/detach a volume to a shelved offloaded instance:

https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148

And those are passing on the devstack change that runs with multiple
cells and configures the API to use cell0 for the [database] section
where the BDM would live:

https://review.openstack.org/#/c/473565/

Unless maybe that test is broken.

We are configured to run ssh validation in the gate jobs on master
(pike) though, so the test is counting the number of partitions on the
guest before and after the unshelve operation to see that they show up.
It's also listing volume attachments for the instance after unshelve.

** Affects: nova
     Importance: High
     Assignee: Dan Smith (danms)
         Status: In Progress


** Tags: cells shelve volumes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1702932

Title:
  Unshelving an offloaded server with volume attachments may not attach
  to the guest in multi-cell env

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  This is based on code inspection currently but it looks like this
  should fail in the following case:

  https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723

  When we attach a volume to a shelved offloaded server, we create the
  BDM in the API. If the API is configured to point at cell0, then the
  BDM would be created in cell0.

  When we unshelve the instance, conductor asks the scheduler for a host
  (which is in some cell) and we build the instance in that cell. This
  could be a different cell because we currently don't restrict that in
  the conductor task manager when unshelving like we do for migrate:

  https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66

  The fact we don't restrict where the instance goes when it's unshelved
  is a separate bug.

  When unshelving the instance, it gets built on some compute and we
  pull the BDMs from the database configured for that cell (should be
  cell1, cell2, ..., cellN - some specific non-cell0 database).

  https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513

  If the BDM was created in the API in cell0, it shouldn't come back
  from that query in the compute manager code.

  What's most confusing about this is Tempest has tests for testing
  attach/detach a volume to a shelved offloaded instance:

  https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148

  And those are passing on the devstack change that runs with multiple
  cells and configures the API to use cell0 for the [database] section
  where the BDM would live:

  https://review.openstack.org/#/c/473565/

  Unless maybe that test is broken.

  We are configured to run ssh validation in the gate jobs on master
  (pike) though, so the test is counting the number of partitions on the
  guest before and after the unshelve operation to see that they show
  up. It's also listing volume attachments for the instance after
  unshelve.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1702932/+subscriptions


Follow ups