yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #65686
[Bug 1702932] [NEW] Unshelving an offloaded server with volume attachments may not attach to the guest in multi-cell env
Public bug reported:
This is based on code inspection currently but it looks like this should
fail in the following case:
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723
When we attach a volume to a shelved offloaded server, we create the BDM
in the API. If the API is configured to point at cell0, then the BDM
would be created in cell0.
When we unshelve the instance, conductor asks the scheduler for a host
(which is in some cell) and we build the instance in that cell. This
could be a different cell because we currently don't restrict that in
the conductor task manager when unshelving like we do for migrate:
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66
The fact we don't restrict where the instance goes when it's unshelved
is a separate bug.
When unshelving the instance, it gets built on some compute and we pull
the BDMs from the database configured for that cell (should be cell1,
cell2, ..., cellN - some specific non-cell0 database).
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513
If the BDM was created in the API in cell0, it shouldn't come back from
that query in the compute manager code.
What's most confusing about this is Tempest has tests for testing
attach/detach a volume to a shelved offloaded instance:
https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148
And those are passing on the devstack change that runs with multiple
cells and configures the API to use cell0 for the [database] section
where the BDM would live:
https://review.openstack.org/#/c/473565/
Unless maybe that test is broken.
We are configured to run ssh validation in the gate jobs on master
(pike) though, so the test is counting the number of partitions on the
guest before and after the unshelve operation to see that they show up.
It's also listing volume attachments for the instance after unshelve.
** Affects: nova
Importance: High
Assignee: Dan Smith (danms)
Status: In Progress
** Tags: cells shelve volumes
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1702932
Title:
Unshelving an offloaded server with volume attachments may not attach
to the guest in multi-cell env
Status in OpenStack Compute (nova):
In Progress
Bug description:
This is based on code inspection currently but it looks like this
should fail in the following case:
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723
When we attach a volume to a shelved offloaded server, we create the
BDM in the API. If the API is configured to point at cell0, then the
BDM would be created in cell0.
When we unshelve the instance, conductor asks the scheduler for a host
(which is in some cell) and we build the instance in that cell. This
could be a different cell because we currently don't restrict that in
the conductor task manager when unshelving like we do for migrate:
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66
The fact we don't restrict where the instance goes when it's unshelved
is a separate bug.
When unshelving the instance, it gets built on some compute and we
pull the BDMs from the database configured for that cell (should be
cell1, cell2, ..., cellN - some specific non-cell0 database).
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513
If the BDM was created in the API in cell0, it shouldn't come back
from that query in the compute manager code.
What's most confusing about this is Tempest has tests for testing
attach/detach a volume to a shelved offloaded instance:
https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148
And those are passing on the devstack change that runs with multiple
cells and configures the API to use cell0 for the [database] section
where the BDM would live:
https://review.openstack.org/#/c/473565/
Unless maybe that test is broken.
We are configured to run ssh validation in the gate jobs on master
(pike) though, so the test is counting the number of partitions on the
guest before and after the unshelve operation to see that they show
up. It's also listing volume attachments for the instance after
unshelve.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1702932/+subscriptions
Follow ups