yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76961
[Bug 1815791] [NEW] Race condition causes Nova to shut off a successfully deployed baremetal server
Public bug reported:
When booting a baremetal server with Nova, we see Ironic report a
successful power on:
ironic-conductor.log:2019-02-13 10:52:15.901 7 INFO
ironic.conductor.utils [req-774350ce-9392-4096-b66c-20ad3d794e4e
7a9b1ac45e084e7cbeb9cb740ffe8d08 41ea8af8d00e46438c7be3b182bbb53f -
default default] Successfully set node a00696d5-32ba-
475e-9528-59bf11cffea6 power state to power on by power on.
But seconds later, Nova (a) triggers a power state sync and then (b)
decided the node is in state "power off" and shuts it down:
nova-compute.log:2019-02-13 10:52:17.289 7 DEBUG nova.compute.manager [req-9bcae7d4-4201-40ea-a66c-c5954117f0e4 - - - - -] Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states /usr/lib/python2.7/site-packages/nova/compute/manager.py:7516
nova-compute.log:2019-02-13 10:52:17.295 7 DEBUG oslo_concurrency.lockutils [-] Lock "dcb4f055-cda4-4d61-ba8f-976645c4e92a" acquired by "nova.compute.manager.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327
nova-compute.log:2019-02-13 10:52:17.344 7 WARNING nova.compute.manager [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 4, current VM power_state: 4
nova-compute.log:2019-02-13 10:52:17.345 7 DEBUG nova.compute.api [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Going to try to stop instance force_stop /usr/lib/python2.7/site-packages/nova/compute/api.py:2291
It looks like Nova is using stale cache data to make this decision.
jroll on irc suggests a solution may look like
https://review.openstack.org/#/c/636699/ (bypass cache data to verify
power state before shutting down the server).
This is with nova @ ad842aa and ironic @ 4404292.
** Affects: nova
Importance: Undecided
Assignee: Jim Rollenhagen (jim-rollenhagen)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1815791
Title:
Race condition causes Nova to shut off a successfully deployed
baremetal server
Status in OpenStack Compute (nova):
In Progress
Bug description:
When booting a baremetal server with Nova, we see Ironic report a
successful power on:
ironic-conductor.log:2019-02-13 10:52:15.901 7 INFO
ironic.conductor.utils [req-774350ce-9392-4096-b66c-20ad3d794e4e
7a9b1ac45e084e7cbeb9cb740ffe8d08 41ea8af8d00e46438c7be3b182bbb53f -
default default] Successfully set node a00696d5-32ba-
475e-9528-59bf11cffea6 power state to power on by power on.
But seconds later, Nova (a) triggers a power state sync and then (b)
decided the node is in state "power off" and shuts it down:
nova-compute.log:2019-02-13 10:52:17.289 7 DEBUG nova.compute.manager [req-9bcae7d4-4201-40ea-a66c-c5954117f0e4 - - - - -] Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states /usr/lib/python2.7/site-packages/nova/compute/manager.py:7516
nova-compute.log:2019-02-13 10:52:17.295 7 DEBUG oslo_concurrency.lockutils [-] Lock "dcb4f055-cda4-4d61-ba8f-976645c4e92a" acquired by "nova.compute.manager.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327
nova-compute.log:2019-02-13 10:52:17.344 7 WARNING nova.compute.manager [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 4, current VM power_state: 4
nova-compute.log:2019-02-13 10:52:17.345 7 DEBUG nova.compute.api [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Going to try to stop instance force_stop /usr/lib/python2.7/site-packages/nova/compute/api.py:2291
It looks like Nova is using stale cache data to make this decision.
jroll on irc suggests a solution may look like
https://review.openstack.org/#/c/636699/ (bypass cache data to verify
power state before shutting down the server).
This is with nova @ ad842aa and ironic @ 4404292.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1815791/+subscriptions
Follow ups