yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77277
[Bug 1815791] Re: Race condition causes Nova to shut off a successfully deployed baremetal server
Reviewed: https://review.openstack.org/636699
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=19cb8280232fd3b0ba0000a475d061ea9fb10e1a
Submitter: Zuul
Branch: master
commit 19cb8280232fd3b0ba0000a475d061ea9fb10e1a
Author: Jim Rollenhagen <jim@xxxxxxxxxxxxxxxxxx>
Date: Wed Feb 13 12:59:53 2019 -0500
ironic: check fresh data when sync_power_state doesn't line up
We return cached data to sync_power_state to avoid pummeling the ironic
API. However, this can lead to a race condition where an instance is
powered on, but nova thinks it should be off and calls stop(). Check
again without the cache when this happens to make sure we don't
unnecessarily kill an instance.
Closes-Bug: #1815791
Change-Id: I907b69eb689cf6c169a4869cfc7889308ca419d5
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1815791
Title:
Race condition causes Nova to shut off a successfully deployed
baremetal server
Status in OpenStack Compute (nova):
Fix Released
Bug description:
When booting a baremetal server with Nova, we see Ironic report a
successful power on:
ironic-conductor.log:2019-02-13 10:52:15.901 7 INFO
ironic.conductor.utils [req-774350ce-9392-4096-b66c-20ad3d794e4e
7a9b1ac45e084e7cbeb9cb740ffe8d08 41ea8af8d00e46438c7be3b182bbb53f -
default default] Successfully set node a00696d5-32ba-
475e-9528-59bf11cffea6 power state to power on by power on.
But seconds later, Nova (a) triggers a power state sync and then (b)
decided the node is in state "power off" and shuts it down:
nova-compute.log:2019-02-13 10:52:17.289 7 DEBUG nova.compute.manager [req-9bcae7d4-4201-40ea-a66c-c5954117f0e4 - - - - -] Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states /usr/lib/python2.7/site-packages/nova/compute/manager.py:7516
nova-compute.log:2019-02-13 10:52:17.295 7 DEBUG oslo_concurrency.lockutils [-] Lock "dcb4f055-cda4-4d61-ba8f-976645c4e92a" acquired by "nova.compute.manager.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327
nova-compute.log:2019-02-13 10:52:17.344 7 WARNING nova.compute.manager [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 4, current VM power_state: 4
nova-compute.log:2019-02-13 10:52:17.345 7 DEBUG nova.compute.api [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Going to try to stop instance force_stop /usr/lib/python2.7/site-packages/nova/compute/api.py:2291
It looks like Nova is using stale cache data to make this decision.
jroll on irc suggests a solution may look like
https://review.openstack.org/#/c/636699/ (bypass cache data to verify
power state before shutting down the server).
This is with nova @ ad842aa and ironic @ 4404292.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1815791/+subscriptions
References