← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1815791] [NEW] Race condition causes Nova to shut off a successfully deployed baremetal server

 

Public bug reported:

When booting a baremetal server with Nova, we see Ironic report a
successful power on:

  ironic-conductor.log:2019-02-13 10:52:15.901 7 INFO
ironic.conductor.utils [req-774350ce-9392-4096-b66c-20ad3d794e4e
7a9b1ac45e084e7cbeb9cb740ffe8d08 41ea8af8d00e46438c7be3b182bbb53f -
default default] Successfully set node a00696d5-32ba-
475e-9528-59bf11cffea6 power state to power on by power on.

But seconds later, Nova (a) triggers a power state sync and then (b)
decided the node is in state "power off" and shuts it down:

	nova-compute.log:2019-02-13 10:52:17.289 7 DEBUG nova.compute.manager [req-9bcae7d4-4201-40ea-a66c-c5954117f0e4 - - - - -] Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states /usr/lib/python2.7/site-packages/nova/compute/manager.py:7516
	nova-compute.log:2019-02-13 10:52:17.295 7 DEBUG oslo_concurrency.lockutils [-] Lock "dcb4f055-cda4-4d61-ba8f-976645c4e92a" acquired by "nova.compute.manager.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327
	nova-compute.log:2019-02-13 10:52:17.344 7 WARNING nova.compute.manager [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 4, current VM power_state: 4
	nova-compute.log:2019-02-13 10:52:17.345 7 DEBUG nova.compute.api [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Going to try to stop instance force_stop /usr/lib/python2.7/site-packages/nova/compute/api.py:2291

It looks like Nova is using stale cache data to make this decision.

jroll on irc suggests a solution may look like
https://review.openstack.org/#/c/636699/ (bypass cache data to verify
power state before shutting down the server).

This is with nova @ ad842aa and ironic @ 4404292.

** Affects: nova
     Importance: Undecided
     Assignee: Jim Rollenhagen (jim-rollenhagen)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1815791

Title:
  Race condition causes Nova to shut off a successfully deployed
  baremetal server

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When booting a baremetal server with Nova, we see Ironic report a
  successful power on:

    ironic-conductor.log:2019-02-13 10:52:15.901 7 INFO
  ironic.conductor.utils [req-774350ce-9392-4096-b66c-20ad3d794e4e
  7a9b1ac45e084e7cbeb9cb740ffe8d08 41ea8af8d00e46438c7be3b182bbb53f -
  default default] Successfully set node a00696d5-32ba-
  475e-9528-59bf11cffea6 power state to power on by power on.

  But seconds later, Nova (a) triggers a power state sync and then (b)
  decided the node is in state "power off" and shuts it down:

  	nova-compute.log:2019-02-13 10:52:17.289 7 DEBUG nova.compute.manager [req-9bcae7d4-4201-40ea-a66c-c5954117f0e4 - - - - -] Triggering sync for uuid dcb4f055-cda4-4d61-ba8f-976645c4e92a _sync_power_states /usr/lib/python2.7/site-packages/nova/compute/manager.py:7516
  	nova-compute.log:2019-02-13 10:52:17.295 7 DEBUG oslo_concurrency.lockutils [-] Lock "dcb4f055-cda4-4d61-ba8f-976645c4e92a" acquired by "nova.compute.manager.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327
  	nova-compute.log:2019-02-13 10:52:17.344 7 WARNING nova.compute.manager [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 4, current VM power_state: 4
  	nova-compute.log:2019-02-13 10:52:17.345 7 DEBUG nova.compute.api [-] [instance: dcb4f055-cda4-4d61-ba8f-976645c4e92a] Going to try to stop instance force_stop /usr/lib/python2.7/site-packages/nova/compute/api.py:2291

  It looks like Nova is using stale cache data to make this decision.

  jroll on irc suggests a solution may look like
  https://review.openstack.org/#/c/636699/ (bypass cache data to verify
  power state before shutting down the server).

  This is with nova @ ad842aa and ironic @ 4404292.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1815791/+subscriptions


Follow ups