← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1639914] Re: Race condition in nova compute during snapshot

 

Yes the something happening is due to the delete of server via API after
confirmation that image is created, but in fact nova-compute is still
working on snapshot creation while it receives the delete request, hence
it deletes the server and aborts the snapshot creates. This scenario
uncovered while testing with rally (rally uses nova API)

https://github.com/openstack/rally/blob/master/rally/plugins/openstack/scenarios/nova/servers.py#L281-L282

So I don't think it is a invalid test case, there need to be some guard
in compute to handle these conflicting operations.


** Changed in: nova
       Status: Invalid => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1639914

Title:
  Race condition in nova compute during snapshot

Status in OpenStack Compute (nova):
  New

Bug description:
  When snapshot is created and immediately deleting the instance seems
  to cause race condition. I was able to re-create it on latest devstack
  on installed on 8th november

  This can be created with following commands.

  1. nova boot --flavor m1.large --image 6d4259ce-5873-42cb-8cbe-
  9873f069c149 testinstance

  id                                   | bef22f9b-
  ade4-48a1-86c4-b9a007897eb3

  2. nova image-create bef22f9b-ade4-48a1-86c4-b9a007897eb3 testinstance-snap ; nova delete bef22f9b-ade4-48a1-86c4-b9a007897eb3
  Request to delete server bef22f9b-ade4-48a1-86c4-b9a007897eb3 has been accepted.
  3. nova image-list doesn't show the snapshot

  4. nova list doesn't show the instance

  Nova compute log indicates a race condition while executing CLI
  commands in 2 above

  <182>1 2016-10-28T14:46:41.830208+00:00 hyper1 nova-compute 30056 - [40521 levelname="INFO" component="nova-compute" funcname="nova.compute.manager" request_id="req-e9e4e899-e2a7-4bf8-bdf1-c26f5634cfda" user="51fa0172fbdf495e89132f7f4574e750" tenant="00ead348c5f9475f8940ab29cd767c5e" instance="[instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] " lineno="/usr/lib/python2.7/site-packages/nova/compute/manager.py:2249"] nova.compute.manager Terminating instance
  <183>1 2016-10-28T14:46:42.057653+00:00 hyper1 nova-compute 30056 - [40521 levelname="DEBUG" component="nova-compute" funcname="nova.compute.manager" request_id="req-1c4cf749-a6a8-46af-b331-f70dc1e9f364" user="51fa0172fbdf495e89132f7f4574e750" tenant="00ead348c5f9475f8940ab29cd767c5e" instance="[instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] " lineno="/usr/lib/python2.7/site-packages/nova/compute/manager.py:420"] nova.compute.manager Cleaning up image ae9ebf4b-7dd6-4615-816f-c2f3c7c08530 decorated_function /usr/lib/python2.7/site-packages/nova/compute/manager.py:420
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] Traceback (most recent call last):
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 416, in decorated_function
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     *args, **kwargs)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3038, in snapshot_instance
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     task_states.IMAGE_SNAPSHOT)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3068, in _snapshot_instance
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     update_task_state)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1447, in snapshot
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     guest.save_memory_state()
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 363, in save_memory_state
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     self._domain.managedSave(0)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     rv = execute(f, *args, **kwargs)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     six.reraise(c, e, tb)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     rv = meth(*args, **kwargs)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1397, in managedSave
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3]     if ret == -1: raise libvirtError ('virDomainManagedSave() failed', dom=self)
  !!!NL!!! 30056 TRACE nova.compute.manager [instance: bef22f9b-ade4-48a1-86c4-b9a007897eb3] libvirtError: operation failed: domain is no longer running

  Nova compute should make sure the save is completed before attempting
  to delete the domain.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1639914/+subscriptions


References