yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1371677] [NEW] Race in resource tracker causes 500 response on deleting during verify_resize state

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matthew Treinish <mtreinish@xxxxxxxxxx>
Date: Fri, 19 Sep 2014 16:13:01 -0000
Reply-to: Bug 1371677 <1371677@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

During a tempest run occasionally a during the 
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_verify_resize_state 
test it will fail when the test attempts to delete a server in the verify_resize state. The failure is caused by a 500 response given being returned from nova. Looking at the nova-api log this is caused by an rpc call never receiving a response:

http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
postgres-full/4cd8a81/logs/screen-n-api.txt.gz#_2014-09-19_10_24_07_221

looking at the n-cpu logs for the handling of that rpc call yields:

http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
postgres-full/4cd8a81/logs/screen-n-cpu.txt.gz#_2014-09-19_10_24_31_404

Which looks like it is coming from attempting to updating the resource
tracker being triggered by the server deletion. However the volume from
that failure according to the tempest log is coming from a different
test, in the test class ServerRescueNegativeTestJSON. It appears the
tearDownClass for that test class is running concurrently with the
failed test, and causing a race in the resource tracker, where the
volume it expects to be there disappears, so when it goes to get the
size it fails.

Full logs for an example run that tripped this is here:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-postgres-full/4cd8a81

** Affects: nova
     Importance: Undecided
         Status: Confirmed


** Tags: compute

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1371677

Title:
  Race in resource tracker causes 500 response on deleting during
  verify_resize state

Status in OpenStack Compute (Nova):
  Confirmed

Bug description:
  During a tempest run occasionally a during the 
  tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_verify_resize_state 
  test it will fail when the test attempts to delete a server in the verify_resize state. The failure is caused by a 500 response given being returned from nova. Looking at the nova-api log this is caused by an rpc call never receiving a response:

  http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
  postgres-
  full/4cd8a81/logs/screen-n-api.txt.gz#_2014-09-19_10_24_07_221

  looking at the n-cpu logs for the handling of that rpc call yields:

  http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
  postgres-
  full/4cd8a81/logs/screen-n-cpu.txt.gz#_2014-09-19_10_24_31_404

  Which looks like it is coming from attempting to updating the resource
  tracker being triggered by the server deletion. However the volume
  from that failure according to the tempest log is coming from a
  different test, in the test class ServerRescueNegativeTestJSON. It
  appears the tearDownClass for that test class is running concurrently
  with the failed test, and causing a race in the resource tracker,
  where the volume it expects to be there disappears, so when it goes to
  get the size it fails.

  Full logs for an example run that tripped this is here:
  http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-postgres-full/4cd8a81

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1371677/+subscriptions

Follow ups

[Bug 1371677] Re: Race in resource tracker causes 500 response on deleting during verify_resize state
From: Thierry Carrez, 2014-10-01
[Bug 1371677] [NEW] Race in resource tracker causes 500 response on deleting during verify_resize state
From: Matthew Treinish, 2014-09-19

References

[Bug 1371677] [NEW] Race in resource tracker causes 500 response on deleting during verify_resize state
From: Matthew Treinish, 2014-09-19