yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #21807
[Bug 1371677] [NEW] Race in resource tracker causes 500 response on deleting during verify_resize state
Public bug reported:
During a tempest run occasionally a during the
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_verify_resize_state
test it will fail when the test attempts to delete a server in the verify_resize state. The failure is caused by a 500 response given being returned from nova. Looking at the nova-api log this is caused by an rpc call never receiving a response:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
postgres-full/4cd8a81/logs/screen-n-api.txt.gz#_2014-09-19_10_24_07_221
looking at the n-cpu logs for the handling of that rpc call yields:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
postgres-full/4cd8a81/logs/screen-n-cpu.txt.gz#_2014-09-19_10_24_31_404
Which looks like it is coming from attempting to updating the resource
tracker being triggered by the server deletion. However the volume from
that failure according to the tempest log is coming from a different
test, in the test class ServerRescueNegativeTestJSON. It appears the
tearDownClass for that test class is running concurrently with the
failed test, and causing a race in the resource tracker, where the
volume it expects to be there disappears, so when it goes to get the
size it fails.
Full logs for an example run that tripped this is here:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-postgres-full/4cd8a81
** Affects: nova
Importance: Undecided
Status: Confirmed
** Tags: compute
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1371677
Title:
Race in resource tracker causes 500 response on deleting during
verify_resize state
Status in OpenStack Compute (Nova):
Confirmed
Bug description:
During a tempest run occasionally a during the
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_verify_resize_state
test it will fail when the test attempts to delete a server in the verify_resize state. The failure is caused by a 500 response given being returned from nova. Looking at the nova-api log this is caused by an rpc call never receiving a response:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
postgres-
full/4cd8a81/logs/screen-n-api.txt.gz#_2014-09-19_10_24_07_221
looking at the n-cpu logs for the handling of that rpc call yields:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-
postgres-
full/4cd8a81/logs/screen-n-cpu.txt.gz#_2014-09-19_10_24_31_404
Which looks like it is coming from attempting to updating the resource
tracker being triggered by the server deletion. However the volume
from that failure according to the tempest log is coming from a
different test, in the test class ServerRescueNegativeTestJSON. It
appears the tearDownClass for that test class is running concurrently
with the failed test, and causing a race in the resource tracker,
where the volume it expects to be there disappears, so when it goes to
get the size it fails.
Full logs for an example run that tripped this is here:
http://logs.openstack.org/10/110110/40/check/check-tempest-dsvm-postgres-full/4cd8a81
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1371677/+subscriptions
Follow ups
References