yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #43244
[Bug 1464259] Re: Volumes tests fails often with rbd backend
Reviewed: https://review.openstack.org/254428
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4f2a46987cf705d5dea84e97ef2006342cc5d9c4
Submitter: Jenkins
Branch: master
commit 4f2a46987cf705d5dea84e97ef2006342cc5d9c4
Author: Matt Riedemann <mriedem@xxxxxxxxxx>
Date: Mon Dec 7 14:49:18 2015 -0800
Make sure bdm.volume_id is set after auto-creating volumes
The test_create_ebs_image_and_check_boot test in Tempest does the
following:
1. create volume1 from an image
2. boot server1 from volume1 with delete_on_termination=True and wait
for the server to be ACTIVE
3. create snapshot from server1 (creates image and volume snapshots)
4. delete server1
5. create server2 from the image snapshot (don't wait for it to be
ACTIVE - this auto-creates volume2 from the volume snapshot in
cinder and attaches server2 to it)
6. delete server2 (could still be building/attaching volumes in the
background)
7. cleanup
There is a race when booting server2, which creates and attaches
volume2, and deleting server2 before it's active.
The volume attach completes and updates the bdm.volume_id in the DB before
we get to _shutdown_instance, but after the delete request is in the API.
The compute API gets potentially stale BDMs and passes those over RPC to
the compute. So we add a check in _shutdown_instance to see if we have
potentially stale volume BDMs and refresh that list if so.
The instance.uuid locks in build_and_run_instance and terminate_instance
create the mutex on the compute host such that the bdm.volume_id should
be set in the database after the volume attach and before terminate_instance
gets the lock. The bdm.volume_id could still be None in _shutdown_instance
if the volume create fails, but we don't have anything to teardown in cinder
in that case anyway.
In the case of the race bug, deleting the volume snapshot in cinder fails
because volume2 was never deleted by nova, so the test fails in
teardown. Note that there is still potential for a race here, this does
not eliminate it, but should narrow the race window.
This also cleans up the logging in attach_block_devices since there
may not be a volume_id at that point (depending on bdm.source_type).
Closes-Bug: #1464259
Change-Id: Ib60d60a5af35be89ad8afbcf44fcffe0b0ce2876
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1464259
Title:
Volumes tests fails often with rbd backend
Status in Cinder:
Triaged
Status in OpenStack Compute (nova):
Fix Released
Status in tempest:
Invalid
Bug description:
http://logs.openstack.org/02/173802/5/check/check-tempest-dsvm-full-
ceph/a72aac1/logs/screen-n-api.txt.gz?level=TRACE#_2015-06-11_09_04_19_511
2015-06-11 09:04:19.511 ERROR nova.api.ec2 [req-0ac81d78-2717-4dd2-80e2-d94363b55ac8 EC2VolumesTest-442487008 EC2VolumesTest-1066393631] Unexpected InvalidInput raised: Invalid input received: Invalid volume: Volume still has 1 dependent snapshots. (HTTP 400) (Request-ID: req-4586b5d2-7212-4ddd-af79-43ad8ba7ea58)
2015-06-11 09:04:19.511 ERROR nova.api.ec2 [req-0ac81d78-2717-4dd2-80e2-d94363b55ac8 EC2VolumesTest-442487008 EC2VolumesTest-1066393631] Environment: {"HTTP_AUTHORIZATION": "AWS4-HMAC-SHA256 Credential=a5e9253350ce4a249ddce8b7c1c798c2/20150611/0/127/aws4_request,SignedHeaders=host;x-amz-date,Signature=304830ed947f7fba3143887b08d1e47faa18d4b59782c0992727cb7593f586b4", "SCRIPT_NAME": "", "REQUEST_METHOD": "POST", "HTTP_X_AMZ_DATE": "20150611T090418Z", "PATH_INFO": "/", "SERVER_PROTOCOL": "HTTP/1.0", "CONTENT_LENGTH": "60", "HTTP_USER_AGENT": "Boto/2.38.0 Python/2.7.6 Linux/3.13.0-53-generic", "RAW_PATH_INFO": "/", "REMOTE_ADDR": "127.0.0.1", "wsgi.url_scheme": "http", "SERVER_PORT": "8773", "CONTENT_TYPE": "application/x-www-form-urlencoded; charset=UTF-8", "HTTP_HOST": "127.0.0.1:8773", "SERVER_NAME": "127.0.0.1", "GATEWAY_INTERFACE": "CGI/1.1", "REMOTE_PORT": "45819", "HTTP_ACCEPT_ENCODING": "identity"}
http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRUMyVm9sdW1lc1Rlc3RcIiBBTkQgbWVzc2FnZTpcIlVuZXhwZWN0ZWQgSW52YWxpZElucHV0IHJhaXNlZDogSW52YWxpZCBpbnB1dCByZWNlaXZlZDogSW52YWxpZCB2b2x1bWU6IFZvbHVtZSBzdGlsbCBoYXMgMSBkZXBlbmRlbnQgc25hcHNob3RzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tbi1hcGkudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MzQwMzAyMTUwODd9
10 hits in 7 days, check and gate, hitting on the ceph and glusterfs
jobs.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1464259/+subscriptions
References