yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1464259] Re: Volumes tests fails often with rbd backend

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1464259@xxxxxxxxxxxxxxxxxx>
Date: Sat, 12 Dec 2015 00:06:47 -0000
Reply-to: Bug 1464259 <1464259@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Reviewed:  https://review.openstack.org/254428
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4f2a46987cf705d5dea84e97ef2006342cc5d9c4
Submitter: Jenkins
Branch:    master

commit 4f2a46987cf705d5dea84e97ef2006342cc5d9c4
Author: Matt Riedemann <mriedem@xxxxxxxxxx>
Date:   Mon Dec 7 14:49:18 2015 -0800

    Make sure bdm.volume_id is set after auto-creating volumes
    
    The test_create_ebs_image_and_check_boot test in Tempest does the
    following:
    
    1. create volume1 from an image
    2. boot server1 from volume1 with delete_on_termination=True and wait
       for the server to be ACTIVE
    3. create snapshot from server1 (creates image and volume snapshots)
    4. delete server1
    5. create server2 from the image snapshot (don't wait for it to be
       ACTIVE - this auto-creates volume2 from the volume snapshot in
       cinder and attaches server2 to it)
    6. delete server2 (could still be building/attaching volumes in the
       background)
    7. cleanup
    
    There is a race when booting server2, which creates and attaches
    volume2, and deleting server2 before it's active.
    
    The volume attach completes and updates the bdm.volume_id in the DB before
    we get to _shutdown_instance, but after the delete request is in the API.
    The compute API gets potentially stale BDMs and passes those over RPC to
    the compute. So we add a check in _shutdown_instance to see if we have
    potentially stale volume BDMs and refresh that list if so.
    
    The instance.uuid locks in build_and_run_instance and terminate_instance
    create the mutex on the compute host such that the bdm.volume_id should
    be set in the database after the volume attach and before terminate_instance
    gets the lock. The bdm.volume_id could still be None in _shutdown_instance
    if the volume create fails, but we don't have anything to teardown in cinder
    in that case anyway.
    
    In the case of the race bug, deleting the volume snapshot in cinder fails
    because volume2 was never deleted by nova, so the test fails in
    teardown. Note that there is still potential for a race here, this does
    not eliminate it, but should narrow the race window.
    
    This also cleans up the logging in attach_block_devices since there
    may not be a volume_id at that point (depending on bdm.source_type).
    
    Closes-Bug: #1464259
    
    Change-Id: Ib60d60a5af35be89ad8afbcf44fcffe0b0ce2876


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1464259

Title:
  Volumes tests fails often with rbd backend

Status in Cinder:
  Triaged
Status in OpenStack Compute (nova):
  Fix Released
Status in tempest:
  Invalid

Bug description:
  http://logs.openstack.org/02/173802/5/check/check-tempest-dsvm-full-
  ceph/a72aac1/logs/screen-n-api.txt.gz?level=TRACE#_2015-06-11_09_04_19_511

  2015-06-11 09:04:19.511 ERROR nova.api.ec2 [req-0ac81d78-2717-4dd2-80e2-d94363b55ac8 EC2VolumesTest-442487008 EC2VolumesTest-1066393631] Unexpected InvalidInput raised: Invalid input received: Invalid volume: Volume still has 1 dependent snapshots. (HTTP 400) (Request-ID: req-4586b5d2-7212-4ddd-af79-43ad8ba7ea58)
  2015-06-11 09:04:19.511 ERROR nova.api.ec2 [req-0ac81d78-2717-4dd2-80e2-d94363b55ac8 EC2VolumesTest-442487008 EC2VolumesTest-1066393631] Environment: {"HTTP_AUTHORIZATION": "AWS4-HMAC-SHA256 Credential=a5e9253350ce4a249ddce8b7c1c798c2/20150611/0/127/aws4_request,SignedHeaders=host;x-amz-date,Signature=304830ed947f7fba3143887b08d1e47faa18d4b59782c0992727cb7593f586b4", "SCRIPT_NAME": "", "REQUEST_METHOD": "POST", "HTTP_X_AMZ_DATE": "20150611T090418Z", "PATH_INFO": "/", "SERVER_PROTOCOL": "HTTP/1.0", "CONTENT_LENGTH": "60", "HTTP_USER_AGENT": "Boto/2.38.0 Python/2.7.6 Linux/3.13.0-53-generic", "RAW_PATH_INFO": "/", "REMOTE_ADDR": "127.0.0.1", "wsgi.url_scheme": "http", "SERVER_PORT": "8773", "CONTENT_TYPE": "application/x-www-form-urlencoded; charset=UTF-8", "HTTP_HOST": "127.0.0.1:8773", "SERVER_NAME": "127.0.0.1", "GATEWAY_INTERFACE": "CGI/1.1", "REMOTE_PORT": "45819", "HTTP_ACCEPT_ENCODING": "identity"}

  http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRUMyVm9sdW1lc1Rlc3RcIiBBTkQgbWVzc2FnZTpcIlVuZXhwZWN0ZWQgSW52YWxpZElucHV0IHJhaXNlZDogSW52YWxpZCBpbnB1dCByZWNlaXZlZDogSW52YWxpZCB2b2x1bWU6IFZvbHVtZSBzdGlsbCBoYXMgMSBkZXBlbmRlbnQgc25hcHNob3RzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tbi1hcGkudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MzQwMzAyMTUwODd9

  10 hits in 7 days, check and gate, hitting on the ceph and glusterfs
  jobs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1464259/+subscriptions
References

[Bug 1464259] [NEW] EC2VolumesTest fails with shared storage backend
From: Matt Riedemann, 2015-06-11