← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1470437] Re: ImageCacheManager raises Permission denied error on nova compute in race condition

 

Reviewed:  https://review.openstack.org/185549
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ec9d5e375e208686d33b9259b039cc009bded42e
Submitter: Jenkins
Branch:    master

commit ec9d5e375e208686d33b9259b039cc009bded42e
Author: Ankit Agrawal <ankit11.agrawal@xxxxxxxxxxx>
Date:   Mon Aug 10 16:27:57 2015 +1000

    libvirt: Race condition leads to instance in error
    
    ImageCacheManager deletes base image while image backend is copying
    image to the instance path leading instance to go in the error state.
    
    Acquired lock before removing image from cache. If libvirt is copying
    image to the instance path, image cache manager won't be able to remove
    it until libvirt finishes copying image completely.
    
    Closes-Bug: 1256838
    Closes-Bug: 1470437
    Co-Authored-By: Michael Still <mikal@xxxxxxxxxxx>
    Depends-On: I337ce28e2fc516c91bec61ca3639ebff0029ad49
    Change-Id: I376cc951922c338669fdf3f83da83e0d3cea1532


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1470437

Title:
  ImageCacheManager raises Permission denied error on nova compute in
  race condition

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  ImageCacheManager raises Permission denied error on nova compute in
  race condition

  While creating an instance snapshot nova calls guest.launch method
  from libvirt driver which changes the base file permissions and
  updates base file user from openstack to libvirt-qemu (in case of
  qcow2 image backend). In race condition when ImageCacheManager is
  trying to update last access time of this base file and guest.launch
  is called by instance snapshot just before updating the access time,
  ImageCacheManager raise Permission denied error in nova compute for
  os.utime().

  Steps to reproduce:
  1. Configure image_cache_manager_interval=120 in nova.conf and use qcow2 image backend.
  2. Add a sleep for 60 sec in _handle_base_image method of libvirt.imagecache just before calling os.utime().
  3. Restart nova services.
  4. Create an instance using image.
  $ nova boot --image 5e1659aa-6d38-44e8-aaa3-4217337436c0 --flavor 1 instance-1
  5. Check that instance is in active state.
  6. Go to the n-cpu screen and check imagecache manager logs at the point it waits to execute sleep statement added in step #2.
  7. Send instance snapshot request when imagecache manger is waiting to execute sleep.
  $ nova image-create 19c7900b-73d5-4c2e-b129-5e2a6b13f396 instance-1-snap
  8. instance snapshot request updates the base file owner to libvirt-qemu by calling guest.launch method from libvirt driver.
  9. Now when imagecache manger comes out from sleep and executes os.utime it raise following Permission denied error in nova compute.

  2015-07-01 01:51:46.794 ERROR nova.openstack.common.periodic_task [req-a03fa45f-ffb9-48dd-8937-5b0414c6864b None None] Error during ComputeManager._run_image_cache_manager_pass
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task Traceback(most recent call last):
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task   File "/opt/stack/nova/nova/openstack/common/periodic_task.py", line 224, in run_periodic_tasks
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task     task(self, context)
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task   File "/opt/stack/nova/nova/compute/manager.py", line 6177, in _run_image_cache_manager_pass
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task     self.driver.manage_image_cache(context, filtered_instances)
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6252, in manage_image_cache
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task     self.image_cache_manager.update(context, all_instances)
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task   File "/opt/stack/nova/nova/virt/libvirt/imagecache.py", line 668, in update
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task
  self._age_and_verify_cached_images(context, all_instances, base_dir)
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task   File "/opt/stack/nova/nova/virt/libvirt/imagecache.py", line 598, in _age_and_verify_cached_images
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task     self._handle_base_image(img, base_file)
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task   File "/opt/stack/nova/nova/virt/libvirt/imagecache.py", line 570, in _handle_base_image
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task     os.utime(base_file, None)
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task OSError:[Errno 13] Permission denied: '/opt/stack/data/nova/instances/_base/8d2c340dcce68e48a75457b1e91457feed27aef5'
  2015-07-01 01:51:46.794 TRACE nova.openstack.common.periodic_task

  Expected result: guest.launch should not update the base file
  permissions and owner to libvirt-qemu.  Base file owner should remain
  unchanged.

  Actual result: Libvirt is updating the base file owner which causes
  permission issues in nova.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1470437/+subscriptions


References