← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1646181] Re: NFS: Fail to boot VM out of large snapshots (30GB+)

 

Reviewed:  https://review.openstack.org/443752
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=52310fa8645cc10b91de7d2b4e10a3b42d4ef073
Submitter: Jenkins
Branch:    master

commit 52310fa8645cc10b91de7d2b4e10a3b42d4ef073
Author: Eric Harney <eharney@xxxxxxxxxx>
Date:   Thu Mar 9 11:25:53 2017 -0500

    Bump prlimit cpu time for qemu-img from 2 to 8
    
    Users have reported that the current CPU limit is not
    sufficient for processing large enough images when
    downloading images to volumes.
    
    This mirrors a similar increase made in Nova (b78b1f8ce).
    
    Closes-Bug: #1646181
    
    Change-Id: I5edea7d1d19fd991e51dca963d2beb7004177498


** Changed in: cinder
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1646181

Title:
  NFS: Fail to boot VM out of large snapshots (30GB+)

Status in Cinder:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) newton series:
  Fix Committed

Bug description:
  Description
  ===========
  Using NFS Shared storage, when I try to boot a VM out of a smaller snapshot (1GB) it works fine.
  Although, when i try to do the same out of a larger snapshot (30GB+) it fails regardless of the OpenStack Release Newton or Mitaka.

  Steps to reproduce
  ==================
  A chronological list of steps which will bring off the
  issue you noticed:
  * I have OpenStack RDO MNewton (or Mitaka) installed and functional
  * I boot a VM out of a QCOW2 image of about 1GB
  * Then I loginto that VM and create a large file (33GB) to inflat the VM image
  * then I shutoff the VM and take a snapshot of it that i call "largeVMsnapshotImage"

  Alternatively to the steps above,
  * I have a snapshot from a large VM (30GB+) that I upload in glance and call "largeVMsnapshotImage"

  Then I do:
  * then I try to boot a new VM out of that snapshot using the same network
  * Although the image seems to be copied to the compute node, the VM Creation fails on "qemu-img info" command

  If I run the same command manually, it works:
  /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885
  image: /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885
  file format: raw
  virtual size: 80G (85899345920 bytes)
  disk size: 37G

  Although, in the logs it fails and the VM Creation is interrupted, see log from nova-compute.log on the compute node:
  ...
  2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] BuildAbortException: Build of instance d6889ea2-f277-40e5-afdc-b3b0698537ed aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885 : Unexpected error while running command.
  2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885
  2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Exit code: -9
  2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Stdout: u''
  2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Stderr: u''
  ...

  
  Expected result
  ===============
  The VM should have been created/booted out of the larg snapshot image.

  Actual result
  =============
  The command fails with exit code -9 when Noiva

  Environment
  ===========
  1. Running RDO Newton on Centos 7.2 (or Oracle Linux 7.2) and reproduced on RDO Mitaka as well

     If this is from a distro please provide
         $ [root@controller ~]# rpm -qa|grep nova
  openstack-nova-console-14.0.0-1.el7.noarch
  puppet-nova-9.4.0-1.el7.noarch
  python-nova-14.0.0-1.el7.noarch
  openstack-nova-novncproxy-14.0.0-1.el7.noarch
  openstack-nova-conductor-14.0.0-1.el7.noarch
  openstack-nova-api-14.0.0-1.el7.noarch
  openstack-nova-common-14.0.0-1.el7.noarch
  openstack-nova-scheduler-14.0.0-1.el7.noarch
  openstack-nova-serialproxy-14.0.0-1.el7.noarch
  python2-novaclient-6.0.0-1.el7.noarch
  openstack-nova-cert-14.0.0-1.el7.noarch

  
  2. Which hypervisor did you use?
     KVM
     
     details:
     [root@compute4 nova]# rpm -qa|grep -Ei "kvm|qemu|libvirt"
  libvirt-gobject-0.1.9-1.el7.x86_64
  libvirt-gconfig-0.1.9-1.el7.x86_64
  libvirt-daemon-1.2.17-13.0.1.el7.x86_64
  qemu-kvm-common-1.5.3-105.el7.x86_64
  qemu-img-1.5.3-105.el7.x86_64
  ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
  libvirt-client-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-nodedev-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-lxc-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-kvm-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-secret-1.2.17-13.0.1.el7.x86_64
  libvirt-python-1.2.17-2.el7.x86_64
  libvirt-daemon-config-network-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-config-nwfilter-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-storage-1.2.17-13.0.1.el7.x86_64
  qemu-kvm-1.5.3-105.el7.x86_64
  libvirt-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-interface-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-network-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-nwfilter-1.2.17-13.0.1.el7.x86_64
  libvirt-daemon-driver-qemu-1.2.17-13.0.1.el7.x86_64
  libvirt-glib-0.1.9-1.el7.x86_64

  
  2. Which storage type did you use?
     NFS
     From a Sun ZFS or just from a server (both environment would have the same issue)
     [root@compute4 nova]# mount | grep -i instance
  172.31.254.254:/nova on /var/lib/nova/instances type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.31.0.114,local_lock=none,addr=172.31.254.254)


  3. Which networking type did you use?
     Neutron with OpenVSwitch

  Logs & Configs
  ==============

  From nova-compute.log:

  2016-11-30 11:34:50.855 29776 INFO nova.compute.resource_tracker [req-dc5e7f75-5ae5-4929-a7b5-f4fdb95c171e - - - - -] Compute_service record updated for c00b05:c00b05
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Instance failed to spawn
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Traceback (most recent call last):
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2078, in _build_resources
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     yield resources
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1920, in _build_and_run_instance
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     block_device_info=block_device_info)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2571, in spawn
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     admin_pass=admin_password)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2975, in _create_image
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     fallback_from_host)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3075, in _create_and_inject_local_root
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     instance, size, fallback_from_host)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6537, in _try_fetch_image_cache
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     size=size)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 218, in cache
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     *args, **kwargs)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 565, in create_image
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     self.verify_base_size(base, size)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 265, in verify_base_size
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     base_size = self.get_disk_size(base)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 277, in get_disk_size
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     return disk.get_disk_size(name)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 148, in get_disk_size
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     return images.qemu_img_info(path).virtual_size
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 67, in qemu_img_info
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     raise exception.InvalidDiskInfo(reason=msg)
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52 : Unexpected error while running command.
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Exit code: -9
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stdout: u''
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stderr: u''
  2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]
  2016-11-30 11:35:11.005 29776 WARNING nova.image.glance [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] No protocol specified in for api_server 'http://10.240.121.13:9292', please update [glance] api_servers with fully qualified url including scheme (http / https)
  2016-11-30 11:35:11.016 29776 INFO nova.virt.libvirt.driver [-] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] During wait destroy, instance disappeared.
  2016-11-30 11:35:11.045 29776 INFO nova.virt.libvirt.driver [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Deleting instance files /var/lib/nova/instances/1e1c2b5a-803a-4701-bc71-254699243f7d_del
  2016-11-30 11:35:11.056 29776 INFO nova.virt.libvirt.driver [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Deletion of /var/lib/nova/instances/1e1c2b5a-803a-4701-bc71-254699243f7d_del complete
  2016-11-30 11:35:11.194 29776 INFO nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Took 0.18 seconds to destroy the instance on the hypervisor.
  2016-11-30 11:35:11.610 29776 INFO nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Took 0.42 seconds to deallocate network for instance.
  2016-11-30 11:35:11.610 29776 WARNING nova.image.glance [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] No protocol specified in for api_server 'http://10.240.121.13:9292', please update [glance] api_servers with fully qualified url including scheme (http / https)
  2016-11-30 11:35:11.789 29776 WARNING nova.image.glance [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] No protocol specified in for api_server 'http://10.240.121.13:9292', please update [glance] api_servers with fully qualified url including scheme (http / https)
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Build of instance 1e1c2b5a-803a-4701-bc71-254699243f7d aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52 : Unexpected error while running command.
  Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52
  Exit code: -9
  Stdout: u''
  Stderr: u''
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Traceback (most recent call last):
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1779, in _do_build_and_run_instance
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     filter_properties)
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1971, in _build_and_run_instance
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]     reason=e.format_message())
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] BuildAbortException: Build of instance 1e1c2b5a-803a-4701-bc71-254699243f7d aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52 : Unexpected error while running command.
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Exit code: -9
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stdout: u''
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stderr: u''
  2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]
  2016-11-30 11:35:11.881 29776 INFO nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Took 0.09 seconds to deallocate network for instance.
  2016-11-30 11:35:12.177 29776 INFO nova.compute.manager [-] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] During sync_power_state the instance has a pending task (spawning). Skip.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1646181/+subscriptions


References