yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #59591
[Bug 1646181] Re: NFS: Fail to boot VM out of large snapshots (30GB+)
** Also affects: nova/newton
Importance: Undecided
Status: New
** Changed in: nova/newton
Status: New => Confirmed
** Changed in: nova/newton
Importance: Undecided => Medium
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1646181
Title:
NFS: Fail to boot VM out of large snapshots (30GB+)
Status in OpenStack Compute (nova):
In Progress
Status in OpenStack Compute (nova) newton series:
Confirmed
Bug description:
Description
===========
Using NFS Shared storage, when I try to boot a VM out of a smaller snapshot (1GB) it works fine.
Although, when i try to do the same out of a larger snapshot (30GB+) it fails regardless of the OpenStack Release Newton or Mitaka.
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* I have OpenStack RDO MNewton (or Mitaka) installed and functional
* I boot a VM out of a QCOW2 image of about 1GB
* Then I loginto that VM and create a large file (33GB) to inflat the VM image
* then I shutoff the VM and take a snapshot of it that i call "largeVMsnapshotImage"
Alternatively to the steps above,
* I have a snapshot from a large VM (30GB+) that I upload in glance and call "largeVMsnapshotImage"
Then I do:
* then I try to boot a new VM out of that snapshot using the same network
* Although the image seems to be copied to the compute node, the VM Creation fails on "qemu-img info" command
If I run the same command manually, it works:
/usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885
image: /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885
file format: raw
virtual size: 80G (85899345920 bytes)
disk size: 37G
Although, in the logs it fails and the VM Creation is interrupted, see log from nova-compute.log on the compute node:
...
2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] BuildAbortException: Build of instance d6889ea2-f277-40e5-afdc-b3b0698537ed aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885 : Unexpected error while running command.
2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/2b54e1ca13134ceb7fc489d58d7aa6fd321b1885
2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Exit code: -9
2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Stdout: u''
2016-11-29 17:52:23.581 10284 ERROR nova.compute.manager [instance: d6889ea2-f277-40e5-afdc-b3b0698537ed] Stderr: u''
...
Expected result
===============
The VM should have been created/booted out of the larg snapshot image.
Actual result
=============
The command fails with exit code -9 when Noiva
Environment
===========
1. Running RDO Newton on Centos 7.2 (or Oracle Linux 7.2) and reproduced on RDO Mitaka as well
If this is from a distro please provide
$ [root@controller ~]# rpm -qa|grep nova
openstack-nova-console-14.0.0-1.el7.noarch
puppet-nova-9.4.0-1.el7.noarch
python-nova-14.0.0-1.el7.noarch
openstack-nova-novncproxy-14.0.0-1.el7.noarch
openstack-nova-conductor-14.0.0-1.el7.noarch
openstack-nova-api-14.0.0-1.el7.noarch
openstack-nova-common-14.0.0-1.el7.noarch
openstack-nova-scheduler-14.0.0-1.el7.noarch
openstack-nova-serialproxy-14.0.0-1.el7.noarch
python2-novaclient-6.0.0-1.el7.noarch
openstack-nova-cert-14.0.0-1.el7.noarch
2. Which hypervisor did you use?
KVM
details:
[root@compute4 nova]# rpm -qa|grep -Ei "kvm|qemu|libvirt"
libvirt-gobject-0.1.9-1.el7.x86_64
libvirt-gconfig-0.1.9-1.el7.x86_64
libvirt-daemon-1.2.17-13.0.1.el7.x86_64
qemu-kvm-common-1.5.3-105.el7.x86_64
qemu-img-1.5.3-105.el7.x86_64
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
libvirt-client-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-lxc-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-kvm-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-13.0.1.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-config-network-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-config-nwfilter-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-storage-1.2.17-13.0.1.el7.x86_64
qemu-kvm-1.5.3-105.el7.x86_64
libvirt-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-interface-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-network-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-13.0.1.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.0.1.el7.x86_64
libvirt-glib-0.1.9-1.el7.x86_64
2. Which storage type did you use?
NFS
From a Sun ZFS or just from a server (both environment would have the same issue)
[root@compute4 nova]# mount | grep -i instance
172.31.254.254:/nova on /var/lib/nova/instances type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=172.31.0.114,local_lock=none,addr=172.31.254.254)
3. Which networking type did you use?
Neutron with OpenVSwitch
Logs & Configs
==============
From nova-compute.log:
2016-11-30 11:34:50.855 29776 INFO nova.compute.resource_tracker [req-dc5e7f75-5ae5-4929-a7b5-f4fdb95c171e - - - - -] Compute_service record updated for c00b05:c00b05
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Instance failed to spawn
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Traceback (most recent call last):
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2078, in _build_resources
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] yield resources
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1920, in _build_and_run_instance
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] block_device_info=block_device_info)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2571, in spawn
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] admin_pass=admin_password)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2975, in _create_image
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] fallback_from_host)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3075, in _create_and_inject_local_root
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] instance, size, fallback_from_host)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6537, in _try_fetch_image_cache
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] size=size)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 218, in cache
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] *args, **kwargs)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 565, in create_image
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] self.verify_base_size(base, size)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 265, in verify_base_size
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] base_size = self.get_disk_size(base)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 277, in get_disk_size
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] return disk.get_disk_size(name)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 148, in get_disk_size
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] return images.qemu_img_info(path).virtual_size
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 67, in qemu_img_info
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] raise exception.InvalidDiskInfo(reason=msg)
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] InvalidDiskInfo: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52 : Unexpected error while running command.
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Exit code: -9
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stdout: u''
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stderr: u''
2016-11-30 11:35:10.983 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]
2016-11-30 11:35:11.005 29776 WARNING nova.image.glance [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] No protocol specified in for api_server 'http://10.240.121.13:9292', please update [glance] api_servers with fully qualified url including scheme (http / https)
2016-11-30 11:35:11.016 29776 INFO nova.virt.libvirt.driver [-] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] During wait destroy, instance disappeared.
2016-11-30 11:35:11.045 29776 INFO nova.virt.libvirt.driver [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Deleting instance files /var/lib/nova/instances/1e1c2b5a-803a-4701-bc71-254699243f7d_del
2016-11-30 11:35:11.056 29776 INFO nova.virt.libvirt.driver [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Deletion of /var/lib/nova/instances/1e1c2b5a-803a-4701-bc71-254699243f7d_del complete
2016-11-30 11:35:11.194 29776 INFO nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Took 0.18 seconds to destroy the instance on the hypervisor.
2016-11-30 11:35:11.610 29776 INFO nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Took 0.42 seconds to deallocate network for instance.
2016-11-30 11:35:11.610 29776 WARNING nova.image.glance [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] No protocol specified in for api_server 'http://10.240.121.13:9292', please update [glance] api_servers with fully qualified url including scheme (http / https)
2016-11-30 11:35:11.789 29776 WARNING nova.image.glance [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] No protocol specified in for api_server 'http://10.240.121.13:9292', please update [glance] api_servers with fully qualified url including scheme (http / https)
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Build of instance 1e1c2b5a-803a-4701-bc71-254699243f7d aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52 : Unexpected error while running command.
Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52
Exit code: -9
Stdout: u''
Stderr: u''
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Traceback (most recent call last):
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1779, in _do_build_and_run_instance
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] filter_properties)
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1971, in _build_and_run_instance
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] reason=e.format_message())
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] BuildAbortException: Build of instance 1e1c2b5a-803a-4701-bc71-254699243f7d aborted: Disk info file is invalid: qemu-img failed to execute on /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52 : Unexpected error while running command.
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=2 -- env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/4cfca5366c949a8588736b75803acd375751ca52
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Exit code: -9
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stdout: u''
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Stderr: u''
2016-11-30 11:35:11.793 29776 ERROR nova.compute.manager [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d]
2016-11-30 11:35:11.881 29776 INFO nova.compute.manager [req-ee0aafda-9ab6-4e8a-b7ad-669f84a5eaf8 7004c133bffa42858f0438c3866c5224 67d87660f58d4cc08c05227b7b668b2d - - -] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] Took 0.09 seconds to deallocate network for instance.
2016-11-30 11:35:12.177 29776 INFO nova.compute.manager [-] [instance: 1e1c2b5a-803a-4701-bc71-254699243f7d] During sync_power_state the instance has a pending task (spawning). Skip.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1646181/+subscriptions
References