yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #19067
[Bug 1356552] [NEW] Live migration: "Disk of instance is too large" when using a volume stored on NFS
Public bug reported:
When live-migrating an instance that has a Cinder volume (stored on NFS)
attached, the operation fails if the volume size is bigger than the
space left on the destination node. This should not happen, since this
volume does not have to be migrated. Here is how to reproduce the bug on
a cluster with one control node and two compute nodes, using the NFS
backend of Cinder.
$ nova boot --flavor m1.tiny --image 173241e-babb-45c7-a35f-b9b62e8ced78 test_vm
...
$ nova volume-create --display-name test_volume 100
...
| id | 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 |
...
$ nova volume-attach test_vm 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 auto
...
$ nova show test_vm | grep OS-EXT-SRV-ATTR:host
| OS-EXT-SRV-ATTR:host | t1-cpunode0 |
$ nova service-list | grep nova-compute
| nova-compute | t1-cpunode0 | nova | enabled | up | 2014-08-13T19:14:40.000000 | - |
| nova-compute | t1-cpunode1 | nova | enabled | up | 2014-08-13T19:14:41.000000 | - |
Now, let's say I want to live-migrate test_vm to t1-cpunode1:
$ nova live-migration --block-migrate test_vm t1-cpunode1
ERROR: Migration pre-check error: Unable to migrate a0d9c991-7931-4710-8684-282b1df4cca6: Disk of instance is too large(available on destination host:46170898432 < need:108447924224) (HTTP 400) (Request-ID: req-b4f00867-df51-44be-8f97-577be385d536)
In nova/virt/libvirt/driver.py, _assert_dest_node_has_enough_disk() calls get_instance_disk_info(), which in turn, calls _get_instance_disk_info(). In this method, we see that volume devices are not taken into account when computing the amount of space needed to migrate an instance:
...
if disk_type != 'file':
LOG.debug('skipping %s since it looks like volume', path)
continue
if target in volume_devices:
LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
'volume', {'path': path, 'target': target})
continue
...
But for some reason, we never get into these conditions.
If we ssh the compute where the instance currently lies, we can get more
information about it:
$ virsh dumpxml 11
...
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/nova/mnt/84751739e625d0ea9609a65dd9c0a6f1/volume-6b9e1d03-3f53-4454-add9-a8c32d82c7e6'/>
<target dev='vdb' bus='virtio'/>
<serial>6b9e1d03-3f53-4454-add9-a8c32d82c7e6</serial>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
...
The disk type is "file", which might explain why this volume is not
skipped in the code snippet shown above. When we use the default Cinder
backend, we get something such as:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-path/ip-192.168.200.250:3260-iscsi-iqn.2010-10.org.openstack:volume-47ecc6a6-8af9-4011-a53f-14a71d14f50b-lun-1'/>
<target dev='vdb' bus='virtio'/>
<serial>47ecc6a6-8af9-4011-a53f-14a71d14f50b</serial>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
</disk>
I think that the code in LibvirtNFSVolumeDriver.connect_volume() might be wrong: conf.source_type should be set to something else than "file" (and some other changes might be needed), but I must admit I'm not a libvirt expert.
Any thoughts ?
** Affects: nova
Importance: Undecided
Status: New
** Tags: libvirt
** Tags added: libvirt
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1356552
Title:
Live migration: "Disk of instance is too large" when using a volume
stored on NFS
Status in OpenStack Compute (Nova):
New
Bug description:
When live-migrating an instance that has a Cinder volume (stored on
NFS) attached, the operation fails if the volume size is bigger than
the space left on the destination node. This should not happen, since
this volume does not have to be migrated. Here is how to reproduce the
bug on a cluster with one control node and two compute nodes, using
the NFS backend of Cinder.
$ nova boot --flavor m1.tiny --image 173241e-babb-45c7-a35f-b9b62e8ced78 test_vm
...
$ nova volume-create --display-name test_volume 100
...
| id | 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 |
...
$ nova volume-attach test_vm 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 auto
...
$ nova show test_vm | grep OS-EXT-SRV-ATTR:host
| OS-EXT-SRV-ATTR:host | t1-cpunode0 |
$ nova service-list | grep nova-compute
| nova-compute | t1-cpunode0 | nova | enabled | up | 2014-08-13T19:14:40.000000 | - |
| nova-compute | t1-cpunode1 | nova | enabled | up | 2014-08-13T19:14:41.000000 | - |
Now, let's say I want to live-migrate test_vm to t1-cpunode1:
$ nova live-migration --block-migrate test_vm t1-cpunode1
ERROR: Migration pre-check error: Unable to migrate a0d9c991-7931-4710-8684-282b1df4cca6: Disk of instance is too large(available on destination host:46170898432 < need:108447924224) (HTTP 400) (Request-ID: req-b4f00867-df51-44be-8f97-577be385d536)
In nova/virt/libvirt/driver.py, _assert_dest_node_has_enough_disk() calls get_instance_disk_info(), which in turn, calls _get_instance_disk_info(). In this method, we see that volume devices are not taken into account when computing the amount of space needed to migrate an instance:
...
if disk_type != 'file':
LOG.debug('skipping %s since it looks like volume', path)
continue
if target in volume_devices:
LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
'volume', {'path': path, 'target': target})
continue
...
But for some reason, we never get into these conditions.
If we ssh the compute where the instance currently lies, we can get
more information about it:
$ virsh dumpxml 11
...
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/var/lib/nova/mnt/84751739e625d0ea9609a65dd9c0a6f1/volume-6b9e1d03-3f53-4454-add9-a8c32d82c7e6'/>
<target dev='vdb' bus='virtio'/>
<serial>6b9e1d03-3f53-4454-add9-a8c32d82c7e6</serial>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
...
The disk type is "file", which might explain why this volume is not
skipped in the code snippet shown above. When we use the default
Cinder backend, we get something such as:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/disk/by-path/ip-192.168.200.250:3260-iscsi-iqn.2010-10.org.openstack:volume-47ecc6a6-8af9-4011-a53f-14a71d14f50b-lun-1'/>
<target dev='vdb' bus='virtio'/>
<serial>47ecc6a6-8af9-4011-a53f-14a71d14f50b</serial>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
</disk>
I think that the code in LibvirtNFSVolumeDriver.connect_volume() might be wrong: conf.source_type should be set to something else than "file" (and some other changes might be needed), but I must admit I'm not a libvirt expert.
Any thoughts ?
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1356552/+subscriptions
Follow ups
References