yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1356552] [NEW] Live migration: "Disk of instance is too large" when using a volume stored on NFS

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Cyril Roelandt <cyril.roelandt@xxxxxxxxxxxx>
Date: Wed, 13 Aug 2014 19:41:14 -0000
Reply-to: Bug 1356552 <1356552@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

When live-migrating an instance that has a Cinder volume (stored on NFS)
attached, the operation fails if the volume size is bigger than the
space left on the destination node. This should not happen, since this
volume does not have to be migrated. Here is how to reproduce the bug on
a cluster with one control node and two compute nodes, using the NFS
backend of Cinder.


$ nova boot --flavor m1.tiny --image 173241e-babb-45c7-a35f-b9b62e8ced78 test_vm
...

$ nova volume-create --display-name test_volume 100
...
| id                  | 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 |
...


$ nova volume-attach test_vm  6b9e1d03-3f53-4454-add9-a8c32d82c7e6 auto
...

$ nova show test_vm | grep OS-EXT-SRV-ATTR:host
| OS-EXT-SRV-ATTR:host                 | t1-cpunode0                                                |

$ nova service-list | grep nova-compute
| nova-compute     | t1-cpunode0 | nova     | enabled | up    | 2014-08-13T19:14:40.000000 | -               |
| nova-compute     | t1-cpunode1 | nova     | enabled | up    | 2014-08-13T19:14:41.000000 | -               |

Now, let's say I want to live-migrate test_vm to t1-cpunode1:

$ nova live-migration --block-migrate test_vm t1-cpunode1
ERROR: Migration pre-check error: Unable to migrate a0d9c991-7931-4710-8684-282b1df4cca6: Disk of instance is too large(available on destination host:46170898432 < need:108447924224) (HTTP 400) (Request-ID: req-b4f00867-df51-44be-8f97-577be385d536)


In nova/virt/libvirt/driver.py, _assert_dest_node_has_enough_disk() calls get_instance_disk_info(), which in turn, calls _get_instance_disk_info(). In this method, we see that volume devices are not taken into account when computing the amount of space needed to migrate an instance:

...
            if disk_type != 'file':
                LOG.debug('skipping %s since it looks like volume', path)
                continue

            if target in volume_devices:
                LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
                          'volume', {'path': path, 'target': target})
                continue
...

But for some reason, we never get into these conditions.

If we ssh the compute where the instance currently lies, we can get more
information about it:

$ virsh dumpxml 11
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/nova/mnt/84751739e625d0ea9609a65dd9c0a6f1/volume-6b9e1d03-3f53-4454-add9-a8c32d82c7e6'/>
      <target dev='vdb' bus='virtio'/>
      <serial>6b9e1d03-3f53-4454-add9-a8c32d82c7e6</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
...

The disk type is "file", which might explain why this volume is not
skipped in the code snippet shown above. When we use the default Cinder
backend, we get something such as:

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-path/ip-192.168.200.250:3260-iscsi-iqn.2010-10.org.openstack:volume-47ecc6a6-8af9-4011-a53f-14a71d14f50b-lun-1'/>
      <target dev='vdb' bus='virtio'/>
      <serial>47ecc6a6-8af9-4011-a53f-14a71d14f50b</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
    </disk>


I think that the code in LibvirtNFSVolumeDriver.connect_volume() might be wrong: conf.source_type should be set to something else than "file" (and some other changes might be needed), but I must admit I'm not a libvirt expert.

Any thoughts ?

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: libvirt

** Tags added: libvirt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1356552

Title:
  Live migration: "Disk of instance is too large" when using a volume
  stored on NFS

Status in OpenStack Compute (Nova):
  New

Bug description:
  When live-migrating an instance that has a Cinder volume (stored on
  NFS) attached, the operation fails if the volume size is bigger than
  the space left on the destination node. This should not happen, since
  this volume does not have to be migrated. Here is how to reproduce the
  bug on a cluster with one control node and two compute nodes, using
  the NFS backend of Cinder.

  
  $ nova boot --flavor m1.tiny --image 173241e-babb-45c7-a35f-b9b62e8ced78 test_vm
  ...

  $ nova volume-create --display-name test_volume 100
  ...
  | id                  | 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 |
  ...

  
  $ nova volume-attach test_vm  6b9e1d03-3f53-4454-add9-a8c32d82c7e6 auto
  ...

  $ nova show test_vm | grep OS-EXT-SRV-ATTR:host
  | OS-EXT-SRV-ATTR:host                 | t1-cpunode0                                                |

  $ nova service-list | grep nova-compute
  | nova-compute     | t1-cpunode0 | nova     | enabled | up    | 2014-08-13T19:14:40.000000 | -               |
  | nova-compute     | t1-cpunode1 | nova     | enabled | up    | 2014-08-13T19:14:41.000000 | -               |

  Now, let's say I want to live-migrate test_vm to t1-cpunode1:

  $ nova live-migration --block-migrate test_vm t1-cpunode1
  ERROR: Migration pre-check error: Unable to migrate a0d9c991-7931-4710-8684-282b1df4cca6: Disk of instance is too large(available on destination host:46170898432 < need:108447924224) (HTTP 400) (Request-ID: req-b4f00867-df51-44be-8f97-577be385d536)

  
  In nova/virt/libvirt/driver.py, _assert_dest_node_has_enough_disk() calls get_instance_disk_info(), which in turn, calls _get_instance_disk_info(). In this method, we see that volume devices are not taken into account when computing the amount of space needed to migrate an instance:

  ...
              if disk_type != 'file':
                  LOG.debug('skipping %s since it looks like volume', path)
                  continue

              if target in volume_devices:
                  LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
                            'volume', {'path': path, 'target': target})
                  continue
  ...

  But for some reason, we never get into these conditions.

  If we ssh the compute where the instance currently lies, we can get
  more information about it:

  $ virsh dumpxml 11
  ...
      <disk type='file' device='disk'>
        <driver name='qemu' type='raw' cache='none'/>
        <source file='/var/lib/nova/mnt/84751739e625d0ea9609a65dd9c0a6f1/volume-6b9e1d03-3f53-4454-add9-a8c32d82c7e6'/>
        <target dev='vdb' bus='virtio'/>
        <serial>6b9e1d03-3f53-4454-add9-a8c32d82c7e6</serial>
        <alias name='virtio-disk1'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
      </disk>
  ...

  The disk type is "file", which might explain why this volume is not
  skipped in the code snippet shown above. When we use the default
  Cinder backend, we get something such as:

      <disk type='block' device='disk'>
        <driver name='qemu' type='raw' cache='none'/>
        <source dev='/dev/disk/by-path/ip-192.168.200.250:3260-iscsi-iqn.2010-10.org.openstack:volume-47ecc6a6-8af9-4011-a53f-14a71d14f50b-lun-1'/>
        <target dev='vdb' bus='virtio'/>
        <serial>47ecc6a6-8af9-4011-a53f-14a71d14f50b</serial>
        <alias name='virtio-disk1'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
  function='0x0'/>
      </disk>

  
  I think that the code in LibvirtNFSVolumeDriver.connect_volume() might be wrong: conf.source_type should be set to something else than "file" (and some other changes might be needed), but I must admit I'm not a libvirt expert.

  Any thoughts ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1356552/+subscriptions

Follow ups

[Bug 1356552] Re: Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Alan Pevec, 2015-06-19
[Bug 1356552] Re: Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Alan Pevec, 2015-06-17
[Bug 1356552] Re: Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Adam Gandelman, 2015-04-10
[Bug 1356552] Re: Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Adam Gandelman, 2015-04-09
[Bug 1356552] Re: Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Thierry Carrez, 2014-12-18
[Bug 1356552] [NEW] Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Cyril Roelandt, 2014-08-13

References

[Bug 1356552] [NEW] Live migration: "Disk of instance is too large" when using a volume stored on NFS
From: Cyril Roelandt, 2014-08-13