← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1774249] [NEW] update_available_resource will raise DiskNotFound after resize but before confirm

 

Public bug reported:

Original reported in RH Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1584315

Tested on OSP12 (Pike), but appears to be still present on master.
Should only occur if nova compute is configured to use local file
instance storage.

Create instance A on compute X

Resize instance A to compute Y
  Domain is powered off
  /var/lib/nova/instances/<uuid A> renamed to <uuid A>_resize on X
  Domain is *not* undefined

On compute X:
  update_available_resource runs as a periodic task
  First action is to update self
  rt calls driver.get_available_resource()
  ...calls _get_disk_over_committed_size_total
  ...iterates over all defined domains, including the ones whose disks we renamed
  ...fails because a referenced disk no longer exists

Results in errors in nova-compute.log:

    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last):
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     resources = self.driver.get_available_resource(nodename)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     disk_over_committed = self._get_disk_over_committed_size_total()
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     config, block_device_info)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     dk_size = disk_api.get_allocated_disk_size(path)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     return images.qemu_img_info(path).disk_size
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     raise exception.DiskNotFound(location=path)
    2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk

And resource tracker is no longer updated. We can find lots of these in
the gate.

Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly
mitigates this, but doesn't because task_state is not set while the
instance is awaiting confirm.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1774249

Title:
  update_available_resource will raise DiskNotFound after resize but
  before confirm

Status in OpenStack Compute (nova):
  New

Bug description:
  Original reported in RH Bugzilla:
  https://bugzilla.redhat.com/show_bug.cgi?id=1584315

  Tested on OSP12 (Pike), but appears to be still present on master.
  Should only occur if nova compute is configured to use local file
  instance storage.

  Create instance A on compute X

  Resize instance A to compute Y
    Domain is powered off
    /var/lib/nova/instances/<uuid A> renamed to <uuid A>_resize on X
    Domain is *not* undefined

  On compute X:
    update_available_resource runs as a periodic task
    First action is to update self
    rt calls driver.get_available_resource()
    ...calls _get_disk_over_committed_size_total
    ...iterates over all defined domains, including the ones whose disks we renamed
    ...fails because a referenced disk no longer exists

  Results in errors in nova-compute.log:

      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager [req-bd52371f-c6ec-4a83-9584-c00c5377acd8 - - - - -] Error updating resources for node compute-0.localdomain.: DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager Traceback (most recent call last):
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6695, in update_available_resource_for_node
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 641, in update_available_resource
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     resources = self.driver.get_available_resource(nodename)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5892, in get_available_resource
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     disk_over_committed = self._get_disk_over_committed_size_total()
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7393, in _get_disk_over_committed_size_total
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     config, block_device_info)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7301, in _get_instance_disk_info_from_config
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     dk_size = disk_api.get_allocated_disk_size(path)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 156, in get_allocated_disk_size
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     return images.qemu_img_info(path).disk_size
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager     raise exception.DiskNotFound(location=path)
      2018-05-30 02:17:08.647 1 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/f3ed9015-3984-43f4-b4a5-c2898052b47d/disk

  And resource tracker is no longer updated. We can find lots of these
  in the gate.

  Note that change Icec2769bf42455853cbe686fb30fda73df791b25 nearly
  mitigates this, but doesn't because task_state is not set while the
  instance is awaiting confirm.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1774249/+subscriptions


Follow ups