← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1627134] Re: libvirt driver stuck deleting online snapshot

 

https://review.openstack.org/#/c/375652/

** Changed in: nova
       Status: New => In Progress

** Changed in: nova
     Assignee: (unassigned) => Evgeny Antyshev (eantyshev)

** Tags added: libvirt

** Also affects: nova/newton
   Importance: Undecided
       Status: New

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova/newton
       Status: New => Confirmed

** Changed in: nova/newton
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1627134

Title:
  libvirt driver stuck deleting online snapshot

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) newton series:
  Confirmed

Bug description:
  There is a problem in nova code in nova/virt/libvirt/driver.py:
              dev = guest.get_block_device(rebase_disk)
              if guest.is_active():
                  result = dev.rebase(rebase_base, relative=relative)
                  if result == 0:
                      LOG.debug('blockRebase started successfully',
                                instance=instance)

                  while dev.wait_for_job(abort_on_error=True):
                      LOG.debug('waiting for blockRebase job completion',
                                instance=instance)
                      time.sleep(0.5)

  It expects that libvirt block job stays for some period in 'cur == end' state, with end != 0 (wait_for_job logic). But in fact, at least for libvirt 1.3.3.2 and libvirt-python-1.2.17, we are not guaranteed to catch a job in such a state, before it dissapears and libvirt call returns empty result. Which is represented in get_job_info() by BlockDeviceJobInfo(job=0,bandwidth=0,cur=0,end=0).
  Such result doesn't match wait_for_job finish criteria (effective since I45ac06eae0b1949f746dae305469718649bfcf23 is merged).

  
  This bug started to occur in our third-party CI:
  http://openstack-3rd-party-storage-ci-logs.virtuozzo.com/28/314928/13/check/dsvm-tempest-kvm/5aae7aa

  n-cpu.log:
  2016-08-17 15:47:04.856 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'3e64cef0-03e3-407e-b6c5-fac873a7c98a'} _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2054
  2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] found device at vda _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2098
  2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] disk: vda, base: None, bw: 0, relative: False _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2171
  2016-08-17 15:47:04.868 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] blockRebase started successfully _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2178
  2016-08-17 15:47:04.889 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
  2016-08-17 15:47:05.396 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
  2016-08-17 15:47:05.951 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
  2016-08-17 15:47:06.456 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
  2016-08-17 15:47:06.968 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
  2016-08-17 15:47:07.594 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182

  BTW, I didn't found tests checking this in gate.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1627134/+subscriptions


References