yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #57000
[Bug 1627134] [NEW] libvirt driver stuck deleting online snapshot
Public bug reported:
There is a problem in nova code in nova/virt/libvirt/driver.py:
dev = guest.get_block_device(rebase_disk)
if guest.is_active():
result = dev.rebase(rebase_base, relative=relative)
if result == 0:
LOG.debug('blockRebase started successfully',
instance=instance)
while dev.wait_for_job(abort_on_error=True):
LOG.debug('waiting for blockRebase job completion',
instance=instance)
time.sleep(0.5)
It expects that libvirt block job stays for some period in 'cur == end' state, with end != 0 (wait_for_job logic). But in fact, at least for libvirt 1.3.3.2 and libvirt-python-1.2.17, we are not guaranteed to catch a job in such a state, before it dissapears and libvirt call returns empty result. Which is represented in get_job_info() by BlockDeviceJobInfo(job=0,bandwidth=0,cur=0,end=0).
Such result doesn't match wait_for_job finish criteria (effective since I45ac06eae0b1949f746dae305469718649bfcf23 is merged).
This bug started to occur in our third-party CI:
http://openstack-3rd-party-storage-ci-logs.virtuozzo.com/28/314928/13/check/dsvm-tempest-kvm/5aae7aa
n-cpu.log:
2016-08-17 15:47:04.856 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'3e64cef0-03e3-407e-b6c5-fac873a7c98a'} _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2054
2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] found device at vda _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2098
2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] disk: vda, base: None, bw: 0, relative: False _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2171
2016-08-17 15:47:04.868 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] blockRebase started successfully _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2178
2016-08-17 15:47:04.889 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:05.396 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:05.951 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:06.456 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:06.968 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:07.594 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
BTW, I didn't found tests checking this in gate.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1627134
Title:
libvirt driver stuck deleting online snapshot
Status in OpenStack Compute (nova):
New
Bug description:
There is a problem in nova code in nova/virt/libvirt/driver.py:
dev = guest.get_block_device(rebase_disk)
if guest.is_active():
result = dev.rebase(rebase_base, relative=relative)
if result == 0:
LOG.debug('blockRebase started successfully',
instance=instance)
while dev.wait_for_job(abort_on_error=True):
LOG.debug('waiting for blockRebase job completion',
instance=instance)
time.sleep(0.5)
It expects that libvirt block job stays for some period in 'cur == end' state, with end != 0 (wait_for_job logic). But in fact, at least for libvirt 1.3.3.2 and libvirt-python-1.2.17, we are not guaranteed to catch a job in such a state, before it dissapears and libvirt call returns empty result. Which is represented in get_job_info() by BlockDeviceJobInfo(job=0,bandwidth=0,cur=0,end=0).
Such result doesn't match wait_for_job finish criteria (effective since I45ac06eae0b1949f746dae305469718649bfcf23 is merged).
This bug started to occur in our third-party CI:
http://openstack-3rd-party-storage-ci-logs.virtuozzo.com/28/314928/13/check/dsvm-tempest-kvm/5aae7aa
n-cpu.log:
2016-08-17 15:47:04.856 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] volume_snapshot_delete: delete_info: {u'type': u'qcow2', u'merge_target_file': None, u'file_to_merge': None, u'volume_id': u'3e64cef0-03e3-407e-b6c5-fac873a7c98a'} _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2054
2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] found device at vda _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2098
2016-08-17 15:47:04.864 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] disk: vda, base: None, bw: 0, relative: False _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2171
2016-08-17 15:47:04.868 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] blockRebase started successfully _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2178
2016-08-17 15:47:04.889 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:05.396 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:05.951 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:06.456 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:06.968 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
2016-08-17 15:47:07.594 42835 DEBUG nova.virt.libvirt.driver [req-81ae5279-0750-4745-839f-6d92f9ab3dc9 nova service] [instance: 018e566a-916b-4b76-9971-b4d4c12ea0b6] waiting for blockRebase job completion _volume_snapshot_delete /opt/stack/new/nova/nova/virt/libvirt/driver.py:2182
BTW, I didn't found tests checking this in gate.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1627134/+subscriptions
Follow ups