yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85825
[Bug 1923206] Re: libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device $device is already in the process of unplug
Reviewed: https://review.opendev.org/c/openstack/nova/+/785682
Committed: https://opendev.org/openstack/nova/commit/0a7d3794c6dc39976b4cbfe12b1688230ac895a8
Submitter: "Zuul (22348)"
Branch: master
commit 0a7d3794c6dc39976b4cbfe12b1688230ac895a8
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date: Fri Apr 9 15:37:23 2021 +0100
libvirt: Ignore device already in the process of unplug errors
At present QEMU will raise an error to libvirt when a device_del request
is made for a device that has already partially detached through a
previous request. This is outlined in more detail in the following
downstream Red Hat QEMU bug report:
Get libvirtError "Device XX is already in the process of unplug" [..]
https://bugzilla.redhat.com/show_bug.cgi?id=1878659
Within Nova we can actually ignore this error and allow our existing
retry logic to attempt again after a short wait, hopefully allowing the
original request to complete removing the device from the domain.
This change does this and should result in one of the following
device_del requests raising a VIR_ERR_DEVICE_MISSING error from libvirt.
_try_detach_device should then translate that libvirt error into a
DeviceNotFound exception which is itself then ignored by all
detach_device_with_retry callers and taken to mean that the device has
detached successfully.
Closes-Bug: #1923206
Change-Id: I0e068043d8267ab91535413d950a3e154c2234f7
** Changed in: nova
Status: In Progress => Fix Released
** Bug watch added: Red Hat Bugzilla #1878659
https://bugzilla.redhat.com/show_bug.cgi?id=1878659
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1923206
Title:
libvirt.libvirtError: internal error: unable to execute QEMU command
'device_del': Device $device is already in the process of unplug
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
This was initially reported downstream against QEMU in the following bug:
Get libvirtError "Device XX is already in the process of unplug" when detach device in OSP env
https://bugzilla.redhat.com/show_bug.cgi?id=1878659
I first saw the error crop up while testing q35 in TripleO in the
following job:
https://c6b36562677324bf8249-804f3f4695b3063292bbb3235f424ae0.ssl.cf1.rackcdn.com/785027/5/check
/tripleo-ci-
centos-8-standalone/6860050/logs/undercloud/var/log/containers/nova
/nova-compute.log
2021-04-09 11:09:53.702 8 DEBUG nova.virt.libvirt.guest [req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 d4d933c7b10c462c8141820b0e70822b - default default] Attempting initial detach for device vdb detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:455
[..]
2021-04-09 11:09:58.721 8 DEBUG nova.virt.libvirt.guest [req-4d0b64d5-a2cf-4a6e-a2f7-f6cc7ced4df1 7e2b737ed8f04b3ca819841a41be66c1 d4d933c7b10c462c8141820b0e70822b - default default] Start retrying detach until device vdb is gone. detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:471
[..]
2021-04-09 11:09:58.729 8 ERROR oslo.service.loopingcall libvirt.libvirtError: internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug
Steps to reproduce
==================
Unclear at present, it looks like a genuine QEMU bug that causes it to fail when a repeat request to device_del a device comes in instead of ignore the request as would previously happen. I've asked for clarification in the downstream QEMU bug.
Expected result
===============
Repeat calls to device_del are ignored or the failure while raised is ignored by Nova.
Actual result
=============
Repeat calls to device_del lead to an error being raised to Nova via libvirt that causes the detach to fail while it still succeeds asynchronously within QEMU.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
libvirt + QEMU/KVM
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
See above.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1923206/+subscriptions
References