yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79688
[Bug 1840912] [NEW] libvirt calls aren't reliably using tpool.Proxy
Public bug reported:
A customer is hitting an issue with symptoms identical to bug 1045152
(from 2012). Specifically, we are frequently seeing the compute host
being marked down. From log correlation, we can see that when this
occurs the relevant compute is always in the middle of executing
LibvirtDriver._get_disk_over_committed_size_total(). The reason for this
appears to be a long-running libvirt call which is not using
tpool.Proxy, and therefore blocks all other greenthreads during
execution. We do not yet know why the libvirt call is slow, but we have
identified the reason it is not using tpool.Proxy.
Because eventlet, we proxy libvirt calls at the point we create the
libvirt connection in libvirt.Host._connect:
return tpool.proxy_call(
(libvirt.virDomain, libvirt.virConnect),
libvirt.openAuth, uri, auth, flags)
This means: run libvirt.openAuth(uri, auth, flags) in a native thread.
If the returned object is a libvirt.virDomain or libvirt.virConnect,
wrap the returned object in a tpool.Proxy with the same autowrap rules.
There are 2 problems with this. Firstly, the autowrap list is
incomplete. At the very least we need to add libvirt.virNodeDevice,
libvirt.virSecret, and libvirt.NWFilter to this list as we use all of
these objects in Nova. Currently none of our interactions with these
objects are using the tpool proxy.
Secondly, and the specific root cause of this bug, it doesn't understand
lists:
https://github.com/eventlet/eventlet/blob/ca8dd0748a1985a409e9a9a517690f46e05cae99/eventlet/tpool.py#L149
In LibvirtDriver._get_disk_over_committed_size_total() we get a list of
running libvirt domains with libvirt.Host.list_instance_domains, which
calls virConnect.listAllDomains(). listAllDomains() returns a *list* of
virDomain, which the above code in tpool doesn't match. Consequently,
none of the subsequent virDomain calls use the tpool proxy, which
starves all other greenthreads.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1840912
Title:
libvirt calls aren't reliably using tpool.Proxy
Status in OpenStack Compute (nova):
New
Bug description:
A customer is hitting an issue with symptoms identical to bug 1045152
(from 2012). Specifically, we are frequently seeing the compute host
being marked down. From log correlation, we can see that when this
occurs the relevant compute is always in the middle of executing
LibvirtDriver._get_disk_over_committed_size_total(). The reason for
this appears to be a long-running libvirt call which is not using
tpool.Proxy, and therefore blocks all other greenthreads during
execution. We do not yet know why the libvirt call is slow, but we
have identified the reason it is not using tpool.Proxy.
Because eventlet, we proxy libvirt calls at the point we create the
libvirt connection in libvirt.Host._connect:
return tpool.proxy_call(
(libvirt.virDomain, libvirt.virConnect),
libvirt.openAuth, uri, auth, flags)
This means: run libvirt.openAuth(uri, auth, flags) in a native thread.
If the returned object is a libvirt.virDomain or libvirt.virConnect,
wrap the returned object in a tpool.Proxy with the same autowrap
rules.
There are 2 problems with this. Firstly, the autowrap list is
incomplete. At the very least we need to add libvirt.virNodeDevice,
libvirt.virSecret, and libvirt.NWFilter to this list as we use all of
these objects in Nova. Currently none of our interactions with these
objects are using the tpool proxy.
Secondly, and the specific root cause of this bug, it doesn't
understand lists:
https://github.com/eventlet/eventlet/blob/ca8dd0748a1985a409e9a9a517690f46e05cae99/eventlet/tpool.py#L149
In LibvirtDriver._get_disk_over_committed_size_total() we get a list
of running libvirt domains with libvirt.Host.list_instance_domains,
which calls virConnect.listAllDomains(). listAllDomains() returns a
*list* of virDomain, which the above code in tpool doesn't match.
Consequently, none of the subsequent virDomain calls use the tpool
proxy, which starves all other greenthreads.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1840912/+subscriptions