← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1840912] Re: libvirt calls aren't reliably using tpool.Proxy

 

Marking this as Fix Released since it went out in the Train release
(20.0.0) but since the commit message had a typo the LP bug status was
not automatically updated.

** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1840912

Title:
  libvirt calls aren't reliably using tpool.Proxy

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  A customer is hitting an issue with symptoms identical to bug 1045152
  (from 2012). Specifically, we are frequently seeing the compute host
  being marked down. From log correlation, we can see that when this
  occurs the relevant compute is always in the middle of executing
  LibvirtDriver._get_disk_over_committed_size_total(). The reason for
  this appears to be a long-running libvirt call which is not using
  tpool.Proxy, and therefore blocks all other greenthreads during
  execution. We do not yet know why the libvirt call is slow, but we
  have identified the reason it is not using tpool.Proxy.

  Because eventlet, we proxy libvirt calls at the point we create the
  libvirt connection in libvirt.Host._connect:

          return tpool.proxy_call(
              (libvirt.virDomain, libvirt.virConnect),
              libvirt.openAuth, uri, auth, flags)

  This means: run libvirt.openAuth(uri, auth, flags) in a native thread.
  If the returned object is a libvirt.virDomain or libvirt.virConnect,
  wrap the returned object in a tpool.Proxy with the same autowrap
  rules.

  There are 2 problems with this. Firstly, the autowrap list is
  incomplete. At the very least we need to add libvirt.virNodeDevice,
  libvirt.virSecret, and libvirt.NWFilter to this list as we use all of
  these objects in Nova. Currently none of our interactions with these
  objects are using the tpool proxy.

  Secondly, and the specific root cause of this bug, it doesn't
  understand lists:

  https://github.com/eventlet/eventlet/blob/ca8dd0748a1985a409e9a9a517690f46e05cae99/eventlet/tpool.py#L149

  In LibvirtDriver._get_disk_over_committed_size_total() we get a list
  of running libvirt domains with libvirt.Host.list_instance_domains,
  which calls virConnect.listAllDomains(). listAllDomains() returns a
  *list* of virDomain, which the above code in tpool doesn't match.
  Consequently, none of the subsequent virDomain calls use the tpool
  proxy, which starves all other greenthreads.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1840912/+subscriptions



References