← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1518430] Re: liberty: ~busy loop on epoll_wait being called with zero timeout

 

Reviewed:  https://review.openstack.org/386656
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=a6c193f3eba62cdcbfe04d0fa93e95352bcfb1c3
Submitter: Jenkins
Branch:    master

commit a6c193f3eba62cdcbfe04d0fa93e95352bcfb1c3
Author: John Eckersberg <jeckersb@xxxxxxxxxx>
Date:   Fri Oct 14 11:02:47 2016 -0400

    rabbit: Avoid busy loop on epoll_wait with heartbeat+eventlet
    
    Calling threading.Event.wait() when using eventlet results in a busy
    loop calling epoll_wait, because the Python 2.x
    threading.Condition.wait() implementation busy-waits by calling
    sleep() with very small values (0.0005..0.05s).  Because sleep() is
    monkey-patched by eventlet, this results in many very short timers
    being added to the eventlet hub, and forces eventlet to constantly
    epoll_wait looking for new data unecessarily.
    
    This utilizes a new Event from eventletutils which conditionalizes the
    event primitive depending on whether or not eventlet is being used.
    If it is, eventlet.event.Event is used instead of threading.Event.
    The eventlet.event.Event implementation does not suffer from the same
    busy-wait sleep problem.  If eventlet is not used, the previous
    behavior is retained.
    
    Change-Id: I5c211092d282e724d1c87ce4d06b6c44b592e764
    Depends-On: Id33c9f8c17102ba1fe24c12b053c336b6d265501
    Closes-bug: #1518430


** Changed in: oslo.messaging
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1518430

Title:
  liberty: ~busy loop on epoll_wait being called with zero timeout

Status in OpenStack Compute (nova):
  Invalid
Status in oslo.messaging:
  Fix Released
Status in nova package in Ubuntu:
  Invalid
Status in python-oslo.messaging package in Ubuntu:
  New

Bug description:
  Context: openstack juju/maas deploy using 1510 charms release
  on trusty, with:
    openstack-origin: "cloud:trusty-liberty"
    source: "cloud:trusty-updates/liberty

  * Several openstack nova- and neutron- services, at least:
  nova-compute, neutron-server, nova-conductor,
  neutron-openvswitch-agent,neutron-vpn-agent
  show almost busy looping on epoll_wait() calls, with zero timeout set
  most frequently.
  - nova-compute (chose it b/cos single proc'd) strace and ltrace captures:
    http://paste.ubuntu.com/13371248/ (ltrace, strace)

  As comparison, this is how it looks on a kilo deploy:
  - http://paste.ubuntu.com/13371635/

  * 'top' sample from a nova-cloud-controller unit from
     this completely idle stack:
    http://paste.ubuntu.com/13371809/

  FYI *not* seeing this behavior on keystone, glance, cinder,
  ceilometer-api.

  As this issue is present on several components, it likely comes
  from common libraries (oslo concurrency?), fyi filed the bug to
  nova itself as a starting point for debugging.

  Note: The description in the following bug gives a good overview of
  the issue and points to a possible fix for oslo.messaging:
  https://bugs.launchpad.net/mos/+bug/1380220

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1518430/+subscriptions