← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1510234] Re: Heartbeats stop when time is changed

 

Reviewed:  https://review.openstack.org/503939
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=2c4574bfdcf9bd08efcb3c82becb26787a635338
Submitter: Zuul
Branch:    master

commit 2c4574bfdcf9bd08efcb3c82becb26787a635338
Author: Dinesh Bhor <dinesh.bhor@xxxxxxxxxxx>
Date:   Wed Sep 13 16:12:33 2017 +0530

    Make eventlet hub use a monotonic clock
    
    If system time is adjusted first forward and then backward while a
    masakari-engine service is running, then the periodic tasks stops
    for the duration of time the system clock was adjusted backward.
    
    This was supposed to be fixed by the following patch to oslo.service
    https://review.openstack.org/#/c/286838/ , but the order of imports
    in unit tests and production code is different, so masakari services
    end up starting with the default eventlet hub, that does not use a
    monotonic clock and, thus, is affected by changes of system time.
    
    Testing the change done in the patch is problematic, as it's a
    subject of imports order and is not reproduced in functional or
    unit tests (oslo_service is always imported earlier than eventlet
    hub is initialized, so it just does "the right thing").
    The alternative is to make an assertion when services start.
    
    Co-Authored-By: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
    Closes-Bug: #1510234
    Change-Id: I9d917b3151d9cdf7340a173b5baf98def63c76cd


** Changed in: masakari
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1510234

Title:
  Heartbeats stop when time is changed

Status in masakari:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in oslo.service:
  Fix Released

Bug description:
  Heartbeats stop working when you mess with the system time. If a
  monotonic clock were used, they would continue to work when the system
  time was changed.

  Steps to reproduce:

  1. List the nova services ('nova-manage service list'). Note that the
  'State' for each services is a happy face ':-)'.

  2. Move the time ahead (for example 2 hours in the future), and then
  list the nova services again. Note that heartbeats continue to work
  and use the future time (see 'Updated_At').

  3. Revert back to the actual time, and list the nova services again.
  Note that all heartbeats stop, and have a 'State' of 'XXX'.

  4. The heartbeats will start again in 2 hours when the actual time
  catches up to the future time, or if you restart the services.

  5. You'll see a log message like the following when the heartbeats
  stop:

  2015-10-26 17:14:10.538 DEBUG nova.servicegroup.drivers.db [req-
  c41a2ad7-e5a5-4914-bdc8-6c1ca8b224c6 None None] Seems service is down.
  Last heartbeat was 2015-10-26 17:20:20. Elapsed time is -369.461679
  from (pid=13994) is_up
  /opt/stack/nova/nova/servicegroup/drivers/db.py:80

  Here's example output demonstrating the issue:

      http://paste.openstack.org/show/477404/

  See bug #1450438 for more context:

      https://bugs.launchpad.net/oslo.service/+bug/1450438

  Long story short: looping call is using the built-in time rather than
  a  monotonic clock for sleeps.

  https://github.com/openstack/oslo.service/blob/3d79348dae4d36bcaf4e525153abf74ad4bd182a/oslo_service/loopingcall.py#L122

  Oslo Service: version 0.11
  Nova: master (commit 2c3f9c339cae24576fefb66a91995d6612bb4ab2)

To manage notifications about this bug go to:
https://bugs.launchpad.net/masakari/+bug/1510234/+subscriptions


References