← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1510234] Re: Heartbeats stop when time is changed

 

Reviewed:  https://review.openstack.org/434327
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a7505ee648b22f5827117bcc1192b5654c29cf9c
Submitter: Jenkins
Branch:    master

commit a7505ee648b22f5827117bcc1192b5654c29cf9c
Author: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
Date:   Wed Feb 15 16:47:42 2017 +0200

    Make eventlet hub use a monotonic clock
    
    If system time is adjusted first forward and then backward while a
    nova service is running (e.g. nova-compute), then there is a high
    probability, that periodic tasks will stop for the duration of time
    the system clock was adjusted backward.
    
    This was supposed to be fixed by the following patch to oslo.service
    https://review.openstack.org/#/c/286838/ , but the order of imports
    in unit tests and production code is different, so nova services
    end up starting with the default eventlet hub, that does not use a
    monotonic clock and, thus, is affected by changes of system time.
    
    Testing this is problematic, as it's a subject of imports order and
    is not reproduced in functional or unit tests (oslo_service is always
    imported earlier than eventlet hub is initialized, so it just does
    "the right thing"). The alternative is to make an assertion when
    services start.
    
    Closes-Bug: #1510234
    
    Change-Id: I110cf31ad2a0c74a0cf30ec08bd94d3a56727b39


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1510234

Title:
  Heartbeats stop when time is changed

Status in OpenStack Compute (nova):
  Fix Released
Status in oslo.service:
  Fix Released

Bug description:
  Heartbeats stop working when you mess with the system time. If a
  monotonic clock were used, they would continue to work when the system
  time was changed.

  Steps to reproduce:

  1. List the nova services ('nova-manage service list'). Note that the
  'State' for each services is a happy face ':-)'.

  2. Move the time ahead (for example 2 hours in the future), and then
  list the nova services again. Note that heartbeats continue to work
  and use the future time (see 'Updated_At').

  3. Revert back to the actual time, and list the nova services again.
  Note that all heartbeats stop, and have a 'State' of 'XXX'.

  4. The heartbeats will start again in 2 hours when the actual time
  catches up to the future time, or if you restart the services.

  5. You'll see a log message like the following when the heartbeats
  stop:

  2015-10-26 17:14:10.538 DEBUG nova.servicegroup.drivers.db [req-
  c41a2ad7-e5a5-4914-bdc8-6c1ca8b224c6 None None] Seems service is down.
  Last heartbeat was 2015-10-26 17:20:20. Elapsed time is -369.461679
  from (pid=13994) is_up
  /opt/stack/nova/nova/servicegroup/drivers/db.py:80

  Here's example output demonstrating the issue:

      http://paste.openstack.org/show/477404/

  See bug #1450438 for more context:

      https://bugs.launchpad.net/oslo.service/+bug/1450438

  Long story short: looping call is using the built-in time rather than
  a  monotonic clock for sleeps.

  https://github.com/openstack/oslo.service/blob/3d79348dae4d36bcaf4e525153abf74ad4bd182a/oslo_service/loopingcall.py#L122

  Oslo Service: version 0.11
  Nova: master (commit 2c3f9c339cae24576fefb66a91995d6612bb4ab2)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1510234/+subscriptions


References