← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1510234] Re: Heartbeats stop when time is changed

 

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Changed in: nova
     Assignee: Stephen Finucane (stephenfinucane) => Roman Podoliaka (rpodolyaka)

** Changed in: nova/pike
       Status: New => In Progress

** Changed in: nova/pike
     Assignee: (unassigned) => John Smith (wang-zengzhi)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1510234

Title:
  Heartbeats stop when time is changed

Status in masakari:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in oslo.service:
  Fix Released

Bug description:
  Heartbeats stop working when you mess with the system time. If a
  monotonic clock were used, they would continue to work when the system
  time was changed.

  Steps to reproduce:

  1. List the nova services ('nova-manage service list'). Note that the
  'State' for each services is a happy face ':-)'.

  2. Move the time ahead (for example 2 hours in the future), and then
  list the nova services again. Note that heartbeats continue to work
  and use the future time (see 'Updated_At').

  3. Revert back to the actual time, and list the nova services again.
  Note that all heartbeats stop, and have a 'State' of 'XXX'.

  4. The heartbeats will start again in 2 hours when the actual time
  catches up to the future time, or if you restart the services.

  5. You'll see a log message like the following when the heartbeats
  stop:

  2015-10-26 17:14:10.538 DEBUG nova.servicegroup.drivers.db [req-
  c41a2ad7-e5a5-4914-bdc8-6c1ca8b224c6 None None] Seems service is down.
  Last heartbeat was 2015-10-26 17:20:20. Elapsed time is -369.461679
  from (pid=13994) is_up
  /opt/stack/nova/nova/servicegroup/drivers/db.py:80

  Here's example output demonstrating the issue:

      http://paste.openstack.org/show/477404/

  See bug #1450438 for more context:

      https://bugs.launchpad.net/oslo.service/+bug/1450438

  Long story short: looping call is using the built-in time rather than
  a  monotonic clock for sleeps.

  https://github.com/openstack/oslo.service/blob/3d79348dae4d36bcaf4e525153abf74ad4bd182a/oslo_service/loopingcall.py#L122

  Oslo Service: version 0.11
  Nova: master (commit 2c3f9c339cae24576fefb66a91995d6612bb4ab2)

To manage notifications about this bug go to:
https://bugs.launchpad.net/masakari/+bug/1510234/+subscriptions


References