yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #69157
[Bug 1510234] Re: Heartbeats stop when time is changed
Reviewed: https://review.openstack.org/503939
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=2c4574bfdcf9bd08efcb3c82becb26787a635338
Submitter: Zuul
Branch: master
commit 2c4574bfdcf9bd08efcb3c82becb26787a635338
Author: Dinesh Bhor <dinesh.bhor@xxxxxxxxxxx>
Date: Wed Sep 13 16:12:33 2017 +0530
Make eventlet hub use a monotonic clock
If system time is adjusted first forward and then backward while a
masakari-engine service is running, then the periodic tasks stops
for the duration of time the system clock was adjusted backward.
This was supposed to be fixed by the following patch to oslo.service
https://review.openstack.org/#/c/286838/ , but the order of imports
in unit tests and production code is different, so masakari services
end up starting with the default eventlet hub, that does not use a
monotonic clock and, thus, is affected by changes of system time.
Testing the change done in the patch is problematic, as it's a
subject of imports order and is not reproduced in functional or
unit tests (oslo_service is always imported earlier than eventlet
hub is initialized, so it just does "the right thing").
The alternative is to make an assertion when services start.
Co-Authored-By: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
Closes-Bug: #1510234
Change-Id: I9d917b3151d9cdf7340a173b5baf98def63c76cd
** Changed in: masakari
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1510234
Title:
Heartbeats stop when time is changed
Status in masakari:
Fix Released
Status in OpenStack Compute (nova):
Fix Released
Status in oslo.service:
Fix Released
Bug description:
Heartbeats stop working when you mess with the system time. If a
monotonic clock were used, they would continue to work when the system
time was changed.
Steps to reproduce:
1. List the nova services ('nova-manage service list'). Note that the
'State' for each services is a happy face ':-)'.
2. Move the time ahead (for example 2 hours in the future), and then
list the nova services again. Note that heartbeats continue to work
and use the future time (see 'Updated_At').
3. Revert back to the actual time, and list the nova services again.
Note that all heartbeats stop, and have a 'State' of 'XXX'.
4. The heartbeats will start again in 2 hours when the actual time
catches up to the future time, or if you restart the services.
5. You'll see a log message like the following when the heartbeats
stop:
2015-10-26 17:14:10.538 DEBUG nova.servicegroup.drivers.db [req-
c41a2ad7-e5a5-4914-bdc8-6c1ca8b224c6 None None] Seems service is down.
Last heartbeat was 2015-10-26 17:20:20. Elapsed time is -369.461679
from (pid=13994) is_up
/opt/stack/nova/nova/servicegroup/drivers/db.py:80
Here's example output demonstrating the issue:
http://paste.openstack.org/show/477404/
See bug #1450438 for more context:
https://bugs.launchpad.net/oslo.service/+bug/1450438
Long story short: looping call is using the built-in time rather than
a monotonic clock for sleeps.
https://github.com/openstack/oslo.service/blob/3d79348dae4d36bcaf4e525153abf74ad4bd182a/oslo_service/loopingcall.py#L122
Oslo Service: version 0.11
Nova: master (commit 2c3f9c339cae24576fefb66a91995d6612bb4ab2)
To manage notifications about this bug go to:
https://bugs.launchpad.net/masakari/+bug/1510234/+subscriptions
References