yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67309
[Bug 1510234] Re: Heartbeats stop when time is changed
Reviewed: https://review.openstack.org/434327
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a7505ee648b22f5827117bcc1192b5654c29cf9c
Submitter: Jenkins
Branch: master
commit a7505ee648b22f5827117bcc1192b5654c29cf9c
Author: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
Date: Wed Feb 15 16:47:42 2017 +0200
Make eventlet hub use a monotonic clock
If system time is adjusted first forward and then backward while a
nova service is running (e.g. nova-compute), then there is a high
probability, that periodic tasks will stop for the duration of time
the system clock was adjusted backward.
This was supposed to be fixed by the following patch to oslo.service
https://review.openstack.org/#/c/286838/ , but the order of imports
in unit tests and production code is different, so nova services
end up starting with the default eventlet hub, that does not use a
monotonic clock and, thus, is affected by changes of system time.
Testing this is problematic, as it's a subject of imports order and
is not reproduced in functional or unit tests (oslo_service is always
imported earlier than eventlet hub is initialized, so it just does
"the right thing"). The alternative is to make an assertion when
services start.
Closes-Bug: #1510234
Change-Id: I110cf31ad2a0c74a0cf30ec08bd94d3a56727b39
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1510234
Title:
Heartbeats stop when time is changed
Status in OpenStack Compute (nova):
Fix Released
Status in oslo.service:
Fix Released
Bug description:
Heartbeats stop working when you mess with the system time. If a
monotonic clock were used, they would continue to work when the system
time was changed.
Steps to reproduce:
1. List the nova services ('nova-manage service list'). Note that the
'State' for each services is a happy face ':-)'.
2. Move the time ahead (for example 2 hours in the future), and then
list the nova services again. Note that heartbeats continue to work
and use the future time (see 'Updated_At').
3. Revert back to the actual time, and list the nova services again.
Note that all heartbeats stop, and have a 'State' of 'XXX'.
4. The heartbeats will start again in 2 hours when the actual time
catches up to the future time, or if you restart the services.
5. You'll see a log message like the following when the heartbeats
stop:
2015-10-26 17:14:10.538 DEBUG nova.servicegroup.drivers.db [req-
c41a2ad7-e5a5-4914-bdc8-6c1ca8b224c6 None None] Seems service is down.
Last heartbeat was 2015-10-26 17:20:20. Elapsed time is -369.461679
from (pid=13994) is_up
/opt/stack/nova/nova/servicegroup/drivers/db.py:80
Here's example output demonstrating the issue:
http://paste.openstack.org/show/477404/
See bug #1450438 for more context:
https://bugs.launchpad.net/oslo.service/+bug/1450438
Long story short: looping call is using the built-in time rather than
a monotonic clock for sleeps.
https://github.com/openstack/oslo.service/blob/3d79348dae4d36bcaf4e525153abf74ad4bd182a/oslo_service/loopingcall.py#L122
Oslo Service: version 0.11
Nova: master (commit 2c3f9c339cae24576fefb66a91995d6612bb4ab2)
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1510234/+subscriptions
References