yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #01394
[Bug 1101839] Re: Don't use the local compute time when syncing
Having accurate timing in distributed systems is actually really
important, and time skew can cause issues (eg, scheduler thinking a
compute node is dead). Even if I might be tempted to blame the deployer
for not properly managing ntpd, the problem is preventable by not
relying on each compute host's local clock where possible.
** Changed in: nova
Importance: Wishlist => High
** Changed in: nova
Status: Opinion => Triaged
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1101839
Title:
Don't use the local compute time when syncing
Status in OpenStack Compute (Nova):
Triaged
Bug description:
Right now there is a strong tendency to rely on NTP for determining if
services are up or down, especially compute nodes. This has been
problematic since it is very fragile in its implementation (aka when
NTP gets slightly out of sync on any compute node then that compute
node will no longer be useable). It seems simpler to let the database
decide what is "time" using its own internal functions like NOW() and
such and not worry about time being in sync on the other nodes...
Examples of this:
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L502
(note the time is from the caller, not from the db)... and
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L276
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1101839/+subscriptions