yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1101839] Re: Don't use the local compute time when syncing

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Devananda van der Veen <1101839@xxxxxxxxxxxxxxxxxx>
Date: Tue, 19 Mar 2013 19:02:18 -0000
Reply-to: Bug 1101839 <1101839@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Having accurate timing in distributed systems is actually really
important, and time skew can cause issues (eg, scheduler thinking a
compute node is dead). Even if I might be tempted to blame the deployer
for not properly managing ntpd, the problem is preventable by not
relying on each compute host's local clock where possible.

** Changed in: nova
   Importance: Wishlist => High

** Changed in: nova
       Status: Opinion => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1101839

Title:
  Don't use the local compute time when syncing

Status in OpenStack Compute (Nova):
  Triaged

Bug description:
  Right now there is a strong tendency to rely on NTP for determining if
  services are up or down, especially compute nodes. This has been
  problematic since it is very fragile in its implementation (aka when
  NTP gets slightly out of sync on any compute node then that compute
  node will no longer be useable). It seems simpler to let the database
  decide what is "time" using its own internal functions like NOW() and
  such and not worry about time being in sync on the other nodes...

  Examples of this:

  https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L502
  (note the time is from the caller, not from the db)... and
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L276

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1101839/+subscriptions