touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #122811
[Bug 1125726] Re: boot-time race between /etc/network/if-up.d/ntpdate and "/etc/init.d/ntp start"
** Description changed:
- We're seeing a race between if-up.d/ntpdate and the ntp startup script.
+ [Impact]
+ * Hardware clocks are not stepped at boot, which can prevent NTP from ever
+ syncing the clock.
+ Incorrect clocks can cause serious issues in distributed systems.
- 1) if-up.d/ntpdate starts.
- 2) if-up.d/ntpdate acquires the lock "/var/lock/ntpdate-ifup".
- 3) if-up.d/ntpdate stops the ntp service [which isn't running anyway].
- 4) if-up.d/ntpdate starts running ntpdate, which bids UDP *.ntp
- 5) /etc/init.d/rc 2 executes "/etc/rc2.d/S20ntp start"
- 6) /etc/init.d/ntp acquires the lock "/var/lock/ntpdate".
- 7) /etc/init.d/ntp starts the ntp daemon.
- 8) The ntp daemon logs an error, complaining that it cannot bind UDP *.ntp.
- 9) if-up.d/ntpdate now starts the ntp service.
+ * Upstream originally added a lock file to eliminate a race between the ntp
+ service (which keeps the clock synchronized during normal operation) and
+ ntpdate (which is used to step the clock by large intervals at boot time).
+ That change had a flaw which introduced a deadlock. An Ubuntu patch was
+ applied which broke the locking mechanism entirely, reintroducing the race
+ condition.
- The result is a weird churn, though ntpd does end up running at the end.
+ * This change undoes the Ubuntu patch and fixes the deadlock by unlocking
+ before attempting to start the ntp service.
- Should these not be using the same lock file?
+ [Test Case]
+
+ * There are two bugs: The race, and the deadlock. To reproduce the race more
+ consistently:
+ - add 'sleep 30' to '/etc/network/if-up.d/ntpdate' on the line preceding
+ '/usr/sbin/ntpdate-debian -s $OPTS 2>/dev/null || :', and comment out
+ 'invoke-rc.d --quiet $service stop >/dev/null 2>&1 || true'. This will
+ reproduce the case where the ntp service starts between the stop command
+ and the ntpdate command.
+ The result will be that the ntpdate command fails. There will be a
+ message in syslog like:
+ 'ntpdate[17660]: the NTP socket is in use, exiting'
+ - Reintroducing the lock brings back the deadlock issue. Both the ntpdate
+ if-up.d script and the ntp init script check the lock file, but the
+ ntpdate script attempted to start the ntp init script before unlocking
+ the lock. Moving the unlock before the init script invocation fixes
+ the deadlock. The original deadlock behavior is described here:
+ https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/246203
+
+ [Regression Potential]
+
+ * Low. Out-of-sync clocks could be changed a large amount at boot time, but
+ only for machines with static IP's. The clock is only likely to be in this
+ state if the clock was very skewed at boot time, which is also unlikely
+ since NTP usually keeps the software clock in sync during operation and
+ the hardware clock is updated at shutdown.
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to ntp in Ubuntu.
https://bugs.launchpad.net/bugs/1125726
Title:
boot-time race between /etc/network/if-up.d/ntpdate and
"/etc/init.d/ntp start"
Status in ntp package in Ubuntu:
Fix Released
Status in ntp source package in Precise:
New
Status in ntp source package in Trusty:
New
Bug description:
[Impact]
* Hardware clocks are not stepped at boot, which can prevent NTP from ever
syncing the clock.
Incorrect clocks can cause serious issues in distributed systems.
* Upstream originally added a lock file to eliminate a race between the ntp
service (which keeps the clock synchronized during normal operation) and
ntpdate (which is used to step the clock by large intervals at boot time).
That change had a flaw which introduced a deadlock. An Ubuntu patch was
applied which broke the locking mechanism entirely, reintroducing the race
condition.
* This change undoes the Ubuntu patch and fixes the deadlock by unlocking
before attempting to start the ntp service.
[Test Case]
* There are two bugs: The race, and the deadlock. To reproduce the race more
consistently:
- add 'sleep 30' to '/etc/network/if-up.d/ntpdate' on the line preceding
'/usr/sbin/ntpdate-debian -s $OPTS 2>/dev/null || :', and comment out
'invoke-rc.d --quiet $service stop >/dev/null 2>&1 || true'. This will
reproduce the case where the ntp service starts between the stop command
and the ntpdate command.
The result will be that the ntpdate command fails. There will be a
message in syslog like:
'ntpdate[17660]: the NTP socket is in use, exiting'
- Reintroducing the lock brings back the deadlock issue. Both the ntpdate
if-up.d script and the ntp init script check the lock file, but the
ntpdate script attempted to start the ntp init script before unlocking
the lock. Moving the unlock before the init script invocation fixes
the deadlock. The original deadlock behavior is described here:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/246203
[Regression Potential]
* Low. Out-of-sync clocks could be changed a large amount at boot time, but
only for machines with static IP's. The clock is only likely to be in this
state if the clock was very skewed at boot time, which is also unlikely
since NTP usually keeps the software clock in sync during operation and
the hardware clock is updated at shutdown.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1125726/+subscriptions