group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 1706818] Re: mismatched file locking since 1:4.2.8p4+dfsg-3ubuntu1 causes race leaving ntp dead on reboot

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Launchpad Bug Tracker <1706818@xxxxxxxxxxxxxxxxxx>
Date: Thu, 07 Sep 2017 09:09:00 -0000
Reply-to: Bug 1706818 <1706818@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
This bug was fixed in the package ntp - 1:4.2.8p10+dfsg-5ubuntu2

---------------
ntp (1:4.2.8p10+dfsg-5ubuntu2) artful; urgency=medium

  * d/ntp-systemd-wrapper protect systemd service startup from concurrent
    ntpdate processes the same way it was protected on sysv-init (LP: #1706818)

 -- Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx>  Tue, 05 Sep
2017 15:09:08 +0200

** Changed in: ntp (Ubuntu Artful)
       Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1706818

Title:
  mismatched file locking since 1:4.2.8p4+dfsg-3ubuntu1 causes race
  leaving ntp dead on reboot

Status in ntp package in Ubuntu:
  Fix Released
Status in ntp source package in Xenial:
  Confirmed
Status in ntp source package in Zesty:
  Fix Released
Status in ntp source package in Artful:
  Fix Released
Status in ntp package in Debian:
  Fix Released

Bug description:
  [Impact]

   * The locks of ntpdate the ifup hook and the ntp service start do not 
     match, therefore installation of ntpdate can harmstring the start of 
     ntp at boot.

   * The change ports back what Debian added later and we merged in Zesty.
     It does two things:
     1. it makes the lock paths actually match
     2. it drops the usage of lockfile-progs which never was a dependency 
        and uses flock directly.
     
  [Test Case]

   * Prep
     - Taking a Xenial VM (to avoid all the time set rejects in a container 
       from cluttering the view)
     - Installing ntp
     - Check status of ntp
     - Reboot the VM
     - Check status of ntp
   # Until now all should be good
   * Break it
     - install ntpdate
     - reboot
     - Check status of ntp
       - It (likely) is failed for "blocked known address being busy"
     - This is somewhat of a race, adding more extra network devices in 
       libvirt to your guest increases the chance if you can't reproduce.
   * Fix it
     - install the fix from proposed (or the ppa in c#14)
     - reboot
     - ntp is now running correctly after reboot

  [Regression Potential]

   * It was locking before as well, just on a lock never contended and 
     potentially failing to have the lockfile-progs calls available.
     Due to the change the init now of ntp can take longer (until the 
     ntpdate calls are out of the way)
   * For a fallback in case locking goes crazy in unexpected ways the 
     timeout of the flock (180s) is intentionally not checked for bad return 
     codes. That way in those cases ntp still tries to initialize and if it 
     fails for an ntpdate blocking the port it didn't "loose" anything by 
     being stalled.
     Therefor I'd consider that the actual regression potential rather 
     low  and safe.

  [Other Info]
   
   * This is kind of a bug-zombie, fixed in zesty but resurrected in Debian 
     (and Ubuntu by our merge) due to the addition of a native systemd 
     service. Now that Dev is finally (again) good it is time to tackle the 
     Xenial SRU.

  ---

  ntpdate and ntp conflict on the NTP well-known-socket. If ntp and
  ntpdate 1:4.2.8p4+dfsg-3ubuntu5.5 are installed on Xenial, and there
  are 2 static interfaces configured, most often we find that ntpd is
  not running after a reboot.

  When the ntp service is started by systemd, ntp fails to bind the NTP
  socket because ntpdate is running in the background. It's intended
  that ntp and ntpdate try to avoid this conflict with a lock file, but
  the locking mechanism was changed in ntpdate.if-up (from lockfile to
  flock), but it was not changed in ntp.init. Previously the file
  locking prevented ntp from trying to start when ntpdate was running.
  Not any more.

  Having multiple interfaces causes a much longer period of the socket
  being unavailable, because the 2 ntpdate processes will get serialized
  by the lock, while the ntp service is looking for a different lock, so
  it just plows right in.  Attempts by netdate.if-up to stop and start
  ntp seem to overlap and when the final start is invoked, systemd seems
  to thing ntp is already running, though it has failed.

  In 1:4.2.8p4+dfsg-3ubuntu1 the following change was made:
    debian/ntpdate.if-up: Drop lockfile mechanism as upstream is using flock now.
  Looks like corresponds to rev 371 of debian/ntpdate.if-up from upstream.

  This change diverged locking between ntpdate.if-up and ntp.init. This
  was rectified in rev 451 of ntp.init, to use compatible locking, but
  that doesn't appear in the Ubuntu version.

  System Information:

   lsb_release -rd:
     Description:    Ubuntu 16.04.2 LTS
     Release:        16.04

   apt-cache policy ntpdate:

     ntpdate:
       Installed: 1:4.2.8p4+dfsg-3ubuntu5.5
       Candidate: 1:4.2.8p4+dfsg-3ubuntu5.5
       Version table:
      *** 1:4.2.8p4+dfsg-3ubuntu5.5 100
             100 /var/lib/dpkg/status

   apt-cache policy ntp:

     ntp:
       Installed: 1:4.2.8p4+dfsg-3ubuntu5.5
       Candidate: 1:4.2.8p4+dfsg-3ubuntu5.5
       Version table:
      *** 1:4.2.8p4+dfsg-3ubuntu5.5 100
             100 /var/lib/dpkg/status

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1706818/+subscriptions