group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 1641875] Re: Scheduled events (e.g. LDAP connection retries) are affected by clock adjustments

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Launchpad Bug Tracker <1641875@xxxxxxxxxxxxxxxxxx>
Date: Mon, 06 Mar 2017 19:19:38 -0000
Reply-to: Bug 1641875 <1641875@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
This bug was fixed in the package sssd - 1.13.4-1ubuntu1.2

---------------
sssd (1.13.4-1ubuntu1.2) xenial; urgency=medium

  * d/p/restart_providers_on_timeshift.patch: Implement watchdog and
    use SIGUSR2 after watchdog detects time shift to execute pending
    scheduled tasks that could be stuck (LP: #1641875)

 -- Victor Tapia <victor.tapia@xxxxxxxxxxxxx>  Thu, 23 Feb 2017 11:46:31
+0100

** Changed in: sssd (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1641875

Title:
  Scheduled events (e.g. LDAP connection retries) are affected by clock
  adjustments

Status in sssd package in Ubuntu:
  Fix Released
Status in sssd source package in Trusty:
  Fix Committed
Status in sssd source package in Xenial:
  Fix Released
Status in sssd source package in Yakkety:
  Fix Released
Status in sssd source package in Zesty:
  Fix Released

Bug description:
  [Impact]

   * SSSD is affected by clock shifts because it's built on libtevent, that doesn't use monotonic clocks. 
   * After an event has been scheduled, if the time drifts to the past SSSD won't recover and the event will have to wait the shifted time to be executed.

  [Test Case]

   * Modify the /etc/hosts file to force a failed resolution and restart sssd.
   * Within the first ~1 minute retry window, change the date (date --set) to a previous time (e.g. 2h).
   * "Tail" the log file. The event will be executed at the original schedule, past those 2h.

  [Regression Potential]

   * None

  [Other Info]

   * Version 1.15 implements a watchdog that detects time shifts and resets itselft but doesn't reschedule the scheduled events.
   * There's a network monitor based on netlink that detects interface changes and restarts the providers, rescheduling the scheduled events.
   * Such restart can be triggered with SIGUSR2. Sending that signal after the watchdog restarts fixes this issue.
   * Upstream bug: https://fedorahosted.org/sssd/ticket/3285 

  [Original Description]

  When SSSD fails to connect to a provider (LDAP, for instance) it
  creates a timed event with tevent_add_timer() in order to retry in ~1
  min. Tevent relies on CLOCK_REALTIME, using absolute epoch time, so
  when the time changes (e.g. NTP sync) the scheduled event is affected.

  Reproducer:

  1. Modify the /etc/hosts file to force a failed resolution and restart
  sssd

  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ldap' in DNS
  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [resolv_gethostbyname_done] (0x0040): querying hosts database failed [5]: Input/output error
  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
  ...
  (Tue Oct 25 09:37:14 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

  2. Within the ~1 minute window, change the date (date --set) to a
  previous time (2 hour in my example). Note: if /etc/resolv.conf or a
  network interface is modified, SSSD providers will restart, but the
  scheduled retry will be kept.

  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [recreate_ares_channel] (0x0100): Destroying the old c-ares channel
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
  ...
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
  ...
  (Tue Oct 25 07:41:46 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

  3. Once it gets to the programmed date (2 hours later), it retries:

  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'name not resolved'
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_set_port_status] (0x0100): Marking port 389 of server 'ldap' as 'neutral'
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP'
  ...
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'ldap': Could not contact DNS servers
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [set_server_common_status] (0x0100): Marking server 'ldap' as 'not working'
  ...
  (Tue Oct 25 09:38:25 2016) [sssd[be[openstacklocal]]] [check_online_callback] (0x0100): Backend returned: (1, 0, <NULL>) [Provider is Offline]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1641875/+subscriptions