← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1997124] Re: Netplan/Systemd/Cloud-init/Dbus Race

 

The dbus race that is happening here is due to `networkctl reconfigure`[1] being run by netplan apply, failing to talk to dbus, and restarting systemd_networkd[2] at that point in time when systemd_network may actually be coming up and is in an indeterminate state.
 

[1] https://github.com/canonical/netplan/blob/main/netplan/cli/utils.py#L116
[2] https://github.com/canonical/netplan/blob/main/netplan/cli/commands/apply.py#L277

I'm guessing the restart here from netplan apply is what's triggering
the occasional failure case where not all network config is applied
(like IP addresses) in systemd-networkd. It doesn't happen all the time
but it's racy as systemd-networkd is mid startup and we're restarting it
again via netplan apply.

After discussion with waldi (Bastian Blank) in Debian land about the systemd dependency chain, it seems my suggestion about about adding dbus.socket to cloud-init.service will actually introduce an ordering cycle because dbus.socket is 
  After=sysinit.target, yet cloud-init.service is Before=sysinit.target.


So, trying to shoehorn cloud-init into the dependency chain After=dbus.socket is impossible for systemd to schedule.


Maybe, we'd want one of the following instead:
 1. `netplan apply` provide an option to avoid falling back to `networkctl reconfigure` and exit non-zero so cloud-init can do something better, or retry where necessary
 2.  `netplan apply` can defer or block/retry until dbus.socket/service is ready allowing this only to affect cases where netplan apply is called 
 3. cloud-init to defer calling netplan apply on systemd-networkd environments until later boot stage (cloud-config.service) which comes after sysinit.target (and therefore can expect dbus.socket to be started at that point in boot.


I'll add netplan here to see if there are thoughts or counter suggestions here.

** Also affects: netplan
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1997124

Title:
  Netplan/Systemd/Cloud-init/Dbus Race

Status in cloud-init:
  In Progress
Status in netplan:
  New
Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  Cloud-init is seeing intermittent failures while running `netplan
  apply`, which appears to be caused by a missing resource at the time
  of call.

  The symptom in cloud-init logs looks like:

  Running ['netplan', 'apply'] resulted in stderr output: Failed to
  connect system bus: No such file or directory

  I think that this error[1] is likely caused by cloud-init running
  netplan apply too early in boot process (before dbus is active).

  Today I stumbled upon this error which was hit in MAAS[2]. We have
  also hit it intermittently during tests (we didn't have a reproducer).

  Realizing that this may not be a cloud-init error, but possibly a
  dependency bug between dbus/systemd we decided to file this bug for
  broader visibility to other projects.

  I will follow up this initial report with some comments from our
  discussion earlier.

  [1] https://github.com/canonical/netplan/blob/main/src/dbus.c#L801
  [2] https://discourse.maas.io/t/latest-ubuntu-20-04-image-causing-netplan-error/5970

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1997124/+subscriptions



References