← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1958280] [NEW] Networking failures after NIC reordering

 

Public bug reported:

We can reliably reproduce a case where network configuration changes for
an Ubuntu 20.04 VM results in a networkd hanging on "pending"
interfaces. The interfaces are pending because of conflicts in naming
from the current boot and that found in /etc/netplan/50-cloud-init.yaml
from previous boot

Specifically, the netplan generator applies the previous configuration's names prior to running cloud-init local.  We'll see something like `systemd-udevd[228]: 
eth0: Failed to process device, ignoring: File exists`.

In one scenario, the data source is able to fetch updated network
configuration, and  cloud-init updates the config & udev rules just
fine.  However, networking stays offline ("pending") indefinitely.  It
can be forced to resolve by executing `sudo udevadm trigger --attr-
match=subsystem=net`.

Example: Create a VM on Azure with two NICs, re-order them, then
restart.

az vm create --name test-x1 --image Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest --nics test-nic-01 test-nic-02
az vm deallocate --name test-x1
az vm nics set --vm-name test-x1 --nics test-nic-02 test-nic-01
az vm start --name test-x1

Upon doing that I am unable to login via serial console for 20 minutes
until cloud init times out.  In this case, Azure is trying to report
ready but cannot because system networking never came up. We can remove
/lib/systemd/system/cloud-init-local.service.d/50-azure-clear-
persistent-obj-pkl.conf, cloud-init doesn't hang the boot, but
networking still fails to initialize for the guest.

The behavior for 18.04 is a bit different. On 18.04, the renaming of the
interfaces succeeds at early boot, which instead results in the Azure
data source failing the local phase because the fallback_interface is no
longer the primary NIC (eth1 secondary was renamed to eth0 to match
previous boot's config).

** Affects: cloud-init
     Importance: Undecided
         Status: New

** Attachment added: "Ubuntu 20.04 nic swap logs"
   https://bugs.launchpad.net/bugs/1958280/+attachment/5555206/+files/cloud-init-u20-swap-nics.tar.gz

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1958280

Title:
  Networking failures after NIC reordering

Status in cloud-init:
  New

Bug description:
  We can reliably reproduce a case where network configuration changes
  for an Ubuntu 20.04 VM results in a networkd hanging on "pending"
  interfaces. The interfaces are pending because of conflicts in naming
  from the current boot and that found in /etc/netplan/50-cloud-
  init.yaml from previous boot

  Specifically, the netplan generator applies the previous configuration's names prior to running cloud-init local.  We'll see something like `systemd-udevd[228]: 
  eth0: Failed to process device, ignoring: File exists`.

  In one scenario, the data source is able to fetch updated network
  configuration, and  cloud-init updates the config & udev rules just
  fine.  However, networking stays offline ("pending") indefinitely.  It
  can be forced to resolve by executing `sudo udevadm trigger --attr-
  match=subsystem=net`.

  Example: Create a VM on Azure with two NICs, re-order them, then
  restart.

  az vm create --name test-x1 --image Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest --nics test-nic-01 test-nic-02
  az vm deallocate --name test-x1
  az vm nics set --vm-name test-x1 --nics test-nic-02 test-nic-01
  az vm start --name test-x1

  Upon doing that I am unable to login via serial console for 20 minutes
  until cloud init times out.  In this case, Azure is trying to report
  ready but cannot because system networking never came up. We can
  remove /lib/systemd/system/cloud-init-local.service.d/50-azure-clear-
  persistent-obj-pkl.conf, cloud-init doesn't hang the boot, but
  networking still fails to initialize for the guest.

  The behavior for 18.04 is a bit different. On 18.04, the renaming of
  the interfaces succeeds at early boot, which instead results in the
  Azure data source failing the local phase because the
  fallback_interface is no longer the primary NIC (eth1 secondary was
  renamed to eth0 to match previous boot's config).

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1958280/+subscriptions



Follow ups