yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88093
[Bug 1958280] [NEW] Networking failures after NIC reordering
Public bug reported:
We can reliably reproduce a case where network configuration changes for
an Ubuntu 20.04 VM results in a networkd hanging on "pending"
interfaces. The interfaces are pending because of conflicts in naming
from the current boot and that found in /etc/netplan/50-cloud-init.yaml
from previous boot
Specifically, the netplan generator applies the previous configuration's names prior to running cloud-init local. We'll see something like `systemd-udevd[228]:
eth0: Failed to process device, ignoring: File exists`.
In one scenario, the data source is able to fetch updated network
configuration, and cloud-init updates the config & udev rules just
fine. However, networking stays offline ("pending") indefinitely. It
can be forced to resolve by executing `sudo udevadm trigger --attr-
match=subsystem=net`.
Example: Create a VM on Azure with two NICs, re-order them, then
restart.
az vm create --name test-x1 --image Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest --nics test-nic-01 test-nic-02
az vm deallocate --name test-x1
az vm nics set --vm-name test-x1 --nics test-nic-02 test-nic-01
az vm start --name test-x1
Upon doing that I am unable to login via serial console for 20 minutes
until cloud init times out. In this case, Azure is trying to report
ready but cannot because system networking never came up. We can remove
/lib/systemd/system/cloud-init-local.service.d/50-azure-clear-
persistent-obj-pkl.conf, cloud-init doesn't hang the boot, but
networking still fails to initialize for the guest.
The behavior for 18.04 is a bit different. On 18.04, the renaming of the
interfaces succeeds at early boot, which instead results in the Azure
data source failing the local phase because the fallback_interface is no
longer the primary NIC (eth1 secondary was renamed to eth0 to match
previous boot's config).
** Affects: cloud-init
Importance: Undecided
Status: New
** Attachment added: "Ubuntu 20.04 nic swap logs"
https://bugs.launchpad.net/bugs/1958280/+attachment/5555206/+files/cloud-init-u20-swap-nics.tar.gz
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1958280
Title:
Networking failures after NIC reordering
Status in cloud-init:
New
Bug description:
We can reliably reproduce a case where network configuration changes
for an Ubuntu 20.04 VM results in a networkd hanging on "pending"
interfaces. The interfaces are pending because of conflicts in naming
from the current boot and that found in /etc/netplan/50-cloud-
init.yaml from previous boot
Specifically, the netplan generator applies the previous configuration's names prior to running cloud-init local. We'll see something like `systemd-udevd[228]:
eth0: Failed to process device, ignoring: File exists`.
In one scenario, the data source is able to fetch updated network
configuration, and cloud-init updates the config & udev rules just
fine. However, networking stays offline ("pending") indefinitely. It
can be forced to resolve by executing `sudo udevadm trigger --attr-
match=subsystem=net`.
Example: Create a VM on Azure with two NICs, re-order them, then
restart.
az vm create --name test-x1 --image Canonical:0001-com-ubuntu-server-focal:20_04-lts:latest --nics test-nic-01 test-nic-02
az vm deallocate --name test-x1
az vm nics set --vm-name test-x1 --nics test-nic-02 test-nic-01
az vm start --name test-x1
Upon doing that I am unable to login via serial console for 20 minutes
until cloud init times out. In this case, Azure is trying to report
ready but cannot because system networking never came up. We can
remove /lib/systemd/system/cloud-init-local.service.d/50-azure-clear-
persistent-obj-pkl.conf, cloud-init doesn't hang the boot, but
networking still fails to initialize for the guest.
The behavior for 18.04 is a bit different. On 18.04, the renaming of
the interfaces succeeds at early boot, which instead results in the
Azure data source failing the local phase because the
fallback_interface is no longer the primary NIC (eth1 secondary was
renamed to eth0 to match previous boot's config).
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1958280/+subscriptions
Follow ups