yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #84623
[Bug 1898997] Re: MAAS cannot deploy/boot if OVS bridge is configured on a single PXE NIC
This bug is believed to be fixed in cloud-init in version 20.4. If this
is still a problem for you, please make a comment and set the state back
to New
Thank you.
** Changed in: cloud-init
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1898997
Title:
MAAS cannot deploy/boot if OVS bridge is configured on a single PXE
NIC
Status in cloud-init:
Fix Released
Status in netplan:
Fix Released
Status in netplan.io package in Ubuntu:
Fix Released
Status in netplan.io source package in Focal:
Fix Released
Status in netplan.io source package in Groovy:
Fix Released
Bug description:
Problem description:
If we try to deploy a single-NIC machine via MAAS, configuring an Open vSwitch bridge as the primary/PXE interface, the machine will install and boot Ubuntu 20.04 but it cannot finish the whole configuration (e.g. copying of SSH keys) and cannot be accessed/controlled via MAAS. It ends up in a "Failed" state.
This is because systemd-network-wait-online.service fails (for some
reason), before netplan can fully setup and configure the OVS bridge.
Because of broken networking cloud-init cannot complete its final
stages, like setup of SSH keys or signaling its state back to MAAS. If
we wait a little longer the OVS bridge will actually come online and
networking is working – SSH not being setup and MAAS state still
"Failed", though.
Steps to reproduce:
* Setup a (virtual) MAAS system, e.g. inside a LXD container using a KVM host, as described here:
https://discourse.maas.io/t/setting-up-a-flexible-virtual-maas-test-environment/142
* Install & setup maas[-cli] snap from 2.9/beta channel (instead of the deb/PPA from the discourse post)
* Configure netplan PPA+key for testing via "Settings" -> "Package repos":
https://launchpad.net/~slyon/+archive/ubuntu/ovs
* Prepare curtin preseed in /var/snap/maas/current/preseeds/curtin_userdata, inside the LXD container (so you can access the broken machine afterwards):
======================
#cloud-config
debconf_selections:
maas: |
{{for line in str(curtin_preseed).splitlines()}}
{{line}}
{{endfor}}
late_commands:
maas: [wget, '--no-proxy', '{{node_disable_pxe_url}}', '--post-data', '{{node_disable_pxe_data}}', '-O', '/dev/null']
90_create_user: ["curtin", "in-target", "--", "sh", "-c", "sudo useradd test -g 0 -G sudo"]
92_set_user_password: ["curtin", "in-target", "--", "sh", "-c", "echo 'test:test' | sudo chpasswd"]
94_cat: ["curtin", "in-target", "--", "sh", "-c", "cat /etc/passwd"]
98_cloud_init: ["curtin", "in-target", "--", "apt-get", "-y", "install", "cloud-init"]
======================
* Compose a new virtual machine via MAAS' "KVM" menu, named e.g. "test1"
* Watch it being commissioned via MAAS' "Machines" menu
* Once it's ready select your machine (e.g. "test1.maas") -> Network
* Select the single network interface (e.g. "ens4") -> Create bridge
* Choose "Bridge type: Open vSwitch (ovs)", Select "Subnet" and "IP mode", save.
* Deploy machine to Ubuntu 20.04 via "Take action" button
The machine will install the OS and boot, but will end up in a
"Failed" state inside MAAS due to network/OVS not being setup
correctly. MAAS/SSH has no control over it. You can access the
(broken) machine via serial console from the KVM-host (i.e. LXD
container) via "virsh console test1" using the "test:test"
credentials.
=== SRU/Focal/netplan.io ===
[Impact]
This update contains bug-fixes and packaging improvements and we would like to make sure all of our supported customers have access to these improvements.
The notable ones are:
* Setup OVS early in network-pre.target to avoid delays (LP:
#1898997)
See the changelog entry below for a full list of changes and bugs.
[Test Case]
The following development and SRU process was followed:
https://wiki.ubuntu.com/NetplanUpdates
Netplan contains an extensive integration test suite that is ran using
the SRU package for each releases. This test suite's results are available here:
http://autopkgtest.ubuntu.com/packages/n/netplan.io
A successful run is required before the proposed netplan package
can be let into -updates.
The netplan team will be in charge of attaching the artifacts and console
output of the appropriate run to the bug. Netplan team members will not
mark ‘verification-done’ until this has happened.
[Regression Potential]
In order to mitigate the regression potential, the results of the
aforementioned integration tests are attached to this bug.
Focal:
https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_amd64.log
https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_arm64.log
https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_armhf.log
https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_ppc64el.log
https://git.launchpad.net/~slyon/+git/files/tree/LP1898997/focal_s390x.log
[Discussion]
To fully fix the MAAS/OVS problem, cloud-init needs to be updated as well. The fixes to netplan.io and cloud-init can be applied independently, though.
[Changelog]
- Setup OVS early in network-pre.target to avoid delays (LP: #1898997)
- Suggest openvswitch-switch runtime dependency
- Improve stability of autopkgtests
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1898997/+subscriptions