ubuntu-public-cloud team mailing list archive
-
ubuntu-public-cloud team
-
Mailing list archive
-
Message #00020
[Bug 2057965] Re: google-startup-scripts runs before cloud-init finished network setup
Hello Catherine, or anyone else affected,
Accepted google-guest-agent into noble-proposed. The package will build
now and be available at https://launchpad.net/ubuntu/+source/google-
guest-agent/20240716.00-0ubuntu1~24.04.0 in a few hours, and then in the
-proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
noble to verification-done-noble. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-noble. In either case, without details of your testing we will
not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
** Changed in: google-guest-agent (Ubuntu Noble)
Status: Confirmed => Fix Committed
** Tags added: verification-needed verification-needed-noble
** Changed in: google-guest-agent (Ubuntu Jammy)
Status: New => Fix Committed
** Tags added: verification-needed-jammy
--
You received this bug notification because you are a member of Ubuntu
Public Cloud, which is subscribed to google-guest-agent in Ubuntu.
https://bugs.launchpad.net/bugs/2057965
Title:
google-startup-scripts runs before cloud-init finished network setup
Status in google-guest-agent package in Ubuntu:
Fix Released
Status in google-guest-agent source package in Xenial:
New
Status in google-guest-agent source package in Bionic:
New
Status in google-guest-agent source package in Focal:
Fix Committed
Status in google-guest-agent source package in Jammy:
Fix Committed
Status in google-guest-agent source package in Mantic:
Won't Fix
Status in google-guest-agent source package in Noble:
Fix Committed
Bug description:
[ Impact ]
In certain situations (consistently with ubuntu-pro=31.2 and cloud-
init=23.4.4), cloud-config.service has not completed before google-
startup-scripts.service runs. This can cause startup scripts that
rely on apt to fail, as cloud-init is responsible for reconfiguring
sources.list to point at the GCE archives.
Since pro and cloud-init are backported to all older releases, this
bug will affect them too.
The change that results in this race condition is the removal an
ordering condition between pro and cloud-init, so adding `After=cloud-
final.service` to google-startup-scripts.service should ensure that
the startup scripts are correctly run regardless of the ordering (or
lack thereof) between other services.
[ Test Plan ]
To reproduce:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
Since this bug particularly effects first boot (once sources.list is
configured with the GCE mirrors on first boot it will remain correctly
configured), the best way to test that fix is correctly created will
be to create an image with pro pinned at 31.2, cloud-init pinned at
23.4.4, and google-guest-agent install from proposed. The test would
be:
1. Create an instance with startup script as above
$ gcloud compute instances create startup-test --image [IMAGE_NAME] --image-project [IMAGE PROJECT] --metadata-from-file=startup-script=startup_script.sh
2. SSH into the instance and verify pro/cloud-init/google-guest-agent versions/source
> pro --version
32.1~[RELEASE]
> cloud-init --version
/usr/bin/cloud-init 23.4.4-0ubuntu0~[RELEASE]
> apt-cache policy google-guest-agent
[ensure from -proposed]
3. Verify startup script ran correctly after cloud-config.service.
> diff /tmp/startup-sources.list /etc/apt/sources.list
>
[ Where problems could occur ]
Since this introduces a new ordering constraint, it will likely have
performance impacts (google-startup-scripts will run later). This
seems preferable to breaking a subset of startup scripts in some
situations; it is not uncommon to use startup scripts to install
packages so it's important for the mirrors to be correctly configured.
[ Other Info ]
Original bug report retained below.
New GCP dailies are failing startup-script tests, due to configuration
via cloud-init not being fully completed, apt sources for example,
when startup scripts are run. The failure can be reproduced as
follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-
cloud) do not show this behaviour. The change is due to a change in
ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-
client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40
for details) that removes a systemd ordering on cloud-config.service.
As side effect of this change was the removal of cloud-config.service
(and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to
the google-startup-scripts.service file, which enforces the correct
ordering between google-startup-scripts and cloud-init.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/google-guest-agent/+bug/2057965/+subscriptions