yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74387
[Bug 1693361] Re: cloud-init sometimes fails on dpkg lock due to concurrent apt-daily.service execution
Nothing actionable here for apt, so I'll close this. We should consider
making frontend locking more flexible for scripts using apt, though, so
scripts can hold the lock all the time and drive apt.
** Changed in: apt (Ubuntu)
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1693361
Title:
cloud-init sometimes fails on dpkg lock due to concurrent apt-
daily.service execution
Status in APT:
Confirmed
Status in cloud-init:
Fix Released
Status in apt package in Ubuntu:
Invalid
Status in cloud-init package in Ubuntu:
Fix Released
Status in cloud-init source package in Xenial:
Fix Released
Status in cloud-init source package in Yakkety:
Won't Fix
Status in cloud-init source package in Zesty:
Fix Released
Status in cloud-init source package in Artful:
Fix Released
Bug description:
=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml
#cloud-config
packages: ['hello']
[Test Case]
lxc-proposed-snapshot is
https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot
It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data.
This is platform independent. The test case demonstrates lxd.
$ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']" \
"package_upgrade: true" > config.yaml
$ release=xenial
$ ref=proposed-$release
$ ./lxc-proposed-snapshot --proposed --publish $release $ref;
b.) start the instance
$ name=$release-1693361
$ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml)
$ sleep 1
$ lxc exec $name -- tail -f /var/log/cloud-init.log /var/log/cloud-init-output.log
# watch this boot.
c.) Look for evidence of systemd failure
journalctl -o short-precise | grep -i break
journalctl -o short-precise | grep -i order
[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it. Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.
[Other Info]
Upstream commit at
https://git.launchpad.net/cloud-init/commit/?id=11121fe4
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by
cron.daily. If one builds a custom AMI it is possible that the apt-
daily.timer will fire during boot. This can fire at the same time
cloud-init is running and if cloud-init loses the race the invocation
of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily
(e.g. unattended upgrades happening during business hours, delaying
boot, etc.) and discussion of potential systemd changes regarding
timers firing during boot (c.f.
https://github.com/systemd/systemd/issues/5659).
While it would be better to solve this in apt itself, I suggest that
cloud-init be defensive when calling apt and implement some retry
mechanism.
Various instances of people running into this issue:
https://github.com/chef/bento/issues/609
https://clusterhq.atlassian.net/browse/FLOC-4486
https://github.com/boxcutter/ubuntu/issues/73
https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image
To manage notifications about this bug go to:
https://bugs.launchpad.net/apt/+bug/1693361/+subscriptions
References