yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75953
[Bug 1803173] Re: cloud-init disables user on azure at second reboot
Hi,
I recreated the issue using the Ubuntu upstream kernel builds at
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/
After launching a 18.04 instance, and then installing those kernels
and rebooting (generic_4.20.0-999.201811252100) I saw the issue.
I noticed that cloud-init's /var/lib/cloud data was getting messed up,
which didn't make any sense. The problem that I noticed was
/var/lib/cloud/instance was a directory rather than a symlink.
It turns out that the problem was walinuxagent was deleting the
/var/lib/cloud during boot. somewhere before cloud-init modules that
was getting deleted and was wreaking havoc on cloud-init.
It looks like this is at least identified as not the best idea at:
https://github.com/Azure/WALinuxAgent/commit/f42d2e75617bb54
I verified that cloud-init was working properly by itself with:
systemctl disable walinuxagent
before the reboot into the new kernel. All was well.
On reboot, cloud-init still used the azure datasource and had a single
entry in /var/lib/cloud/instances/
So, I'm marking this 'Invalid' for cloud-init. The fix needs to be
to have walinuxagent stop deleting state from other programs.
** Also affects: walinuxagent (Ubuntu)
Importance: Undecided
Status: New
** Changed in: cloud-init
Status: Confirmed => Invalid
** Changed in: walinuxagent (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1803173
Title:
cloud-init disables user on azure at second reboot
Status in cloud-init:
Invalid
Status in linux package in Ubuntu:
Triaged
Status in walinuxagent package in Ubuntu:
Confirmed
Bug description:
Hello,
Environment:
platform: Azure
arm image: Canonical UbuntuServer 16.04-DAILY-LTS latest
Steps:
Deploy VM with user/pass authentication
Install latest linux-next-upstream kernel (for example 4.19.0-4db9d11bcbef, where 4db9d11bcbef is the git tag from the linux-next latest tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/)
reboot (all good)
reboot again
cloud-init disables the username password authentication
I checked the cloud-init logs and found:
2018-11-01 16:45:28,566 - init.py[INFO]: User already exists, skipping.
2018-11-01 16:45:28,570 - util.py[DEBUG]: Running command ['passwd', '-l', ''] with allowed return codes [0] (shell=False, capture=True)
2018-11-01 16:45:28,793 - util.py[DEBUG]: Reading from /etc/sudoers (quiet=False)
2018-11-01 16:45:28,795 - util.py[DEBUG]: Read 781 bytes from /etc/sudoers
2018-11-01 16:45:28,796 - util.py[DEBUG]: Writing to /etc/sudoers.d/90-cloud-init-users - ab: [None] 51 bytes
2018-11-01 16:45:28,797 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups ran successfully
This issue is very bad one, as it can render your vm inaccessible on Azure.
I think this problem is due to the new kernel installation.
Initial bug report:
https://github.com/Azure/WALinuxAgent/issues/1386
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1803173/+subscriptions
References