← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1803173] Re: cloud-init disables user on azure at second reboot

 

Hi,
I recreated the issue using the Ubuntu upstream kernel builds at
  http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/

After launching a 18.04 instance, and then installing those kernels
and rebooting (generic_4.20.0-999.201811252100) I saw the issue.

I noticed that cloud-init's /var/lib/cloud data was getting messed up,
which didn't make any sense.   The problem that I noticed was 
/var/lib/cloud/instance was a directory rather than a symlink.
It turns out that the problem was walinuxagent was deleting the 
/var/lib/cloud during boot. somewhere before cloud-init modules that
was getting deleted and was wreaking havoc on cloud-init.

It looks like this is at least identified as not the best idea at:
 https://github.com/Azure/WALinuxAgent/commit/f42d2e75617bb54

I verified that cloud-init was working properly by itself with:
  systemctl disable walinuxagent
before the reboot into the new kernel.  All was well.
On reboot, cloud-init still used the azure datasource and had a single
entry in /var/lib/cloud/instances/

So, I'm marking this 'Invalid' for cloud-init.  The fix needs to be
to have walinuxagent stop deleting state from other programs.


** Also affects: walinuxagent (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: cloud-init
       Status: Confirmed => Invalid

** Changed in: walinuxagent (Ubuntu)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1803173

Title:
  cloud-init disables user on azure at second reboot

Status in cloud-init:
  Invalid
Status in linux package in Ubuntu:
  Triaged
Status in walinuxagent package in Ubuntu:
  Confirmed

Bug description:
  Hello,

  Environment:

  platform: Azure
  arm image: Canonical UbuntuServer 16.04-DAILY-LTS latest

  Steps:

  Deploy VM with user/pass authentication
  Install latest linux-next-upstream kernel (for example 4.19.0-4db9d11bcbef, where 4db9d11bcbef is the git tag from the linux-next latest tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/)
  reboot (all good)
  reboot again
  cloud-init disables the username password authentication
  I checked the cloud-init logs and found:

  2018-11-01 16:45:28,566 - init.py[INFO]: User already exists, skipping.
  2018-11-01 16:45:28,570 - util.py[DEBUG]: Running command ['passwd', '-l', ''] with allowed return codes [0] (shell=False, capture=True)
  2018-11-01 16:45:28,793 - util.py[DEBUG]: Reading from /etc/sudoers (quiet=False)
  2018-11-01 16:45:28,795 - util.py[DEBUG]: Read 781 bytes from /etc/sudoers
  2018-11-01 16:45:28,796 - util.py[DEBUG]: Writing to /etc/sudoers.d/90-cloud-init-users - ab: [None] 51 bytes
  2018-11-01 16:45:28,797 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups ran successfully

  This issue is very bad one, as it can render your vm inaccessible on Azure.
  I think this problem is due to the new kernel installation.

  Initial bug report:
  https://github.com/Azure/WALinuxAgent/issues/1386

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1803173/+subscriptions


References