← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1720160] Re: cloud-init wait for waagent on Azure CentOS 7.4 - no sshd start

 

WALinuxAgent-Bug: https://github.com/Azure/WALinuxAgent/issues/902

** Bug watch added: bugs.centos.org/ #13993
   https://bugs.centos.org/view.php?id=13993

** Also affects: cloud-init (CentOS) via
   https://bugs.centos.org/view.php?id=13993
   Importance: Unknown
       Status: Unknown

** Bug watch added: github.com/Azure/WALinuxAgent/issues #902
   https://github.com/Azure/WALinuxAgent/issues/902

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1720160

Title:
  cloud-init wait for waagent on Azure CentOS 7.4 - no sshd start

Status in cloud-init:
  New
Status in cloud-init package in CentOS:
  Unknown

Bug description:
  Hello,
  after update a CentOS 7.3-VM on Azure to CentOS 7.4, you can not connet via ssh because cloud-init try to start the waagent and the boot process hang. So sshd is stopped.

  We install a fresh CentOS 7.4 in the Azure cloud to provide a base
  image template for our company and this will also happens in this VM.

  #######
  # yum info cloud-init:
  Name        : cloud-init
  Arch        : x86_64
  Version     : 0.7.9
  Release     : 9.el7.centos.2
  Size        : 2.1 M
  Repo        : installed
  From repo   : base

  In CentOS 7.3 the cloud-init version is 0.7.5-10.el7.centos.1
  waagent is Package-Version 2.2.14-1.el7 in both CentOS versions witch is internal updated to 2.2.17 from waagent it self.

  #######
  To debug the failure I had to install rlogin before update:

  yum remove firewalld -y
  yum install rsh-server -y
  systemctl enable rsh.socket
  systemctl enable rlogin.socket
  systemctl enable rexec.socket
  echo "root:123" | chpasswd
  echo "+ root" > ~/.rlogin
  cat << EOF >> /etc/securetty
  rsh
  rexec
  rlogin
  EOF

  reboot

  yum update -y
  reboot

  #######
  to unblock the process I have connect via rlogin and kill the waagent start:

  # ps -ef | grep "waagent\|cloud"
  root       993     1  0 14:52 ?        00:00:02 /usr/bin/python /usr/bin/cloud-init init
  root      1134   993  0 14:52 ?        00:00:00 /bin/systemctl start waagent.service
  root      1337  1222  0 15:56 pts/2    00:00:00 grep --color=auto waagent\|cloud

  # kill 1134

  Then cloud-init do magic and on the next reboot sshd start without any
  trouble.

  #######
  To fail the VM again you can clear the config and reboot:
  yum remove cloud-init WALinuxAgent -y
  rm -f /etc/waagent.con*
  rm -fr /etc/cloud/
  rm -fr /var/lib/cloud/
  rm -fr /var/lib/waagent/
  rm -fr /var/log/waagent.lo*
  rm -fr /var/log/cloud-init*
  yum install cloud-init WALinuxAgent -y

  cp -a /etc/waagent.conf /etc/waagent.conf.rpmsave
  sed -i -e "s/Provisioning.Enabled.*/Provisioning.Enabled=n/g" /etc/waagent.conf
  sed -i -e "s/Provisioning.UseCloudInit.*/Provisioning.UseCloudInit=y/g" /etc/waagent.conf
  sed -i -e "s/Logs.Verbose.*/Logs.Verbose=y/g" /etc/waagent.conf

  cp -a /etc/cloud/cloud.cfg /etc/cloud/cloud.cfg.rpmsave
  cat << EOF >> /etc/cloud/cloud.cfg

  # From cloud-init docs
  datasource:
    Azure:
      agent_command: [service, waagent, start]

  debug:
    verbose: True

  EOF

  diff /etc/waagent.conf.rpmsave /etc/waagent.conf
  diff /etc/cloud/cloud.cfg.rpmsave /etc/cloud/cloud.cfg

  reboot

  #######
  I didn't know why the system hang.
  Can you please review this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1720160/+subscriptions


References