← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1794399] [NEW] cloud-init dhcp_discovery() crashes on preprovisioned RHEL 7.6 VM in Azure

 

Public bug reported:

Azure, creating a RHEL 7.6 VM from a pool of preprovisioned VM

In /usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py,
dhcp_discovery() starts dhclient specifically so it will capture the
DHCP leases in dhcp.leases. The function copies the dhclient binary and
starts it with options naming unique lease and pid files. The function
then waits for both the lease and pid files to appear before using the
contents of the pid file to kill the dhclient instance.

There’s a behavior difference between the Ubuntu and RHEL versions of dhclient:
•	On Ubuntu, dhclient writes the DHCP lease response, forks/daemonizes, then writes the pid file with the daemonized process ID.
•	On RHEL, dhclient writes a pid file with the pre-daemon pid, writes the DHCP lease response, forks/daemonizes, then overwrites the pid file with the new (daemonized) pid.

On RHEL, there’s a race between dhcp_discovery() and dhclient:
1.	dhclient writes the pid file and lease file
2.	dhclient forks; the parent process exits
3.	dhcp_discovery() sees that the pid file and lease file exist
4.	dhcp_discovery() tries to kill the process named in the pid file, but it already exited in step 2
5.	dhclient child starts, daemonizes, and writes its pid in the pid file

When cloud-init runs on a preprovisioned RHEL 7.6 VM in Azure, dhcp.py
dhcp_discovery() throws an error when it tries to send SIGKILL to a
process that does not exist.

We have a patch that makes dhcp_discovery() wait until the pid in the
pid file represents a daemon process (parent pid is 1) before killing
the process. With this change, the issue is resolved.

** Affects: cloud-init
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1794399

Title:
  cloud-init dhcp_discovery() crashes on preprovisioned RHEL 7.6 VM in
  Azure

Status in cloud-init:
  New

Bug description:
  Azure, creating a RHEL 7.6 VM from a pool of preprovisioned VM

  In /usr/lib/python2.7/site-packages/cloudinit/net/dhcp.py,
  dhcp_discovery() starts dhclient specifically so it will capture the
  DHCP leases in dhcp.leases. The function copies the dhclient binary
  and starts it with options naming unique lease and pid files. The
  function then waits for both the lease and pid files to appear before
  using the contents of the pid file to kill the dhclient instance.

  There’s a behavior difference between the Ubuntu and RHEL versions of dhclient:
  •	On Ubuntu, dhclient writes the DHCP lease response, forks/daemonizes, then writes the pid file with the daemonized process ID.
  •	On RHEL, dhclient writes a pid file with the pre-daemon pid, writes the DHCP lease response, forks/daemonizes, then overwrites the pid file with the new (daemonized) pid.

  On RHEL, there’s a race between dhcp_discovery() and dhclient:
  1.	dhclient writes the pid file and lease file
  2.	dhclient forks; the parent process exits
  3.	dhcp_discovery() sees that the pid file and lease file exist
  4.	dhcp_discovery() tries to kill the process named in the pid file, but it already exited in step 2
  5.	dhclient child starts, daemonizes, and writes its pid in the pid file

  When cloud-init runs on a preprovisioned RHEL 7.6 VM in Azure, dhcp.py
  dhcp_discovery() throws an error when it tries to send SIGKILL to a
  process that does not exist.

  We have a patch that makes dhcp_discovery() wait until the pid in the
  pid file represents a daemon process (parent pid is 1) before killing
  the process. With this change, the issue is resolved.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1794399/+subscriptions


Follow ups