← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1353008] Re: MAAS Provider: LXC did not get DHCP address, stuck in "pending"

 

Removing this bug from juju-core as the issue required a cloud init fix.

** No longer affects: juju-core

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1353008

Title:
  MAAS Provider: LXC did not get DHCP address, stuck in "pending"

Status in Init scripts for use on cloud images:
  Fix Committed
Status in “cloud-init” package in Ubuntu:
  Fix Released
Status in “cloud-init” source package in Trusty:
  Fix Released

Bug description:
  === Begin SRU Information ===
  This bug causes lxc containers created by the ubuntu-cloud template (lxc-create -t ubuntu-cloud) to sometimes not obtain an IP address, and thus not correctly boot to completion.

  The bug is in an assumption by cloud-init that /run is mounted before
  the cloud-init-local job is run.  The fix is very simply to guarantee
  that it is via modification to its upstart 'start on'.

  When booting with an initramfs /run will be mounted before /, so the
  race condition is not possible.  Thus, the failure case is only either
  in non-initramfs boot (which is very unlikely) or in lxc boot.  The
  lxc case seems only to occur very rarely, somewhere well under one
  percent of the time.

  [Test Case]
  A test case is written at [1] that launches many instances in an attempt brute force find the error.  However, I've not been able to make it fail.

  The original bug reporter has been running with the 'start on' change
  and has seen no errors since.

  We will request the original bug reporter to apply the uploaded
  changes and run through their battery.

  [Regression Potential]
  The possibility for regression here is in the second boot of an instance.  The following scenario is a change of behavior:
   * the user boots an instance with NoCloud or ConfigDrive in ds=local mode
   * user changes /etc/network/interfaces in a way that would cause
     static-networking to not be emitted on subsequent boot
   * user reboots
  Now, instead of a quick boot, the user may see cloud-init-nonet blocking on network coming up.

  This would be a uncommon scenario, and the broken-etc-network-
  interfaces scenario is already one that causes timeouts on boot.

  --
  [1] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/cloud-init-test/view/head:/tests/lxc-test-new-instance

  === End  SRU Information ===

  Note, that after I went onto the system, it *did* have an IP address.

        0/lxc/3:
          agent-state: pending
          instance-id: juju-machine-0-lxc-3
          series: trusty
          hardware: arch=amd64

  cloud-init-output.log snip:

  Cloud-init v. 0.7.5 running 'init' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.29 seconds.
  ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: | Device |  Up  |  Address  |    Mask   |     Hw-Address    |
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: |   lo   | True | 127.0.0.1 | 255.0.0.0 |         .         |
  ci-info: |  eth0  | True |     .     |     .     | 00:16:3e:34:aa:57 |
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Cloud-init v. 0.7.5 running 'modules:config' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.99 seconds.
  Cloud-init v. 0.7.5 running 'modules:final' at Mon, 04 Aug 2014 23:57:14 +0000. Up 574.42 seconds.
  Cloud-init v. 0.7.5 finished at Mon, 04 Aug 2014 23:57:14 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 574.54 seconds

  syslog on system, showing DHCPACK 1 second later:

  root@juju-machine-0-lxc-3:/home/ubuntu# grep DHCP /var/log/syslog
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 255.255.255.255 port 67 (xid=0x1687c544)
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPOFFER of 10.96.3.173 from 10.96.0.10
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  Aug  5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
  Aug  5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  Aug  5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
  Aug  5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10

  It appears in every case, cloud-init init-local has failed very early
  visible in juju logs /var/lib/juju/containers/<container>/console.log:

  Traceback (most recent call last):
    File "/usr/bin/cloud-init", line 618, in <module>
      sys.exit(main())
    File "/usr/bin/cloud-init", line 614, in main
      get_uptime=True, func=functor, args=(name, args))
    File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time
      ret = func(*args, **kwargs)
    File "/usr/bin/cloud-init", line 491, in status_wrapper
      force=True)
    File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1402, in sym_link
      os.symlink(source, link)
  OSError: [Errno 2] No such file or directory

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1353008/+subscriptions