yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #22376
[Bug 1353008] Re: MAAS Provider: LXC did not get DHCP address, stuck in "pending"
This bug was fixed in the package cloud-init - 0.7.5-0ubuntu1.2
---------------
cloud-init (0.7.5-0ubuntu1.2) trusty-proposed; urgency=medium
* d/patches/lp-1353008-cloud-init-local-needs-run.conf:
backport change to cloud-init-local.conf to depend on /run being
mounted (LP: #1353008)
-- Scott Moser <smoser@xxxxxxxxxx> Wed, 17 Sep 2014 09:15:54 -0400
** Changed in: cloud-init (Ubuntu Trusty)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1353008
Title:
MAAS Provider: LXC did not get DHCP address, stuck in "pending"
Status in Init scripts for use on cloud images:
Fix Committed
Status in juju-core:
Triaged
Status in “cloud-init” package in Ubuntu:
Fix Released
Status in “cloud-init” source package in Trusty:
Fix Released
Bug description:
=== Begin SRU Information ===
This bug causes lxc containers created by the ubuntu-cloud template (lxc-create -t ubuntu-cloud) to sometimes not obtain an IP address, and thus not correctly boot to completion.
The bug is in an assumption by cloud-init that /run is mounted before
the cloud-init-local job is run. The fix is very simply to guarantee
that it is via modification to its upstart 'start on'.
When booting with an initramfs /run will be mounted before /, so the
race condition is not possible. Thus, the failure case is only either
in non-initramfs boot (which is very unlikely) or in lxc boot. The
lxc case seems only to occur very rarely, somewhere well under one
percent of the time.
[Test Case]
A test case is written at [1] that launches many instances in an attempt brute force find the error. However, I've not been able to make it fail.
The original bug reporter has been running with the 'start on' change
and has seen no errors since.
We will request the original bug reporter to apply the uploaded
changes and run through their battery.
[Regression Potential]
The possibility for regression here is in the second boot of an instance. The following scenario is a change of behavior:
* the user boots an instance with NoCloud or ConfigDrive in ds=local mode
* user changes /etc/network/interfaces in a way that would cause
static-networking to not be emitted on subsequent boot
* user reboots
Now, instead of a quick boot, the user may see cloud-init-nonet blocking on network coming up.
This would be a uncommon scenario, and the broken-etc-network-
interfaces scenario is already one that causes timeouts on boot.
--
[1] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/cloud-init-test/view/head:/tests/lxc-test-new-instance
=== End SRU Information ===
Note, that after I went onto the system, it *did* have an IP address.
0/lxc/3:
agent-state: pending
instance-id: juju-machine-0-lxc-3
series: trusty
hardware: arch=amd64
cloud-init-output.log snip:
Cloud-init v. 0.7.5 running 'init' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.29 seconds.
ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
ci-info: +--------+------+-----------+-----------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+------+-----------+-----------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | . | . | 00:16:3e:34:aa:57 |
ci-info: +--------+------+-----------+-----------+-------------------+
ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Cloud-init v. 0.7.5 running 'modules:config' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.99 seconds.
Cloud-init v. 0.7.5 running 'modules:final' at Mon, 04 Aug 2014 23:57:14 +0000. Up 574.42 seconds.
Cloud-init v. 0.7.5 finished at Mon, 04 Aug 2014 23:57:14 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net]. Up 574.54 seconds
syslog on system, showing DHCPACK 1 second later:
root@juju-machine-0-lxc-3:/home/ubuntu# grep DHCP /var/log/syslog
Aug 4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 255.255.255.255 port 67 (xid=0x1687c544)
Aug 4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPOFFER of 10.96.3.173 from 10.96.0.10
Aug 4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
Aug 5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
Aug 5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
Aug 5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
Aug 5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
It appears in every case, cloud-init init-local has failed very early
visible in juju logs /var/lib/juju/containers/<container>/console.log:
Traceback (most recent call last):
File "/usr/bin/cloud-init", line 618, in <module>
sys.exit(main())
File "/usr/bin/cloud-init", line 614, in main
get_uptime=True, func=functor, args=(name, args))
File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time
ret = func(*args, **kwargs)
File "/usr/bin/cloud-init", line 491, in status_wrapper
force=True)
File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1402, in sym_link
os.symlink(source, link)
OSError: [Errno 2] No such file or directory
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1353008/+subscriptions