touch-packages team mailing list archive

Thread
Date
[Bug 1509747] Re: Intermittent lxc failures on wily

To: touch-packages@xxxxxxxxxxxxxxxxxxx
From: Martin Pitt <martin.pitt@xxxxxxxxxx>
Date: Wed, 28 Oct 2015 08:05:50 -0000
Reply-to: Bug 1509747 <1509747@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
I'm running wily LXC on wily (or now xenial) pretty much every day, so
this isn't very straightforward to reproduce. lxc's own autopkgtests
also do that, and they run in the cloud in a wily instance, pretty
similar to your's.

I tried a wily cloud image, created an LXC container, and started it,
which works as expected:

  nova boot --flavor m1.small --image ubuntu/ubuntu-wily-daily-amd64-server-20151026-disk1.img test1
  ssh [... into test1 instance ]
  sudo lxc-create -n w1 -t download -- -d ubuntu -r wily -a amd64
  sudo lxc-start -n w1 -F

Thanks for the strace; there's nothing unusual in it, so the reason why
it thinks it fails must be somewhere else.

But either way: You can set "ExecStart=/bin/false" in systemd-update-
utmp-runlevel.service to force the unit to fail, but that's in no way
fatal to the container; it boots, and you just have this one failed
unit. Nothing else depends on this, this will just make "runlevel" spit
out an updated value, so that legacy software becomes a bit more
compatible. So while that unit certainly should not fail (this is a
bug), this is almost surely *not* the reason what breaks you, so this is
far from 'critical'.

I had a look at your user-data: this dance around a temporary systemd
unit to shut down the machine is rather complex. You can just have a
runcmd like

 - (while [ ! -e /var/lib/cloud/instance/boot-finished ]; do sleep 1;
done; shutdown -P now) &

It also looks wrong: your runcmd *immediately* starts the unit, which
(tries to) shutdown the container while cloud-init is still running. My
hunch is that calling "shutdown" at this point will hang or fail as
there is still the boot transaction running, so this smells like a
deadlock which would explain why cloud-init never finishes.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1509747

Title:
  Intermittent lxc failures on wily

Status in juju-core:
  Invalid
Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  Frequently, when creating an lxc container on wily (either through
  --to lxc:#, or using the local provider on wily), the template never
  stops and errors out here:

  [ 2300.885573] cloud-init[2758]: Cloud-init v. 0.7.7 running 'modules:final' at Sun, 25 Oct 2015 00:28:57 +0000. Up 182 seconds.
  [ 2300.886101] cloud-init[2758]: Cloud-init v. 0.7.7 finished at Sun, 25 Oct 2015 00:29:03 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 189 seconds
  [  OK  ] Started Execute cloud user/final scripts.
  [  OK  ] Reached target Multi-User System.
  [  OK  ] Reached target Graphical Interface.
           Starting Update UTMP about System Runlevel Changes...
  [  OK  ] Started /dev/initctl Compatibility Daemon.
  [FAILED] Failed to start Update UTMP about System Runlevel Changes.
  See 'systemctl status systemd-update-utmp-runlevel.service' for details.

  Attaching to the container and running the above command yields:

  ubuntu@cherylj-wily-local-lxc:~$ sudo lxc-attach --name juju-wily-lxc-template
  root@juju-wily-lxc-template:~# systemctl status systemd-update-utmp-runlevel.service
  ● systemd-update-utmp-runlevel.service - Update UTMP about System Runlevel Changes
     Loaded: loaded (/lib/systemd/system/systemd-update-utmp-runlevel.service; static; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2015-10-25 00:30:29 UTC; 2h 23min ago
       Docs: man:systemd-update-utmp.service(8)
             man:utmp(5)
    Process: 3963 ExecStart=/lib/systemd/systemd-update-utmp runlevel (code=exited, status=1/FAILURE)
   Main PID: 3963 (code=exited, status=1/FAILURE)

  Oct 25 00:29:46 juju-wily-lxc-template systemd[1]: Starting Update UTMP about System Runlevel Changes...
  Oct 25 00:30:29 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Main process exited, code=exited, status=1/FAILURE
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: Failed to start Update UTMP about System Runlevel Changes.
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Unit entered failed state.
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Failed with result 'exit-code'.

  
  I have seen this on ec2 and in canonistack.  The canonistack machine is available for further debugging.  Ping me for access.

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1509747/+subscriptions