touch-packages team mailing list archive

Thread
Date
[Bug 1509747] Re: Intermittent lxc failures on wily

To: touch-packages@xxxxxxxxxxxxxxxxxxx
From: Martin Pitt <martin.pitt@xxxxxxxxxx>
Date: Wed, 28 Oct 2015 08:16:33 -0000
Reply-to: Bug 1509747 <1509747@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
I tried that user-data (minus the extra packages: and
ssh_authorized_keys: as they are irrelevant) on the current wily cloud
image with QEMU, and I get:

[   13.799851] cloud-init[1010]: Cloud-init v. 0.7.7 running 'modules:config' at Wed, 28 Oct 2015 08:11:33 +0000. Up 13.12 seconds.
[   14.189822] cloud-init[1088]: + mkdir -p /var/lib/juju/init/juju-template-restart
[   14.192818] cloud-init[1088]: + cat
[   14.194669] cloud-init[1088]: + /bin/systemctl link /var/lib/juju/init/juju-template-restart/juju-template-restart.service
[   14.208282] cloud-init[1088]: Created symlink from /etc/systemd/system/juju-template-restart.service to /var/lib/juju/init/juju-template-restart/juju-template-restart.service.
[   14.271740] cloud-init[1088]: + /bin/systemctl daemon-reload
[   14.342206] cloud-init[1088]: + /bin/systemctl enable /var/lib/juju/init/juju-template-restart/juju-template-restart.service
[   14.348218] cloud-init[1088]: Created symlink from /etc/systemd/system/multi-user.target.wants/juju-template-restart.service to /var/lib/juju/init/juju-template-restart/juju-template-restart.service.
[   14.416085] cloud-init[1088]: + /bin/systemctl start juju-template-restart.service
Cloud-init 0.7.7 received SIGTERM, exiting...
  Filename: /usr/lib/python3.4/logging/__init__.py
  Function: handle
  Line number: 855
    Filename: /usr/lib/python3.4/logging/__init__.py
    Function: callHandlers
    Line number: 1486
      Filename: /usr/lib/python3.4/logging/__init__.py
      Function: handle
      Line number: 1424
[   16.204669] reboot: Power down

So that's a different timing/behaviour than your's, but it shows the
race condition with this juju-template-restart.service. Reopening juju
task for that. Can you please try to change the runcmd to drop all this
and just use

   - (while [ ! -e /var/lib/cloud/instance/boot-finished ]; do sleep 1;
done; shutdown -P now) &

?

Note that more recent versions of cloud-init have better support for
this (http://cloudinit.readthedocs.org/en/latest/topics/examples.html
#reboot-poweroff-when-finished), but I believe that's not yet available
on trusty; hence in autopkgtest I use the above while loop in the
background instead of "power_state:".

** Changed in: juju-core
       Status: Invalid => Confirmed

** Summary changed:

- Intermittent lxc failures on wily
+ Intermittent lxc failures on wily, juju-template-restart.service is racy

** Summary changed:

- Intermittent lxc failures on wily, juju-template-restart.service is racy
+ Intermittent lxc failures on wily, juju-template-restart.service race condition

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1509747

Title:
  Intermittent lxc failures on wily, juju-template-restart.service race
  condition

Status in juju-core:
  Confirmed
Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  Frequently, when creating an lxc container on wily (either through
  --to lxc:#, or using the local provider on wily), the template never
  stops and errors out here:

  [ 2300.885573] cloud-init[2758]: Cloud-init v. 0.7.7 running 'modules:final' at Sun, 25 Oct 2015 00:28:57 +0000. Up 182 seconds.
  [ 2300.886101] cloud-init[2758]: Cloud-init v. 0.7.7 finished at Sun, 25 Oct 2015 00:29:03 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 189 seconds
  [  OK  ] Started Execute cloud user/final scripts.
  [  OK  ] Reached target Multi-User System.
  [  OK  ] Reached target Graphical Interface.
           Starting Update UTMP about System Runlevel Changes...
  [  OK  ] Started /dev/initctl Compatibility Daemon.
  [FAILED] Failed to start Update UTMP about System Runlevel Changes.
  See 'systemctl status systemd-update-utmp-runlevel.service' for details.

  Attaching to the container and running the above command yields:

  ubuntu@cherylj-wily-local-lxc:~$ sudo lxc-attach --name juju-wily-lxc-template
  root@juju-wily-lxc-template:~# systemctl status systemd-update-utmp-runlevel.service
  ● systemd-update-utmp-runlevel.service - Update UTMP about System Runlevel Changes
     Loaded: loaded (/lib/systemd/system/systemd-update-utmp-runlevel.service; static; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2015-10-25 00:30:29 UTC; 2h 23min ago
       Docs: man:systemd-update-utmp.service(8)
             man:utmp(5)
    Process: 3963 ExecStart=/lib/systemd/systemd-update-utmp runlevel (code=exited, status=1/FAILURE)
   Main PID: 3963 (code=exited, status=1/FAILURE)

  Oct 25 00:29:46 juju-wily-lxc-template systemd[1]: Starting Update UTMP about System Runlevel Changes...
  Oct 25 00:30:29 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Main process exited, code=exited, status=1/FAILURE
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: Failed to start Update UTMP about System Runlevel Changes.
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Unit entered failed state.
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Failed with result 'exit-code'.

  
  I have seen this on ec2 and in canonistack.  The canonistack machine is available for further debugging.  Ping me for access.

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1509747/+subscriptions