touch-packages team mailing list archive

Thread
Date
Re: [Bug 1509747] Re: Intermittent lxc failures on wily, juju-template-restart.service race condition

To: touch-packages@xxxxxxxxxxxxxxxxxxx
From: Martin Pitt <martin.pitt@xxxxxxxxxx>
Date: Wed, 28 Oct 2015 23:49:14 -0000
Reply-to: Bug 1509747 <1509747@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Cheryl Jennings [2015-10-28 13:59 -0000]:
> 2 - However, I thought adding in After=cloud-config.target would ensure
> that it wouldn't start until after cloud-init completes?

Not really, I'm afraid. TL;DR: this doesn't work that way, is prone to
deadlocks, and rather hard to understand.

Long-winded explanation in case you care:

If some target (or other unit) pulls in a service via Wants/Requires
during boot, systemd has control over when it starts which unit, and
the After= will order the units accordingly. But if you manually start
that unit, *you* determine the startup time.

The After= for a target without a Wants=/Requires= is by and large
meaningless with manual start requests: If cloud-config.target is not
started, then nothing has activated it yet, and After= is a noop (as
it's only ordering, not dependency).  OTOH, if the target is already
started, then the After= is immediately satisfied.

"systemctl start foo" actually *does* wait in the case that
foo.service has an After=bar.service, and bar.service is in state
"activating" (i. e.  already scheduled to be started). But targets
don't have that state, they are just a synchronization point for boot
ordering and have no "activating" or Exec= themselves.

Note that if systemctl start *would* wait for cloud-config.target to
be started, you would have a deadlock: the runcmd would wait forever
for cloud-config.target (as only then it could start your unit), but
the target won't be started until your runcmds (and other cloud-init
stuff) finishes -- but that cant't happen because you wait.

So in summary: Such a command can always only ever be a no-op or a
deadlock with a target. If you do need such a case, then you can run
"systemctl start --no-block" to just queue the start of a unit without
waiting for it to be started; this would break the deadlock in the
correct way.

> 3 - The recreate steps for this are:
> - Bootstrap local env on wily  (this won't create a container)

Oh, it won't? I thought it would already (similar to the machine-0 on
the cloud). For trusty I did get a "juju-trusty-lxc-template"
container which I thought was the result of bootstrap; but apparently
it just appeared too late and it actually was the result of "machine
add" then?

> - Deploy using the command: `juju deploy wily/ubuntu`.  It may take
some time before you see the container started as juju will download the
image before starting the template container.

Ah, this command works, thanks. I didn't reproduce the bug yet with my
first run, but I do see that temp reboot unit in action. To avoid
doubt, this happens while building juju-wily-lxc-template, *not* while
cloning it to ubuntu-local-machine-1, right?

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1509747

Title:
  Intermittent lxc failures on wily, juju-template-restart.service race
  condition

Status in juju-core:
  Confirmed
Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  Frequently, when creating an lxc container on wily (either through
  --to lxc:#, or using the local provider on wily), the template never
  stops and errors out here:

  [ 2300.885573] cloud-init[2758]: Cloud-init v. 0.7.7 running 'modules:final' at Sun, 25 Oct 2015 00:28:57 +0000. Up 182 seconds.
  [ 2300.886101] cloud-init[2758]: Cloud-init v. 0.7.7 finished at Sun, 25 Oct 2015 00:29:03 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 189 seconds
  [  OK  ] Started Execute cloud user/final scripts.
  [  OK  ] Reached target Multi-User System.
  [  OK  ] Reached target Graphical Interface.
           Starting Update UTMP about System Runlevel Changes...
  [  OK  ] Started /dev/initctl Compatibility Daemon.
  [FAILED] Failed to start Update UTMP about System Runlevel Changes.
  See 'systemctl status systemd-update-utmp-runlevel.service' for details.

  Attaching to the container and running the above command yields:

  ubuntu@cherylj-wily-local-lxc:~$ sudo lxc-attach --name juju-wily-lxc-template
  root@juju-wily-lxc-template:~# systemctl status systemd-update-utmp-runlevel.service
  ● systemd-update-utmp-runlevel.service - Update UTMP about System Runlevel Changes
     Loaded: loaded (/lib/systemd/system/systemd-update-utmp-runlevel.service; static; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2015-10-25 00:30:29 UTC; 2h 23min ago
       Docs: man:systemd-update-utmp.service(8)
             man:utmp(5)
    Process: 3963 ExecStart=/lib/systemd/systemd-update-utmp runlevel (code=exited, status=1/FAILURE)
   Main PID: 3963 (code=exited, status=1/FAILURE)

  Oct 25 00:29:46 juju-wily-lxc-template systemd[1]: Starting Update UTMP about System Runlevel Changes...
  Oct 25 00:30:29 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Main process exited, code=exited, status=1/FAILURE
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: Failed to start Update UTMP about System Runlevel Changes.
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Unit entered failed state.
  Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Failed with result 'exit-code'.

  
  I have seen this on ec2 and in canonistack.  The canonistack machine is available for further debugging.  Ping me for access.

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1509747/+subscriptions
References

[Bug 1509747] Re: Intermittent lxc failures on wily, juju-template-restart.service race condition
From: Cheryl Jennings, 2015-10-28