touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #114235
Re: [Bug 1509747] Re: Intermittent lxc failures on wily, juju-template-restart.service race condition
Cheryl Jennings [2015-10-28 13:59 -0000]:
> 2 - However, I thought adding in After=cloud-config.target would ensure
> that it wouldn't start until after cloud-init completes?
Not really, I'm afraid. TL;DR: this doesn't work that way, is prone to
deadlocks, and rather hard to understand.
Long-winded explanation in case you care:
If some target (or other unit) pulls in a service via Wants/Requires
during boot, systemd has control over when it starts which unit, and
the After= will order the units accordingly. But if you manually start
that unit, *you* determine the startup time.
The After= for a target without a Wants=/Requires= is by and large
meaningless with manual start requests: If cloud-config.target is not
started, then nothing has activated it yet, and After= is a noop (as
it's only ordering, not dependency). OTOH, if the target is already
started, then the After= is immediately satisfied.
"systemctl start foo" actually *does* wait in the case that
foo.service has an After=bar.service, and bar.service is in state
"activating" (i. e. already scheduled to be started). But targets
don't have that state, they are just a synchronization point for boot
ordering and have no "activating" or Exec= themselves.
Note that if systemctl start *would* wait for cloud-config.target to
be started, you would have a deadlock: the runcmd would wait forever
for cloud-config.target (as only then it could start your unit), but
the target won't be started until your runcmds (and other cloud-init
stuff) finishes -- but that cant't happen because you wait.
So in summary: Such a command can always only ever be a no-op or a
deadlock with a target. If you do need such a case, then you can run
"systemctl start --no-block" to just queue the start of a unit without
waiting for it to be started; this would break the deadlock in the
correct way.
> 3 - The recreate steps for this are:
> - Bootstrap local env on wily (this won't create a container)
Oh, it won't? I thought it would already (similar to the machine-0 on
the cloud). For trusty I did get a "juju-trusty-lxc-template"
container which I thought was the result of bootstrap; but apparently
it just appeared too late and it actually was the result of "machine
add" then?
> - Deploy using the command: `juju deploy wily/ubuntu`. It may take
some time before you see the container started as juju will download the
image before starting the template container.
Ah, this command works, thanks. I didn't reproduce the bug yet with my
first run, but I do see that temp reboot unit in action. To avoid
doubt, this happens while building juju-wily-lxc-template, *not* while
cloning it to ubuntu-local-machine-1, right?
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1509747
Title:
Intermittent lxc failures on wily, juju-template-restart.service race
condition
Status in juju-core:
Confirmed
Status in systemd package in Ubuntu:
Incomplete
Bug description:
Frequently, when creating an lxc container on wily (either through
--to lxc:#, or using the local provider on wily), the template never
stops and errors out here:
[ 2300.885573] cloud-init[2758]: Cloud-init v. 0.7.7 running 'modules:final' at Sun, 25 Oct 2015 00:28:57 +0000. Up 182 seconds.
[ 2300.886101] cloud-init[2758]: Cloud-init v. 0.7.7 finished at Sun, 25 Oct 2015 00:29:03 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net]. Up 189 seconds
[ OK ] Started Execute cloud user/final scripts.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Started /dev/initctl Compatibility Daemon.
[FAILED] Failed to start Update UTMP about System Runlevel Changes.
See 'systemctl status systemd-update-utmp-runlevel.service' for details.
Attaching to the container and running the above command yields:
ubuntu@cherylj-wily-local-lxc:~$ sudo lxc-attach --name juju-wily-lxc-template
root@juju-wily-lxc-template:~# systemctl status systemd-update-utmp-runlevel.service
● systemd-update-utmp-runlevel.service - Update UTMP about System Runlevel Changes
Loaded: loaded (/lib/systemd/system/systemd-update-utmp-runlevel.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2015-10-25 00:30:29 UTC; 2h 23min ago
Docs: man:systemd-update-utmp.service(8)
man:utmp(5)
Process: 3963 ExecStart=/lib/systemd/systemd-update-utmp runlevel (code=exited, status=1/FAILURE)
Main PID: 3963 (code=exited, status=1/FAILURE)
Oct 25 00:29:46 juju-wily-lxc-template systemd[1]: Starting Update UTMP about System Runlevel Changes...
Oct 25 00:30:29 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Main process exited, code=exited, status=1/FAILURE
Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: Failed to start Update UTMP about System Runlevel Changes.
Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Unit entered failed state.
Oct 25 00:30:30 juju-wily-lxc-template systemd[1]: systemd-update-utmp-runlevel.service: Failed with result 'exit-code'.
I have seen this on ec2 and in canonistack. The canonistack machine is available for further debugging. Ping me for access.
To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1509747/+subscriptions
References