yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #82988
[Bug 1124384] Re: Configuration reload clears event that others jobs may be waiting on
** No longer affects: cloud-init
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1124384
Title:
Configuration reload clears event that others jobs may be waiting on
Status in cloud-init package in Ubuntu:
Fix Released
Status in upstart package in Ubuntu:
Fix Released
Status in cloud-init source package in Raring:
Won't Fix
Status in upstart source package in Raring:
Won't Fix
Status in cloud-init source package in Saucy:
Fix Released
Status in upstart source package in Saucy:
Fix Released
Bug description:
[Impact]
* The status of blocked events was not preserved, when upstart
performed stateful re-execution or configuration reload. Thus jobs
with complex start/stop conditions (one or more "and" clauses),
with at least one event emitted before re-exec/reload, may not
execute when remaining conditions are finally satisfied.
* Above may prevent certain system to function correctly, and in the
cases similar to cloud-init instances may even cause failure to
boot.
* The fix includes incriminating reference counts on blocked events,
whilst job configuration is reloaded and fully serialising all
upstart objects, including blocked events, during stateful
re-execution.
* Since previous versions of upstart, do not serialise blocked events
the upgrade needs special casing. On upgrade upstart will perform
stateful re-execution, unless runlevel 2 has been already
reached. Instead upstart, will re-executed at system shutdown. This
should allow upgrading upstart during early boot of cloud-init
instances. But do note, that old instance of upstart will still be
running as init and the running machine will still be affected by
the bug described here.
[Test Case]
* Create a sample job /etc/init/foo.conf similar to this:
start on (event1 and event2)
task
exec date
* Test reload configuration works correctly:
$ sudo status foo
foo stop/waiting
$ sudo initctl emit -n event1
$ sudo initctl reload-configuration
$ sudo initctl emit -n event2
$ sudo tail /var/log/upstart/foo.log
At the end one should see a timestamp appended in the foo.log.
* Test stateful re-exec works correctly:
$ sudo initctl emit -n event1
$ sudo telinit u
$ sudo initctl emit -n event2
$ sudo tail /var/log/upstart/foo.log
* Start an ubuntu-cloud image (in lxc or cloud) with apt-get update &
upgrade enabled going from upstart version without this fix included
to a one that does have it. Cloud-final should finish and boot-
finished under /var/lib/cloud/instances/*/boot-finished. Please note
this test should be performed in isolation from dbus security update
that does partial stateful re-exec at the moment.
[Regression Potential]
* The bug fix introduced here is fairly large (approx 1.5k line diff)
but comes with comprehensive set of test-suites to verify the two
bug fixes as well as all possible combinations of stateful
re-execution serialisation formats. Majority of code changes are
for additional [de]serialisation, which follow existing well tested
code pattern. And changes to reference counting have been carefully
reviewed and tested by multiple developers.
* While the bug report indicates a severe problem, it was not noticed
until recently, as the system must be under heavy race conditions
to become affected by this bug. Since systems reaching stable state,
with little or no blocked events left, would not normally be
affected.
* Overall regression potential is deemed low.
[Original Bug Report]
Under bug 1080841 we made cloud-init invoke 'initctl reload-
configuration' after it wrote a upstart job. This was necessary
because inotify is not supported on all filesystems (overlayfs being
the one of most current interst).
This seems to be causing upstart some pain, and resulting in cloud-
final (and 'rc') not being run.
Easy user-data to reproduce the problem is:
#cloud-config-archive
- content: |
#cloud-boothook
#!/bin/sh
touch /run/cloud-init-upstart-reload # hack, see trunk commit 783
- content: |
#!/bin/sh
echo "==== $(date -R): user-script run ===" | tee /run/user-script.log
- content: |
#upstart-job
description "a test upstart job"
start on stopped rc RUNLEVEL=[2345]
console output
task
script
echo "==== $(date -R): upstart job run ===" | tee /run/upstart-job.log
end script
You should (and do on quantal) end up with 2 files written to /run.
I've verified that the same behavior is true on quantal. If you
change cloud-init to notify upstart about a job immediately after it
writes it, then quantal's upstart gets confused also.
Related bugs:
* bug 1080841: should reload configuration if an upstart job is added
* bug 1103881: cloud-final is never executed if upstart is upgraded during initialization of the image
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1124384/+subscriptions