yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1124384] Re: Configuration reload clears event that others jobs may be waiting on

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Rolf Leggewie <1124384@xxxxxxxxxxxxxxxxxx>
Date: Fri, 05 Dec 2014 06:08:49 -0000
Reply-to: Bug 1124384 <1124384@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
raring has seen the end of its life and is no longer receiving any
updates. Marking the raring task for this ticket as "Won't Fix".

** Changed in: cloud-init (Ubuntu Raring)
       Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1124384

Title:
  Configuration reload clears event that others jobs may be waiting on

Status in Init scripts for use on cloud images:
  Confirmed
Status in cloud-init package in Ubuntu:
  Fix Released
Status in upstart package in Ubuntu:
  Fix Released
Status in cloud-init source package in Raring:
  Won't Fix
Status in upstart source package in Raring:
  Won't Fix
Status in cloud-init source package in Saucy:
  Fix Released
Status in upstart source package in Saucy:
  Fix Released

Bug description:
  [Impact]

   * The status of blocked events was not preserved, when upstart
     performed stateful re-execution or configuration reload. Thus jobs
     with complex start/stop conditions (one or more "and" clauses),
     with at least one event emitted before re-exec/reload, may not
     execute when remaining conditions are finally satisfied.

   * Above may prevent certain system to function correctly, and in the
     cases similar to cloud-init instances may even cause failure to
     boot.

   * The fix includes incriminating reference counts on blocked events,
     whilst job configuration is reloaded and fully serialising all
     upstart objects, including blocked events, during stateful
     re-execution.

   * Since previous versions of upstart, do not serialise blocked events
     the upgrade needs special casing. On upgrade upstart will perform
     stateful re-execution, unless runlevel 2 has been already
     reached. Instead upstart, will re-executed at system shutdown. This
     should allow upgrading upstart during early boot of cloud-init
     instances. But do note, that old instance of upstart will still be
     running as init and the running machine will still be affected by
     the bug described here.

  [Test Case]

   * Create a sample job /etc/init/foo.conf similar to this:

  start on (event1 and event2)
  task
  exec date

   * Test reload configuration works correctly:

  $ sudo status foo
  foo stop/waiting
  $ sudo initctl emit -n event1
  $ sudo initctl reload-configuration
  $ sudo initctl emit -n event2
  $ sudo tail /var/log/upstart/foo.log

  At the end one should see a timestamp appended in the foo.log.

   * Test stateful re-exec works correctly:

  $ sudo initctl emit -n event1
  $ sudo telinit u
  $ sudo initctl emit -n event2
  $ sudo tail /var/log/upstart/foo.log

   * Start an ubuntu-cloud image (in lxc or cloud) with apt-get update &
  upgrade enabled going from upstart version without this fix included
  to a one that does have it. Cloud-final should finish and boot-
  finished under /var/lib/cloud/instances/*/boot-finished. Please note
  this test should be performed in isolation from dbus security update
  that does partial stateful re-exec at the moment.

  [Regression Potential]

   * The bug fix introduced here is fairly large (approx 1.5k line diff)
     but comes with comprehensive set of test-suites to verify the two
     bug fixes as well as all possible combinations of stateful
     re-execution serialisation formats. Majority of code changes are
     for additional [de]serialisation, which follow existing well tested
     code pattern. And changes to reference counting have been carefully
     reviewed and tested by multiple developers.

   * While the bug report indicates a severe problem, it was not noticed
     until recently, as the system must be under heavy race conditions
     to become affected by this bug. Since systems reaching stable state,
     with little or no blocked events left, would not normally be
     affected.

   * Overall regression potential is deemed low.

  [Original Bug Report]

  Under bug 1080841 we made cloud-init invoke 'initctl reload-
  configuration' after it wrote a upstart job.  This was necessary
  because inotify is not supported on all filesystems (overlayfs being
  the one of most current interst).

  This seems to be causing upstart some pain, and resulting in cloud-
  final (and 'rc') not being run.

  Easy user-data to reproduce the problem is:

  #cloud-config-archive
  - content: |
     #cloud-boothook
     #!/bin/sh
     touch /run/cloud-init-upstart-reload  # hack, see trunk commit 783
  - content: |
     #!/bin/sh
     echo "==== $(date -R): user-script run ===" | tee /run/user-script.log
  - content: |
     #upstart-job
     description "a test upstart job"
     start on stopped rc RUNLEVEL=[2345]
     console output
     task
     script
     echo "==== $(date -R): upstart job run ===" | tee /run/upstart-job.log
     end script

  You should (and do on quantal) end up with 2 files written to /run.

  I've verified that the same behavior is true on quantal.  If you
  change cloud-init to notify upstart about a job immediately after it
  writes it, then quantal's upstart gets confused also.

  Related bugs:
   * bug 1080841: should reload configuration if an upstart job is added
   * bug 1103881: cloud-final is never executed if upstart is upgraded during initialization of the image

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1124384/+subscriptions