← Back to team overview

touch-packages team mailing list archive

[Bug 1188642] Re: Init aborts with the message: init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io

 

Encounter same issue in 14.04

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1188642

Title:
  Init aborts with the message: init: log.c:813: Assertion failed in
  log_clear_unflushed: ! log->io

Status in Upstart:
  Fix Released
Status in upstart package in Ubuntu:
  Fix Released

Bug description:
  init (in Ubuntu 13.04) accidentally aborts with the following messages
  in the kernel log:

  [    4.981969] init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io
  [    4.991031] init: Caught abort, core dumped
  [    4.995571] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  I've figured out that the probable cause is a race between the line 699 of the file init/log.c of the function log_read_watch() that sets the field
  log->remote_closed to 1 (the caller clears log->io) and the routine log_clear_unflushed() that causes the abort when log->remote_closed == 1 but log->io != NULL: log_clear_unflushed() may preempt log_read_watch() when
  log->remote_closed is set to 1 but before log->io is cleared for
  the same log.

  I applied the following patch to prove this assertion:

  @@ -264,6 +266,8 @@
       */
      log->io->error_handler = NULL;

  +			nih_fatal("log %p: clearing log->io\n", log);
  +
      nih_free (log->io);
      log->io = NULL;
     }
  @@ -384,6 +388,8 @@

    nih_free (err);

  +	nih_fatal("log %p: clearing log->io\n", log);
  +
    /* Ensure the NihIo is closed */
    nih_free (log->io);
    log->io = NULL;
  @@ -696,7 +702,11 @@
       *
       */
      if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
  +			{
  +				nih_fatal("log %p: setting log->remote_closed\n", log);
       log->remote_closed = 1;
  +			}

      close (log->fd);
      log->fd = -1;
  @@ -806,6 +816,8 @@
     nih_assert (log->open_errno);

     if (log->remote_closed) {
  +			nih_fatal("log %p: checking log->io\n", log);
  +
      /* Parent job has ended and unflushed data
       * exists.
       */

  I got the  following kernel log before the abort:

  [    4.635146] init: log 0xb882a660: clearing log->io
  [    4.640487] init: log 0xb88035e0: setting log->remote_closed
  [    4.654816] init: log 0xb882ce88: clearing log->io
  [    4.655263] init: log 0xb882a1d0: setting log->remote_closed
  [    4.706758] init: log 0xb8831370: setting log->remote_closed
  [    6.714543] init: log 0xb882a600: clearing log->io
  [    6.733596] init: log 0xb882c6a0: clearing log->io
  [    6.746124] init: log 0xb882cbd8: setting log->remote_closed
  [    6.765075] init: log 0xb882b1d0: clearing log->io
  [    6.787772] init: log 0xb8842b50: clearing log->io
  [    6.801163] init: log 0xb882cb78: clearing log->io
  [    6.810717] init: log 0xb882df58: clearing log->io
  [    6.864869] init: log 0xb883e418: clearing log->io
  [    6.871033] init: log 0xb882d2a8: clearing log->io
  [    6.899113] init: log 0xb882d268: clearing log->io
  [    6.927477] init: log 0xb884eb28: clearing log->io
  [    6.942432] init: log 0xb88519a8: clearing log->io
  [    6.997231] init: log 0xb8850c40: clearing log->io
  [    7.031685] init: log 0xb8850650: clearing log->io
  [    7.042562] init: log 0xb8856d10: clearing log->io
  [    7.042811] init: log 0xb8857cf8: clearing log->io
  [    7.063461] init: log 0xb885a850: setting log->remote_closed
  [    7.063996] init: log 0xb883e478: setting log->remote_closed
  [    7.095505] init: log 0xb8854960: setting log->remote_closed
  [    7.135581] init: log 0xb883e478: checking log->io
  [    7.135756] init: log.c:825: Assertion failed in log_clear_unflushed: ! log->io
  [    7.140537] init: Caught abort, core dumped
  [    7.143211] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  So the assertion has been proved for the log at 0xb883e478 which has
  the field remote_closed updated but the field io dirty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/upstart/+bug/1188642/+subscriptions