← Back to team overview

touch-packages team mailing list archive

[Bug 1514609] [NEW] Deserialising a job with the attribute "kill_timer" and "kill_process"="PROCESS_MAIN" results in abort

 

Public bug reported:

Upstart sometimes aborts on a stateful re-execution
triggered by "telinit u":

job.c:1977: Assertion failed in job_deserialise: job->kill_process
Caught abort, core dumped
init:job.c:1977: Assertion failed in job_deserialise: job->kill_process
[   69.668199] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

The attached file (sessions.json) is a salvaged dump of the Upstart state
that triggers the assertion failure; the problem evidently occurs while
processing the following piece:

[...]
          "name": "",
          "path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_",
          "goal": "JOB_STOP",
          "state": "JOB_KILLED",
[...]
          "kill_timer": {
            "timeout": 180,
            "due": 245
          },
          "kill_process": "PROCESS_MAIN",
[...]

The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04)
and is caused by the following code:

[init/job.c]

1954         json_kill_timer = json_object_object_get (json, "kill_timer");
1955 
1956         if (json_kill_timer) {
[...]
1973                 nih_local NihTimer *kill_timer = job_deserialise_kill_timer (json_kill_timer);
1974                 if (! kill_timer)
1975                         goto error;
1976 
1977                 nih_assert (job->kill_process);
1978                 job_process_set_kill_timer (job, job->kill_process,
1979                                             kill_timer->timeout);
1980                 job_process_adj_kill_timer (job, kill_timer->due);
1981         }

The assertion (job->kill_process) fails in the routine job_deserialise()
if the deserialised job has an associated kill timer and
the field kill_process == PROCESS_MAIN.

It seems the issue might still affect the trunk as well:
there're no similar checks in the routines job_process_kill()
and job_serialise(), so if the Upstart state is serialised
after the job_process_kill() but before the job kill timer fires
then the resulting state representation cannot be restored
since job->kill_timer is non-NULL and job->kill_process
isn't PROCESS_INVALID that is a result of job_process_set_kill_timer()
operation.

Probably the assertion in question should read

 (job->kill_process != PROCESS_INVALID)

if job_process_set_kill_timer() is assumed to operate correctly.

Unfortunately the issue is extremely difficult to reproduce
so additional diagnostics might be difficult to perform
and it might kill the race that triggers the issue.

** Affects: upstart (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "Serialised Upstart state dump"
   https://bugs.launchpad.net/bugs/1514609/+attachment/4515781/+files/sessions.json

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1514609

Title:
  Deserialising a job with the attribute "kill_timer" and
  "kill_process"="PROCESS_MAIN" results in abort

Status in upstart package in Ubuntu:
  New

Bug description:
  Upstart sometimes aborts on a stateful re-execution
  triggered by "telinit u":

  job.c:1977: Assertion failed in job_deserialise: job->kill_process
  Caught abort, core dumped
  init:job.c:1977: Assertion failed in job_deserialise: job->kill_process
  [   69.668199] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  The attached file (sessions.json) is a salvaged dump of the Upstart state
  that triggers the assertion failure; the problem evidently occurs while
  processing the following piece:

  [...]
            "name": "",
            "path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_",
            "goal": "JOB_STOP",
            "state": "JOB_KILLED",
  [...]
            "kill_timer": {
              "timeout": 180,
              "due": 245
            },
            "kill_process": "PROCESS_MAIN",
  [...]

  The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04)
  and is caused by the following code:

  [init/job.c]

  1954         json_kill_timer = json_object_object_get (json, "kill_timer");
  1955 
  1956         if (json_kill_timer) {
  [...]
  1973                 nih_local NihTimer *kill_timer = job_deserialise_kill_timer (json_kill_timer);
  1974                 if (! kill_timer)
  1975                         goto error;
  1976 
  1977                 nih_assert (job->kill_process);
  1978                 job_process_set_kill_timer (job, job->kill_process,
  1979                                             kill_timer->timeout);
  1980                 job_process_adj_kill_timer (job, kill_timer->due);
  1981         }

  The assertion (job->kill_process) fails in the routine job_deserialise()
  if the deserialised job has an associated kill timer and
  the field kill_process == PROCESS_MAIN.

  It seems the issue might still affect the trunk as well:
  there're no similar checks in the routines job_process_kill()
  and job_serialise(), so if the Upstart state is serialised
  after the job_process_kill() but before the job kill timer fires
  then the resulting state representation cannot be restored
  since job->kill_timer is non-NULL and job->kill_process
  isn't PROCESS_INVALID that is a result of job_process_set_kill_timer()
  operation.

  Probably the assertion in question should read

   (job->kill_process != PROCESS_INVALID)

  if job_process_set_kill_timer() is assumed to operate correctly.

  Unfortunately the issue is extremely difficult to reproduce
  so additional diagnostics might be difficult to perform
  and it might kill the race that triggers the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1514609/+subscriptions