touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #116787
[Bug 1514609] [NEW] Deserialising a job with the attribute "kill_timer" and "kill_process"="PROCESS_MAIN" results in abort
Public bug reported:
Upstart sometimes aborts on a stateful re-execution
triggered by "telinit u":
job.c:1977: Assertion failed in job_deserialise: job->kill_process
Caught abort, core dumped
init:job.c:1977: Assertion failed in job_deserialise: job->kill_process
[ 69.668199] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600
The attached file (sessions.json) is a salvaged dump of the Upstart state
that triggers the assertion failure; the problem evidently occurs while
processing the following piece:
[...]
"name": "",
"path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_",
"goal": "JOB_STOP",
"state": "JOB_KILLED",
[...]
"kill_timer": {
"timeout": 180,
"due": 245
},
"kill_process": "PROCESS_MAIN",
[...]
The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04)
and is caused by the following code:
[init/job.c]
1954 json_kill_timer = json_object_object_get (json, "kill_timer");
1955
1956 if (json_kill_timer) {
[...]
1973 nih_local NihTimer *kill_timer = job_deserialise_kill_timer (json_kill_timer);
1974 if (! kill_timer)
1975 goto error;
1976
1977 nih_assert (job->kill_process);
1978 job_process_set_kill_timer (job, job->kill_process,
1979 kill_timer->timeout);
1980 job_process_adj_kill_timer (job, kill_timer->due);
1981 }
The assertion (job->kill_process) fails in the routine job_deserialise()
if the deserialised job has an associated kill timer and
the field kill_process == PROCESS_MAIN.
It seems the issue might still affect the trunk as well:
there're no similar checks in the routines job_process_kill()
and job_serialise(), so if the Upstart state is serialised
after the job_process_kill() but before the job kill timer fires
then the resulting state representation cannot be restored
since job->kill_timer is non-NULL and job->kill_process
isn't PROCESS_INVALID that is a result of job_process_set_kill_timer()
operation.
Probably the assertion in question should read
(job->kill_process != PROCESS_INVALID)
if job_process_set_kill_timer() is assumed to operate correctly.
Unfortunately the issue is extremely difficult to reproduce
so additional diagnostics might be difficult to perform
and it might kill the race that triggers the issue.
** Affects: upstart (Ubuntu)
Importance: Undecided
Status: New
** Attachment added: "Serialised Upstart state dump"
https://bugs.launchpad.net/bugs/1514609/+attachment/4515781/+files/sessions.json
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1514609
Title:
Deserialising a job with the attribute "kill_timer" and
"kill_process"="PROCESS_MAIN" results in abort
Status in upstart package in Ubuntu:
New
Bug description:
Upstart sometimes aborts on a stateful re-execution
triggered by "telinit u":
job.c:1977: Assertion failed in job_deserialise: job->kill_process
Caught abort, core dumped
init:job.c:1977: Assertion failed in job_deserialise: job->kill_process
[ 69.668199] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600
The attached file (sessions.json) is a salvaged dump of the Upstart state
that triggers the assertion failure; the problem evidently occurs while
processing the following piece:
[...]
"name": "",
"path": "\/com\/ubuntu\/Upstart\/jobs\/ureadahead\/_",
"goal": "JOB_STOP",
"state": "JOB_KILLED",
[...]
"kill_timer": {
"timeout": 180,
"due": 245
},
"kill_process": "PROCESS_MAIN",
[...]
The issue has been caught in the package ubuntu-1.12.1 (Ubuntu 14.04)
and is caused by the following code:
[init/job.c]
1954 json_kill_timer = json_object_object_get (json, "kill_timer");
1955
1956 if (json_kill_timer) {
[...]
1973 nih_local NihTimer *kill_timer = job_deserialise_kill_timer (json_kill_timer);
1974 if (! kill_timer)
1975 goto error;
1976
1977 nih_assert (job->kill_process);
1978 job_process_set_kill_timer (job, job->kill_process,
1979 kill_timer->timeout);
1980 job_process_adj_kill_timer (job, kill_timer->due);
1981 }
The assertion (job->kill_process) fails in the routine job_deserialise()
if the deserialised job has an associated kill timer and
the field kill_process == PROCESS_MAIN.
It seems the issue might still affect the trunk as well:
there're no similar checks in the routines job_process_kill()
and job_serialise(), so if the Upstart state is serialised
after the job_process_kill() but before the job kill timer fires
then the resulting state representation cannot be restored
since job->kill_timer is non-NULL and job->kill_process
isn't PROCESS_INVALID that is a result of job_process_set_kill_timer()
operation.
Probably the assertion in question should read
(job->kill_process != PROCESS_INVALID)
if job_process_set_kill_timer() is assumed to operate correctly.
Unfortunately the issue is extremely difficult to reproduce
so additional diagnostics might be difficult to perform
and it might kill the race that triggers the issue.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1514609/+subscriptions