← Back to team overview

sts-sponsors team mailing list archive

[Bug 1795658] [NEW] xenial systemd reports 'inactive' instead of 'failed' for service units that repeatedly failed to restart / failed permanently

 

You have been subscribed to a public bug by Mauricio Faria de Oliveira (mfo):

[Impact]

 * In case a service unit has repeatedly failed to restart, it should be
   reported as 'failed' permanently, but currently it's instead reported
   as 'inactive'.

 * System monitoring tools that evaluate the status of systemd service units
   and act upon it (for example: restart service, report permanent failure)
   are currently misled by information in 'systemctl status <unit>.service'.

 * System management tools based on such information may take wrong and/or
   sub-optimal actions in the managed systems regarding such service units.

 * This systemd patch [1] directly addresses this issue (see systemd github
   PR #3166 [2]), and its code is still effectice in upstream systemd today,
   without further fixes/changes (the only changes were in doc text and the
   busname files that were removed, but still without further fixes to this).

[Test Case]

 * This is copied from systemd PR #3166 [2].

 * This has been tested by a customer as well, and with its system monitoring
   and management solution, for interoperability verification.

    $ cat <<EOF | sudo tee /etc/systemd/system/fail-on-restart.service
    [Service]
    ExecStart=/bin/false
    Restart=always
    EOF

    $ sudo systemctl daemon-reload
    $ sudo systemctl start fail-on-restart

    Before) "Active: inactive (dead)"

    $ systemctl status -n0 fail-on-restart
    fail-on-restart.service
       Loaded: loaded (/etc/systemd/system/fail-on-restart.service; static; vendor preset: enabled)
       Active: inactive (dead)

    After) "Active: failed (Result: start-limit-hit)"
 
    $ systemctl status -n0 fail-on-restart
    fail-on-restart.service
       Loaded: loaded (/etc/systemd/system/fail-on-restart.service; static; vendor preset: enabled)
       Active: failed (Result: start-limit-hit) since Sat 2018-09-29 11:01:34 UTC; 4s ago
      Process: 7066 ExecStart=/bin/false (code=exited, status=1/FAILURE)
     Main PID: 7066 (code=exited, status=1/FAILURE)

[Regression Potential]

 * This code changes at which point the check for the number of (re)start
   attempts are made, so regressions to (re)start units are theoretically
   possible.

 * However, this code actually reverts a change that caused a regression,
   so it goes back to the code that was known to work correctly before ..

 * .. and it is still in this form in upstream systemd nowadays, 
   without further fixes/changes (see comment in the Impact section).

[Other Info]

 * Test package was built on Launchpad PPA for all architectures,
   with dependencies from Proposed enabled (more up-to-date for SRU).

 * The testsuite (in package build time; blocks the package build result)
   has identical results to that in buildlog of current xenial-updates.

    ============================================================================
    Testsuite summary for systemd 229
    ============================================================================
    # TOTAL: 128
    # PASS:  109
    # SKIP:  19
    # XFAIL: 0
    # FAIL:  0
    # XPASS: 0
    # ERROR: 0
    ============================================================================

[Links]
 
[1] https://github.com/systemd/systemd/commit/072993504e3e4206ae1019f5461a0372f7d82ddf
[2] https://github.com/systemd/systemd/issues/3166
[3] https://launchpad.net/~mfo/+archive/ubuntu/sf199312

** Affects: systemd (Ubuntu)
     Importance: Undecided
     Assignee: Mauricio Faria de Oliveira (mfo)
         Status: In Progress

-- 
xenial systemd reports 'inactive' instead of 'failed' for service units that repeatedly failed to restart / failed permanently
https://bugs.launchpad.net/bugs/1795658
You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report.