← Back to team overview

sts-sponsors team mailing list archive

[Bug 1876600] [NEW] cookie overruns can cause org.freedesktop.systemd1 dbus to hang

 

You have been subscribed to a public bug by Dan Streetman (ddstreet):

[Impact]
Long-running services overflow the sd_bus->cookie counter, causing further communication with org.freedesktop.systemd1 to stall.

[Description]
Systemd dbus messages include a "cookie" value to uniquely identify them in their bus context. This value is obtained from the bus header, and incremented for each exchanged message in the same bus object. For services that run for longer periods of time and keep communicating through dbus, it's possible to overflow the cookie value, causing further messages to the org.freedesktop.systemd1 dbus to fail. This can lead to these services becoming unresponsive, as they get stuck trying to communicate with invalid bus cookie values.

This issue has been fixed upstream by the commit below:
-  sd-bus: deal with cookie overruns (1f82f5bb4237)

$ git describe --contains 1f82f5bb4237
v242-rc1~228

$ rmadison systemd
 systemd | 229-4ubuntu4     | xenial          | source, ...
 systemd | 229-4ubuntu21.27 | xenial-security | source, ...
 systemd | 229-4ubuntu21.27 | xenial-updates  | source, ...
 systemd | 229-4ubuntu21.28 | xenial-proposed | source, ...
 systemd | 237-3ubuntu10    | bionic          | source, ...
 systemd | 237-3ubuntu10.38 | bionic-security | source, ...
 systemd | 237-3ubuntu10.39 | bionic-updates  | source, ...
 systemd | 237-3ubuntu10.40 | bionic-proposed | source, ... <----
 systemd | 242-7ubuntu3     | eoan            | source, ...

Releases starting with Eoan already have this fix.

[Test Case]
There doesn't seem to be an easy test case for this, as the cookie values start at zero and won't overflow until (1<<32). There have been reports from users hitting this on Kubernetes clusters continuously running for longer periods (~5 months).
Using GDB, we can construct an artificial test case to test the cookie overflow. The test case below performs the following steps:

1. Create a new system bus object through sd_bus_default_system()
2. Allocate and append a new method_call message to the bus
3. Send the message through sd_bus_call()
4. Handle the response message and free up the message objects

It's essentially the example code from the
sd_bus_message_new_method_call() manpage, with minor modifications: this
is done continuously, to keep incrementing the bus cookie value. We step
in with GDB when it reaches 0x10000, and set its value to 0xffffff00
which then causes the test program to fail shortly afterwards. An
example test run of an impacted system:

ubuntu@bionic:~$ gcc -Wall test.c -o cookie -lsystemd -g
ubuntu@bionic:~$ gdb --batch --command=test.gdb --args ./cookie
Breakpoint 1 at 0xe61: file test.c, line 38.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
(16s) cookie: 0x00010000        reply-cookie: 0x00010000

Breakpoint 1, print_unit_path (bus=0x555555757290) at test.c:38
38              r = sd_bus_message_new_method_call(bus, &m,
$1 = 0x10000
$2 = 0xffffff00
Call failed: Operation not supported
Sleeping and retrying...
Call failed: Invalid argument
Assertion 'm->n_ref > 0' failed at ../src/libsystemd/sd-bus/bus-message.c:934, function sd_bus_message_unref(). Aborting.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=0x6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

To compile and debug the test case above, libsystemd-dev and libsystemd0-dbgsym are required.
Both test.c and test.gdb source code are attached to this LP bug.

[Regression Potential]
This fix introduces some changes in the way cookie incrementation is handled. We now have a reduced number of available values, since the patch makes use of a high order bit to indicate whether we have overflowed or not. Potential issues could arise from two distinct messages repeating the cookie value, or from us not handling the cookie reuse properly. In practice, this shouldn't cause serious problems as most dbus messages should not stall long enough for a possible overlap in the 2^31 space. The patch has been present in other stable Ubuntu Series and upstream, and has been validated and tested through the systemd test suite and autopkgtests.

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Affects: systemd (Ubuntu Xenial)
     Importance: High
     Assignee: Heitor Alves de Siqueira (halves)
         Status: In Progress

** Affects: systemd (Ubuntu Bionic)
     Importance: High
     Assignee: Heitor Alves de Siqueira (halves)
         Status: In Progress


** Tags: sts sts-sponsor-ddstreet
-- 
cookie overruns can cause org.freedesktop.systemd1 dbus to hang
https://bugs.launchpad.net/bugs/1876600
You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report.