sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #05028
[Bug 1999816] Re: Failure to get free disk space breaks "rabbitmqctl status" command
> So, testing for an :unknown value has no effect. I could not find the
the exact code that explain this difference.
You mean by keeping the upstream 3.8.x patch which uses "unknown",
rabbitmqctl status would still crash in the end?
Here is the current chain of calls as I see it, please correct me if I'm wrong:
Without the fix:
- app APP1 calls "rabbitmqctl status", to get status
- rabbitmqctl status triggers some events that eventually call the `df` tool, which hangs/fails/whatever, rabbitmqctl gets something that is not a number, crashes
- APP1 notices that rabbitmqctl failed, reports that in some way, or even crashes itself
With the fix ("unknown"):
- app APP1 calls "rabbitmqctl status", to get status
- rabbitmqctl goes all the way down to getting df output, which fails in the same way, reports unknown, but now rabbitmqctl instead of crashing, just propagates that "unknown" value as the disk space, and does not crash
- APP1 gets status output, tries to check disk space, and now:
- maybe it knows how to handle the fact that "unknown" is not a number, and behaves well
- maybe it tries to parse "unknown" or "undefined" as a number, and crashes
- maybe it tries to parse "unknown", but gets "undefined" instead, and crashes
I understand the fix for rabbitmqctl status not crashing, but that just
makes it propagate the value that originally made it crash, to its
caller (and there isn't really much else it can do). Do we know of any
APP1 like in the above example? Is that something that we could test? Or
should we wait and see if now something else (APP1) starts crashing, and
then fix that, and so on?
--
You received this bug notification because you are a member of SE SRU
("STS") Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1999816
Title:
Failure to get free disk space breaks "rabbitmqctl status" command
Status in rabbitmq-server package in Ubuntu:
Fix Released
Status in rabbitmq-server source package in Focal:
In Progress
Status in rabbitmq-server source package in Jammy:
In Progress
Status in rabbitmq-server source package in Kinetic:
In Progress
Bug description:
[Impact]
When for some reason the df command fails to get the disk free space
(for example timeout on a heavily loaded system) the result is a
harcoded value of "unknown". As this is not a valid number this
generates arithmetic errors when the "rabbitmqctl status" command is
run and tries to divide that value to convert it to another unit.
This has been fixed upstream here:
https://github.com/rabbitmq/rabbitmq-server/pull/4897
[Test Plan]
The df command can be linked to another file that just waits for a few
minutes to force a timeout for example: [detailed steps in comment
#5].
#!/bin/bash
sleep 5m
After the timeout occurs the "rabbitmqctl status" returns an error
with the unpatched version. After the patch it shows all the
information and displays unknown in the free space line.
[Where problems could occur]
The patch just changes the display of information, it should not break
anything in the core operations of the package
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1999816/+subscriptions