group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #27143
[Bug 997217] Re: salsauthd maxes cpu
This bug was fixed in the package cyrus-sasl2 - 2.1.26.dfsg1-14ubuntu0.1
---------------
cyrus-sasl2 (2.1.26.dfsg1-14ubuntu0.1) xenial; urgency=medium
* d/p/dont_hang_when_imap_closes.patch: Don't hang when IMAP
server closes connection. (LP: #997217)
-- Andreas Hasenack <andreas@xxxxxxxxxxxxx> Wed, 24 Oct 2018 14:51:00
-0300
** Changed in: cyrus-sasl2 (Ubuntu Xenial)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/997217
Title:
salsauthd maxes cpu
Status in cyrus-sasl2 package in Ubuntu:
Fix Released
Status in cyrus-sasl2 source package in Precise:
Won't Fix
Status in cyrus-sasl2 source package in Trusty:
Invalid
Status in cyrus-sasl2 source package in Xenial:
Fix Released
Bug description:
[Impact]
The rimap authentication mechanism in saslauthd can hit a condition
where it will start spinning and using all available CPU. This
condition can be easily encountered when an authentication is
happening and the imap service is being restarted.
Furthermore, the saslauthd child process that picked up that
authentication request and that is spinning now won't be reaped nor
can it service further requests. If all children are left in this
state, the authentication service as a whole won't be working anymore.
[Test Case]
This test can be performed in a LXD or VM.
* install the needed packages. mail-stack-delivery is used to have an
imap server available on localhost that needs no further
configuration. Accept the defaults for all debconf prompts:
sudo apt update
sudo apt install sasl2-bin mail-stack-delivery
* set the password "ubuntu" for the ubuntu user
echo ubuntu:ubuntu | sudo chpasswd
* start saslauthd like this, with just one child:
sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1
* restart dovecot
sudo service dovecot restart
* test saslauthd authentication:
$ sudo testsaslauthd -u ubuntu -p ubuntu
0: OK "Success."
* Now let's break it. In one terminal watch the output of top:
top
* in another terminal, run the following:
sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop
* observe in the "top" terminal that saslauthd is consuming a lot of
cpu. If that's not happening, try starting dovecot again and adjusting
the sleep value in the previous test command, but 1s was enough in all
my runs.
* start dovecot and repeat the authentication request. Since the only saslauthd child is now spinning, this will block:
sudo service dovecot start
$ sudo testsaslauthd -u ubuntu -p ubuntu
<blocks>
[Regression Potential]
This fix relies on read(2) returning zero bytes when the connection is dropped, and that is clearly documented in its manpage:
"On success, the number of bytes read is returned (zero indicates
end of file),"
The select manpage also documents such a case being a valid case to
indicate that a socket is ready to be read from, and that it won't
block:
"The file descriptors listed in readfds will be watched to see if
characters become available for reading (more precisely, to see if a
read will not block; in particular, a file descriptor is also
ready on end-of-file)"
This patch is what was used upstream, and is also present in bionic.
I can't think of regressions specific to this change, other than the
usual risk of rebuilding a widely used library (sasl2) in an
environment different from the one from when xenial was released,
i.e., different libraries available, maybe different system-wide build
options, etc.
[Other Info]
Trusty is still not accounting for read() returning zero being an end-of-file case, but the loop there has a counter and it eventually exits, not leading to a perpetual spin or high cpu usage (see comment #17 for a brief history on how this fix was dropped in the xenial package).
The fix is simple and could be applied there as well, if the SRU team
prefers.
[Original Description]
sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a
bug that causes heavy cpu utilization, impacting normal operation of
one of our mail servers following an upgrade to Ubuntu 12.04.
We are running the daemon with the following options:
/usr/sbin/saslauthd -a rimap -O our.imap.server -r -m
/var/spool/postfix/var/run/saslauthd -n 5
We noticed that users were unable to send mail and that the saslauthd
processes were using approximately 100% of each cpu core. An strace of
one of the runaway process showed that it was stuck in the following
behaviour:
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
.....
with further inspection showing that the file descriptor in question
was a socket connected to our imap server in CLOSE_WAIT.
Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin,
we came across the following code, repeated in two locations:
while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {
if ( FD_ISSET(s, &perm) ) {
ret = read(s, rbuf+rc, sizeof(rbuf)-rc);
if ( ret<0 ) {
rc = ret;
break;
} else {
rc += ret;
}
}
}
It looks like this loop is expected to run until a read error is
encountered or the timeout of 1 second is reached. There is no test to
check that 0 bytes were read, indicating that the connection was
closed by the remote peer. Since select() will immediately return the
size of the set of the partially closed descriptor (1, which is >0),
and calls to read() will always yield 0 bytes, there's the potential
for execution to get stuck in this non blocking loop and I'm presuming
that that's what's happening here.
We've not performed any further analysis to prove that this is really
what's happening but if my intuition is correct then our IMAP server
(an nginx imap proxy) most liklely closes the connection at an
unexpected time under as yet undetermined conditions.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cyrus-sasl2/+bug/997217/+subscriptions