linuxdcpp-team team mailing list archive
-
linuxdcpp-team team
-
Mailing list archive
-
Message #09032
[Bug 1909861] Re: FileReader is not thread safe on Linux
This patch is relatively aggressive in removing FileReader::readCached()
and all associated infrastructure entirely. I'm not yet certain that's
the best, but in the absence of enough ongoing development to support
something evidently flawed.
Since non-Windows doesn't support readDirect() to avoid polluting the
disk cache, this ensures hashing will probably evict disk cache on Linux
and other platforms. I'm not sure the benchmarks you've shown address
that side effect. The concern isn't slowing down AirDC++ or DC++, it's
slowing down everything else, as they have to repopulate disk cache.
To ameliorate this on Linux, one can use O_DIRECT
(https://linux.die.net/man/2/open says "Since Linux 2.4.10", which seems
sufficient). FreeBSD 4.x, released in 2000, introduced this flag, which
means DragonFly BSD would have inherited it.
https://man.netbsd.org/NetBSD-9.0/open.2 documents that NetBSD supports
it (with an alignment restriction)
https://stackoverflow.com/questions/55281353/use-of-undeclared-
identifier-o-direct suggests that this is trickier on macOS. Apparently
there's an F_NOCACHE flag which accomplishes a similar goal.
https://man.openbsd.org/amd64/open shows OpenBSD not supporting
O_DIRECT. It's not clear to me whether it supports any alternate
mechanism.
So for all the typical POSIXy systems except OpenBSD, the safer approach
looks like removing readMapped() and, in its place, using those per-OS
direct I/O hints. The attached patch doesn't do the latter part, but
only the former.
** Patch added: "rm_readmapped.patch"
https://bugs.launchpad.net/dcplusplus/+bug/1909861/+attachment/5448723/+files/rm_readmapped.patch
--
You received this bug notification because you are a member of
Dcplusplus-team, which is subscribed to DC++.
https://bugs.launchpad.net/bugs/1909861
Title:
FileReader is not thread safe on Linux
Status in DC++:
New
Bug description:
FileReader::readMapped currently modifies the global SIGBUS handler in
order to catch read errors:
https://sourceforge.net/p/dcplusplus/code/ci/default/tree/dcpp/FileReader.cpp#l289
Since the function can be called concurrently from different threads
(currently hashing/queue recheck/sfv check in DC++) and each of them
sets and resets the SIGBUS handler, there's a high risk that the
application will crash in case of read errors as they aren't being
handler properly.
More information about the caveats:
https://www.sublimetext.com/blog/articles/use-mmap-with-care
These issues are much more likely to happen with AirDC++ as it uses
multiple threads for hashing. Read errors caused rather nasty crashes
with corrupted stack traces for one user, but luckily he was able to
catch the SIGBUS signal with gdb.
I didn't even spend time in trying to figure out how to make the
mapped reads work properly, as based on my testing the basic
FileReader::readCached function is noticeably faster:
readMapped: 671 files (21.70 GiB) in 9 directories have been hashed in 4 minutes 21 seconds (84.87 MiB/s)
readCached: 671 files (21.70 GiB) in 9 directories have been hashed in 3 minutes 58 seconds (93.08 MiB/s)
FileReader::readMapped is now disabled in AirDC++, as I can't see any
benefits from using it. The included setjmp.h header is even causing
issues when using clang for compiling on Linux:
https://bugs.gentoo.org/731676
To manage notifications about this bug go to:
https://bugs.launchpad.net/dcplusplus/+bug/1909861/+subscriptions
References