linuxdcpp-team team mailing list archive
-
linuxdcpp-team team
-
Mailing list archive
-
Message #09040
[Bug 1909861] Re: FileReader is not thread safe on Linux
https://sourceforge.net/p/dcplusplus/code/ci/a484cbace55f671548b370135ac9fa42d5ddfc02/
implements this. As discussed in this bug/issue, it's mostly a no-op for
supported Win32-based DC++ platforms, but hopefully the buffer size
increase helps hashing speeds in general in a way consistent with your
benchmarks for OVERLAPPED. There's a plausible mechanism, certainly.
Do you have any idea why in your Ubuntu 20.04 setup, 4MB buffers
performed notably worse than 1MB buffers? It didn't seem to matter, as
regards that, whether you used POSIX_FADV_DONTNEED or O_DIRECT, so
presumably it's more a function of the buffer size itself.
The 256 kB vs 1MB/4MB benchmarks for macOS 11 are fairly dramatic. The
cached/F_NOCACHE distinction, at least, comes across quite clearly in
them. With 1MB, F_NOCACHE is at least a reasonable thing to do, even if
it's not optimal for AirDC++'s hashing speed itself.
The other use case that F_NOCACHE, O_DIRECT, POSIX_FADV_DONTNEED, etc
might hinder is multiple simultaneous uploads of the same file, if
AirDC++ uses the same FileReader member functions as for hashing. I'm
not sure how likely/common/worth optimizing for that scenario is, but
there's something to be said for just letting the OS find a decent
compromise, rather than attempt to address all these edge cases in
AirDC++ or DC++.
For example,
https://www.realworldtech.com/forum/?threadid=197081&curpostid=197969
from Linus Torvalds, discussing POSIX_FADV_DONTNEED, notes that "Linux
should already handle the case of "truly only touched once" case fairly
well. Those pages should never end up on the active list etc, and should
be cheap and easy to re-use. IOW, it's not typically one of the hard
cases for the VM." It's possible that it just doesn't need the hints.
If one does want to use POSIX_FADV_DONTNEED, it works on Linux, FreeBSD,
and NetBSD. It compiles, but is a no-op, on DragonFlyBSD:
https://github.com/DragonFlyBSD/DragonFlyBSD/blob/5a4a4dbaed373c7df0dbbd0638ce487111b28c4e/sys/sys/fcntl.h#L314-L324.
OpenBSD doesn't support it at all, that I can see:
https://github.com/openbsd/src/search?q=POSIX_FADV_DONTNEED. The moral
equivalent in practical usage that I've found on macOS is F_NOCACHE,
despite note being semantically identical.
So for DC++ purposes, for this bug, given that DC++ code isn't set up to
be easily tested with non-Win32 environments (which can include WINE),
and there'd need to be some additional per-OS build testing to do
anything fancier on the hinting side for POSIXy systems, just dropping
FileReader::readMapped() seemed the most prudent.
** Changed in: dcplusplus
Status: New => Fix Committed
--
You received this bug notification because you are a member of
Dcplusplus-team, which is subscribed to DC++.
https://bugs.launchpad.net/bugs/1909861
Title:
FileReader is not thread safe on Linux
Status in DC++:
Fix Committed
Bug description:
FileReader::readMapped currently modifies the global SIGBUS handler in
order to catch read errors:
https://sourceforge.net/p/dcplusplus/code/ci/default/tree/dcpp/FileReader.cpp#l289
Since the function can be called concurrently from different threads
(currently hashing/queue recheck/sfv check in DC++) and each of them
sets and resets the SIGBUS handler, there's a high risk that the
application will crash in case of read errors as they aren't being
handler properly.
More information about the caveats:
https://www.sublimetext.com/blog/articles/use-mmap-with-care
These issues are much more likely to happen with AirDC++ as it uses
multiple threads for hashing. Read errors caused rather nasty crashes
with corrupted stack traces for one user, but luckily he was able to
catch the SIGBUS signal with gdb.
I didn't even spend time in trying to figure out how to make the
mapped reads work properly, as based on my testing the basic
FileReader::readCached function is noticeably faster:
readMapped: 671 files (21.70 GiB) in 9 directories have been hashed in 4 minutes 21 seconds (84.87 MiB/s)
readCached: 671 files (21.70 GiB) in 9 directories have been hashed in 3 minutes 58 seconds (93.08 MiB/s)
FileReader::readMapped is now disabled in AirDC++, as I can't see any
benefits from using it. The included setjmp.h header is even causing
issues when using clang for compiling on Linux:
https://bugs.gentoo.org/731676
To manage notifications about this bug go to:
https://bugs.launchpad.net/dcplusplus/+bug/1909861/+subscriptions
References