← Back to team overview

linuxdcpp-team team mailing list archive

[Bug 1909861] Re: FileReader is not thread safe on Linux

 

https://sasha-f.medium.com/why-mmap-is-faster-than-system-calls-
24718e75ab37 profiles mmap()-based file reading versus what looks like
what DC++ calls readCached().

mmap() is significantly faster in those experiments, so while my
response here was to just remove the associated code, given DC++'s
target platforms and lack of testing on Linux, there's some potential
argument for getting mmap(), with all the multithreaded signaling
issues, to work in general. It doesn't compare
O_DIRECT/FILE_FLAG_NO_BUFFERING, which should avoid
copy_user_enhanced_fast_string, or its Windows equivalent, from
consuming most of the file-reading time.

So that's another dimension one might measure: CPU usage for reading
during hashing. It might not matter enough to decide things, since the
TTH probably overwhelms the file read per se, but it's an mmap() use
case.

-- 
You received this bug notification because you are a member of
Dcplusplus-team, which is subscribed to DC++.
https://bugs.launchpad.net/bugs/1909861

Title:
  FileReader is not thread safe on Linux

Status in DC++:
  Fix Committed

Bug description:
  FileReader::readMapped currently modifies the global SIGBUS handler in
  order to catch read errors:

  https://sourceforge.net/p/dcplusplus/code/ci/default/tree/dcpp/FileReader.cpp#l289

  Since the function can be called concurrently from different threads
  (currently hashing/queue recheck/sfv check in DC++) and each of them
  sets and resets the SIGBUS handler, there's a high risk that the
  application will crash in case of read errors as they aren't being
  handler properly.

  More information about the caveats:
  https://www.sublimetext.com/blog/articles/use-mmap-with-care

  These issues are much more likely to happen with AirDC++ as it uses
  multiple threads for hashing. Read errors caused rather nasty crashes
  with corrupted stack traces for one user, but luckily he was able to
  catch the SIGBUS signal with gdb.

  I didn't even spend time in trying to figure out how to make the
  mapped reads work properly, as based on my testing the basic
  FileReader::readCached function is noticeably faster:

  readMapped: 671 files (21.70 GiB) in 9 directories have been hashed in 4 minutes 21 seconds (84.87 MiB/s)
  readCached: 671 files (21.70 GiB) in 9 directories have been hashed in 3 minutes 58 seconds (93.08 MiB/s)

  FileReader::readMapped is now disabled in AirDC++, as I can't see any
  benefits from using it. The included setjmp.h header is even causing
  issues when using clang for compiling on Linux:
  https://bugs.gentoo.org/731676

To manage notifications about this bug go to:
https://bugs.launchpad.net/dcplusplus/+bug/1909861/+subscriptions


References