← Back to team overview

kicad-developers team mailing list archive

Re: RICHIO performance - 3 to 30 times slower than std::ifstream

 

What about wxFFileInputStream instead of wxFileInputStream?


Am 17.02.2017 um 16:24 schrieb Wayne Stambaugh:
On 2/16/2017 8:59 PM, John Beard wrote:
Hi Wayne,

I added some new profiles for the INPUTSTREAM_LINE_READER.

The results are very surprising to me. In debug and release mode,
using INPUTSTREAM_LINE_READER with a wxInputFileStream is around 200
times (:-O) slower than a straight std::ifstream, taking over two
seconds to read a 6.5MB short-lined file that std::ifstream can do in
<10ms.

I wonder if there's something I've missed here, as I can't believe
it's truly that slow.
Ouch!  I wonder what wxFileInputStream is doing that there is this much
of a performance hit.

I've pushed the benchmark to Launchpad for those who are interested:

https://code.launchpad.net/~john-j-beard/kicad/+git/kicad/+ref/io_benchmark
Thanks for testing this.  Now we know to use wxInputStream objects
cautiously.

As for your note about having a generic stream version, yes, that's
more flexible and we should aim for that, if we were to provide a
std::istream LINE_READER. I just did ifstream as a test to keep things
clear and ensure a sensible comparison.
If you would make this change, I would commit your patch.  I think it's
a good thing to have for performance testing purposes and using the file
stream reader where it makes sense.

As I said, the current performance is "OK", and if we want to limit
line lengths, we probably can't get that for free, anyway.

I understand the desire to not read infinite lines, but at least in my
tests, the std:ifstream method, which has no limit for that, can deal
with a 1GB file of a single line in about 300ms. Obviously it's all in
disk cache, and you have to pay the allocation for it when reading
into the buffer.

All the existing LINE_READER explode with IO_ERROR on that file since
it's too long for them.

Cheers,

John

On Fri, Feb 17, 2017 at 5:56 AM, Wayne Stambaugh <stambaughw@xxxxxxxxx> wrote:
John,

It would have been nice if you would have benchmarked wxFileInputStream
as well.  There already is an INPUTSTREAM_LINE_READER object which takes
a pointer to wxInputStream object.  I'm curious how it stacks up against
the std::ifstream.  There are some interesting wxInputStream objects
that could prove useful.

I think ifstream wasn't used in case there are really long lines which
there can be if you have text objects with lots of long multiple line
strings in your files.  I'm ok with adding a LINE_READER the wraps
istream objects.  It's fairly trivial to change LINE_READER types.  It
might be a bit more flexible if you just provided an ISTREAM_LINE_READER
that take any istream derived object rather than write a separate
LINE_READER for each istream derivative.

Cheers,

Wayne

On 2/16/2017 8:43 AM, John Beard wrote:
Hi,

I was trying to profile the eeschema slow library loads, and I got a
bit distracted by RICHIO's FILE_LINE_READER.

Internally, it uses a very tight loop of reading single chars at a
time from a file descriptor, which looks inefficient. I wrote a
benchmarker to compare RICHIO against std::ifstream and a new
LINE_READER implementation, backed by std::ifstream. operf confirms
that most of the time in RICHIO burned in the ReadLine() function
itself.

The results were that RICHIO (in debug mode) is consistently 4-7 times
slower than using std::ifstream, when reading eeschema library text
files (so relatively short lines). Compiling the release version
improved RICHIOs speed more than std::ifstream's, but it is still
around 3 times slower than std::ifstream.

For files with 1k lines, the slowdown is about 30 times (!) in debug
and 14 times in release mode, so significantly worse. Few files read
line-wise by Kicad look like that, however.

Avoiding reconstructing the stream/LINE_READER each time doesn't have
much of an effect in any case.

Is there a particular reason why STL streams are not used in RICHIO?
The only thing I think the example ifstream implementation can't do is
catching over long lines, but that's only used in one place: the VRML
parser, which hardcodes an 8MB limit. ifstream could do this, but not
with the simple getline function.

This performance doesn't appear to be a major bottleneck for me, but
it does seem a shame to throw away (charitably) two thirds of file
read speeds (and uncharitably, up to 97% in odd cases) if there is no
particular reason to do so.

As an aside, RICHIO appears to allocate twice as many times as
std::ifstream when reading the same data, for the roughly the same
amount of memory in total.

Anyway, I thought I'd share this finding! Please find attached the
benchmark program, such as it is.

Cheers,

John



_______________________________________________
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp



Follow ups

References