← Back to team overview

kicad-developers team mailing list archive

Re: RICHIO performance - 3 to 30 times slower than std::ifstream

 

On 2/16/2017 8:59 PM, John Beard wrote:
> Hi Wayne,
> 
> I added some new profiles for the INPUTSTREAM_LINE_READER.
> 
> The results are very surprising to me. In debug and release mode,
> using INPUTSTREAM_LINE_READER with a wxInputFileStream is around 200
> times (:-O) slower than a straight std::ifstream, taking over two
> seconds to read a 6.5MB short-lined file that std::ifstream can do in
> <10ms.
> 
> I wonder if there's something I've missed here, as I can't believe
> it's truly that slow.

Ouch!  I wonder what wxFileInputStream is doing that there is this much
of a performance hit.

> 
> I've pushed the benchmark to Launchpad for those who are interested:
> 
> https://code.launchpad.net/~john-j-beard/kicad/+git/kicad/+ref/io_benchmark

Thanks for testing this.  Now we know to use wxInputStream objects
cautiously.

> 
> As for your note about having a generic stream version, yes, that's
> more flexible and we should aim for that, if we were to provide a
> std::istream LINE_READER. I just did ifstream as a test to keep things
> clear and ensure a sensible comparison.

If you would make this change, I would commit your patch.  I think it's
a good thing to have for performance testing purposes and using the file
stream reader where it makes sense.

> 
> As I said, the current performance is "OK", and if we want to limit
> line lengths, we probably can't get that for free, anyway.
> 
> I understand the desire to not read infinite lines, but at least in my
> tests, the std:ifstream method, which has no limit for that, can deal
> with a 1GB file of a single line in about 300ms. Obviously it's all in
> disk cache, and you have to pay the allocation for it when reading
> into the buffer.
> 
> All the existing LINE_READER explode with IO_ERROR on that file since
> it's too long for them.
> 
> Cheers,
> 
> John
> 
> On Fri, Feb 17, 2017 at 5:56 AM, Wayne Stambaugh <stambaughw@xxxxxxxxx> wrote:
>> John,
>>
>> It would have been nice if you would have benchmarked wxFileInputStream
>> as well.  There already is an INPUTSTREAM_LINE_READER object which takes
>> a pointer to wxInputStream object.  I'm curious how it stacks up against
>> the std::ifstream.  There are some interesting wxInputStream objects
>> that could prove useful.
>>
>> I think ifstream wasn't used in case there are really long lines which
>> there can be if you have text objects with lots of long multiple line
>> strings in your files.  I'm ok with adding a LINE_READER the wraps
>> istream objects.  It's fairly trivial to change LINE_READER types.  It
>> might be a bit more flexible if you just provided an ISTREAM_LINE_READER
>> that take any istream derived object rather than write a separate
>> LINE_READER for each istream derivative.
>>
>> Cheers,
>>
>> Wayne
>>
>> On 2/16/2017 8:43 AM, John Beard wrote:
>>> Hi,
>>>
>>> I was trying to profile the eeschema slow library loads, and I got a
>>> bit distracted by RICHIO's FILE_LINE_READER.
>>>
>>> Internally, it uses a very tight loop of reading single chars at a
>>> time from a file descriptor, which looks inefficient. I wrote a
>>> benchmarker to compare RICHIO against std::ifstream and a new
>>> LINE_READER implementation, backed by std::ifstream. operf confirms
>>> that most of the time in RICHIO burned in the ReadLine() function
>>> itself.
>>>
>>> The results were that RICHIO (in debug mode) is consistently 4-7 times
>>> slower than using std::ifstream, when reading eeschema library text
>>> files (so relatively short lines). Compiling the release version
>>> improved RICHIOs speed more than std::ifstream's, but it is still
>>> around 3 times slower than std::ifstream.
>>>
>>> For files with 1k lines, the slowdown is about 30 times (!) in debug
>>> and 14 times in release mode, so significantly worse. Few files read
>>> line-wise by Kicad look like that, however.
>>>
>>> Avoiding reconstructing the stream/LINE_READER each time doesn't have
>>> much of an effect in any case.
>>>
>>> Is there a particular reason why STL streams are not used in RICHIO?
>>> The only thing I think the example ifstream implementation can't do is
>>> catching over long lines, but that's only used in one place: the VRML
>>> parser, which hardcodes an 8MB limit. ifstream could do this, but not
>>> with the simple getline function.
>>>
>>> This performance doesn't appear to be a major bottleneck for me, but
>>> it does seem a shame to throw away (charitably) two thirds of file
>>> read speeds (and uncharitably, up to 97% in odd cases) if there is no
>>> particular reason to do so.
>>>
>>> As an aside, RICHIO appears to allocate twice as many times as
>>> std::ifstream when reading the same data, for the roughly the same
>>> amount of memory in total.
>>>
>>> Anyway, I thought I'd share this finding! Please find attached the
>>> benchmark program, such as it is.
>>>
>>> Cheers,
>>>
>>> John
>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~kicad-developers
>>> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~kicad-developers
>> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~kicad-developers
>> More help   : https://help.launchpad.net/ListHelp



Follow ups

References