← Back to team overview

kicad-developers team mailing list archive

RICHIO performance - 3 to 30 times slower than std::ifstream

 

Hi,

I was trying to profile the eeschema slow library loads, and I got a
bit distracted by RICHIO's FILE_LINE_READER.

Internally, it uses a very tight loop of reading single chars at a
time from a file descriptor, which looks inefficient. I wrote a
benchmarker to compare RICHIO against std::ifstream and a new
LINE_READER implementation, backed by std::ifstream. operf confirms
that most of the time in RICHIO burned in the ReadLine() function
itself.

The results were that RICHIO (in debug mode) is consistently 4-7 times
slower than using std::ifstream, when reading eeschema library text
files (so relatively short lines). Compiling the release version
improved RICHIOs speed more than std::ifstream's, but it is still
around 3 times slower than std::ifstream.

For files with 1k lines, the slowdown is about 30 times (!) in debug
and 14 times in release mode, so significantly worse. Few files read
line-wise by Kicad look like that, however.

Avoiding reconstructing the stream/LINE_READER each time doesn't have
much of an effect in any case.

Is there a particular reason why STL streams are not used in RICHIO?
The only thing I think the example ifstream implementation can't do is
catching over long lines, but that's only used in one place: the VRML
parser, which hardcodes an 8MB limit. ifstream could do this, but not
with the simple getline function.

This performance doesn't appear to be a major bottleneck for me, but
it does seem a shame to throw away (charitably) two thirds of file
read speeds (and uncharitably, up to 97% in odd cases) if there is no
particular reason to do so.

As an aside, RICHIO appears to allocate twice as many times as
std::ifstream when reading the same data, for the roughly the same
amount of memory in total.

Anyway, I thought I'd share this finding! Please find attached the
benchmark program, such as it is.

Cheers,

John
From 9030890b9ad86ff2601a0488ef6e0003dfbb2e66 Mon Sep 17 00:00:00 2001
From: John Beard <john.j.beard@xxxxxxxxx>
Date: Wed, 15 Feb 2017 16:47:41 +0800
Subject: [PATCH] IO benchmark program: RICHIO vs std::fstream

Also vs fstream-backed LINE_READER.
---
 qa/CMakeLists.txt                |   2 +
 qa/io_benchmark/CMakeLists.txt   |  17 ++++
 qa/io_benchmark/fstream_io.cpp   |  77 +++++++++++++++
 qa/io_benchmark/fstream_io.h     |  55 +++++++++++
 qa/io_benchmark/io_benchmark.cpp | 199 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 350 insertions(+)
 create mode 100644 qa/io_benchmark/CMakeLists.txt
 create mode 100644 qa/io_benchmark/fstream_io.cpp
 create mode 100644 qa/io_benchmark/fstream_io.h
 create mode 100644 qa/io_benchmark/io_benchmark.cpp

diff --git a/qa/CMakeLists.txt b/qa/CMakeLists.txt
index bd579a597..a56f15f51 100644
--- a/qa/CMakeLists.txt
+++ b/qa/CMakeLists.txt
@@ -9,3 +9,5 @@ if( KICAD_SCRIPTING_MODULES )
         )
 
 endif()
+
+add_subdirectory( io_benchmark )
diff --git a/qa/io_benchmark/CMakeLists.txt b/qa/io_benchmark/CMakeLists.txt
new file mode 100644
index 000000000..fde89c9d2
--- /dev/null
+++ b/qa/io_benchmark/CMakeLists.txt
@@ -0,0 +1,17 @@
+
+include_directories( BEFORE ${INC_BEFORE} )
+
+set( IOBENCHMARK_SRCS
+    io_benchmark.cpp
+    fstream_io.cpp
+)
+
+add_executable( io_benchmark
+    ${IOBENCHMARK_SRCS}
+)
+
+target_link_libraries( io_benchmark
+    common
+    ${wxWidgets_LIBRARIES}
+)
+
diff --git a/qa/io_benchmark/fstream_io.cpp b/qa/io_benchmark/fstream_io.cpp
new file mode 100644
index 000000000..420f57348
--- /dev/null
+++ b/qa/io_benchmark/fstream_io.cpp
@@ -0,0 +1,77 @@
+/*
+ * This program source code file is part of KiCad, a free EDA CAD application.
+ *
+ * Copyright (C) 1992-2017 KiCad Developers, see AUTHORS.txt for contributors.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you may find one here:
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.html
+ * or you may search the http://www.gnu.org website for the version 2 license,
+ * or you may write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA
+ */
+
+#include "fstream_io.h"
+
+#include <ios>
+
+FSTREAM_LINE_READER::FSTREAM_LINE_READER( const wxString& aFileName,
+            unsigned aStartingLineNumber,
+            unsigned aMaxLineLength ) throw( IO_ERROR ) :
+    LINE_READER( aMaxLineLength ),
+    m_stream( aFileName )
+{
+    line = &m_buffer[0];
+
+    if( !m_stream.is_open() )
+    {
+        wxString msg = wxString::Format(
+            _( "Unable to open filename '%s' for reading" ), aFileName.GetData() );
+        THROW_IO_ERROR( msg );
+    }
+
+    source  = aFileName;
+    lineNum = aStartingLineNumber;
+}
+
+FSTREAM_LINE_READER::~FSTREAM_LINE_READER()
+{
+    // this is only a view into a a string, it cant be deleted by the base
+    line = nullptr;
+}
+
+
+char* FSTREAM_LINE_READER::ReadLine() throw( IO_ERROR )
+{
+    if( getline( m_stream, m_buffer ) )
+    {
+        length = m_buffer.size();
+        line = (char*) m_buffer.data(); //ew why no const??
+    }
+    else
+    {
+        length = 0;
+    }
+
+    // lineNum is incremented even if there was no line read, because this
+    // leads to better error reporting when we hit an end of file.
+    ++lineNum;
+
+    return length ? line : nullptr;
+}
+
+void FSTREAM_LINE_READER::Rewind()
+{
+    m_stream.clear() ;
+    m_stream.seekg(0, std::ios::beg );
+}
diff --git a/qa/io_benchmark/fstream_io.h b/qa/io_benchmark/fstream_io.h
new file mode 100644
index 000000000..dbb5d7cab
--- /dev/null
+++ b/qa/io_benchmark/fstream_io.h
@@ -0,0 +1,55 @@
+/*
+ * This program source code file is part of KiCad, a free EDA CAD application.
+ *
+ * Copyright (C) 2017 KiCad Developers, see AUTHORS.txt for contributors.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you may find one here:
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.html
+ * or you may search the http://www.gnu.org website for the version 2 license,
+ * or you may write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA
+ */
+
+#ifndef FSTREAM_LINE_READER_H
+#define FSTREAM_LINE_READER_H
+
+#include <richio.h>
+
+#include <fstream>
+
+class FSTREAM_LINE_READER : public LINE_READER
+{
+public:
+
+    FSTREAM_LINE_READER( const wxString& aFileName,
+            unsigned aStartingLineNumber = 0,
+            unsigned aMaxLineLength = LINE_READER_LINE_DEFAULT_MAX ) throw( IO_ERROR );
+
+    ~FSTREAM_LINE_READER();
+
+    char* ReadLine() throw( IO_ERROR ) override;
+
+    /**
+     * Function Rewind
+     * rewinds the file and resets the line number back to zero.  Line number
+     * will go to 1 on first ReadLine().
+     */
+    void Rewind();
+
+private:
+    std::string m_buffer;
+    std::ifstream m_stream;
+};
+
+#endif // FSTREAM_LINE_READER_H
diff --git a/qa/io_benchmark/io_benchmark.cpp b/qa/io_benchmark/io_benchmark.cpp
new file mode 100644
index 000000000..a0035ee92
--- /dev/null
+++ b/qa/io_benchmark/io_benchmark.cpp
@@ -0,0 +1,199 @@
+
+#include <wx/wx.h>
+#include <richio.h>
+
+#include <chrono>
+#include <ios>
+#include <functional>
+
+#include <fstream>
+
+#include "fstream_io.h"
+
+enum RET_CODES
+{
+    BAD_ARGS = 1,
+};
+
+using CLOCK = std::chrono::steady_clock;
+using TIME_PT = std::chrono::time_point<CLOCK>;
+
+static int getMs( const TIME_PT& start, const TIME_PT& end )
+{
+    using std::chrono::milliseconds;
+    using std::chrono::duration_cast;
+
+    milliseconds diff = duration_cast<milliseconds>( end - start );
+
+    return diff.count();
+}
+
+
+struct BENCH_REPORT
+{
+    int linesRead;
+    int charAcc;
+};
+
+using BENCH_FUNC = std::function<void(const wxString&, int, BENCH_REPORT&)>;
+
+struct BENCHMARK
+{
+    BENCH_FUNC func;
+    wxString name;
+};
+
+
+static void bench_fstream( const wxString& aFile, int aReps, BENCH_REPORT& report )
+{
+    std::string line;
+
+    for( int i = 0; i < aReps; ++i)
+    {
+        std::ifstream fstr( aFile );
+
+        while( getline( fstr, line ) )
+        {
+            report.linesRead++;
+            report.charAcc += line[0];
+        }
+
+        fstr.close();
+    }
+}
+
+static void bench_fstream_reuse( const wxString& aFile, int aReps, BENCH_REPORT& report )
+{
+    std::string line;
+    std::ifstream fstr( aFile );
+
+    for( int i = 0; i < aReps; ++i)
+    {
+        while( getline( fstr, line ) )
+        {
+            report.linesRead++;
+            report.charAcc += line[0];
+        }
+        fstr.clear() ;
+        fstr.seekg(0, std::ios::beg) ;
+    }
+
+    fstr.close();
+}
+
+static void bench_richio( const wxString& aFile, int aReps, BENCH_REPORT& report )
+{
+    for( int i = 0; i < aReps; ++i)
+    {
+        FILE_LINE_READER flr( aFile );
+        while( flr.ReadLine() )
+        {
+            report.linesRead++;
+            report.charAcc += flr.Line()[0];
+        }
+    }
+}
+
+static void bench_richio_reuse( const wxString& aFile, int aReps, BENCH_REPORT& report )
+{
+    FILE_LINE_READER flr( aFile );
+    for( int i = 0; i < aReps; ++i)
+    {
+        while( flr.ReadLine() )
+        {
+            report.linesRead++;
+            report.charAcc += flr.Line()[0];
+        }
+
+        flr.Rewind();
+    }
+}
+
+static void bench_fstream_richio( const wxString& aFile, int aReps, BENCH_REPORT& report )
+{
+    for( int i = 0; i < aReps; ++i)
+    {
+        FSTREAM_LINE_READER fstr( aFile );
+        while( fstr.ReadLine() )
+        {
+            report.linesRead++;
+            report.charAcc += fstr.Line()[0];
+        }
+    }
+}
+
+static void bench_fstream_richio_reuse( const wxString& aFile, int aReps, BENCH_REPORT& report )
+{
+    FSTREAM_LINE_READER fstr( aFile );
+    for( int i = 0; i < aReps; ++i)
+    {
+
+        while( fstr.ReadLine() )
+        {
+            report.linesRead++;
+            report.charAcc += fstr.Line()[0];
+        }
+
+        fstr.Rewind();
+    }
+}
+
+int main( int argc, char* argv[] )
+{
+    if (argc < 3)
+    {
+        wxLogDebug("Usage: %s <FILE> <REPS> [r/f/R/F]", argv[0] );
+        return BAD_ARGS;
+    }
+
+    wxString inFile( argv[1] );
+
+    long reps = 0;
+    wxString( argv[2] ).ToLong( &reps );
+
+    // get the benchmark to do, or 0 for all of them
+    char bench = 0;
+    if ( argc == 4 )
+        bench = argv[3][0];
+
+    wxLogDebug("IO Bench Mark Util");
+
+    wxLogDebug( "  Benchmark file: %s", inFile );
+    wxLogDebug( "  Repetitions: %d", (int) reps );
+
+    std::vector<BENCHMARK> benches;
+
+
+    if( !bench || bench == 'f' )
+        benches.push_back( { bench_fstream, "std::fstream" } );
+
+    if( !bench || bench == 'F' )
+        benches.push_back( { bench_fstream_reuse, "std::fstream, reused" } );
+
+    if ( !bench || bench == 'r' )
+        benches.push_back( { bench_richio, "RICHIO" } );
+
+    if ( !bench || bench == 'R' )
+        benches.push_back( { bench_richio_reuse, "RICHIO, reused" } );
+
+    if ( !bench || bench == 'n' )
+        benches.push_back( { bench_fstream_richio, "New fstream IO" } );
+
+    if ( !bench || bench == 'N' )
+        benches.push_back( { bench_fstream_richio_reuse, "New fstream IO, reused" } );
+
+
+    for( auto& bmark : benches )
+    {
+        BENCH_REPORT report = {};
+
+        TIME_PT start = CLOCK::now();
+        bmark.func( inFile, reps, report );
+        TIME_PT end = CLOCK::now();
+
+        wxLogDebug( "%25s %d lines, acc: %d in %d ms",
+                bmark.name, report.linesRead, report.charAcc, getMs( start, end ) );
+    }
+
+    return 0;
+}
-- 
2.11.0


Follow ups