maria-discuss team mailing list archive

Thread
Date
Re: InnoDB: slow insert (too many fsyncs ?)

To: Julien Muchembled <jm@xxxxxxxxxx>
From: MARK CALLAGHAN <mdcallag@xxxxxxxxx>
Date: Mon, 27 Apr 2015 09:09:56 -0700
Cc: Maria Discuss <maria-discuss@xxxxxxxxxxxxxxxxxxx>, Vincent Pelletier <vincent@xxxxxxxxxx>
In-reply-to: <553D2092.7050803@nexedi.com>
For the fsync rate, my memory is vague, but I think InnoDB respects the
write-ahead-log rule even with innodb_flush_method = nosync and this means
that before a dirty page is written back the log must be forced for any
changes to that page. If you run strace you can determine which files get
the fsync and if it is only the log then my memory is correct. I haven't
used innodb_flush_method = nosync for a very long time (maybe 6 years ago)
and when I did I hacked it (probably to not sync the redo log as suggested
above). I almost always use innodb_flush_method=O_DIRECT and then configure
enough background write threads.

For write amplification you are computing the rate as Innodb-write-rate /
InnoDB-read-rate. But I think your InnoDB-read-rate is the scans of the PK
index while the write rate includes IO for PK and secondary indexes. This
is still write-amp, but deserves an asterisk to explain that point.
Write-amp from the doublewrite buffer can be much cheaper than write-amp
from dirty page writeback where "cheaper" means fewer seeks.

If you want to make the load go faster than creating the secondary indexes
after the table has been loaded might help.

Another interesting question is whether to load in parallel or one table at
a time. Sometimes, loading one table at a time helps because you are more
likely to keep the working set in memory during the load. This can be very
important when secondary indexes are maintained during the load as you can
do too much random read - modify - write for secondary indexes, although
the InnoDB change buffer helps a bit with that. The choice here isn't all
parallel vs 1-at-a-time. Doing a few at a time gives some parallel benefit,
but also gives each table load a better chance at keeping secondary indexes
in memory during the load.

It might help to increase innodb_max_dirty_pages_pct to 90 (you used 75). I
don't have current experience with it but you might also want to increase
innodb_max_dirty_pages_pct_lwm. The goal is to not write back dirty pages
too soon.

Making your InnoDB log files as large as possible will also help defer
writing back dirty pages and reduce your page write back rate (and
write-amp) in a good way.
| innodb_log_file_size                                   | 524288000


On Sun, Apr 26, 2015 at 10:29 AM, Julien Muchembled <jm@xxxxxxxxxx> wrote:

> Hello,
>
> I am trying to optimize the insertion of 124 GB of data into InnoDB and it
> takes much more time than I expected.
> For the moment this is only a test before migrating prod, using a USB HDD.
> Prod has much faster storage but I prefer to check if there's anything
> wrong in my MariaDB setup.
>
> The source DB is read as less than 1MB/s with less than 20% CPU usage
> (<10% for mysqld and <10% for the client app doing the migration).
>
> I know that several unsafe settings can be used to import data but I won't
> be able to use them for the migration of prod. This is because the
> migration will be done transparently in background, the old db being
> accessed for not yet migrated data. For the same reason, indexes can't be
> disabled.
>
> Anyway, it was slow that for the moment, I only did a full test with
> unsafe settings:
> innodb_flush_log_at_trx_commit = 0
> innodb_flush_method = nosync
> innodb_doublewrite = 0
> sync_frm = 0
>
> It took exactly 19.8 hours to import all the data.
> See attached file for all parameters and detailed statistics.
> (Sorry for the 'show status' outputs, no idea what I expected to get when
> I tried 'flush status', whereas 21G were already processed. So 'show
> status.20941778944' is the output just before the flush, and 'show status'
> is after everything was finished)
>
> The schema of tables can be found here:
>
> http://git.erp5.org/gitweb/neoppod.git/blob/HEAD:/neo/storage/database/mysqldb.py?js=1#l142
>
> 1. fsync
>
> Not sure if I read 'show status' correctly, but I'd say it's actually 6
> fsyncs per second, and this matches what I see with strace.
> Why not 0 whereas I disabled them ?
>
> There's also less than 1 COMMIT per second, so even without disabling
> fsyncs I was surprised to see so many in the middle of transactions.
>
> 2. Write amplification
>
> During my first tests, I saw with iotop that mysqld wrote at 10MB/s. Which
> means a factor 10.
> With doublewrite disabled, it's rather a factor 5.
>
> These ratios are probably underestimated. For better interpretation of the
> statistics, I should add that:
> - source DB is uncompressed but the file format is very compact
> - most of the imported data ends up in the 'data.value' column, and most
> values are zlib-compressed, with also deduplication
> The resulting ibdata is only 70GB.
>
> Is this write amplification only caused by indexes & doublewrite ?
>
> Thanks,
> Julien
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Mark Callaghan
mdcallag@xxxxxxxxx
Follow ups

Re: InnoDB: slow insert (too many fsyncs ?)
From: Julien Muchembled, 2015-06-15
References

InnoDB: slow insert (too many fsyncs ?)
From: Julien Muchembled, 2015-04-26