← Back to team overview

maria-discuss team mailing list archive

Re: InnoDB: slow insert (too many fsyncs ?)

 

Hi,

You should try to enable innodb_file_per_table (and recreate the table), to avoid filling the ibdata file. Try also to increase the innodb_log_buffer_size to 16M (your Innodb_log_waits is increasing). How much memory do you have on your system ? The innodb_buffer_pool_size of 1G seems to be a little bit low. Try also to use the Barracuda file_format instead of Antelope : innodb_file_format = Barracuda & innodb_file_format_max = Barracuda. For such a big amount of data to import, you could also try to increase the innodb_log_file_size. On prod, depending on the speed of your disk (what is your disk / cpu setup on prod ?), you can also increase the innodb_io_capacity.

Depending on your usage, you could also benefit from switching to TokuDB (fast insert and much better compression)

HTH,
  Jocelyn Fournier

Le 26/04/2015 19:29, Julien Muchembled a écrit :
Hello,

I am trying to optimize the insertion of 124 GB of data into InnoDB and it takes much more time than I expected.
For the moment this is only a test before migrating prod, using a USB HDD. Prod has much faster storage but I prefer to check if there's anything wrong in my MariaDB setup.

The source DB is read as less than 1MB/s with less than 20% CPU usage (<10% for mysqld and <10% for the client app doing the migration).

I know that several unsafe settings can be used to import data but I won't be able to use them for the migration of prod. This is because the migration will be done transparently in background, the old db being accessed for not yet migrated data. For the same reason, indexes can't be disabled.

Anyway, it was slow that for the moment, I only did a full test with unsafe settings:
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = nosync
innodb_doublewrite = 0
sync_frm = 0

It took exactly 19.8 hours to import all the data.
See attached file for all parameters and detailed statistics.
(Sorry for the 'show status' outputs, no idea what I expected to get when I tried 'flush status', whereas 21G were already processed. So 'show status.20941778944' is the output just before the flush, and 'show status' is after everything was finished)

The schema of tables can be found here:
http://git.erp5.org/gitweb/neoppod.git/blob/HEAD:/neo/storage/database/mysqldb.py?js=1#l142

1. fsync

Not sure if I read 'show status' correctly, but I'd say it's actually 6 fsyncs per second, and this matches what I see with strace.
Why not 0 whereas I disabled them ?

There's also less than 1 COMMIT per second, so even without disabling fsyncs I was surprised to see so many in the middle of transactions.

2. Write amplification

During my first tests, I saw with iotop that mysqld wrote at 10MB/s. Which means a factor 10.
With doublewrite disabled, it's rather a factor 5.

These ratios are probably underestimated. For better interpretation of the statistics, I should add that:
- source DB is uncompressed but the file format is very compact
- most of the imported data ends up in the 'data.value' column, and most values are zlib-compressed, with also deduplication
The resulting ibdata is only 70GB.

Is this write amplification only caused by indexes & doublewrite ?

Thanks,
Julien


_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp


References