← Back to team overview

maria-discuss team mailing list archive

Re: Very slow insert with load data infile

 

Hi, Pierre!

Okay, there were quite a few replies already here.
I'll chime in, but only to confirm that there's no easy solution,
unfortunately :(

On Feb 10, Pierre wrote:
> Hello,
> 
> Mariadb is getting slower and slower when I'm inserting a massive
> amout of data. I'm trying to insert 166 507 066 rows (12go) using load
> data infile '' into an empty table. I splitted my file in 13 parts of
> the same size because it was too long to insert in one shot. When I
> inserted more than 100M rows, it starts to be extremely slow.
> 
> Here are the time for each load data infile :
> 
> LOAD DATA INFILE '/tmp/hash/xaa' : 2min 49sec
> LOAD DATA INFILE '/tmp/hash/xab' : 2min 49sec
> LOAD DATA INFILE '/tmp/hash/xac' : 3min 48sec
> LOAD DATA INFILE '/tmp/hash/xad' : 3min 48sec
> LOAD DATA INFILE '/tmp/hash/xae' : 3min 49sec
> LOAD DATA INFILE '/tmp/hash/xaf' : 5min 59sec
> LOAD DATA INFILE '/tmp/hash/xag' : 10min 50sec
> LOAD DATA INFILE '/tmp/hash/xah' : 20min 7sec
> LOAD DATA INFILE '/tmp/hash/xai' : 47min 7sec
> LOAD DATA INFILE '/tmp/hash/xaj' : 4 hours 1 min 9.34 sec ( ouch !)
> LOAD DATA INFILE '/tmp/hash/xak' : still running...
>
> Can someone explain me what is happening internaly and why it is so slow ?

Yes. That's because indexes are implemented as B-trees, and B-tree has a
cost of insertion of O(log N). As you insert more data, insertions
become slower. Furthermore, B-tree organizes data in pages. MariaDB
tries to keep all pages in memory, but as soon as index becomes larger
than keycache size, every key insertion means at least one disk read and
one write (assuming your key values are distributed normally and
randomly).

> Do you think if I just create an index (not a unique index) it will be
> faster ?

Not really. Inserting values into a B-tree one-by-one will always be
slow when your B-tree becomes big enough.

But you don't need to insert values into a B-tree one by one.  MyISAM
has a much faster (like, two orders of magnitude) way of building
b-trees by creating a tree all at once.

This is used when you insert data into the empty table. Or when you add
an index to the existing table. Or when you enable indexes. That is:

 1. LOAD DATA INFILE 'everything, not in chunks'
    (a convenient way would be to load from a pipe, and cat all your
    chunks into it).
 2. ALTER TABLE ... ADD KEY
 3. ALTER TABLE DISABLE KEYS. insert the data. ALTER TABLE ENABLE KEYS

But unfortunately, all this only works for non-unique keys, when MariaDB
doesn't need to expect a unique constraint error and won't need to
delete rows when creating indexes.  For unique indexes values will be
inserted into a b-tree one by one.

Now, there's another trick you can use. Create a table. Then disable all
indexes - not with ALTER TABLE ENABLE KEYS (which only disables
non-unique indexes) - but from a command-line with myisamchk:

  $ myisamchk --keys-used=0 /path/to/table.MYI

then insert your data. And REPAIR TABLE or REPAIR TABLE ... QUICK.
This will rebuild all indexes, the fast way.
The second variant (with QUICK) will not copy your (presumably, huge)
data - it will only rebuild indexes. But it cannot remove duplicates
either.

And also it's worth noting that MariaDB-10.0 rebuilds both UNIQUE
and non-UNIQUE indexes the fast way when you do ALTER TABLE ... ADD KEY.

Regards,
Sergei



Follow ups

References