maria-discuss team mailing list archive

Thread
Date

Re: Very slow insert with load data infile

To: Justin Swanhart <greenlion@xxxxxxxxx>
From: Pierre <pierz@xxxxxxxxxx>
Date: Wed, 12 Feb 2014 11:10:57 +0100
Cc: "maria-discuss@xxxxxxxxxxxxxxxxxxx" <maria-discuss@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <B256DCFF-6156-4B13-83C2-B53C8016587D@gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

You don't undestand, I already did this. Read the thread since the beginning :

https://lists.launchpad.net/maria-discuss/msg01338.html <= load data in empty table with index

https://lists.launchpad.net/maria-discuss/msg01361.html <= Load the data in empty table THEN add theindex (what you are suggesting).


Le 12/02/2014 11:03, Justin Swanhart a écrit :

Load the data THEN add the index.  This will do the unique check once instead of on every insertion.

On bloom filter miss, select from the table and insert if it isn't there.  If this is multithreaded use innodb or tokudb and select .... for update, to prevent race.

Sent from my iPhone

On Feb 12, 2014, at 12:49 AM, Pierre <pierz@xxxxxxxxxx> wrote:

Ok I understand, but this doesn't resolve the initialization problem, I'm sure there is a bug, or something which can be improved a lot. I can't use the UNIQUE constraint when I have to much key which doesn't fit in RAM.

Because even If have memcache/bloom filter in front, I still need to create my initial dataset with the unique constraint,and even if my initial dataset is unique, I can't insert them in my table because of the bug explained before.

If mysql can do a sort so quickly (create index), it should be trivial to reuse this sorted data, eliminate the duplicate and create the UNIQUE index.


Le 12/02/2014 09:24, Justin Swanhart a écrit :

Hi,

This is not a bug, but how b tree indexes work.  For them to be efficient they must fit in ram.
   There are buffering mechanisms that can be used for secondary indexes in some cases, because the
write can be done without a read, but ONLY when the index is not unique.  It if it unique, then the
index dive is necessary and a btree traversal can take multiple random IO to get to the leaf node.

Faster IO can help, so you can look into flash storage.

Consider fronting your lookups with memcache (a hash index) or a bloom filter (there are many
implementations on the net) to reduce lookups.

--Justin


Le 12/02/2014 09:04, Pierre a écrit :

Using this technique I have the same Issue. It's now running for severals hours, I'm at a 40% and
looking at show full processlist, it's getting slower and slower. It will never finish.

I think there is a bug here.

Firstly, regardly the memory usage, It doesn't correctly use the buffer I did set, I think it's only
using the key_buffer_size. myisam_sort_buffer_size or bulk_insert_buffer_size are not used on this
task.

So what's happening ? When the RAM is full, mariadb is doing random access on the disk to sort and
filter Go of data ! That's why my cpu was only used a few percent, the process was in i/o wait most
of the time.

So what I am saying here is : mariadb can't crate UNIQUE keys/index if the rows doesn't fit in RAM.

However if I try to create a standard index (ie non unique), it works well and it's done in less
than an hour.

ALTER IGNORE TABLE mytable ADD INDEX (c1), ADD INDEX(c2);
Query OK, 349086532 rows affected (44 min 25.21 sec)
Records: 349086532 Duplicates: 0 Warnings: 0

In this second usage case, Maria is doing a good work by using the myisam_sort_buffer_size. I think
it's doing something like an external/merge sort, spliting the rows in part that fit in RAM, sorting
them, merging them and creating index.

It was 100% cpu most of the time, when It was not it was because mysql was loading the rows in RAM
from hard disk (and not doing a random access on the hard disk like in create unique index). So why
UNIQUE index is not behaving the same way ?

It's easy to reproduce the bug, just create a binary file of 2 or 3 x size of RAM, then load data
infile and try to create a UNIQUE index on it. It will never end.

Am 10.02.2014 13:45, schrieb Pierre:

Mariadb is getting slower and slower when I'm inserting a massive amout of data. I'm trying to

insert 166 507 066

rows (12go) using load data infile '' into an empty table. I splitted my file in 13 parts of the

same size because

it was too long to insert in one shot. When I inserted more than 100M rows, it starts to be

extremely slow.


he reason are the permamently updated keys
in general for large inserts on a new table

UNIQUE KEY `p1` (`p1`),
UNIQUE KEY `p2` (`p2`),

* remove keyes
* insert data
* add kyes

https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html


_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: Very slow insert with load data infile
From: Reindl Harald, 2014-02-12

References

Very slow insert with load data infile
From: Pierre, 2014-02-10
Re: Very slow insert with load data infile
From: Reindl Harald, 2014-02-10
Re: Very slow insert with load data infile
From: Pierre, 2014-02-12
Re: Very slow insert with load data infile
From: Pierre, 2014-02-12
Re: Very slow insert with load data infile
From: Justin Swanhart, 2014-02-12