← Back to team overview

maria-discuss team mailing list archive

Re: Very slow insert with load data infile

 

Hi,

This is not a bug, but how b tree indexes work.  For them to be efficient
they must fit in ram.  There are buffering mechanisms that can be used for
secondary indexes in some cases, because the write can be done without a
read, but ONLY when the index is not unique.  It if it unique, then the
index dive is necessary and a btree traversal can take multiple random IO
to get to the leaf node.

Faster IO can help, so you can look into flash storage.

Consider fronting your lookups with memcache (a hash index) or a bloom
filter (there are many implementations on the net) to reduce lookups.

--Justin


On Wed, Feb 12, 2014 at 12:04 AM, Pierre <pierz@xxxxxxxxxx> wrote:

> Using this technique I have the same Issue. It's now running for severals
> hours, I'm at a 40% and looking at show full processlist, it's getting
> slower and slower. It will never finish.
>
> I think there is a bug here.
>
> Firstly, regardly the memory usage, It doesn't correctly use the buffer I
> did set, I think it's only using the key_buffer_size.
> myisam_sort_buffer_size or bulk_insert_buffer_size are not used on this
> task.
>
> So what's happening ? When the RAM is full, mariadb is doing random access
> on the disk to sort and filter Go of data ! That's why my cpu was only used
> a few percent, the process was in i/o wait most of the time.
>
> So what I am saying here is : mariadb can't crate UNIQUE keys/index if the
> rows doesn't fit in RAM.
>
> However if I try to create a standard index (ie non unique), it works well
> and it's done in less than an hour.
>
> ALTER IGNORE TABLE mytable ADD INDEX (c1), ADD INDEX(c2);
> Query OK, 349086532 rows affected (44 min 25.21 sec)
> Records: 349086532  Duplicates: 0  Warnings: 0
>
> In this second usage case, Maria is doing a good work by using the
> myisam_sort_buffer_size. I think it's doing something like an
> external/merge sort, spliting the rows in part that fit in RAM, sorting
> them, merging them and creating index.
>
> It was 100% cpu most of the time, when It was not it was because mysql was
> loading the rows in RAM from hard disk (and not doing a random access on
> the hard disk like in create unique index). So why UNIQUE index is not
> behaving the same way ?
>
> It's easy to reproduce the bug, just create a binary file of 2 or 3 x size
> of RAM, then load data infile and try to create a UNIQUE index on it. It
> will never end.
>
>
> >Am 10.02.2014 13:45, schrieb Pierre:
> >> Mariadb is getting slower and slower when I'm inserting a massive amout
> of data. I'm trying to insert 166 507 066
> >> rows (12go) using load data infile '' into an empty table. I splitted
> my file in 13 parts of the same size because
> >> it was too long to insert in one shot. When I inserted more than 100M
> rows, it starts to be extremely slow.
> >
> >he reason are the permamently updated keys
> >in general for large inserts on a new table
> >
> >UNIQUE KEY `p1` (`p1`),
> >UNIQUE KEY `p2` (`p2`),
> >
> >* remove keyes
> >* insert data
> >* add kyes
> >
> >https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>

References