← Back to team overview

maria-discuss team mailing list archive

Re: Very slow insert with load data infile

 

hi guys
one doubt... using btree have a O(log N), what about using tokudb?


2014-02-12 12:45 GMT-02:00 Sergei Golubchik <serg@xxxxxxxxxxx>:

> Hi, Pierre!
>
> Okay, there were quite a few replies already here.
> I'll chime in, but only to confirm that there's no easy solution,
> unfortunately :(
>
> On Feb 10, Pierre wrote:
> > Hello,
> >
> > Mariadb is getting slower and slower when I'm inserting a massive
> > amout of data. I'm trying to insert 166 507 066 rows (12go) using load
> > data infile '' into an empty table. I splitted my file in 13 parts of
> > the same size because it was too long to insert in one shot. When I
> > inserted more than 100M rows, it starts to be extremely slow.
> >
> > Here are the time for each load data infile :
> >
> > LOAD DATA INFILE '/tmp/hash/xaa' : 2min 49sec
> > LOAD DATA INFILE '/tmp/hash/xab' : 2min 49sec
> > LOAD DATA INFILE '/tmp/hash/xac' : 3min 48sec
> > LOAD DATA INFILE '/tmp/hash/xad' : 3min 48sec
> > LOAD DATA INFILE '/tmp/hash/xae' : 3min 49sec
> > LOAD DATA INFILE '/tmp/hash/xaf' : 5min 59sec
> > LOAD DATA INFILE '/tmp/hash/xag' : 10min 50sec
> > LOAD DATA INFILE '/tmp/hash/xah' : 20min 7sec
> > LOAD DATA INFILE '/tmp/hash/xai' : 47min 7sec
> > LOAD DATA INFILE '/tmp/hash/xaj' : 4 hours 1 min 9.34 sec ( ouch !)
> > LOAD DATA INFILE '/tmp/hash/xak' : still running...
> >
> > Can someone explain me what is happening internaly and why it is so slow
> ?
>
> Yes. That's because indexes are implemented as B-trees, and B-tree has a
> cost of insertion of O(log N). As you insert more data, insertions
> become slower. Furthermore, B-tree organizes data in pages. MariaDB
> tries to keep all pages in memory, but as soon as index becomes larger
> than keycache size, every key insertion means at least one disk read and
> one write (assuming your key values are distributed normally and
> randomly).
>
> > Do you think if I just create an index (not a unique index) it will be
> > faster ?
>
> Not really. Inserting values into a B-tree one-by-one will always be
> slow when your B-tree becomes big enough.
>
> But you don't need to insert values into a B-tree one by one.  MyISAM
> has a much faster (like, two orders of magnitude) way of building
> b-trees by creating a tree all at once.
>
> This is used when you insert data into the empty table. Or when you add
> an index to the existing table. Or when you enable indexes. That is:
>
>  1. LOAD DATA INFILE 'everything, not in chunks'
>     (a convenient way would be to load from a pipe, and cat all your
>     chunks into it).
>  2. ALTER TABLE ... ADD KEY
>  3. ALTER TABLE DISABLE KEYS. insert the data. ALTER TABLE ENABLE KEYS
>
> But unfortunately, all this only works for non-unique keys, when MariaDB
> doesn't need to expect a unique constraint error and won't need to
> delete rows when creating indexes.  For unique indexes values will be
> inserted into a b-tree one by one.
>
> Now, there's another trick you can use. Create a table. Then disable all
> indexes - not with ALTER TABLE ENABLE KEYS (which only disables
> non-unique indexes) - but from a command-line with myisamchk:
>
>   $ myisamchk --keys-used=0 /path/to/table.MYI
>
> then insert your data. And REPAIR TABLE or REPAIR TABLE ... QUICK.
> This will rebuild all indexes, the fast way.
> The second variant (with QUICK) will not copy your (presumably, huge)
> data - it will only rebuild indexes. But it cannot remove duplicates
> either.
>
> And also it's worth noting that MariaDB-10.0 rebuilds both UNIQUE
> and non-UNIQUE indexes the fast way when you do ALTER TABLE ... ADD KEY.
>
> Regards,
> Sergei
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>



-- 
Roberto Spadim
SPAEmpresarial
Eng. Automação e Controle

References