maria-discuss team mailing list archive
Mailing list archive
Very slow insert with load data infile
Mariadb is getting slower and slower when I'm inserting a massive amout of data. I'm trying to
insert 166 507 066 rows (12go) using load data infile '' into an empty table. I splitted my file in
13 parts of the same size because it was too long to insert in one shot. When I inserted more than
100M rows, it starts to be extremely slow.
Here are the time for each load data infile :
LOAD DATA INFILE '/tmp/hash/xaa' : 2min 49sec
LOAD DATA INFILE '/tmp/hash/xab' : 2min 49sec
LOAD DATA INFILE '/tmp/hash/xac' : 3min 48sec
LOAD DATA INFILE '/tmp/hash/xad' : 3min 48sec
LOAD DATA INFILE '/tmp/hash/xae' : 3min 49sec
LOAD DATA INFILE '/tmp/hash/xaf' : 5min 59sec
LOAD DATA INFILE '/tmp/hash/xag' : 10min 50sec
LOAD DATA INFILE '/tmp/hash/xah' : 20min 7sec
LOAD DATA INFILE '/tmp/hash/xai' : 47min 7sec
LOAD DATA INFILE '/tmp/hash/xaj' : 4 hours 1 min 9.34 sec ( ouch !)
LOAD DATA INFILE '/tmp/hash/xak' : still running...
Now the task is running but mariadb is only using 2% cpu and read to the disk at 2500 Kb/sec (which
is slow regarding the disk performance). There is a lot of disk space. If insertion speed continue
to slow down at this rate, it will take weeks to insert the 3 remainings files ! It's only 166M rows
so I think it's not normal. If I disable the "unique" constraint insertion speed is great but I need
to be sure there is no duplicate and I also need an index.
Can someone explain me what is happening internaly and why it is so slow ?
Do you think if I just create an index (not a unique index) it will be faster ? The problem is I
specially delegated this task of unique checking to mysql.
If no solution, do you know a database or a key/value store with better for performance for this use
case ? (create an index and remove duplicate ?) I can rearrange my table structure (at a cost of
more disk space usage) to match a key/value structure.
Here is my current table structure :
CREATE TABLE `wallets` (
`p1` varbinary(20) DEFAULT NULL,
`p2` varbinary(20) DEFAULT NULL,
`data` varbinary(32) DEFAULT NULL,
UNIQUE KEY `p1` (`p1`),
UNIQUE KEY `p2` (`p2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED;
Computer specs: 16go RAM and 4x3.1 ghz, 2x1To (with 90% unused disk space)
The configuration variables (I tweaked specially for this insertion job) :
key_buffer_size = 6G # MyISAM: used with insert to an empty table
bulk_insert_buffer_size = 6G # MyISAM/Aria: insert to a non empty table; default 8M
myisam_sort_buffer_size = 6G # MyISAM: used for sort but only when : "CREATE INDEX", "ALTER TABLE"
or "REPAIR TABLE"
read_buffer_size = 6G # MyISAM/Aria/MERGE: allocated for each table scan, for order by, nested query
caching, bulk insert into partitions
sort_buffer_size = 4M # ALL: allocated each time a session need to do a sort
myisam_max_sort_file_size = 128G # MyISAM: tmp file max size
I'm using mariadb 5.5.31.