← Back to team overview

maria-discuss team mailing list archive

Very slow insert with load data infile

 

Hello,

Mariadb is getting slower and slower when I'm inserting a massive amout of data. I'm trying to insert 166 507 066 rows (12go) using load data infile '' into an empty table. I splitted my file in 13 parts of the same size because it was too long to insert in one shot. When I inserted more than 100M rows, it starts to be extremely slow.

Here are the time for each load data infile :

LOAD DATA INFILE '/tmp/hash/xaa' : 2min 49sec
LOAD DATA INFILE '/tmp/hash/xab' : 2min 49sec
LOAD DATA INFILE '/tmp/hash/xac' : 3min 48sec
LOAD DATA INFILE '/tmp/hash/xad' : 3min 48sec
LOAD DATA INFILE '/tmp/hash/xae' : 3min 49sec
LOAD DATA INFILE '/tmp/hash/xaf' : 5min 59sec
LOAD DATA INFILE '/tmp/hash/xag' : 10min 50sec
LOAD DATA INFILE '/tmp/hash/xah' : 20min 7sec
LOAD DATA INFILE '/tmp/hash/xai' : 47min 7sec
LOAD DATA INFILE '/tmp/hash/xaj' : 4 hours 1 min 9.34 sec ( ouch !)
LOAD DATA INFILE '/tmp/hash/xak' : still running...

Now the task is running but mariadb is only using 2% cpu and read to the disk at 2500 Kb/sec (which is slow regarding the disk performance). There is a lot of disk space. If insertion speed continue to slow down at this rate, it will take weeks to insert the 3 remainings files ! It's only 166M rows so I think it's not normal. If I disable the "unique" constraint insertion speed is great but I need to be sure there is no duplicate and I also need an index.

Can someone explain me what is happening internaly and why it is so slow ?
Do you think if I just create an index (not a unique index) it will be faster ? The problem is I specially delegated this task of unique checking to mysql. If no solution, do you know a database or a key/value store with better for performance for this use case ? (create an index and remove duplicate ?) I can rearrange my table structure (at a cost of more disk space usage) to match a key/value structure.

Here is my current table structure :
CREATE TABLE `wallets` (
  `p1` varbinary(20) DEFAULT NULL,
  `p2` varbinary(20) DEFAULT NULL,
  `data` varbinary(32) DEFAULT NULL,
  UNIQUE KEY `p1` (`p1`),
  UNIQUE KEY `p2` (`p2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=FIXED;


Computer specs: 16go RAM and 4x3.1 ghz, 2x1To (with 90% unused disk space)

The configuration variables (I tweaked specially for this insertion job) :

key_buffer_size = 6G # MyISAM: used with insert to an empty table
bulk_insert_buffer_size = 6G # MyISAM/Aria: insert to a non empty table; default 8M
myisam_sort_buffer_size = 6G # MyISAM: used for sort but only when : "CREATE INDEX", "ALTER TABLE" or "REPAIR TABLE" read_buffer_size = 6G # MyISAM/Aria/MERGE: allocated for each table scan, for order by, nested query caching, bulk insert into partitions
sort_buffer_size = 4M # ALL: allocated each time a session need to do a sort
myisam_max_sort_file_size = 128G # MyISAM: tmp file max size

I'm using mariadb 5.5.31.

Best regards,


Follow ups