← Back to team overview

maria-developers team mailing list archive

Re: tmp-table-tests MariaDB 5.2.6 vs. 5.5.12 vs.5.1.57

 

Hi!

>>>>> "Sergei" == Sergei Golubchik <serg@xxxxxxxxxxxx> writes:

Sergei> Hi, erkan!
Sergei> On May 30, erkan yanar wrote:
>> 
>> Moin, 
>> I did some tmp-table tests. As it is told MariaDB is faster on
>> tmp-tables because of the Aria-Engine.

Sergei> I've tried that, but without mysqlslap. Your table structure, few times more rows,
Sergei> mysql-5.1, mariadb-5.2 with aria tmp tables, mariadb-5.2 with myisam tmp
Sergei> tables. I didn't try mysql-5.5. I used a query like

Sergei>   SELECT id, tea from t1 group by left(id,1) order by null;

Sergei> where "group by left(id,1)" makes sure the number of groups is small
Sergei> (only ten), so the result set is small and the query doesn't spend most
Sergei> of its time sending thousands of result rows. And "order by null"
Sergei> removes filesort from the query plan.

Sergei> I've got almost identical numbers (which is a bit scary, considering
Sergei> different setup, and different queries).

Sergei> MariaDB with myisam tmp tables performed similar to MySQL (1.2-1.5 sec)
Sergei> MariaDB with aria tables was slower (5.1-5.7 sec).

Sergei> I remember that few months ago Monty fixed a slowdown caused by Aria
Sergei> temp tables. May be this fix didn't get into 5.2? Or you've found a
Sergei> different problem? I'll let Monty comment on it - he was already
Sergei> debugging this problem once.

I have now analysed this in detail.

I used sergis test table:

CREATE TABLE `t1` (
  `id` int(11) DEFAULT NULL,
  `tea` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

I repeated then the following query until I got 2097152 rows.

insert t1 select rand()*2e8, repeat(rand(), rand()*64) from t1;

Then I issued the following queries.

SELECT id, tea from t1 group by left(id,1) order by null;
SELECT id, count(*), tea from t1 group by left(id,1) order by null

The results I got was (for both queries):

Aria(8K) row_format=page:    8.56 sec & 6.06
Aria(8K) row_format=dynamic: 3.19 & 13.32
Aria(1K) row_format=page:    6.40 & 4.88
Aria(1K) row_format=dynamic: 2.02 & 13.32
Aria(2K) row_format=page:    6.40 & 4.88
Aria(2K) row_format=dynamic: 2.26 & 13.51
MyISAM:                      2.17, 12.89

The number after Aria is the block size (with mysqld
--aria-block-size=).  The 'dynamic' test was done by patching mysqld
to use the dynamic page format instead of 'page' which is default.

Dynamic is what MYISAM is using;  If the page size would be the same
then Aria and MyISAM should perform identically.
>From the above you can see
- That page size affects the query time notable in this case. This is
  because there is more data to move back/from the page cache.  The
  bigger page cache however allows longer keys and fewer index levels
  so for bigger data sets the different should be smaller.
  It's possible to in the future optimize Aria to not copy pages from
  the page cache also for index writes and then this difference should
  disappear.
- Aria is 2x faster than MyISAM when you add 'count(*)' to the query.
  (More about this later).

I also tried the change the query to use left(id,2) to get more
groups. Then the result was:

Aria(8K) row_format=page:    10.21 & 7.96
Aria(8K) row_format=dynamic: 4.85 & 15.47
MyISAM:                      4.04 & 15.14

Here one can see that compared to dynamic, the better code in Aria
negates the page cache a bit for dynamic.  row_format=page is still
slower for the simpler query.

I then tried by using group on an integer:

SELECT id, tea from t1 group by id % 100 order by null;
SELECT id, count(*), tea from t1 group by id % 100 order by null;

Aria(8K) row_format=page:    9.83 & 10.23
Aria(8K) row_format=dynamic: 7.19 & 17.42
MyISAM:                      6.28 & 17.00

Results is still similar.

I examined the code and then found out why Aria was slower for the
simpler query.

MariaDB has an optimization that when you don't have any summary
functions in the query it will just write to the group temp table and
let duplicate key handling remove the duplicates. With the 'page'
format however this is not a good idea as this is optimized for the
case when there is few duplicates and it starts by writing the row,
then the keys. What happens is that as almost all rows are duplicates
we first insert the row, then notice the duplication and then delete
the row.

Fortunately it's not common that you have group by queries without a
sum function and that's why we have not noticed this before.

I have now fixed this in 5.2 so that if there is no summary function
we are using the dynamic row format for aria.
(It's not trivial to fix the way page format works by first writing
the row and this is not something I would recommended to done in a
stable release).

After this (one line change) the results are:

First queries LEFT(id,1) : 3.08 & 6.24
Second LEFT(id,2)):        4.87 & 8.20
Third (id % 10):           7.10 & 10.38

We are still a little bit slower than MyISAM for the simpler, not
common queries, (which can be fixed by running with a block size of 2K
if this would be critical for anyone) However for queries with sum
functions we are about 50 % faster than MyISAM.

Erkan, thanks for finding this issue!

Regards,
Monty


References