maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #10420
On constructing ref access from mis-matched charset comparisons
Hi Alexander,
I've got a question about mis-matched charset comparisons and ref access method.
== Short form ==
I know that VARCHAR comparisons over mis-matching charsets cannot be used for
constructing index lookups into the index over the "narrower" character set
column.
But is this a real limitation or just an optimizer deficiency?
If it is the latter, do people hit it in the real world sufficiently often to
warrant lifting it?
== Long form ==
=== Example dataset ===
create table t0 (
a varchar(32) character set utf8 collate utf8_general_ci
);
create table t1 (
a varchar(32) character set latin1 collate latin1_swedish_ci,
col1 varchar(32),
key(a)
);
create table t2 (
a varchar(32) character set utf8 collate utf8_general_ci,
col1 varchar(32),
key(a)
);
insert into t0 select a from ten;
insert into t1 select a,a from one_k;
insert into t2 select a,a from one_k;
=== Queries ===
MariaDB [test]> explain
-> select * from t0, t2 where t0.a=t2.a;
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 1 | SIMPLE | t2 | ref | a | a | 99 | test.t0.a | 1 | |
+------+-------------+-------+------+---------------+------+---------+-----------+------+-------------+
Ok, good.
MariaDB [test]> explain
-> select * from t0, t1 where t0.a=t1.a;
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| 1 | SIMPLE | t0 | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
2 rows in set (0.00 sec)
Oops, the index t1(a) is not used. FORMAT=JSON shows the reason:
"attached_condition": "(t0.a = convert(t1.a using utf8))"
Still it feels like the optimizer could
* Try to convert the value of t0.a into latin1.
* If it can't be represented in latin1, then we know that no row
in t1 has t1.a=t0.a.
* make a lookup on t1.a= convert(t0.a using latin1) , using t1.a's collation
rules.
Is this true [for some charsets] ?
If yes, is it worth doing?
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Follow ups