randgen team mailing list archive
-
randgen team
-
Mailing list archive
-
Message #00049
Case sensitivity in Transform validator's TRANSFORM_OUTCOME_DISTINCT check
Hi,
It is probably known to those of you using the Transformer validator
that some query transforms are not comparable to the original in every
case. One such case is the ExecuteAsUnion transforms when using
TRANSFORM_OUTCOME_DISTINCT against queries using varchar columns.
With this transform, results from the original query (e.g. SELECT
`col_varchar_10_latin1` FROM L) are compared to results of transformed
queries using UNION. For example,
( SELECT `col_varchar_10_latin1` FROM L )
UNION ALL ( SELECT `col_varchar_10_latin1` FROM L )
/* TRANSFORM_OUTCOME_SUPERSET */;
where the /* TRANFORM_OUTCOME_SUPERSET */ signals to the validator that
results are OK if results from the transformed query is a superset of
the original query. This is all well.
On the other hand, one other type of transformed UNION query is:
( SELECT `col_varchar_10_latin1` FROM L )
UNION ( SELECT `col_varchar_10_latin1` FROM L )
/* TRANSFORM_OUTCOME_DISTINCT */;
where the /* TRANFORM_OUTCOME_DISTINCT */ signals to the validator that
results are OK if results from the transformed query after removing
duplicates is equal to the results of the original query.
This works OK in most cases, but when dealing with text values in a
column with a case-insensitive collation (default in MySQL) the
validator reports false positives. This is because the validator's idea
of "DISTINCT" is not equal to the the database server's idea of
DISTINCT. The issue is visible with transforms such as "ExecuteAsUnion"
and "Distinct".
I see this for example when running the RQG against MySQL with
--validator=Transform, --grammar=conf/optimizer/range_access.yy and
--gendata=conf/optimizer/range_access.zz.
I am interested in your opinions on what is the best way to deal with
this, especially if you too are affected by this. For example:
a) Implement permanent support for case insensitive duplicate removal
(if possible) in the Transform validator's DISTINCT checking?
b) Automatically skip (return STATUS_WONT_HANDLE) this transform
altogether if columns in query are varchar or char?
c) Implement temporary hacks to disable e.g. the ExecuteAsUnion
transform in the affected test runs?
To make the issue more clear, consider the following MySQL table:
CREATE TABLE t9 (a VARCHAR(10), b VARCHAR(10) COLLATE latin1_bin);
-- Here, column a has default collation (|latin1_swedish_ci|), while
column b has case sensitive collation (latin1_bin).
-- Let's insert some data:
insert into t9 values ('hello', 'hello');
insert into t9 values ('v', 'v');
insert into t9 values ('i', 'i');
insert into t9 values ('I', 'I');
insert into t9 values ('m', 'm');
-- Mimicking the Transform validator, we execute first the original
query and then the transformed query:
mysql> SELECT * FROM t9;
+-------+-------+
| a | b |
+-------+-------+
| hello | hello |
| v | v |
| i | i |
| I | I |
| m | m |
+-------+-------+
5 rows in set (0.00 sec)
-- Using the column with default collation, we get:
mysql> (SELECT a FROM t9) UNION (SELECT a FROM t9);
+-------+
| a |
+-------+
| hello |
| v |
| i |
| m |
+-------+
4 rows in set (0.00 sec)
-- The validator thinks this is a bad result, because the value "I" is
missing.
-- However, the result is correct, since duplicate rows were removed
with case insensitivity in mind.
-- Using the column with the case sensitive collation:
mysql> (SELECT b FROM t9) UNION (SELECT b FROM t9);
+-------+
| b |
+-------+
| hello |
| v |
| i |
| I |
| m |
+-------+
5 rows in set (0.00 sec)
-- The validator agrees this is an OK result, since it is equal to the
original.
-- The result is correct, since this was a column with case sensitive
collation (no duplicates).
The problem here is that although the validator's isDistinct() check
tries to remove duplicates from the result before comparing, it does
this in a case sensitive manner ("i" is not the same as "I"). The UNION
operator of MySQL, however, regards "i" the same as "I" and removes one
of them from the result, as long as the column has default collation.
Let me know if you have suggestions or opinions.
--
John
Follow ups