← Back to team overview

randgen team mailing list archive

Re: Case sensitivity in Transform validator's TRANSFORM_OUTCOME_DISTINCT check

 

Hi,

I am sorry for this oversight . I did notice that the ExecuteAsUnion was producing failures, but I though they were due to some other bug about 0 v.s. NULL that is currently being fixed .

I think the best solution is to make isDistinct() case insensitive to make its behavior identical to the database one. This will fix both the ExecuteAsUnion and the Distinct validator. there is a lc() perl function that you can use.

Other databases or character sets may not match the behavior in MySQL, so if the RQG ever aquires extra "customers" for Transformer-based testing, we may need to revisit the issue.

Philip Stoev

----- Original Message ----- From: "John H Embretsen" <john.embretsen@xxxxxxxxxx>
To: <randgen@xxxxxxxxxxxxxxxxxxx>
Sent: Wednesday, September 22, 2010 11:38 AM
Subject: [Randgen] Case sensitivity in Transform validator's TRANSFORM_OUTCOME_DISTINCT check


Hi,


It is probably known to those of you using the Transformer validator
that some query transforms are not comparable to the original in every
case. One such case is the ExecuteAsUnion transforms when using
TRANSFORM_OUTCOME_DISTINCT against queries using varchar columns.

With this transform, results from the original query (e.g. SELECT
`col_varchar_10_latin1`  FROM L) are compared to results of transformed
queries using UNION. For example,

            (  SELECT `col_varchar_10_latin1`  FROM L    )
  UNION ALL (  SELECT `col_varchar_10_latin1`  FROM L    )
  /* TRANSFORM_OUTCOME_SUPERSET */;

where the /* TRANFORM_OUTCOME_SUPERSET */ signals to the validator that
results are OK if results from the transformed query is a superset of
the original query. This is all well.

On the other hand, one other type of transformed UNION query is:

        (  SELECT `col_varchar_10_latin1`  FROM L    )
  UNION (  SELECT `col_varchar_10_latin1`  FROM L    )
  /* TRANSFORM_OUTCOME_DISTINCT */;

where the /* TRANFORM_OUTCOME_DISTINCT */ signals to the validator that
results are OK if results from the transformed query after removing
duplicates is equal to the results of the original query.

This works OK in most cases, but when dealing with text values in a
column with a case-insensitive collation (default in MySQL) the
validator reports false positives. This is because the validator's idea
of "DISTINCT" is not equal to the the database server's idea of
DISTINCT. The issue is visible with transforms such as "ExecuteAsUnion"
and "Distinct".

I see this for example when running the RQG against MySQL with
--validator=Transform, --grammar=conf/optimizer/range_access.yy and
--gendata=conf/optimizer/range_access.zz.

I am interested in your opinions on what is the best way to deal with
this, especially if you too are affected by this. For example:

a) Implement permanent support for case insensitive duplicate removal
(if possible) in the Transform validator's DISTINCT checking?

b) Automatically skip (return STATUS_WONT_HANDLE) this transform
altogether if columns in query are varchar or char?

c) Implement temporary hacks to disable e.g. the ExecuteAsUnion
transform in the affected test runs?



To make the issue more clear, consider the following MySQL table:

CREATE TABLE t9 (a VARCHAR(10), b VARCHAR(10) COLLATE latin1_bin);

-- Here, column a has default collation (|latin1_swedish_ci|), while
column b has case sensitive collation (latin1_bin).
-- Let's insert some data:

insert into t9 values ('hello', 'hello');
insert into t9 values ('v', 'v');
insert into t9 values ('i', 'i');
insert into t9 values ('I', 'I');
insert into t9 values ('m', 'm');

-- Mimicking the Transform validator, we execute first the original
query and then the transformed query:

mysql> SELECT * FROM t9;
+-------+-------+
| a     | b     |
+-------+-------+
| hello | hello |
| v     | v     |
| i     | i     |
| I     | I     |
| m     | m     |
+-------+-------+
5 rows in set (0.00 sec)

-- Using the column with default collation, we get:

mysql> (SELECT a FROM t9) UNION (SELECT a FROM t9);
+-------+
| a     |
+-------+
| hello |
| v     |
| i     |
| m     |
+-------+
4 rows in set (0.00 sec)

-- The validator thinks this is a bad result, because the value "I" is
missing.
-- However, the result is correct, since duplicate rows were removed
with case insensitivity in mind.

-- Using the column with the case sensitive collation:

mysql> (SELECT b FROM t9) UNION (SELECT b FROM t9);
+-------+
| b     |
+-------+
| hello |
| v     |
| i     |
| I     |
| m     |
+-------+
5 rows in set (0.00 sec)

-- The validator agrees this is an OK result, since it is equal to the
original.
-- The result is correct, since this was a column with case sensitive
collation (no duplicates).


The problem here is that although the validator's isDistinct() check
tries to remove duplicates from the result before comparing, it does
this in a case sensitive manner ("i" is not the same as "I"). The UNION
operator of MySQL, however, regards "i" the same as "I" and removes one
of them from the result, as long as the column has default collation.

Let me know if you have suggestions or opinions.


--
John




--------------------------------------------------------------------------------


_______________________________________________
Mailing list: https://launchpad.net/~randgen
Post to     : randgen@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~randgen
More help   : https://help.launchpad.net/ListHelp





Follow ups

References