← Back to team overview

randgen team mailing list archive

[Bug 785895] [NEW] Transformers are character set unaware

 

Public bug reported:

In transformers - where $return_type is being used without copying the
original character set definition - there is a potential for data
compare failures which are false positives.

Example: a table with a utf8 varchar column. The transformer uses
$return_type and sees "varchar". It is however not aware of the utf8.
Now the transformer may create a function, SP, table etc. and use this
$return_type for variables within it, but it will not specify the
original character set. Hence, differences between the original data and
transformed data are likely/possible.

Note that for instance a varchar type (which normal maximum lenght is
about 65K) in utf8 can at maximum be about 21-22K long, since utf8 uses
3 bytes:

mysql> CREATE TABLE t1 (txt1 VARCHAR(65000)) ENGINE=MyISAM;
Query OK, 0 rows affected (0.02 sec)

mysql> DROP TABLE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t1 (txt1 VARCHAR(65000) CHARACTER SET utf8) ENGINE=MyISAM;
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> SHOW WARNINGS;
+-------+------+-----------------------------------------------+
| Level | Code | Message                                       |
+-------+------+-----------------------------------------------+
| Note  | 1246 | Converting column 'txt1' from VARCHAR to TEXT |
+-------+------+-----------------------------------------------+
1 row in set (0.00 sec)

mysql> DROP TABLE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t1 (txt1 VARCHAR(20000) CHARACTER SET utf8) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)

** Affects: randgen
     Importance: Medium
         Status: New

** Summary changed:

- Transformers are charater set unaware
+ Transformers are character set unaware

** Changed in: randgen
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Random
Query Generator Team, which is subscribed to Random Query Generator.
https://bugs.launchpad.net/bugs/785895

Title:
  Transformers are character set unaware

Status in SQL Generator for testing SQL servers (MySQL, JavaDB, PostgreSQL):
  New

Bug description:
  In transformers - where $return_type is being used without copying the
  original character set definition - there is a potential for data
  compare failures which are false positives.

  Example: a table with a utf8 varchar column. The transformer uses
  $return_type and sees "varchar". It is however not aware of the utf8.
  Now the transformer may create a function, SP, table etc. and use this
  $return_type for variables within it, but it will not specify the
  original character set. Hence, differences between the original data
  and transformed data are likely/possible.

  Note that for instance a varchar type (which normal maximum lenght is
  about 65K) in utf8 can at maximum be about 21-22K long, since utf8
  uses 3 bytes:

  mysql> CREATE TABLE t1 (txt1 VARCHAR(65000)) ENGINE=MyISAM;
  Query OK, 0 rows affected (0.02 sec)

  mysql> DROP TABLE t1;
  Query OK, 0 rows affected (0.00 sec)

  mysql> CREATE TABLE t1 (txt1 VARCHAR(65000) CHARACTER SET utf8) ENGINE=MyISAM;
  Query OK, 0 rows affected, 1 warning (0.03 sec)

  mysql> SHOW WARNINGS;
  +-------+------+-----------------------------------------------+
  | Level | Code | Message                                       |
  +-------+------+-----------------------------------------------+
  | Note  | 1246 | Converting column 'txt1' from VARCHAR to TEXT |
  +-------+------+-----------------------------------------------+
  1 row in set (0.00 sec)

  mysql> DROP TABLE t1;
  Query OK, 0 rows affected (0.00 sec)

  mysql> CREATE TABLE t1 (txt1 VARCHAR(20000) CHARACTER SET utf8) ENGINE=MyISAM;
  Query OK, 0 rows affected (0.03 sec)


Follow ups

References