maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09096
Re: Please review MDEV-7055 MySQL#74664 - InnoDB: Failing assertion ...
Hi Sergei,
Thanks for your review. I'm sending a new version.
I changed DATE_FORMAT() to determine its result character set and
collation from args[1] (the "format" argument), instead of using
collation_connection.
I don't see anything special with DATE_FORMAT and think that we just
forgot to fix it under terms of:
WL#2649: Number-to-string conversions
https://dev.mysql.com/worklog/task/?id=2649
The difference is very subtle and does not affect the most regular case:
DATE_FORMAT(datetime_expr, 'format-string')
This change allowed to reuse agg_arg_charsets_for_string_result(),
which is what the other string functions with string input do in
fix_length_and_dec(). It automatically makes DATE_TIME() work in a
consistent way when the "format" argument is a numeric or a datetime
expression.
Please also see inline:
On 12/09/2015 02:36 PM, Sergei Golubchik wrote:
Hi, Alexander!
On Dec 08, Alexander Barkov wrote:
Hi Sergei,
Please review a patch for MDEV-7055.
Thanks.
diff --git a/sql/item_timefunc.cc b/sql/item_timefunc.cc
index 522004e..a2a6fff 100644
--- a/sql/item_timefunc.cc
+++ b/sql/item_timefunc.cc
@@ -447,6 +447,70 @@ static bool extract_date_time(DATE_TIME_FORMAT *format,
/**
+ A multi-byte safe helper class to read characters from a string.
+
+ QQ: Serg: which file to put this new class in?
+ It can be helpful for some other purposes
+ (not only here in item_timefunc.cc)
+ I remember you don't like such things in sql_string.h :)
Thoughts:
1. I'd call it, like, string (or character) iterator
2. There's something like that in 5.6, you might want to see what API
they use
3. Should rather be in C (inline, if possible, with a C++ wrapper, if
you'd like) so that we can put it in a service along with CHARSET_INFO
4. Should rather be in CHARSET_INFO so that you could have fast bytewise
iterator for simple charsets.
And yes, 3 and 4 contradict each other, to a certain extent :)
That's because they're thoughts, not a complete how-to plan.
Discussed on IRC.
<skip>
@@ -457,21 +521,29 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
uint hours_i;
uint weekday;
ulong length;
- const char *ptr, *end;
+ my_wc_t wc;
+ int chlen;
+ Wchar_reader reader(str->charset(), format->format.str,
+ format->format.length);
is str->charset() the character set of the format string?
It was not necessarily the case in the first version,
with collation_connection.
In the second version, with determining the result collation from
args[1], it is.
- end= (ptr= format->format.str) + format->format.length;
- for (; ptr != end ; ptr++)
+ for ( ; !reader.read(&wc, &chlen) ; )
You don't distinguish here between end of string and invalid character.
Is that ok?
As agreed on IRC, I now changed it to convert bad bytes to question
marks, to be inline with MDEV-6566 (which we implemented in 10.0).
Note, I did not add warnings on bad bytes to avoid special code in
DATE_FORMAT. Other functions, e.g. CONCAT('bad-byte-sequence') also
do not warn.
My patch for "MDEV-6643 Improve performance of string processing in the
parser" will automatically add warnings into DATE_FORMAT(), together
with all other functions.
diff --git a/sql/sql_string.cc b/sql/sql_string.cc
index 885f53a..21fba79 100644
--- a/sql/sql_string.cc
+++ b/sql/sql_string.cc
@@ -547,8 +556,28 @@ bool String::append_with_prefill(const char *s,uint32 arg_length,
t_length= full_length - arg_length;
if (t_length > 0)
{
- bfill(Ptr+str_length, t_length, fill_char);
- str_length=str_length + t_length;
+ if (charset()->mbminlen == 1)
+ {
+ /*
+ An ASCII string can be appended directly
+ to an ASCII-compatible string. This includes
+ multi-byte character sets, like utf8, sjis, etc.
+ */
but (mbminlen == 1) doesn't necessarily mean "ASCII-compatible",
remember "filename" charset? I thought you've had an "ASCII-compatible"
property somewhere in the CHARSET_INFO.
In this particular case it was not important, because the argument
string can only consists of digits. But to be on the safe side I
changed it to test the MY_CS_NONASCII flag instead.
Thanks!
<skip>
diff --git a/include/m_ctype.h b/include/m_ctype.h
index 001884a..ac6853f 100644
--- a/include/m_ctype.h
+++ b/include/m_ctype.h
@@ -110,6 +110,8 @@ extern MY_UNI_CTYPE my_uni_ctype[256];
#define MY_CS_TOOSMALL6 -106 /* Need at least 6 bytes: wc_mb and mb_wc */
/* A helper macros for "need at least n bytes" */
#define MY_CS_TOOSMALLN(n) (-100-(n))
+/* mb_wc() found a valid but unassigned character */
+#define MY_CS_MB_WC_UNASSIGNED(x) ((x) >= -6 && (x) < 0)
#define MY_SEQ_INTTAIL 1
#define MY_SEQ_SPACES 2
diff --git a/mysql-test/include/ctype_date_format.inc b/mysql-test/include/ctype_date_format.inc
new file mode 100644
index 0000000..69d8f6e
--- /dev/null
+++ b/mysql-test/include/ctype_date_format.inc
@@ -0,0 +1,30 @@
+--echo #
+--echo # MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+--echo #
+SELECT HEX(date_format('0001-01-01', '%Y'));
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+SELECT date_format('2001-01-01','%W rubb ish %w');
+SELECT date_format('2001-01-01','%W rubb ish %');
+
+# _latin1 'Ã' is latin1 0xC384, which is:
+# - U+00C3 LATIN CAPITAL LETTER A WITH TILDE
+# - U+201E DOUBLE LOW-9 QUOTATION MARK) followed by
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+SELECT date_format('2001-01-01','%YÃЯ');
+
+# Testing non-string format
+SELECT HEX(date_format('2001-01-01', 1000));
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+SELECT date_format('2001-01-01', 1000);
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+SELECT * FROM t1;
+DROP TABLE IF EXISTS t1;
diff --git a/mysql-test/r/ctype_gbk.result b/mysql-test/r/ctype_gbk.result
index 9da3cf9..0c593bc 100644
--- a/mysql-test/r/ctype_gbk.result
+++ b/mysql-test/r/ctype_gbk.result
@@ -621,5 +621,14 @@ A8BD Å
A8BE Å
DROP TABLE t1;
#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SET NAMES gbk;
+CREATE TABLE t1 AS SELECT DATE_FORMAT('2001-01-01',_gbk 0xA1402557) AS a;
+SELECT HEX(a), CONVERT(a USING utf8) FROM t1;
+HEX(a) CONVERT(a USING utf8)
+A1404D6F6E646179 ?Monday
+DROP TABLE t1;
+#
# End of 5.5 tests
#
diff --git a/mysql-test/r/ctype_latin1.result b/mysql-test/r/ctype_latin1.result
index 11a9479..4e7a4f2 100644
--- a/mysql-test/r/ctype_latin1.result
+++ b/mysql-test/r/ctype_latin1.result
@@ -3448,6 +3448,87 @@ maketime(`a`,`a`,`a`)
00:00:00.000000
DROP TABLE t1;
SET sql_mode=default;
+SET NAMES latin1;
+#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SELECT HEX(date_format('0001-01-01', '%Y'));
+HEX(date_format('0001-01-01', '%Y'))
+30303031
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%f'))
+303030303039
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'))
+3030303120303030303039
+SELECT date_format('2001-01-01','%W rubb ish %w');
+date_format('2001-01-01','%W rubb ish %w')
+Monday rubb ish 1
+SELECT date_format('2001-01-01','%W rubb ish %');
+date_format('2001-01-01','%W rubb ish %')
+Monday rubb ish %
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+HEX(date_format('2001-01-01',_latin1'%YÃ'))
+32303031C384
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+HEX(date_format('2001-01-01',_utf8'%YÃЯ'))
+32303031C384D0AF
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+HEX(date_format('2001-01-01','%YÃЯ'))
+32303031C384D0AF
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+date_format('2001-01-01',_latin1'%YÃ')
+2001Ã
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+date_format('2001-01-01',_utf8'%YÃЯ')
+2001Ä?
+SELECT date_format('2001-01-01','%YÃЯ');
+date_format('2001-01-01','%YÃЯ')
+2001ÃЯ
+SELECT HEX(date_format('2001-01-01', 1000));
+HEX(date_format('2001-01-01', 1000))
+31303030
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+HEX(date_format('2001-01-01', TIME('10:20:30')))
+31303A32303A3330
+SELECT date_format('2001-01-01', 1000);
+date_format('2001-01-01', 1000)
+1000
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+date_format('2001-01-01', TIME('10:20:30'))
+10:20:30
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+Table Create Table
+t1 CREATE TABLE `t1` (
+ `a` varchar(1) NOT NULL DEFAULT ''
+) ENGINE=MyISAM DEFAULT CHARSET=latin1
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+Warnings:
+Warning 1265 Data truncated for column 'a' at row 1
+SELECT * FROM t1;
+a
+M
+DROP TABLE IF EXISTS t1;
+SET NAMES utf8;
+SELECT HEX('ÿzzz%Y%ÿzzzÃ') AS format;
+format
+FF7A7A7A255925FF7A7A7AC3
+SELECT HEX(DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzÃ')) AS date;
+date
+3F7A7A7A323030313F7A7A7A3F
+SELECT DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzÃ') AS date;
+date
+?zzz2001?zzz?
+SELECT HEX('ÿzzz%Y%ÿzzzï½') AS format;
+format
+FF7A7A7A255925FF7A7A7AEFBD
+SELECT HEX(DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzï½')) AS date;
+date
+3F7A7A7A323030313F7A7A7A3F3F
+SELECT DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzï½') AS date;
+date
+?zzz2001?zzz??
#
# Bug#11764503 (Bug#57341) Query in EXPLAIN EXTENDED shows wrong characters
#
diff --git a/mysql-test/r/ctype_ucs.result b/mysql-test/r/ctype_ucs.result
index f9e9a69..4d3cf81 100644
--- a/mysql-test/r/ctype_ucs.result
+++ b/mysql-test/r/ctype_ucs.result
@@ -124,6 +124,68 @@ select 'a a' > 'a', 'a \0' < 'a';
select binary 'a a' > 'a', binary 'a \0' > 'a', binary 'a\0' > 'a';
binary 'a a' > 'a' binary 'a \0' > 'a' binary 'a\0' > 'a'
1 1 1
+SET NAMES utf8, character_set_connection=ucs2;
+#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SELECT HEX(date_format('0001-01-01', '%Y'));
+HEX(date_format('0001-01-01', '%Y'))
+0030003000300031
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%f'))
+003000300030003000300039
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'))
+00300030003000310020003000300030003000300039
+SELECT date_format('2001-01-01','%W rubb ish %w');
+date_format('2001-01-01','%W rubb ish %w')
+Monday rubb ish 1
+SELECT date_format('2001-01-01','%W rubb ish %');
+date_format('2001-01-01','%W rubb ish %')
+Monday rubb ish %
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+HEX(date_format('2001-01-01',_latin1'%YÃ'))
+32303031C384
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+HEX(date_format('2001-01-01',_utf8'%YÃЯ'))
+32303031C384D0AF
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+HEX(date_format('2001-01-01','%YÃЯ'))
+003200300030003100C4042F
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+date_format('2001-01-01',_latin1'%YÃ')
+2001Ãâ
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+date_format('2001-01-01',_utf8'%YÃЯ')
+2001ÃЯ
+SELECT date_format('2001-01-01','%YÃЯ');
+date_format('2001-01-01','%YÃЯ')
+2001ÃЯ
+SELECT HEX(date_format('2001-01-01', 1000));
+HEX(date_format('2001-01-01', 1000))
+0031003000300030
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+HEX(date_format('2001-01-01', TIME('10:20:30')))
+00310030003A00320030003A00330030
+SELECT date_format('2001-01-01', 1000);
+date_format('2001-01-01', 1000)
+1000
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+date_format('2001-01-01', TIME('10:20:30'))
+10:20:30
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+Table Create Table
+t1 CREATE TABLE `t1` (
+ `a` varchar(1) CHARACTER SET ucs2 NOT NULL DEFAULT ''
+) ENGINE=MyISAM DEFAULT CHARSET=latin1
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+Warnings:
+Warning 1265 Data truncated for column 'a' at row 1
+SELECT * FROM t1;
+a
+M
+DROP TABLE IF EXISTS t1;
SET CHARACTER SET koi8r;
create table t1 (a varchar(2) character set ucs2 collate ucs2_bin, key(a));
insert into t1 values ('A'),('A'),('B'),('C'),('D'),('A\t');
diff --git a/mysql-test/r/ctype_ucs2_innodb.result b/mysql-test/r/ctype_ucs2_innodb.result
new file mode 100644
index 0000000..a5fe22b
--- /dev/null
+++ b/mysql-test/r/ctype_ucs2_innodb.result
@@ -0,0 +1,73 @@
+SET default_storage_engine=InnoDB;
+DROP TABLE IF EXISTS t1;
+#
+# Start of 5.5 tests
+#
+SET NAMES utf8, character_set_connection=ucs2;
+SELECT HEX('a'), HEX('a ');
+HEX('a') HEX('a ')
+0061 00610020
+#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SELECT HEX(date_format('0001-01-01', '%Y'));
+HEX(date_format('0001-01-01', '%Y'))
+0030003000300031
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%f'))
+003000300030003000300039
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'))
+00300030003000310020003000300030003000300039
+SELECT date_format('2001-01-01','%W rubb ish %w');
+date_format('2001-01-01','%W rubb ish %w')
+Monday rubb ish 1
+SELECT date_format('2001-01-01','%W rubb ish %');
+date_format('2001-01-01','%W rubb ish %')
+Monday rubb ish %
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+HEX(date_format('2001-01-01',_latin1'%YÃ'))
+32303031C384
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+HEX(date_format('2001-01-01',_utf8'%YÃЯ'))
+32303031C384D0AF
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+HEX(date_format('2001-01-01','%YÃЯ'))
+003200300030003100C4042F
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+date_format('2001-01-01',_latin1'%YÃ')
+2001Ãâ
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+date_format('2001-01-01',_utf8'%YÃЯ')
+2001ÃЯ
+SELECT date_format('2001-01-01','%YÃЯ');
+date_format('2001-01-01','%YÃЯ')
+2001ÃЯ
+SELECT HEX(date_format('2001-01-01', 1000));
+HEX(date_format('2001-01-01', 1000))
+0031003000300030
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+HEX(date_format('2001-01-01', TIME('10:20:30')))
+00310030003A00320030003A00330030
+SELECT date_format('2001-01-01', 1000);
+date_format('2001-01-01', 1000)
+1000
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+date_format('2001-01-01', TIME('10:20:30'))
+10:20:30
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+Table Create Table
+t1 CREATE TABLE `t1` (
+ `a` varchar(1) CHARACTER SET ucs2 NOT NULL DEFAULT ''
+) ENGINE=InnoDB DEFAULT CHARSET=latin1
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+Warnings:
+Warning 1265 Data truncated for column 'a' at row 1
+SELECT * FROM t1;
+a
+M
+DROP TABLE IF EXISTS t1;
+#
+# End of 5.5 tests
+#
diff --git a/mysql-test/r/ctype_utf32.result b/mysql-test/r/ctype_utf32.result
index 1f316b7..bc3dfc7 100644
--- a/mysql-test/r/ctype_utf32.result
+++ b/mysql-test/r/ctype_utf32.result
@@ -29,6 +29,68 @@ select 'a a' > 'a', 'a \0' < 'a';
select binary 'a a' > 'a', binary 'a \0' > 'a', binary 'a\0' > 'a';
binary 'a a' > 'a' binary 'a \0' > 'a' binary 'a\0' > 'a'
1 1 1
+SET NAMES utf8, character_set_connection=utf32;
+#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SELECT HEX(date_format('0001-01-01', '%Y'));
+HEX(date_format('0001-01-01', '%Y'))
+00000030000000300000003000000031
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%f'))
+000000300000003000000030000000300000003000000039
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'))
+0000003000000030000000300000003100000020000000300000003000000030000000300000003000000039
+SELECT date_format('2001-01-01','%W rubb ish %w');
+date_format('2001-01-01','%W rubb ish %w')
+Monday rubb ish 1
+SELECT date_format('2001-01-01','%W rubb ish %');
+date_format('2001-01-01','%W rubb ish %')
+Monday rubb ish %
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+HEX(date_format('2001-01-01',_latin1'%YÃ'))
+32303031C384
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+HEX(date_format('2001-01-01',_utf8'%YÃЯ'))
+32303031C384D0AF
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+HEX(date_format('2001-01-01','%YÃЯ'))
+00000032000000300000003000000031000000C40000042F
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+date_format('2001-01-01',_latin1'%YÃ')
+2001Ãâ
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+date_format('2001-01-01',_utf8'%YÃЯ')
+2001ÃЯ
+SELECT date_format('2001-01-01','%YÃЯ');
+date_format('2001-01-01','%YÃЯ')
+2001ÃЯ
+SELECT HEX(date_format('2001-01-01', 1000));
+HEX(date_format('2001-01-01', 1000))
+00000031000000300000003000000030
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+HEX(date_format('2001-01-01', TIME('10:20:30')))
+00000031000000300000003A00000032000000300000003A0000003300000030
+SELECT date_format('2001-01-01', 1000);
+date_format('2001-01-01', 1000)
+1000
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+date_format('2001-01-01', TIME('10:20:30'))
+10:20:30
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+Table Create Table
+t1 CREATE TABLE `t1` (
+ `a` varchar(1) CHARACTER SET utf32 NOT NULL DEFAULT ''
+) ENGINE=MyISAM DEFAULT CHARSET=latin1
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+Warnings:
+Warning 1265 Data truncated for column 'a' at row 1
+SELECT * FROM t1;
+a
+M
+DROP TABLE IF EXISTS t1;
select hex(_utf32 0x44);
hex(_utf32 0x44)
00000044
diff --git a/mysql-test/r/ctype_utf32_innodb.result b/mysql-test/r/ctype_utf32_innodb.result
new file mode 100644
index 0000000..a26c40f
--- /dev/null
+++ b/mysql-test/r/ctype_utf32_innodb.result
@@ -0,0 +1,73 @@
+SET default_storage_engine=InnoDB;
+DROP TABLE IF EXISTS t1;
+#
+# Start of 5.5 tests
+#
+SET NAMES utf8, character_set_connection=utf32;
+SELECT HEX('a'), HEX('a ');
+HEX('a') HEX('a ')
+00000061 0000006100000020
+#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SELECT HEX(date_format('0001-01-01', '%Y'));
+HEX(date_format('0001-01-01', '%Y'))
+00000030000000300000003000000031
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%f'))
+000000300000003000000030000000300000003000000039
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'))
+0000003000000030000000300000003100000020000000300000003000000030000000300000003000000039
+SELECT date_format('2001-01-01','%W rubb ish %w');
+date_format('2001-01-01','%W rubb ish %w')
+Monday rubb ish 1
+SELECT date_format('2001-01-01','%W rubb ish %');
+date_format('2001-01-01','%W rubb ish %')
+Monday rubb ish %
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+HEX(date_format('2001-01-01',_latin1'%YÃ'))
+32303031C384
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+HEX(date_format('2001-01-01',_utf8'%YÃЯ'))
+32303031C384D0AF
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+HEX(date_format('2001-01-01','%YÃЯ'))
+00000032000000300000003000000031000000C40000042F
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+date_format('2001-01-01',_latin1'%YÃ')
+2001Ãâ
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+date_format('2001-01-01',_utf8'%YÃЯ')
+2001ÃЯ
+SELECT date_format('2001-01-01','%YÃЯ');
+date_format('2001-01-01','%YÃЯ')
+2001ÃЯ
+SELECT HEX(date_format('2001-01-01', 1000));
+HEX(date_format('2001-01-01', 1000))
+00000031000000300000003000000030
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+HEX(date_format('2001-01-01', TIME('10:20:30')))
+00000031000000300000003A00000032000000300000003A0000003300000030
+SELECT date_format('2001-01-01', 1000);
+date_format('2001-01-01', 1000)
+1000
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+date_format('2001-01-01', TIME('10:20:30'))
+10:20:30
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+Table Create Table
+t1 CREATE TABLE `t1` (
+ `a` varchar(1) CHARACTER SET utf32 NOT NULL DEFAULT ''
+) ENGINE=InnoDB DEFAULT CHARSET=latin1
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+Warnings:
+Warning 1265 Data truncated for column 'a' at row 1
+SELECT * FROM t1;
+a
+M
+DROP TABLE IF EXISTS t1;
+#
+# End of 5.5 tests
+#
diff --git a/mysql-test/r/ctype_utf8.result b/mysql-test/r/ctype_utf8.result
index 91dbe85..8abec4f 100644
--- a/mysql-test/r/ctype_utf8.result
+++ b/mysql-test/r/ctype_utf8.result
@@ -5085,6 +5085,68 @@ maketime(`a`,`a`,`a`)
00:00:00.000000
DROP TABLE t1;
SET sql_mode=default;
+SET NAMES utf8;
+#
+# MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+#
+SELECT HEX(date_format('0001-01-01', '%Y'));
+HEX(date_format('0001-01-01', '%Y'))
+30303031
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%f'))
+303030303039
+SELECT HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'));
+HEX(date_format('0001-01-01 10:20:30.000009', '%Y %f'))
+3030303120303030303039
+SELECT date_format('2001-01-01','%W rubb ish %w');
+date_format('2001-01-01','%W rubb ish %w')
+Monday rubb ish 1
+SELECT date_format('2001-01-01','%W rubb ish %');
+date_format('2001-01-01','%W rubb ish %')
+Monday rubb ish %
+SELECT HEX(date_format('2001-01-01',_latin1'%YÃ'));
+HEX(date_format('2001-01-01',_latin1'%YÃ'))
+32303031C384
+SELECT HEX(date_format('2001-01-01',_utf8'%YÃЯ'));
+HEX(date_format('2001-01-01',_utf8'%YÃЯ'))
+32303031C384D0AF
+SELECT HEX(date_format('2001-01-01','%YÃЯ'));
+HEX(date_format('2001-01-01','%YÃЯ'))
+32303031C384D0AF
+SELECT date_format('2001-01-01',_latin1'%YÃ');
+date_format('2001-01-01',_latin1'%YÃ')
+2001Ãâ
+SELECT date_format('2001-01-01',_utf8'%YÃЯ');
+date_format('2001-01-01',_utf8'%YÃЯ')
+2001ÃЯ
+SELECT date_format('2001-01-01','%YÃЯ');
+date_format('2001-01-01','%YÃЯ')
+2001ÃЯ
+SELECT HEX(date_format('2001-01-01', 1000));
+HEX(date_format('2001-01-01', 1000))
+31303030
+SELECT HEX(date_format('2001-01-01', TIME('10:20:30')));
+HEX(date_format('2001-01-01', TIME('10:20:30')))
+31303A32303A3330
+SELECT date_format('2001-01-01', 1000);
+date_format('2001-01-01', 1000)
+1000
+SELECT date_format('2001-01-01', TIME('10:20:30'));
+date_format('2001-01-01', TIME('10:20:30'))
+10:20:30
+CREATE TABLE t1 AS SELECT IF(0=0,'Y','N') AS a LIMIT 0;
+SHOW CREATE TABLE t1;
+Table Create Table
+t1 CREATE TABLE `t1` (
+ `a` varchar(1) CHARACTER SET utf8 NOT NULL DEFAULT ''
+) ENGINE=MyISAM DEFAULT CHARSET=latin1
+INSERT INTO t1 VALUES (date_format('2001-01-01','%W'));
+Warnings:
+Warning 1265 Data truncated for column 'a' at row 1
+SELECT * FROM t1;
+a
+M
+DROP TABLE IF EXISTS t1;
#
# Bug#57687 crash when reporting duplicate group_key error and utf8
# Make sure to modify this when Bug#58081 is fixed.
diff --git a/mysql-test/t/ctype_gbk.test b/mysql-test/t/ctype_gbk.test
index b9e25e9..3493cec 100644
--- a/mysql-test/t/ctype_gbk.test
+++ b/mysql-test/t/ctype_gbk.test
@@ -154,7 +154,19 @@ WHERE HEX(CAST(UPPER(a) AS CHAR CHARACTER SET utf8)) <>
DROP TABLE t1;
+--echo #
+--echo # MDEV-7055 MySQL#74664 - InnoDB: Failing assertion: len <= col->len || col->mtype == 5 || (col->len == 0 && col->mtype == 1) in file rem0rec.cc line 845
+--echo #
+
+# Testing format string 0xA140 + '%' + 'W'
+# 0xA140 is an unassigned character in gbk.
+# It should be preserved in the DATE_FORMAT output
+# (should not be replaced to question mark)
+SET NAMES gbk;
+CREATE TABLE t1 AS SELECT DATE_FORMAT('2001-01-01',_gbk 0xA1402557) AS a;
+SELECT HEX(a), CONVERT(a USING utf8) FROM t1;
+DROP TABLE t1;
--echo #
--echo # End of 5.5 tests
diff --git a/mysql-test/t/ctype_latin1.test b/mysql-test/t/ctype_latin1.test
index aa66c81..60f8626 100644
--- a/mysql-test/t/ctype_latin1.test
+++ b/mysql-test/t/ctype_latin1.test
@@ -148,6 +148,33 @@ SELECT '' LIKE '' ESCAPE EXPORT_SET(1, 1, 1, 1, '');
--source include/ctype_numconv.inc
+SET NAMES latin1;
+--source include/ctype_date_format.inc
+
+# Check how date_format() handles bad and incomplete byte sequences.
+# It intentionally passes a broken utf8 format string,
+# consisting of the following parts:
+# "{0xFF}" - an invalid utf8 byte, should be replaced to the question mark '?'
+# "zzz" - a good byte sequence, should get into the result as is
+# "%Y" - a good format sequence, should be replaced to the year value
+# "%{0xFF}" - percent followed by a bad byte, should be replaced to '?'
+# "zzz" - good byte sequence, should get into the result as is
+# "{0xC3}" - an incomplete utf8 byte sequence, should be replaced to '?'
+# The test resides in ctype_latin1.test to avoid malformed utf8 data
+# in ctype_utf8.test. Note, the format string is a valid latin1 string.
+SET NAMES utf8;
+SELECT HEX('ÿzzz%Y%ÿzzzÃ') AS format;
+SELECT HEX(DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzÃ')) AS date;
+SELECT DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzÃ') AS date;
+
+# Now a similar test with 0xEFBD instead of 0xC3 at the end,
+# which is an incomplete 2-byte beginning of a 3-byte utf8 character,
+# e.g. 0xEFBDA6 (U+FF66 HALFWIDTH KATAKANA LETTER WO)
+# 0xEFBD should be replaced to two question marks in the result.
+SELECT HEX('ÿzzz%Y%ÿzzzï½') AS format;
+SELECT HEX(DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzï½')) AS date;
+SELECT DATE_FORMAT('2001-01-01', 'ÿzzz%Y%ÿzzzï½') AS date;
+
--echo #
--echo # Bug#11764503 (Bug#57341) Query in EXPLAIN EXTENDED shows wrong characters
--echo #
diff --git a/mysql-test/t/ctype_ucs.test b/mysql-test/t/ctype_ucs.test
index 7fd3768..d21d6ff 100644
--- a/mysql-test/t/ctype_ucs.test
+++ b/mysql-test/t/ctype_ucs.test
@@ -12,6 +12,9 @@ SET NAMES latin1;
SET character_set_connection=ucs2;
-- source include/endspace.inc
+SET NAMES utf8, character_set_connection=ucs2;
+-- source include/ctype_date_format.inc
+
SET CHARACTER SET koi8r;
#
diff --git a/mysql-test/t/ctype_ucs2_innodb.test b/mysql-test/t/ctype_ucs2_innodb.test
new file mode 100644
index 0000000..3a38ec9
--- /dev/null
+++ b/mysql-test/t/ctype_ucs2_innodb.test
@@ -0,0 +1,20 @@
+-- source include/have_innodb.inc
+-- source include/have_ucs2.inc
+
+SET default_storage_engine=InnoDB;
+
+--disable_warnings
+DROP TABLE IF EXISTS t1;
+--enable_warnings
+
+--echo #
+--echo # Start of 5.5 tests
+--echo #
+
+SET NAMES utf8, character_set_connection=ucs2;
+SELECT HEX('a'), HEX('a ');
+-- source include/ctype_date_format.inc
+
+--echo #
+--echo # End of 5.5 tests
+--echo #
diff --git a/mysql-test/t/ctype_utf32.test b/mysql-test/t/ctype_utf32.test
index 1be8925..4ff136f 100644
--- a/mysql-test/t/ctype_utf32.test
+++ b/mysql-test/t/ctype_utf32.test
@@ -15,6 +15,9 @@ SET character_set_connection=utf32;
select hex('a'), hex('a ');
-- source include/endspace.inc
+SET NAMES utf8, character_set_connection=utf32;
+-- source include/ctype_date_format.inc
+
#
# Check that incomplete utf32 characters in HEX notation
# are left-padded with zeros
diff --git a/mysql-test/t/ctype_utf32_innodb.test b/mysql-test/t/ctype_utf32_innodb.test
new file mode 100644
index 0000000..02b9230
--- /dev/null
+++ b/mysql-test/t/ctype_utf32_innodb.test
@@ -0,0 +1,20 @@
+-- source include/have_innodb.inc
+-- source include/have_utf32.inc
+
+SET default_storage_engine=InnoDB;
+
+--disable_warnings
+DROP TABLE IF EXISTS t1;
+--enable_warnings
+
+--echo #
+--echo # Start of 5.5 tests
+--echo #
+
+SET NAMES utf8, character_set_connection=utf32;
+SELECT HEX('a'), HEX('a ');
+-- source include/ctype_date_format.inc
+
+--echo #
+--echo # End of 5.5 tests
+--echo #
diff --git a/mysql-test/t/ctype_utf8.test b/mysql-test/t/ctype_utf8.test
index 8cd70e9..5ca3523 100644
--- a/mysql-test/t/ctype_utf8.test
+++ b/mysql-test/t/ctype_utf8.test
@@ -1556,6 +1556,9 @@ DROP TABLE t1, t2;
SET NAMES utf8;
--source include/ctype_numconv.inc
+SET NAMES utf8;
+--source include/ctype_date_format.inc
+
--echo #
--echo # Bug#57687 crash when reporting duplicate group_key error and utf8
--echo # Make sure to modify this when Bug#58081 is fixed.
diff --git a/sql/item_timefunc.cc b/sql/item_timefunc.cc
index 522004e..4eae1bf 100644
--- a/sql/item_timefunc.cc
+++ b/sql/item_timefunc.cc
@@ -447,6 +447,111 @@ static bool extract_date_time(DATE_TIME_FORMAT *format,
/**
+ A multi-byte safe helper class to read characters from a string.
+*/
+class Wchar_reader
+{
+ CHARSET_INFO *m_cs;
+ const char *m_ptr;
+ const char *m_end;
+ int m_chlen;
+
+public:
+ Wchar_reader(CHARSET_INFO *cs, const char *str, size_t length)
+ :m_cs(cs), m_ptr(str), m_end(str + length)
+ { }
+ /**
+ Append the character scanned during the last call of read()
+ to the target String.
+ Note: to->charset() and m_cs should be binary compatible
+ and require no character set conversion.
+ @return false - on success
+ @return true - on error
+ */
+ bool append_last_character_to_string(String *to)
+ {
+ if (m_chlen > 0)
+ {
+ // A regular or an unassigned character was found by read().
+ return to->append(m_ptr - m_chlen, m_chlen, &my_charset_bin);
+ }
+ else
+ {
+ // A bad or incomplete byte sequence was found by read().
+ return to->append("?", 1);
+ }
+ }
+ /**
+ Test if end-of-line has been reached.
+ */
+ bool eol() const { return m_ptr >= m_end; }
+ /**
+ Read a character and return its Unicode code point.
+
+ @param [OUT] wc - a pointer to a Unicode code point variable
+ @return false - on success
+ @return true - on error (end-of-line)
+
+ In case of end-of-line, true is returned. The value of *wc is not
+ initialized and should not be read by the caller.
+
+ Otherwise (if end-of-line has not been reached yet), read() returns false,
+ m_ptr is shifted forward and *wc and m_chlen are set as follows:
+
+ 1. In case if a regular character is found, *wc is assigned to its
+ Unicode code point, m_chlen is assigned to the octet length of the
+ character, the pointer m_ptr is shifted to the octet length of the
+ character.
+
+ 2. In case if a valid but unassigned character is found, *wc is assigned
+ to '?', m_chlen is assigned to the octet length of the character,
+ the pointer m_ptr is shifted to the octet length of the character.
+
+ 3. In case of an invalid (or incomplete) byte sequence, *wc is assigned
+ to '?', m_chlen is assigned to 0, the pointer m_ptr is shifted one byte.
+ Note, individual question marks are consequently returned for every
+ byte of an invalid or incomplete byte sequence.
+
+ This behaviour is in accord with a 10.0 task:
+ MDEV-6566 Different .. behaviour .. with and without .. conversion
+ */
+ bool read(my_wc_t *wc)
+ {
+ m_chlen= m_cs->cset->mb_wc(m_cs, wc, (uchar *) m_ptr, (uchar *) m_end);
+ if (m_chlen > 0)
+ {
+ m_ptr+= m_chlen;
+ DBUG_ASSERT(m_ptr <= m_end);
+ return false; // A normal character, *wc was set by mb_wc()
+ }
+ if (m_ptr >= m_end)
+ {
+ DBUG_ASSERT(m_chlen <= MY_CS_TOOSMALL);
+ return true; // End-of-line
+ }
+ if (MY_CS_MB_WC_UNASSIGNED(m_chlen))
+ {
+ m_chlen= -m_chlen;
+ *wc= '?';
+ m_ptr+= m_chlen;
+ DBUG_ASSERT(m_ptr <= m_end);
+ return false; // An unassigned (but a valid) character
+ }
+ // Here we have a bad or an incomplete byte sequence
+ *wc= '?';
+ m_ptr++; // Shift one byte forward
+ if (m_chlen == MY_CS_ILSEQ) // MY_CS_ILSEQ is defined as 0, so m_chlen==0
+ return false; // Bad byte sequence
+ // Here we can have a non-empty incomplete sequence only
+ DBUG_ASSERT(m_chlen <= MY_CS_TOOSMALL2);
+ DBUG_ASSERT(m_chlen >= MY_CS_TOOSMALL6);
+ m_chlen= 0;
+ return false; // Incomplete byte sequence
+ }
+};
+
+
+/**
Create a formated date/time value in a string.
*/
@@ -457,21 +562,31 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
uint hours_i;
uint weekday;
ulong length;
- const char *ptr, *end;
-
+ my_wc_t wc;
+ Wchar_reader reader(str->charset(), format->format.str,
+ format->format.length);
str->length(0);
if (l_time->neg)
str->append('-');
- end= (ptr= format->format.str) + format->format.length;
- for (; ptr != end ; ptr++)
+ while (!reader.read(&wc))
{
- if (*ptr != '%' || ptr+1 == end)
- str->append(*ptr);
+ if (wc != '%' || reader.eol())
+ {
+ // A regular character, or a trailing '%'
+ if (reader.append_last_character_to_string(str))
+ return true;
+ }
else
{
- switch (*++ptr) {
+ // A '%' sequnce found, scan the next character after '%'
+ if (reader.read(&wc))
+ {
+ DBUG_ASSERT(0); // read() should not fail in case when eol() is false.
+ return true;
+ }
+ switch (wc) {
case 'M':
if (!l_time->month)
return 1;
@@ -617,7 +732,7 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
if (type == MYSQL_TIMESTAMP_TIME)
return 1;
length= (uint) (int10_to_str(calc_week(l_time,
- (*ptr) == 'U' ?
+ wc == 'U' ?
WEEK_FIRST_WEEKDAY : WEEK_MONDAY_FIRST,
&year),
intbuff, 10) - intbuff);
@@ -631,7 +746,7 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
if (type == MYSQL_TIMESTAMP_TIME)
return 1;
length= (uint) (int10_to_str(calc_week(l_time,
- ((*ptr) == 'V' ?
+ (wc == 'V' ?
(WEEK_YEAR | WEEK_FIRST_WEEKDAY) :
(WEEK_YEAR | WEEK_MONDAY_FIRST)),
&year),
@@ -646,7 +761,7 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
if (type == MYSQL_TIMESTAMP_TIME)
return 1;
(void) calc_week(l_time,
- ((*ptr) == 'X' ?
+ (wc == 'X' ?
WEEK_YEAR | WEEK_FIRST_WEEKDAY :
WEEK_YEAR | WEEK_MONDAY_FIRST),
&year);
@@ -664,8 +779,9 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
break;
default:
- str->append(*ptr);
- break;
+ if (reader.append_last_character_to_string(str))
+ return true;
+ break;
}
}
}
@@ -1732,11 +1848,10 @@ void Item_func_date_format::fix_length_and_dec()
Item *arg1= args[1]->this_item();
decimals=0;
- CHARSET_INFO *cs= thd->variables.collation_connection;
- uint32 repertoire= arg1->collation.repertoire;
+ if (agg_arg_charsets_for_string_result(collation, &args[1], 1))
+ return;
if (!thd->variables.lc_time_names->is_ascii)
- repertoire|= MY_REPERTOIRE_EXTENDED;
- collation.set(cs, arg1->collation.derivation, repertoire);
+ collation.repertoire|= MY_REPERTOIRE_EXTENDED;
if (arg1->type() == STRING_ITEM)
{ // Optimize the normal case
fixed_length=1;
@@ -1782,16 +1897,22 @@ bool Item_func_date_format::eq(const Item *item, bool binary_cmp) const
uint Item_func_date_format::format_length(const String *format)
{
uint size=0;
- const char *ptr=format->ptr();
- const char *end=ptr+format->length();
+ my_wc_t wc;
+ Wchar_reader reader(format->charset(), format->ptr(), format->length());
- for (; ptr != end ; ptr++)
+ while (!reader.read(&wc))
{
- if (*ptr != '%' || ptr == end-1)
- size++;
+ if (wc != '%' || reader.eol())
+ size++; // A regular character, or a trailing '%'
else
{
- switch(*++ptr) {
+ // A '%' sequence found, scan the next character after '%'
+ if (reader.read(&wc))
+ {
+ DBUG_ASSERT(0); // read() should not fail in case whe eol() is false.
+ break;
+ }
+ switch (wc) {
case 'M': /* month, textual */
case 'W': /* day (of the week), textual */
size += 64; /* large for UTF8 locale data */
diff --git a/sql/sql_string.cc b/sql/sql_string.cc
index 885f53a..62191d2 100644
--- a/sql/sql_string.cc
+++ b/sql/sql_string.cc
@@ -537,6 +537,15 @@ bool String::append(IO_CACHE* file, uint32 arg_length)
return FALSE;
}
+/**
+ Append an ASCII string, optionally fill a prefix.
+ @param s - a pointer to an ASCII string
+ @param arg_length - length of the ASCII string
+ @param full_length - the desired character length of the piece
+ to be added
+ @param fill_char - make a prefix consisting of this character,
+ if the desired full_length is bigger that arg_length.
+*/
bool String::append_with_prefill(const char *s,uint32 arg_length,
uint32 full_length, char fill_char)
{
@@ -547,8 +556,28 @@ bool String::append_with_prefill(const char *s,uint32 arg_length,
t_length= full_length - arg_length;
if (t_length > 0)
{
- bfill(Ptr+str_length, t_length, fill_char);
- str_length=str_length + t_length;
+ if (!(charset()->state & MY_CS_NONASCII))
+ {
+ /*
+ An ASCII string can be appended directly
+ to an ASCII-compatible string. This includes
+ multi-byte character sets, like utf8, sjis, etc.
+ */
+ bfill(Ptr+str_length, t_length, fill_char);
+ str_length=str_length + t_length;
+ }
+ else
+ {
+ /*
+ Needs conversion to append an ASCII string to ASCII-incompatible
+ character sets, such as ucs2, utf16, utf16le, utf32.
+ */
+ for (int i= 0; i < t_length; i++)
+ {
+ if (append(&fill_char, 1, &my_charset_latin1))
+ return true;
+ }
+ }
}
append(s, arg_length);
return FALSE;
References