maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09089
Re: Please review MDEV-7055 MySQL#74664 - InnoDB: Failing assertion ...
Hi, Alexander!
On Dec 08, Alexander Barkov wrote:
> Hi Sergei,
>
> Please review a patch for MDEV-7055.
>
> Thanks.
> diff --git a/sql/item_timefunc.cc b/sql/item_timefunc.cc
> index 522004e..a2a6fff 100644
> --- a/sql/item_timefunc.cc
> +++ b/sql/item_timefunc.cc
> @@ -447,6 +447,70 @@ static bool extract_date_time(DATE_TIME_FORMAT *format,
>
>
> /**
> + A multi-byte safe helper class to read characters from a string.
> +
> + QQ: Serg: which file to put this new class in?
> + It can be helpful for some other purposes
> + (not only here in item_timefunc.cc)
> + I remember you don't like such things in sql_string.h :)
Thoughts:
1. I'd call it, like, string (or character) iterator
2. There's something like that in 5.6, you might want to see what API
they use
3. Should rather be in C (inline, if possible, with a C++ wrapper, if
you'd like) so that we can put it in a service along with CHARSET_INFO
4. Should rather be in CHARSET_INFO so that you could have fast bytewise
iterator for simple charsets.
And yes, 3 and 4 contradict each other, to a certain extent :)
That's because they're thoughts, not a complete how-to plan.
> +*/
> +class Wchar_reader
> +{
> + CHARSET_INFO *m_cs;
> + const char *m_ptr;
> + const char *m_end;
> +public:
> + Wchar_reader(CHARSET_INFO *cs, const char *str, size_t length)
> + :m_cs(cs), m_ptr(str), m_end(str + length)
> + { }
> + bool eol() const { return m_ptr >= m_end; }
> + /**
> + Read a character, return its Unicode code point and octet length.
> +
> + @param [OUT] wc - a pointer to a Unicode code point variable
> + @param [OUT] chlen - a pointer to a character length variable
> + @return false - on success
> + @return true - on error (end of line, or a bad byte sequence)
> +
> + Converts negative lengths in the range -6..-1
> + (which mb_wc() returns for valid but unassigned characters)
> + to positive lengths 1..6, so the caller does not have
> + to care about unassigned characters. The caller will just see
> + such characters as "U+003F QUESTION MARK", but with length
> + not necessarily equal to 1.
> + */
> + bool read(my_wc_t *wc, int *chlen)
> + {
> + *chlen= m_cs->cset->mb_wc(m_cs, wc, (uchar *) m_ptr, (uchar *) m_end);
> + if (*chlen <= 0)
> + {
> + if (*chlen < -6 || *chlen == 0)
> + return true; // End of line, or a bad byte sequence
> + *chlen= -(*chlen); // An unassigned (but a valid) character found
> + *wc= '?'; // Initialize *wc to QUESTION MARK
> + }
> + m_ptr+= *chlen; // Shift the pointer to the next character.
> + return false;
> + }
> + /**
> + Read a character when its length is not important for the caller
> + @param [OUT] wc - a pointer to a Unicode code point variable
> + @return false - on succes
> + @return true - on error (end of line, or an invalid byte sequence)
> + */
> + bool read(my_wc_t *wc)
> + {
> + int chlen;
> + return read(wc, &chlen);
> + }
> + /**
> + Return a ponter to the next character in the queue.
> + */
> + const char *ptr() const { return m_ptr; }
> +};
> +
> +
> +/**
> Create a formated date/time value in a string.
> */
>
> @@ -457,21 +521,29 @@ static bool make_date_time(DATE_TIME_FORMAT *format, MYSQL_TIME *l_time,
> uint hours_i;
> uint weekday;
> ulong length;
> - const char *ptr, *end;
> + my_wc_t wc;
> + int chlen;
> + Wchar_reader reader(str->charset(), format->format.str,
> + format->format.length);
is str->charset() the character set of the format string?
> - end= (ptr= format->format.str) + format->format.length;
> - for (; ptr != end ; ptr++)
> + for ( ; !reader.read(&wc, &chlen) ; )
You don't distinguish here between end of string and invalid character.
Is that ok?
> diff --git a/sql/sql_string.cc b/sql/sql_string.cc
> index 885f53a..21fba79 100644
> --- a/sql/sql_string.cc
> +++ b/sql/sql_string.cc
> @@ -547,8 +556,28 @@ bool String::append_with_prefill(const char *s,uint32 arg_length,
> t_length= full_length - arg_length;
> if (t_length > 0)
> {
> - bfill(Ptr+str_length, t_length, fill_char);
> - str_length=str_length + t_length;
> + if (charset()->mbminlen == 1)
> + {
> + /*
> + An ASCII string can be appended directly
> + to an ASCII-compatible string. This includes
> + multi-byte character sets, like utf8, sjis, etc.
> + */
but (mbminlen == 1) doesn't necessarily mean "ASCII-compatible",
remember "filename" charset? I thought you've had an "ASCII-compatible"
property somewhere in the CHARSET_INFO.
> + bfill(Ptr+str_length, t_length, fill_char);
> + str_length=str_length + t_length;
> + }
> + else
> + {
> + /*
> + Needs conversion to append an ASCII string to ASCII-incompatible
> + character sets, such as ucs2, utf16, utf16le, utf32.
> + */
> + for (int i= 0; i < t_length; i++)
> + {
> + if (append(&fill_char, 1, &my_charset_latin1))
> + return true;
> + }
> + }
> }
> append(s, arg_length);
> return FALSE;
Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx
--
Vote for my Percona Live 2016 talks:
https://www.percona.com/live/data-performance-conference-2016/sessions/mariadb-connectors-fast-and-smart-new-protocol-optimizations#community-voting
https://www.percona.com/live/data-performance-conference-2016/sessions/mariadb-101-security-validation-authentication-encryption#community-voting
Follow ups
References