maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #10702
Re: String functions have their item->maybe_null=true unnecessarily, blocking optimizations
Hello Sergey,
On 05/15/2017 04:45 PM, Sergey Petrunia wrote:
> Hi Alexander,
>
> I was looking at
>
> https://jira.mariadb.org/browse/MDEV-11893
>
> and found this piece of code:
>
> bool Item_str_func::fix_fields(THD *thd, Item **ref)
> {
> bool res= Item_func::fix_fields(thd, ref);
> /*
> In Item_str_func::check_well_formed_result() we may set null_value
> flag on the same condition as in test() below.
> */
> maybe_null= maybe_null || thd->is_strict_mode();
> return res;
> }
>
> This has been introduced by this cset:
>
> https://github.com/MariaDB/server/commit/af22eb35e577ef17226faf662f2cffc4705bde26
>
> Which says:
>
> Add Item_str_func::fix_fields() implementation, and set maybe_null to
> TRUE if we are in the SQL mode that requires some functions to return
> null even if they normally do not.
>
> The patch has only one example of CHAR() function doing that.
>
> At the same time, not having NOT-NULL attribute disallows query opimizations
> (see MDEV-11893 for an example).
> I think we should have this
>
> maybe_null= maybe_null || thd->is_strict_mode();
>
> logic on a case-by-case basis only. I wanted to check with you - are there any
> other known string functions that, as the patch puts it "return null even if they
> normally do not"?
The idea behind this code is to return:
- A well-formed result, or NULL in strict mode.
- A well-formed result, or a fixed result in non-strict mode.
If val_str() sees that some bad byte sequences were fixed to question
marks, then it returns NULL in strict mode, or the fixed string in
non-strict mode.
There are more functions except CHAR() that can return NULL or not-NULL
depending on strict mode.
The most important is Item_func_conv_charset.
It's val_str() uses String_copier_for_item::copy_with_warn(),
whose result depends on strict mode.
The important thing about Item_func_conv_charset is that it can be
automatically generated during fix_fields() whenever a character set
conversion is needed. See Item::safe_charset_converter().
So it can appear even if it did not exists in the query typed by the user.
For example:
CREATE OR REPLACE TABLE t1 (a VARBINARY(1));
INSERT INTO t1 VALUES (0xFF);
SELECT CONCAT(_utf8 'a' COLLATE utf8_general_ci, a) FROM t1;
The SELECT query is automatically replaced to:
SELECT CONCAT(_utf8 'a' COLLATE utf8_general_ci, CONVERT(a USING utf8))
FROM t1;
For every Item_func_xxx it is possible to track down recursively during
fix_fields() if the result is going to be
well-formed for all possible argument values. In many cases it will
also depend on the current @@character_set_connection value.
I'm not sure how much efforts a precise solution will need.
But in the example in MDEV-11893, it's clear that the result is going to
be well-formed, because:
- A datetime field converted to string always returns a result with
ASCII repertoire, which is well-formed.
- A string literal consisting of ASCII characters is also well-formed
- A concatenation of two ASCII-repertoire results is always well-formed,
even if character set conversion is needed.
We could start with checking repertoire. If it's MY_REPERTOIRE_ASCII,
then the result should be well-formed. It will at least cover the cases
like MDEV-11893.
>
> BR
> Sergei
>
References