← Back to team overview

maria-developers team mailing list archive

Re: String functions have their item->maybe_null=true unnecessarily, blocking optimizations

 

Hello Sergey,


On 05/15/2017 04:45 PM, Sergey Petrunia wrote:
> Hi Alexander,
> 
> I was looking at 
> 
> https://jira.mariadb.org/browse/MDEV-11893
> 
> and found this piece of code:
> 
> bool Item_str_func::fix_fields(THD *thd, Item **ref)
> {
>   bool res= Item_func::fix_fields(thd, ref);
>   /*
>     In Item_str_func::check_well_formed_result() we may set null_value
>     flag on the same condition as in test() below.
>   */
>   maybe_null= maybe_null || thd->is_strict_mode();
>   return res;
> }
> 
> This has been introduced by this cset:
> 
> https://github.com/MariaDB/server/commit/af22eb35e577ef17226faf662f2cffc4705bde26
> 
> Which says:
> 
>   Add Item_str_func::fix_fields() implementation, and set maybe_null to
>   TRUE if we are in the SQL mode that requires some functions to return
>   null even if they normally do not.
> 
> The patch has only one example of CHAR() function doing that.
> 
> At the same time, not having NOT-NULL attribute disallows query opimizations
> (see MDEV-11893 for an example).
> I think we should have this 
> 
>   maybe_null= maybe_null || thd->is_strict_mode();
> 
> logic on a case-by-case basis only.  I wanted to check with you - are there any
> other known string functions that, as the patch puts it "return null even if they
> normally do not"? 

The idea behind this code is to return:
- A well-formed result, or NULL in strict mode.
- A well-formed result, or a fixed result in non-strict mode.

If val_str() sees that some bad byte sequences were fixed to question
marks, then it returns NULL in strict mode, or the fixed string in
non-strict mode.


There are more functions except CHAR() that can return NULL or not-NULL
depending on strict mode.

The most important is Item_func_conv_charset.
It's val_str() uses  String_copier_for_item::copy_with_warn(),
whose result depends on strict mode.


The important thing about Item_func_conv_charset is that it can be
automatically generated during fix_fields() whenever a character set
conversion is needed. See Item::safe_charset_converter().
So it can appear even if it did not exists in the query typed by the user.


For example:

CREATE OR REPLACE TABLE t1 (a VARBINARY(1));
INSERT INTO t1 VALUES (0xFF);
SELECT CONCAT(_utf8 'a' COLLATE utf8_general_ci, a) FROM t1;

The SELECT query is automatically replaced to:

SELECT CONCAT(_utf8 'a' COLLATE utf8_general_ci, CONVERT(a USING utf8))
FROM t1;



For every Item_func_xxx it is possible to track down recursively during
fix_fields() if the result is going to be
well-formed for all possible argument values. In many cases it will
also depend on the current @@character_set_connection value.
I'm not sure how much efforts a precise solution will need.

But in the example in MDEV-11893, it's clear that the result is going to
be well-formed, because:

- A datetime field converted to string always returns a result with
ASCII repertoire, which is well-formed.
- A string literal consisting of ASCII characters is also well-formed
- A concatenation of two ASCII-repertoire results is always well-formed,
  even if character set conversion is needed.

We could start with checking repertoire. If it's MY_REPERTOIRE_ASCII,
then the result should be well-formed. It will at least cover the cases
like MDEV-11893.


> 
> BR
>  Sergei
> 


References