maria-developers team mailing list archive

Thread
Date

Re: MDEV-13118 Wrong results with LOWER and UPPER and subquery

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Thu, 2 Nov 2017 17:26:47 +0400
Cc: maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20171024090954.GA2850@meddwl.fritz.box>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

Hello Sergei,

Thanks for your reviews. Please find my comments below.

On 10/24/2017 01:09 PM, Sergei Golubchik wrote:
> Hi, Alexander!
> 
> On Jul 22, Alexander Barkov wrote:
>> On 06/20/2017 12:45 PM, Alexander Barkov wrote:
>>> Hello Sergei,
>>>
>>>
>>> Please review a patch for MDEV-13118.
>>>
>>> Thanks!
>>>
> 
>> commit dc7352fa2bc21fd9b66b12ec33cdacf88b478157
>> Author: Alexander Barkov <bar@xxxxxxxxxxx>
>> Date:   Tue Jun 20 12:44:36 2017 +0400
>>
>>     MDEV-13118 Wrong results with LOWER and UPPER and subquery
>>     
>>     This problem is similar to MDEV-10306.
>>     
>>     1. Fixing Item_str_conv::val_str(String *str) to return the result in "str",
>>        and to use tmp_value only as a temporary buffer for args[0]->val_str().
>>        The new code version now guarantees that the result is always returned in
>>        "str". The trick with copy_if_not_alloced() is not used any more.
>>     
>>     2. Removing the optimization that *some* character sets used in casedn()
>>        and caseup(), which allowed (and required) to change the case in-place,
>>        overwriting the string passed as the "src" argument.
>>        Now all CHARSET_INFO's work in the same way:
>>        non of them change the source string in-place, all of them now convert
>>        case from the source string to the destination string, leaving
>>        the source string untouched.
> 
> I don't understand why it was needed. If the goal is to return the
> result in 'str' and never in 'tmp_value', it could've been achieved
> without doing 1. or 2., something like

There was a symmetry in the charset library in caseup() and casedn()
implementations for various charsets:

- charsets whose case conversion does not change octet length
implemented "inplace" conversion.
- charsets whose case conversion change octet length implemented
  "copying" conversion.

That was done specifically for Item_str_conv.

Now with Item_str_conv returning the value in "str" anyway:

- This optimization is not needed any more.
  There is no sense to do case conversion inplace and
  then immediately copy the modified string to another String
  (or the other way around: copy to a new String and then
   convert inplace).


- It's good to make the charset library symmetric for all charsets


> 
> diff --git a/sql/item_strfunc.cc b/sql/item_strfunc.cc
> index 89e55234482..f1837a4477b 100644
> --- a/sql/item_strfunc.cc
> +++ b/sql/item_strfunc.cc
> @@ -1560,30 +1560,28 @@ String *Item_str_conv::val_str(String *str)
>  {
>    DBUG_ASSERT(fixed == 1);
>    String *res;
> -  if (!(res=args[0]->val_str(str)))
> -  {
> -    null_value=1; /* purecov: inspected */
> -    return 0; /* purecov: inspected */
> -  }
> -  null_value=0;
>    if (multiply == 1)
>    {
>      uint len;
> -    res= copy_if_not_alloced(&tmp_value, res, res->length());
> +    if ((null_value= !(res= args[0]->val_str(str))))
> +      return 0;
> +    res= copy_if_not_alloced(str, res, res->length());
>      len= converter(collation.collation, (char*) res->ptr(), res->length(),
>                                          (char*) res->ptr(), res->length());
>      DBUG_ASSERT(len <= res->length());
>      res->length(len);
>    }
>    else
>    {
> +    if ((null_value= !(res= args[0]->val_str(&tmp_value))))
> +      return 0;
>      uint len= res->length() * multiply;
> -    tmp_value.alloc(len);
> -    tmp_value.set_charset(collation.collation);
> +    str->alloc(len);
> +    str->set_charset(collation.collation);
>      len= converter(collation.collation, (char*) res->ptr(), res->length(),
> -                                        (char*) tmp_value.ptr(), len);
> -    tmp_value.length(len);
> -    res= &tmp_value;
> +                                        (char*) str->ptr(), len);
> +    str->length(len);
> +    res= str;
>    }
>    return res;
>  }
> 
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
>

Follow ups

Re: MDEV-13118 Wrong results with LOWER and UPPER and subquery
From: Sergei Golubchik, 2018-07-12

References

MDEV-13118 Wrong results with LOWER and UPPER and subquery
From: Alexander Barkov, 2017-06-20
Re: MDEV-13118 Wrong results with LOWER and UPPER and subquery
From: Alexander Barkov, 2017-07-22
Re: MDEV-13118 Wrong results with LOWER and UPPER and subquery
From: Sergei Golubchik, 2017-10-24