← Back to team overview

maria-developers team mailing list archive

Re: bb-10.2-ext / rpad lpad with 2 args

 

Hello Jerome,


On 05/02/2017 11:53 AM, jerome brauge wrote:
> Hello Alexander,
> Thanks for the review.
> I have already moved code of text_literal into lex in my last patch (empty_strings.diff).
> I will change it to use your new method.

I have pushed a patch adding THD::make_string_literal() to bb-10.2-ext.
Please pull to the latest commit.


Instead of this code:

Item *param_3= new (thd->mem_root) Item_string(thd,
               param_1->collation.collation,
               " ", 1);
please use:

Item *param_3= thd->make_string_literal(" ", 1, MY_REPERTOIRE_ASCII);


After fixing this line, please create a pull request for this patch.
Don't forget to mention that you share the code under either BSD or MCA.

Please use this task in commit comments:
MDEV-12658 Make the third parameter to LPAD and RPAD optional
(I just created it)

Thank you very much!


> 
> Do you have news from Monty for the patch on stacktrace as notes ?
> 
> Best regards,
> Jérôme.
> 
>> -----Message d'origine-----
>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
>> Envoyé : mardi 2 mai 2017 09:39
>> À : jerome brauge
>> Cc : MariaDB Developers
>> Objet : Re: bb-10.2-ext / rpad lpad with 2 args
>>
>> Hi Jerome,
>>
>> Thanks for your contribution!
>> The patch generally looks good.
>>
>> This piece of code is not fully correct though:
>>
>> +    Item *param_3= new (thd->mem_root) Item_string(thd,
>> +
>> param_1->collation.collation
>> +                                                   " ", 1);
>>
>> There are two problems here:
>>
>> 1. param_1->collation.collation is not precise at this point.
>> It will become valid later, after param_1 calls its fix_fields().
>>
>> 2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32 it will
>> not work fine, as space occupies two bytes (in UCS2 and UTF16) or four bytes
>> (in UTF32). We need to emulate the same behavior like when the user types
>> space as the last argument, so conversion from @@character_set_client to
>> @@character_set_connection is needed.
>>
>> To make this work correctly, we need to repeat this logic from sql_yacc.yy:
>>
>>
>>> text_literal:
>>>           TEXT_STRING
>>>           {
>>>             LEX_CSTRING tmp;
>>>             CHARSET_INFO *cs_con= thd->variables.collation_connection;
>>>             CHARSET_INFO *cs_cli= thd->variables.character_set_client;
>>>             uint repertoire= $1.repertoire(cs_cli);
>>>             if (thd->charset_is_collation_connection ||
>>>                 (repertoire == MY_REPERTOIRE_ASCII &&
>>>                  my_charset_is_ascii_based(cs_con)))
>>>               tmp= $1;
>>>             else
>>>             {
>>>               LEX_STRING to;
>>>               if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli))
>>>                 MYSQL_YYABORT;
>>>               tmp.str=    to.str;
>>>               tmp.length= to.length;
>>>             }
>>>             $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length,
>>>                                                 cs_con,
>>>                                                 DERIVATION_COERCIBLE,
>>>                                                 repertoire);
>>>             if ($$ == NULL)
>>>               MYSQL_YYABORT;
>>>           }
>>
>>
>> I'm going to move this code from sql_yacc.yy to a new method in THD:
>>
>> Item_string *THD::make_item_string(const char *str, size_t length) const;
>>
>> so you can reuse it in your patch.
>>
>> I'm starting working on it right now and will tell you when it's ready.
>>
>> Thanks.
>>
>>
>>
>> On 04/26/2017 11:45 AM, jerome brauge wrote:
>>> Hi Alexander,
>>> This is the patch for rpad/lpad with 2 args (call of make_empty_string() in
>> val_str() is to prepare MDEV-10574).
>>>
>>> Regard,
>>> Jérôme.
>>>
>>> PS : I share this code under terms of MCA.
>>>
>>>
>>>> -----Message d'origine-----
>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : mercredi 26
>>>> avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet :
>>>> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
>>>>
>>>> Hello Jerome,
>>>>
>>>> Thanks for working on this!
>>>>
>>>> On 04/25/2017 11:42 PM, jerome brauge wrote:
>>>>> Hello Alexander,
>>>>> Some times ago, we have had the below discussion.
>>>>> I made a patch following my first idea (it's enough from our point of
>> view) :
>>>> replace empty string by null in literals and in make_empty_string.
>>>>> I think this patch can be complementary to your suggestion of adding
>>>>> an Item_func_eq_oracle (for data inserted in default sql_mode)
>>>>>
>>>>> This patch also contains following some functions  changes:
>>>>> - replace ('ab','a',null) returns 'a' instead null
>>>>> - trim / ltrim / rtrim returns null when input only contains spaces
>>>>> - lpad /rpad can accept 2 args (default padding char is space) and
>>>>> if length equals to 0, returns null -substring : if start index is
>>>>> equal to 0, act as if it's equals to 1 -add a function chr() as a
>>>>> synonym of
>>>>> char() (but accept only one arg) -add a function lengthb() as a
>>>>> synonym of lentgh() -change the semantic of length for oracle (where
>>>>> length() is the character length and not the byte length)
>>>>>
>>>>> What do you think about this patch ?
>>>>>
>>>>> (Perhaps I have to make 2 patchs, one for empty string, and one for
>>>>> functions changes with separate test file for each modified
>>>>> function)
>>>>
>>>> Right, can you please split the patch into pieces?
>>>> I suggest even more pieces than two. One patch for one function.
>>>> I'll create MDEVs, one MDEV per function and put examples how Oracle
>>>> works.
>>>>
>>>> Can we do it in the following order:
>>>>
>>>> 1. LPAD / RPAD - making the third parameter optional.
>>>>    This extension will be good for all sql_modes.
>>>>    I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have
>>>>    the third parameter to LPAD/RPAD optional.
>>>>
>>>> 2. CHR().
>>>>
>>>> 3. LENGTH() and LENGTHB().
>>>>
>>>>
>>>> Then we can continue with REPLACE, TRIM. SUBSTRING in no particular
>>>> order, one patch per function.
>>>>
>>>>
>>>> Some general notes:
>>>>
>>>>
>>>> 1. Unify *.yy files
>>>>
>>>> Please put sql_mode dependent code into LEX rather than *.yy files
>> directly.
>>>> We're going to unify the two files soon. See here for details:
>>>> MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
>>>>
>>>> So instead of doing:
>>>>
>>>> | SUBSTRING '(' expr ',' expr ',' expr ')'
>>>>  {
>>>>    $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7);
>>>>    if ($$ == NULL)
>>>>    MYSQL_YYABORT;
>>>>  }
>>>>
>>>>
>>>> Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
>>>>
>>>>
>>>> | SUBSTRING '(' expr ',' expr ',' expr ')'
>>>>  {
>>>>    if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7)))
>>>>      MYSQL_YYABORT;
>>>>  }
>>>>
>>>> where LEX::make_item_func_substr() is a new method, creating either
>>>> Item_func_substr or Item_func_substr_oracle depending on sql_mode.
>>>>
>>>> The same applies to all other functions, and to text_literal.
>>>>
>>>>
>>>> 2. Don't use Item::is_null() when you call val_str() anyway.
>>>>
>>>> There are two reasons that:
>>>> - Performance
>>>> is_null() involves the second val_str() call, which can be resource
>>>> consuming in case of complex functions and subselect.
>>>>
>>>> - Side effect
>>>> Expressions can have some side effects.
>>>> For example, in case if the argument is a stored function which
>>>> counts how many times it was called and stores the counter in a user
>>>> variable, or writes something to a table on every execution.
>>>> If we do is_null() followed by val_str(), the side effect will done two
>> times.
>>>> We need one time only.
>>>>
>>>> So instead of:
>>>>
>>>> String *Item_func_replace_oracle::val_str(String *str) {
>>>>   String * res;
>>>>   if (res = replace_value(str,
>>>>             args[1]->is_null() ? &tmp_emtpystr :
>>>> rgs[1]->val_str(&tmp_value2),
>>>>             args[2]->is_null() ? &tmp_emtpystr :
>>>> rgs[2]->val_str(&tmp_value3))
>>>>
>>>>     return (res->length() == 0) ? make_empty_result () : res;
>>>>   else
>>>>     return 0;
>>>> }
>>>>
>>>>
>>>> Please do something like this:
>>>>
>>>> String *Item_func_replace_oracle::val_str(String *str) {
>>>>   String * res;
>>>>   String *str1= args[1]->val_str(&tmp_value1);
>>>>   String *str2= args[2]->val_str(&tmp_value2);
>>>>   if (res= replace_value(str,
>>>>                          str1 ? str1 : &tmp_emptystr,
>>>>                          str2 ? str2 : &tmp_emptystr)
>>>>     return (res->length() == 0) ? make_empty_result () : res;
>>>>   else
>>>>     return 0;
>>>> }
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>>
>>>>> Best regards,
>>>>> Jérôme.
>>>>>
>>>>>> -----Message d'origine-----
>>>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi 23
>>>>>> janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet :
>>>>>> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
>>>>>>
>>>>>> Hi Jerome,
>>>>>>
>>>>>> On 01/18/2017 01:22 PM, jerome brauge wrote:
>>>>>>> Hello Alexander,
>>>>>>> Sometime ago, when I ask you about plan for MDEV-10574, you
>> replied :
>>>>>>>
>>>>>>>> The current plan is to do these transformations:
>>>>>>>>
>>>>>>>> 1. Transform Insert
>>>>>>>> - insert values ("") -> insert values (null)
>>>>>>>>
>>>>>>>> 2. Transform Select
>>>>>>>>
>>>>>>>> - where v=x => (v <> "" and V=X)
>>>>>>>> - where v is null => (v="" or v is null)
>>>>>>>>
>>>>>>>> We didn't plan to change functions yet. Thanks for bringing this up.
>>>>>>>> We'll discuss this.
>>>>>>>
>>>>>>> I've done some tests just by changing :
>>>>>>> - insert an Item_null instead of an Item_string when $1.length==0
>>>>>>> in rule text_literal of sql_yacc_ora.yy
>>>>>>> - return null instead of an empty string in
>>>>>>> Item_str_func::make_empty_result
>>>>>>>
>>>>>>> My first tests seem promising.
>>>>>>>
>>>>>>> Of course this solution does not allow to "see" the records
>>>>>>> created with
>>>>>> empty strings as null values.
>>>>>>> I don't see the importance of being able to do this in a transparent
>> way.
>>>>>>> We can explicitly select these row by adding rtrim on these columns.
>>>>>>>
>>>>>>> If you are interesting, I can begin to write a test to evaluate
>>>>>>> the coverage of
>>>>>> this solution.
>>>>>>
>>>>>> I'm afraid this will fix the behavior only for literals.
>>>>>>
>>>>>> This script:
>>>>>>
>>>>>> DROP TABLE t1;
>>>>>> CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1
>>>> VALUES
>>>>>> ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM
>>>>>> t1 WHERE a=b;
>>>>>>
>>>>>> returns 0 for both SELECT queries in Oracle, and returns 1 for both
>>>>>> SELECT queries in MariaDB, Your approach will fix the first SELECT only.
>>>>>> The second will still return 1.
>>>>>>
>>>>>>
>>>>>> For now, I'm not sure how to do this correctly. Some ways possible:
>>>>>>
>>>>>> 1. We could add new classes, e.g. Item_func_eq_oracle as a pair for
>>>>>> Item_func_eq, which will handle both empty strings and NULLs inside
>>>>>> their
>>>>>> val_int() implementations.
>>>>>>
>>>>>> 2. Or there could be some after-processing on the created Item tree
>>>>>> which will replace all things like Item_func_eq to Item_cond_and
>>>>>> with Item_func_eq and Item_func_null_predicate passed as
>> arguments.
>>>>>>
>>>>>> Currently I'm inclined to #1, because we don't have a good
>>>>>> mechanism to clone items to implement #2 correctly. We'd need such
>>>>>> cloning to pass arguments both to Item_func_eq and
>>>>>> Item_func_null_predicate
>>>> correctly.
>>>>>> Some places in the code don't expect that the same arguments can be
>>>>>> passed to two different Item_func instances.
>>>>>>
>>>>>> On the other hand, implementing #1 will need more changes in the
>>>>>> optimizer.
>>>>>>
>>>>>> We were going to think on this later.
>>>>>>
>>>>>>>
>>>>>>> Best regard.
>>>>>>> Jérôme.
>>>>>>>


References