← Back to team overview

maria-developers team mailing list archive

Re: bb-10.2-ext / rpad lpad with 2 args

 

Hi Jerome,

Thanks for your contribution!
The patch generally looks good.

This piece of code is not fully correct though:

+    Item *param_3= new (thd->mem_root) Item_string(thd,
+
param_1->collation.collation
+                                                   " ", 1);

There are two problems here:

1. param_1->collation.collation is not precise at this point.
It will become valid later, after param_1 calls its fix_fields().

2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32
it will not work fine, as space occupies two bytes (in UCS2 and UTF16)
or four bytes (in UTF32). We need to emulate the same behavior
like when the user types space as the last argument, so conversion
from @@character_set_client to @@character_set_connection is needed.

To make this work correctly, we need to repeat this logic from sql_yacc.yy:


> text_literal:
>           TEXT_STRING
>           {
>             LEX_CSTRING tmp;
>             CHARSET_INFO *cs_con= thd->variables.collation_connection;
>             CHARSET_INFO *cs_cli= thd->variables.character_set_client;
>             uint repertoire= $1.repertoire(cs_cli);
>             if (thd->charset_is_collation_connection ||
>                 (repertoire == MY_REPERTOIRE_ASCII &&
>                  my_charset_is_ascii_based(cs_con)))
>               tmp= $1;
>             else
>             {
>               LEX_STRING to;
>               if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli))
>                 MYSQL_YYABORT;
>               tmp.str=    to.str;
>               tmp.length= to.length;
>             }
>             $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length,
>                                                 cs_con,
>                                                 DERIVATION_COERCIBLE,
>                                                 repertoire);
>             if ($$ == NULL)
>               MYSQL_YYABORT;
>           }


I'm going to move this code from sql_yacc.yy to a new method in THD:

Item_string *THD::make_item_string(const char *str, size_t length) const;

so you can reuse it in your patch.

I'm starting working on it right now and will tell you when it's ready.

Thanks.



On 04/26/2017 11:45 AM, jerome brauge wrote:
> Hi Alexander,
> This is the patch for rpad/lpad with 2 args (call of make_empty_string() in val_str() is to prepare MDEV-10574).
> 
> Regard,
> Jérôme.
> 
> PS : I share this code under terms of MCA.
> 
> 
>> -----Message d'origine-----
>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
>> Envoyé : mercredi 26 avril 2017 06:12
>> À : jerome brauge
>> Cc : MariaDB Developers
>> Objet : Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
>>
>> Hello Jerome,
>>
>> Thanks for working on this!
>>
>> On 04/25/2017 11:42 PM, jerome brauge wrote:
>>> Hello Alexander,
>>> Some times ago, we have had the below discussion.
>>> I made a patch following my first idea (it's enough from our point of view) :
>> replace empty string by null in literals and in make_empty_string.
>>> I think this patch can be complementary to your suggestion of adding
>>> an Item_func_eq_oracle (for data inserted in default sql_mode)
>>>
>>> This patch also contains following some functions  changes:
>>> - replace ('ab','a',null) returns 'a' instead null
>>> - trim / ltrim / rtrim returns null when input only contains spaces
>>> - lpad /rpad can accept 2 args (default padding char is space) and if
>>> length equals to 0, returns null -substring : if start index is equal
>>> to 0, act as if it's equals to 1 -add a function chr() as a synonym of
>>> char() (but accept only one arg) -add a function lengthb() as a
>>> synonym of lentgh() -change the semantic of length for oracle (where
>>> length() is the character length and not the byte length)
>>>
>>> What do you think about this patch ?
>>>
>>> (Perhaps I have to make 2 patchs, one for empty string, and one for
>>> functions changes with separate test file for each modified function)
>>
>> Right, can you please split the patch into pieces?
>> I suggest even more pieces than two. One patch for one function.
>> I'll create MDEVs, one MDEV per function and put examples how Oracle
>> works.
>>
>> Can we do it in the following order:
>>
>> 1. LPAD / RPAD - making the third parameter optional.
>>    This extension will be good for all sql_modes.
>>    I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have
>>    the third parameter to LPAD/RPAD optional.
>>
>> 2. CHR().
>>
>> 3. LENGTH() and LENGTHB().
>>
>>
>> Then we can continue with REPLACE, TRIM. SUBSTRING in no particular
>> order, one patch per function.
>>
>>
>> Some general notes:
>>
>>
>> 1. Unify *.yy files
>>
>> Please put sql_mode dependent code into LEX rather than *.yy files directly.
>> We're going to unify the two files soon. See here for details:
>> MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
>>
>> So instead of doing:
>>
>> | SUBSTRING '(' expr ',' expr ',' expr ')'
>>  {
>>    $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7);
>>    if ($$ == NULL)
>>    MYSQL_YYABORT;
>>  }
>>
>>
>> Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
>>
>>
>> | SUBSTRING '(' expr ',' expr ',' expr ')'
>>  {
>>    if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7)))
>>      MYSQL_YYABORT;
>>  }
>>
>> where LEX::make_item_func_substr() is a new method, creating either
>> Item_func_substr or Item_func_substr_oracle depending on sql_mode.
>>
>> The same applies to all other functions, and to text_literal.
>>
>>
>> 2. Don't use Item::is_null() when you call val_str() anyway.
>>
>> There are two reasons that:
>> - Performance
>> is_null() involves the second val_str() call, which can be resource consuming
>> in case of complex functions and subselect.
>>
>> - Side effect
>> Expressions can have some side effects.
>> For example, in case if the argument is a stored function which counts how
>> many times it was called and stores the counter in a user variable, or writes
>> something to a table on every execution.
>> If we do is_null() followed by val_str(), the side effect will done two times.
>> We need one time only.
>>
>> So instead of:
>>
>> String *Item_func_replace_oracle::val_str(String *str) {
>>   String * res;
>>   if (res = replace_value(str,
>>             args[1]->is_null() ? &tmp_emtpystr :
>> rgs[1]->val_str(&tmp_value2),
>>             args[2]->is_null() ? &tmp_emtpystr :
>> rgs[2]->val_str(&tmp_value3))
>>
>>     return (res->length() == 0) ? make_empty_result () : res;
>>   else
>>     return 0;
>> }
>>
>>
>> Please do something like this:
>>
>> String *Item_func_replace_oracle::val_str(String *str) {
>>   String * res;
>>   String *str1= args[1]->val_str(&tmp_value1);
>>   String *str2= args[2]->val_str(&tmp_value2);
>>   if (res= replace_value(str,
>>                          str1 ? str1 : &tmp_emptystr,
>>                          str2 ? str2 : &tmp_emptystr)
>>     return (res->length() == 0) ? make_empty_result () : res;
>>   else
>>     return 0;
>> }
>>
>>
>> Thanks!
>>
>>
>>
>>>
>>> Best regards,
>>> Jérôme.
>>>
>>>> -----Message d'origine-----
>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi 23
>>>> janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet :
>>>> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
>>>>
>>>> Hi Jerome,
>>>>
>>>> On 01/18/2017 01:22 PM, jerome brauge wrote:
>>>>> Hello Alexander,
>>>>> Sometime ago, when I ask you about plan for MDEV-10574, you replied :
>>>>>
>>>>>> The current plan is to do these transformations:
>>>>>>
>>>>>> 1. Transform Insert
>>>>>> - insert values ("") -> insert values (null)
>>>>>>
>>>>>> 2. Transform Select
>>>>>>
>>>>>> - where v=x => (v <> "" and V=X)
>>>>>> - where v is null => (v="" or v is null)
>>>>>>
>>>>>> We didn't plan to change functions yet. Thanks for bringing this up.
>>>>>> We'll discuss this.
>>>>>
>>>>> I've done some tests just by changing :
>>>>> - insert an Item_null instead of an Item_string when $1.length==0 in
>>>>> rule text_literal of sql_yacc_ora.yy
>>>>> - return null instead of an empty string in
>>>>> Item_str_func::make_empty_result
>>>>>
>>>>> My first tests seem promising.
>>>>>
>>>>> Of course this solution does not allow to "see" the records created
>>>>> with
>>>> empty strings as null values.
>>>>> I don't see the importance of being able to do this in a transparent way.
>>>>> We can explicitly select these row by adding rtrim on these columns.
>>>>>
>>>>> If you are interesting, I can begin to write a test to evaluate the
>>>>> coverage of
>>>> this solution.
>>>>
>>>> I'm afraid this will fix the behavior only for literals.
>>>>
>>>> This script:
>>>>
>>>> DROP TABLE t1;
>>>> CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1
>> VALUES
>>>> ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM t1
>>>> WHERE a=b;
>>>>
>>>> returns 0 for both SELECT queries in Oracle, and returns 1 for both
>>>> SELECT queries in MariaDB, Your approach will fix the first SELECT only.
>>>> The second will still return 1.
>>>>
>>>>
>>>> For now, I'm not sure how to do this correctly. Some ways possible:
>>>>
>>>> 1. We could add new classes, e.g. Item_func_eq_oracle as a pair for
>>>> Item_func_eq, which will handle both empty strings and NULLs inside
>>>> their
>>>> val_int() implementations.
>>>>
>>>> 2. Or there could be some after-processing on the created Item tree
>>>> which will replace all things like Item_func_eq to Item_cond_and with
>>>> Item_func_eq and Item_func_null_predicate passed as arguments.
>>>>
>>>> Currently I'm inclined to #1, because we don't have a good mechanism
>>>> to clone items to implement #2 correctly. We'd need such cloning to
>>>> pass arguments both to Item_func_eq and Item_func_null_predicate
>> correctly.
>>>> Some places in the code don't expect that the same arguments can be
>>>> passed to two different Item_func instances.
>>>>
>>>> On the other hand, implementing #1 will need more changes in the
>>>> optimizer.
>>>>
>>>> We were going to think on this later.
>>>>
>>>>>
>>>>> Best regard.
>>>>> Jérôme.
>>>>>


Follow ups

References