← Back to team overview

maria-developers team mailing list archive

Re: bb-10.2-ext / rpad lpad with 2 args

 

Hello Alexander,
Thanks for the review.
I have already moved code of text_literal into lex in my last patch (empty_strings.diff).
I will change it to use your new method.

Do you have news from Monty for the patch on stacktrace as notes ?

Best regards,
Jérôme.

> -----Message d'origine-----
> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
> Envoyé : mardi 2 mai 2017 09:39
> À : jerome brauge
> Cc : MariaDB Developers
> Objet : Re: bb-10.2-ext / rpad lpad with 2 args
> 
> Hi Jerome,
> 
> Thanks for your contribution!
> The patch generally looks good.
> 
> This piece of code is not fully correct though:
> 
> +    Item *param_3= new (thd->mem_root) Item_string(thd,
> +
> param_1->collation.collation
> +                                                   " ", 1);
> 
> There are two problems here:
> 
> 1. param_1->collation.collation is not precise at this point.
> It will become valid later, after param_1 calls its fix_fields().
> 
> 2. In case is @@character_set_connections is UCS2, UTF16 OR UTF32 it will
> not work fine, as space occupies two bytes (in UCS2 and UTF16) or four bytes
> (in UTF32). We need to emulate the same behavior like when the user types
> space as the last argument, so conversion from @@character_set_client to
> @@character_set_connection is needed.
> 
> To make this work correctly, we need to repeat this logic from sql_yacc.yy:
> 
> 
> > text_literal:
> >           TEXT_STRING
> >           {
> >             LEX_CSTRING tmp;
> >             CHARSET_INFO *cs_con= thd->variables.collation_connection;
> >             CHARSET_INFO *cs_cli= thd->variables.character_set_client;
> >             uint repertoire= $1.repertoire(cs_cli);
> >             if (thd->charset_is_collation_connection ||
> >                 (repertoire == MY_REPERTOIRE_ASCII &&
> >                  my_charset_is_ascii_based(cs_con)))
> >               tmp= $1;
> >             else
> >             {
> >               LEX_STRING to;
> >               if (thd->convert_string(&to, cs_con, $1.str, $1.length, cs_cli))
> >                 MYSQL_YYABORT;
> >               tmp.str=    to.str;
> >               tmp.length= to.length;
> >             }
> >             $$= new (thd->mem_root) Item_string(thd, tmp.str, tmp.length,
> >                                                 cs_con,
> >                                                 DERIVATION_COERCIBLE,
> >                                                 repertoire);
> >             if ($$ == NULL)
> >               MYSQL_YYABORT;
> >           }
> 
> 
> I'm going to move this code from sql_yacc.yy to a new method in THD:
> 
> Item_string *THD::make_item_string(const char *str, size_t length) const;
> 
> so you can reuse it in your patch.
> 
> I'm starting working on it right now and will tell you when it's ready.
> 
> Thanks.
> 
> 
> 
> On 04/26/2017 11:45 AM, jerome brauge wrote:
> > Hi Alexander,
> > This is the patch for rpad/lpad with 2 args (call of make_empty_string() in
> val_str() is to prepare MDEV-10574).
> >
> > Regard,
> > Jérôme.
> >
> > PS : I share this code under terms of MCA.
> >
> >
> >> -----Message d'origine-----
> >> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : mercredi 26
> >> avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet :
> >> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
> >>
> >> Hello Jerome,
> >>
> >> Thanks for working on this!
> >>
> >> On 04/25/2017 11:42 PM, jerome brauge wrote:
> >>> Hello Alexander,
> >>> Some times ago, we have had the below discussion.
> >>> I made a patch following my first idea (it's enough from our point of
> view) :
> >> replace empty string by null in literals and in make_empty_string.
> >>> I think this patch can be complementary to your suggestion of adding
> >>> an Item_func_eq_oracle (for data inserted in default sql_mode)
> >>>
> >>> This patch also contains following some functions  changes:
> >>> - replace ('ab','a',null) returns 'a' instead null
> >>> - trim / ltrim / rtrim returns null when input only contains spaces
> >>> - lpad /rpad can accept 2 args (default padding char is space) and
> >>> if length equals to 0, returns null -substring : if start index is
> >>> equal to 0, act as if it's equals to 1 -add a function chr() as a
> >>> synonym of
> >>> char() (but accept only one arg) -add a function lengthb() as a
> >>> synonym of lentgh() -change the semantic of length for oracle (where
> >>> length() is the character length and not the byte length)
> >>>
> >>> What do you think about this patch ?
> >>>
> >>> (Perhaps I have to make 2 patchs, one for empty string, and one for
> >>> functions changes with separate test file for each modified
> >>> function)
> >>
> >> Right, can you please split the patch into pieces?
> >> I suggest even more pieces than two. One patch for one function.
> >> I'll create MDEVs, one MDEV per function and put examples how Oracle
> >> works.
> >>
> >> Can we do it in the following order:
> >>
> >> 1. LPAD / RPAD - making the third parameter optional.
> >>    This extension will be good for all sql_modes.
> >>    I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have
> >>    the third parameter to LPAD/RPAD optional.
> >>
> >> 2. CHR().
> >>
> >> 3. LENGTH() and LENGTHB().
> >>
> >>
> >> Then we can continue with REPLACE, TRIM. SUBSTRING in no particular
> >> order, one patch per function.
> >>
> >>
> >> Some general notes:
> >>
> >>
> >> 1. Unify *.yy files
> >>
> >> Please put sql_mode dependent code into LEX rather than *.yy files
> directly.
> >> We're going to unify the two files soon. See here for details:
> >> MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
> >>
> >> So instead of doing:
> >>
> >> | SUBSTRING '(' expr ',' expr ',' expr ')'
> >>  {
> >>    $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7);
> >>    if ($$ == NULL)
> >>    MYSQL_YYABORT;
> >>  }
> >>
> >>
> >> Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
> >>
> >>
> >> | SUBSTRING '(' expr ',' expr ',' expr ')'
> >>  {
> >>    if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7)))
> >>      MYSQL_YYABORT;
> >>  }
> >>
> >> where LEX::make_item_func_substr() is a new method, creating either
> >> Item_func_substr or Item_func_substr_oracle depending on sql_mode.
> >>
> >> The same applies to all other functions, and to text_literal.
> >>
> >>
> >> 2. Don't use Item::is_null() when you call val_str() anyway.
> >>
> >> There are two reasons that:
> >> - Performance
> >> is_null() involves the second val_str() call, which can be resource
> >> consuming in case of complex functions and subselect.
> >>
> >> - Side effect
> >> Expressions can have some side effects.
> >> For example, in case if the argument is a stored function which
> >> counts how many times it was called and stores the counter in a user
> >> variable, or writes something to a table on every execution.
> >> If we do is_null() followed by val_str(), the side effect will done two
> times.
> >> We need one time only.
> >>
> >> So instead of:
> >>
> >> String *Item_func_replace_oracle::val_str(String *str) {
> >>   String * res;
> >>   if (res = replace_value(str,
> >>             args[1]->is_null() ? &tmp_emtpystr :
> >> rgs[1]->val_str(&tmp_value2),
> >>             args[2]->is_null() ? &tmp_emtpystr :
> >> rgs[2]->val_str(&tmp_value3))
> >>
> >>     return (res->length() == 0) ? make_empty_result () : res;
> >>   else
> >>     return 0;
> >> }
> >>
> >>
> >> Please do something like this:
> >>
> >> String *Item_func_replace_oracle::val_str(String *str) {
> >>   String * res;
> >>   String *str1= args[1]->val_str(&tmp_value1);
> >>   String *str2= args[2]->val_str(&tmp_value2);
> >>   if (res= replace_value(str,
> >>                          str1 ? str1 : &tmp_emptystr,
> >>                          str2 ? str2 : &tmp_emptystr)
> >>     return (res->length() == 0) ? make_empty_result () : res;
> >>   else
> >>     return 0;
> >> }
> >>
> >>
> >> Thanks!
> >>
> >>
> >>
> >>>
> >>> Best regards,
> >>> Jérôme.
> >>>
> >>>> -----Message d'origine-----
> >>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi 23
> >>>> janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet :
> >>>> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
> >>>>
> >>>> Hi Jerome,
> >>>>
> >>>> On 01/18/2017 01:22 PM, jerome brauge wrote:
> >>>>> Hello Alexander,
> >>>>> Sometime ago, when I ask you about plan for MDEV-10574, you
> replied :
> >>>>>
> >>>>>> The current plan is to do these transformations:
> >>>>>>
> >>>>>> 1. Transform Insert
> >>>>>> - insert values ("") -> insert values (null)
> >>>>>>
> >>>>>> 2. Transform Select
> >>>>>>
> >>>>>> - where v=x => (v <> "" and V=X)
> >>>>>> - where v is null => (v="" or v is null)
> >>>>>>
> >>>>>> We didn't plan to change functions yet. Thanks for bringing this up.
> >>>>>> We'll discuss this.
> >>>>>
> >>>>> I've done some tests just by changing :
> >>>>> - insert an Item_null instead of an Item_string when $1.length==0
> >>>>> in rule text_literal of sql_yacc_ora.yy
> >>>>> - return null instead of an empty string in
> >>>>> Item_str_func::make_empty_result
> >>>>>
> >>>>> My first tests seem promising.
> >>>>>
> >>>>> Of course this solution does not allow to "see" the records
> >>>>> created with
> >>>> empty strings as null values.
> >>>>> I don't see the importance of being able to do this in a transparent
> way.
> >>>>> We can explicitly select these row by adding rtrim on these columns.
> >>>>>
> >>>>> If you are interesting, I can begin to write a test to evaluate
> >>>>> the coverage of
> >>>> this solution.
> >>>>
> >>>> I'm afraid this will fix the behavior only for literals.
> >>>>
> >>>> This script:
> >>>>
> >>>> DROP TABLE t1;
> >>>> CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1
> >> VALUES
> >>>> ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*) FROM
> >>>> t1 WHERE a=b;
> >>>>
> >>>> returns 0 for both SELECT queries in Oracle, and returns 1 for both
> >>>> SELECT queries in MariaDB, Your approach will fix the first SELECT only.
> >>>> The second will still return 1.
> >>>>
> >>>>
> >>>> For now, I'm not sure how to do this correctly. Some ways possible:
> >>>>
> >>>> 1. We could add new classes, e.g. Item_func_eq_oracle as a pair for
> >>>> Item_func_eq, which will handle both empty strings and NULLs inside
> >>>> their
> >>>> val_int() implementations.
> >>>>
> >>>> 2. Or there could be some after-processing on the created Item tree
> >>>> which will replace all things like Item_func_eq to Item_cond_and
> >>>> with Item_func_eq and Item_func_null_predicate passed as
> arguments.
> >>>>
> >>>> Currently I'm inclined to #1, because we don't have a good
> >>>> mechanism to clone items to implement #2 correctly. We'd need such
> >>>> cloning to pass arguments both to Item_func_eq and
> >>>> Item_func_null_predicate
> >> correctly.
> >>>> Some places in the code don't expect that the same arguments can be
> >>>> passed to two different Item_func instances.
> >>>>
> >>>> On the other hand, implementing #1 will need more changes in the
> >>>> optimizer.
> >>>>
> >>>> We were going to think on this later.
> >>>>
> >>>>>
> >>>>> Best regard.
> >>>>> Jérôme.
> >>>>>


Follow ups

References