← Back to team overview

maria-developers team mailing list archive

Re: MDEV-10142 - bb-10.2-compatibility

 

Hi Alexander,

> -----Message d'origine-----
> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
> Envoyé : jeudi 4 mai 2017 09:00
> À : jerome brauge
> Cc : MariaDB Developers
> Objet : Re: MDEV-10142 - bb-10.2-compatibility
> 
>   Hi Jerome,
> 
> On 05/04/2017 10:28 AM, jerome brauge wrote:
> > Hello Alexander,
> > Just 2 points:
> >  - CHR accept only one value (not a list)
> 
> I think this is not a problem to support a list.
> 
> >  - NCHAR_CS is a keyword, you can't specify the character set (it's
> > value is specified at the database creation time)
> 
> Ah, I see. We don't have a configurable NCHAR yet.
> NCHAR always translated to utf8, for example:
> 
> CREATE TABLE t1 (a NCHAR(10)) ->
> CREATE TABLE t1 (a CHAR(10) CHARACTER SET utf8)
> 
> 
> There are two options then:
> 
> 1. Understand NCHAR_CS and translate it to utf8 too.
> This, in addition too CHR(..USING NCHAR_CS), will also make MariaDB
> understand this valid Oracle syntax:
> 
> DECLARE
>   a VARCHAR(10) CHARACTER SET NCHAR_CS;
> BEGIN
>   NULL;
> END;
> 
> Btw, do you know where I can find the full list of Oracle's grammar rules that
> can contain  NCHAR_CS?

Unfortunately no.
I only know:
	CHR(n USING NCHAR_CS), but there is an equivalent : NCHR
	TRANSLATE (char USING { CHAR_CS | NCHAR_CS } ) equivalent to TO_CHAR and TO_NCHAR

> 
> 2. Don't add the USING clause in CHR for now.
> 

It was my original choice ...
We don't use NVARCHAR. For multilingual database, we use AL32UTF8 as database character set and VARCHAR2(x CHAR) for all char datatype (by default VARCHAR(x) means x bytes, not x chars).
AL32UTF8 is Oracle's first recommended database character set.
>From my point of view, oracle nvarchar compatibility is not a priority.

> >
> > Regards,
> > Jérôme
> >
> >> -----Message d'origine-----
> >> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : jeudi 4 mai
> >> 2017 06:16 À : jerome brauge Cc : MariaDB Developers Objet : Re:
> >> MDEV-10142 - bb-10.2-compatibility
> >>
> >> Hello Jerome,
> >>
> >>
> >> On 04/27/2017 08:43 AM, jerome brauge wrote:
> >>> Hello Alexander,
> >>> This is the patch for CHR().
> >>> Function CHR(n) returns the character having the binary equivalent
> >>> to n in
> >> the database character set. Returns null on null input.
> >>
> >> Looking at Oracle's manual:
> >>
> >>
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions019.h
> >> tm
> >>
> >> I think that:
> >>
> >> 1. We should support the optional USING clause.
> >> CHR(x USING y) could be just translated to CHAR(x USING y).
> >>
> >> 2. When no USING is specified,
> >> CHR(x) could be translated to CHAR(x USING database_character_set)
> >>
> >> MariaDB's CHAR() uses &my_charset_bin as the default character set,
> >> which means VARBINARY result.
> >> Oracle's CHR() uses the database character set as the default character
> set.
> >>
> >>
> >> It seems that both can be done in sql_yacc_ora.yy, and no new class
> >> is needed.
> >>
> >>
> >> I suggest we do the following:
> >>
> >> 1. Add this into lex.h:
> >>
> >>   { "CHR",             SYM(CHR_SYM)},
> >>
> >> 2. Add this into sql_yacc.yy and sql_yacc_ora.yy:
> >>
> >> %token  CHR_SYM
> >>
> >> This is to make this keyword non-reserved.
> >>
> >>
> >> 3. Add CHR_SYM into sql_yacc.yy, into the keyword_sp rule.
> >>
> >> 4. Add CHR_SYM into sql_yacc_ora.yy, into the
> >> keyword_sp_not_data_type rule.
> >>
> >> This is to make this keyword non-reserved for sql_mode=ORACLE.
> >>
> >>
> >> 5. Add this into sql_yacc_ora.yy, into function_call_keyword
> >>
> >>           CHR_SYM '(' expr_list ')'
> >>           {
> >>             $$= new (thd->mem_root)
> >>                 Item_func_char(thd, *$3,
> >>                                thd->variables.collation_database);
> >>             if ($$ == NULL)
> >>               MYSQL_YYABORT;
> >>           }
> >>         | CHR_SYM '(' expr_list USING charset_name ')'
> >>           {
> >>             $$= new (thd->mem_root) Item_func_char(thd, *$3, $5);
> >>             if ($$ == NULL)
> >>               MYSQL_YYABORT;
> >>           }
> >>
> >>
> >>
> >> In the meanwhile I'll fix two bugs that I found reading the code in
> >> Item_func_char:
> >>
> >>
> >> MDEV-12680 Crash when CREATE TABLE t1 AS SELECT char(100 using
> >> filename)
> >> MDEV-12681 Wrong VIEW results for CHAR(0xDF USING latin1)
> >>
> >> Thanks!
> >>
> >>>
> >>> Regard,
> >>> Jérôme.
> >>>
> >>> PS : I share this code under terms of MCA.
> >>>
> >>>
> >>>> -----Message d'origine-----
> >>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : mercredi 26
> >>>> avril 2017 06:12 À : jerome brauge Cc : MariaDB Developers Objet :
> >>>> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
> >>>>
> >>>> Hello Jerome,
> >>>>
> >>>> Thanks for working on this!
> >>>>
> >>>> On 04/25/2017 11:42 PM, jerome brauge wrote:
> >>>>> Hello Alexander,
> >>>>> Some times ago, we have had the below discussion.
> >>>>> I made a patch following my first idea (it's enough from our point
> >>>>> of
> >> view) :
> >>>> replace empty string by null in literals and in make_empty_string.
> >>>>> I think this patch can be complementary to your suggestion of
> >>>>> adding an Item_func_eq_oracle (for data inserted in default
> >>>>> sql_mode)
> >>>>>
> >>>>> This patch also contains following some functions  changes:
> >>>>> - replace ('ab','a',null) returns 'a' instead null
> >>>>> - trim / ltrim / rtrim returns null when input only contains
> >>>>> spaces
> >>>>> - lpad /rpad can accept 2 args (default padding char is space) and
> >>>>> if length equals to 0, returns null -substring : if start index is
> >>>>> equal to 0, act as if it's equals to 1 -add a function chr() as a
> >>>>> synonym of
> >>>>> char() (but accept only one arg) -add a function lengthb() as a
> >>>>> synonym of lentgh() -change the semantic of length for oracle
> >>>>> (where
> >>>>> length() is the character length and not the byte length)
> >>>>>
> >>>>> What do you think about this patch ?
> >>>>>
> >>>>> (Perhaps I have to make 2 patchs, one for empty string, and one
> >>>>> for functions changes with separate test file for each modified
> >>>>> function)
> >>>>
> >>>> Right, can you please split the patch into pieces?
> >>>> I suggest even more pieces than two. One patch for one function.
> >>>> I'll create MDEVs, one MDEV per function and put examples how
> >>>> Oracle works.
> >>>>
> >>>> Can we do it in the following order:
> >>>>
> >>>> 1. LPAD / RPAD - making the third parameter optional.
> >>>>    This extension will be good for all sql_modes.
> >>>>    I checked IBM DB2, Oracle PostgreSQL, Firebird - they all have
> >>>>    the third parameter to LPAD/RPAD optional.
> >>>>
> >>>> 2. CHR().
> >>>>
> >>>> 3. LENGTH() and LENGTHB().
> >>>>
> >>>>
> >>>> Then we can continue with REPLACE, TRIM. SUBSTRING in no particular
> >>>> order, one patch per function.
> >>>>
> >>>>
> >>>> Some general notes:
> >>>>
> >>>>
> >>>> 1. Unify *.yy files
> >>>>
> >>>> Please put sql_mode dependent code into LEX rather than *.yy files
> >> directly.
> >>>> We're going to unify the two files soon. See here for details:
> >>>> MDEV-12518) Unify sql_yacc.yy and sql_yacc_ora.yy
> >>>>
> >>>> So instead of doing:
> >>>>
> >>>> | SUBSTRING '(' expr ',' expr ',' expr ')'
> >>>>  {
> >>>>    $$= new (thd->mem_root) Item_func_substr(thd, $3, $5, $7);
> >>>>    if ($$ == NULL)
> >>>>    MYSQL_YYABORT;
> >>>>  }
> >>>>
> >>>>
> >>>> Please do something like this in both sql_yacc.yy and sql_yacc_ora.yy:
> >>>>
> >>>>
> >>>> | SUBSTRING '(' expr ',' expr ',' expr ')'
> >>>>  {
> >>>>    if (!($$= Lex->make_item_func_substr(thd, $3, $5, $7)))
> >>>>      MYSQL_YYABORT;
> >>>>  }
> >>>>
> >>>> where LEX::make_item_func_substr() is a new method, creating either
> >>>> Item_func_substr or Item_func_substr_oracle depending on
> sql_mode.
> >>>>
> >>>> The same applies to all other functions, and to text_literal.
> >>>>
> >>>>
> >>>> 2. Don't use Item::is_null() when you call val_str() anyway.
> >>>>
> >>>> There are two reasons that:
> >>>> - Performance
> >>>> is_null() involves the second val_str() call, which can be resource
> >>>> consuming in case of complex functions and subselect.
> >>>>
> >>>> - Side effect
> >>>> Expressions can have some side effects.
> >>>> For example, in case if the argument is a stored function which
> >>>> counts how many times it was called and stores the counter in a
> >>>> user variable, or writes something to a table on every execution.
> >>>> If we do is_null() followed by val_str(), the side effect will done
> >>>> two
> >> times.
> >>>> We need one time only.
> >>>>
> >>>> So instead of:
> >>>>
> >>>> String *Item_func_replace_oracle::val_str(String *str) {
> >>>>   String * res;
> >>>>   if (res = replace_value(str,
> >>>>             args[1]->is_null() ? &tmp_emtpystr :
> >>>> rgs[1]->val_str(&tmp_value2),
> >>>>             args[2]->is_null() ? &tmp_emtpystr :
> >>>> rgs[2]->val_str(&tmp_value3))
> >>>>
> >>>>     return (res->length() == 0) ? make_empty_result () : res;
> >>>>   else
> >>>>     return 0;
> >>>> }
> >>>>
> >>>>
> >>>> Please do something like this:
> >>>>
> >>>> String *Item_func_replace_oracle::val_str(String *str) {
> >>>>   String * res;
> >>>>   String *str1= args[1]->val_str(&tmp_value1);
> >>>>   String *str2= args[2]->val_str(&tmp_value2);
> >>>>   if (res= replace_value(str,
> >>>>                          str1 ? str1 : &tmp_emptystr,
> >>>>                          str2 ? str2 : &tmp_emptystr)
> >>>>     return (res->length() == 0) ? make_empty_result () : res;
> >>>>   else
> >>>>     return 0;
> >>>> }
> >>>>
> >>>>
> >>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> Best regards,
> >>>>> Jérôme.
> >>>>>
> >>>>>> -----Message d'origine-----
> >>>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi 23
> >>>>>> janvier 2017 12:49 À : jerome brauge Cc : MariaDB Developers Objet :
> >>>>>> Re: MDEV-10142 - bb-10.2-compatibility / MDEV-10574
> >>>>>>
> >>>>>> Hi Jerome,
> >>>>>>
> >>>>>> On 01/18/2017 01:22 PM, jerome brauge wrote:
> >>>>>>> Hello Alexander,
> >>>>>>> Sometime ago, when I ask you about plan for MDEV-10574, you
> >> replied :
> >>>>>>>
> >>>>>>>> The current plan is to do these transformations:
> >>>>>>>>
> >>>>>>>> 1. Transform Insert
> >>>>>>>> - insert values ("") -> insert values (null)
> >>>>>>>>
> >>>>>>>> 2. Transform Select
> >>>>>>>>
> >>>>>>>> - where v=x => (v <> "" and V=X)
> >>>>>>>> - where v is null => (v="" or v is null)
> >>>>>>>>
> >>>>>>>> We didn't plan to change functions yet. Thanks for bringing this
> up.
> >>>>>>>> We'll discuss this.
> >>>>>>>
> >>>>>>> I've done some tests just by changing :
> >>>>>>> - insert an Item_null instead of an Item_string when
> >>>>>>> $1.length==0 in rule text_literal of sql_yacc_ora.yy
> >>>>>>> - return null instead of an empty string in
> >>>>>>> Item_str_func::make_empty_result
> >>>>>>>
> >>>>>>> My first tests seem promising.
> >>>>>>>
> >>>>>>> Of course this solution does not allow to "see" the records
> >>>>>>> created with
> >>>>>> empty strings as null values.
> >>>>>>> I don't see the importance of being able to do this in a
> >>>>>>> transparent
> >> way.
> >>>>>>> We can explicitly select these row by adding rtrim on these
> columns.
> >>>>>>>
> >>>>>>> If you are interesting, I can begin to write a test to evaluate
> >>>>>>> the coverage of
> >>>>>> this solution.
> >>>>>>
> >>>>>> I'm afraid this will fix the behavior only for literals.
> >>>>>>
> >>>>>> This script:
> >>>>>>
> >>>>>> DROP TABLE t1;
> >>>>>> CREATE TABLE t1 (a VARCHAR(10), b VARCHAR(10)); INSERT INTO t1
> >>>> VALUES
> >>>>>> ('',''); SELECT COUNT(*) FROM t1 WHERE a=''; SELECT COUNT(*)
> FROM
> >>>>>> t1 WHERE a=b;
> >>>>>>
> >>>>>> returns 0 for both SELECT queries in Oracle, and returns 1 for
> >>>>>> both SELECT queries in MariaDB, Your approach will fix the first
> SELECT only.
> >>>>>> The second will still return 1.
> >>>>>>
> >>>>>>
> >>>>>> For now, I'm not sure how to do this correctly. Some ways possible:
> >>>>>>
> >>>>>> 1. We could add new classes, e.g. Item_func_eq_oracle as a pair
> >>>>>> for Item_func_eq, which will handle both empty strings and NULLs
> >>>>>> inside their
> >>>>>> val_int() implementations.
> >>>>>>
> >>>>>> 2. Or there could be some after-processing on the created Item
> >>>>>> tree which will replace all things like Item_func_eq to
> >>>>>> Item_cond_and with Item_func_eq and Item_func_null_predicate
> >>>>>> passed as
> >> arguments.
> >>>>>>
> >>>>>> Currently I'm inclined to #1, because we don't have a good
> >>>>>> mechanism to clone items to implement #2 correctly. We'd need
> >>>>>> such cloning to pass arguments both to Item_func_eq and
> >>>>>> Item_func_null_predicate
> >>>> correctly.
> >>>>>> Some places in the code don't expect that the same arguments can
> >>>>>> be passed to two different Item_func instances.
> >>>>>>
> >>>>>> On the other hand, implementing #1 will need more changes in the
> >>>>>> optimizer.
> >>>>>>
> >>>>>> We were going to think on this later.
> >>>>>>
> >>>>>>>
> >>>>>>> Best regard.
> >>>>>>> Jérôme.
> >>>>>>>


Follow ups

References