maria-developers team mailing list archive

Thread
Date

Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments

To: 'Alexander Barkov' <bar@xxxxxxxxxxx>
From: jerome brauge <j.brauge@xxxxxxxxxxx>
Date: Thu, 9 Nov 2017 08:03:12 +0000
Accept-language: fr-FR, en-US
Cc: "'MariaDB Developers \(maria-developers@xxxxxxxxxxxxxxxxxxx\)'" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <43f12e74-3134-1584-d4ea-dbe0dbd97dc2@mariadb.org>
Thread-index: AdNUfI8F7tLlcS6dTHuuNcTv1QOySgABd0UAAARYMhAAj7cg8ABArnyAADGcJeA=
Thread-topic: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments

Hello Alexander,

> -----Message d'origine-----
> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
> Envoyé : mardi 7 novembre 2017 16:34
> À : jerome brauge
> Cc : 'MariaDB Developers (maria-developers@xxxxxxxxxxxxxxxxxxx)'
> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
> SELECT..INTO assignments
> 
> Hi Jerome,
> 
> 
> On 11/06/2017 01:01 PM, jerome brauge wrote:
> > Hello Alexander,
> > We have checked all our stored procedure and the good news for us is that
> only 2 statements are concerned !
> > So, MDEV-13418 is not mandatory for us.
> >
> > However, can we consider to do a full parse just at compile time ?
> > It would allow to :
> >  - determining variables which must have a temporary pair and
> > modifying stored source code to handle them
> >  - issuing error or warning when these statements have syntax error or
> > call unknown functions ( like Oracle, SQLserver , UDB , .. )
> 
> I'm not sure I understood about full parser at compile time.
> Can you please give an example?
> 

Currently, the following function is compiled without any warning and run fine as long as b1 is not between 1 and 5.

delimiter /
CREATE or replace procedure p1(b1 integer)
BEGIN
  declare res INTEGER default 0;  
  if b1 = 0 then
    -- unknow variable
    set res=x;
  end if;
  if b1 = 1 then
    -- unknow function in set
    set res=lenngth('v');
  end if;
  if b1 = 2 then
    -- unknow variable
    select a into res;
  end if;
  if b1 = 3 then
    -- unknow table / column / function
   select 1 into res from unknowtable where unknowcolumn = unknowfunc();   
  end if;
  if b1 = 4 then
    -- unknow group by column 
    select 1 into res from dual group by 13;
  end if;
  if b1 = 5 then
    -- unknow function in test 
    if coalesc(res,'x') = 'y' then
      set res:='z';
    end if;
  end if;

END
/
call p1(-1)
/

My idea is to have different behavior during "create procedure" and load_routine.
Load_routine must be fast but "create procedure" can be slower and do more check (at least check existence of variables, tables, columns, functions and procedure)
It's not truly acceptable to throw this kind of error only when the code is really executed.

To do this, each select list expression have to be "parse" and so we can determine variables which must have a temporary pair.
Ex:
CREATE PROCEDURE p1(res OUT VARCHAR)
AS
  b1 INT:=10;
BEGIN
  SELECT  1,CAST(b1 AS VARCHAR(10)) INTO b1, res FROM dual;  
END;

Can be transform and store as
CREATE PROCEDURE p1(res OUT VARCHAR)
AS
  b1 INT:=10;
BEGIN
  DECLARE  _b1 INTEGER DEFAULT b1;
  SELECT  1,CAST(b1 AS VARCHAR(10)) INTO b1, res FROM dual;  
  SET b1=_b1;
END;

So, no additional works should be done during load_routine.


> >
> > Without this second point we are exposed to keystrokes errors and to find
> these mistakes only at runtime.
> > It's already very hard to have tests that cover most of the code, but here
> we must have 100% of coverage !
> > Fortunately for us, SQLServer and Oracle does the work for us :-)
> >
> > Other idea : maybe the behavior of "set" could be conditioned by another
> sql_mode.
> > Sybase ,  SQLServer, Mariadb : multiple "set" are evaluated as multiple
> single "set"
> > DB2 : work as select (each expression is evaluated before the
> > assignments are performed)
> >
> > We could have three sql_mode:
> > - SET_CONSISTENCY_UPDATE
> > - SET_CONSISTENCY_SELECT
> > - SET_CONSISTENCY_SET
> 
> What would SET_CONSISTENCY_SELECT stand for?
> Can you give an example?

I thought to:
- SET_CONSISTENCY_UPDATE for MDEV-13417 : UPDATE produces wrong values if an updated column is later used as an update source
- SET_CONSISTENCY_SELECT for MDEV-13418 : The order of evaluation of SELECT..INTO assignments
- SET_CONSISTENCY_SET  :  evaluate expression before any assignment in "grouped" SET 

> 
> Thanks!
> 
> 
> 
> >
> > Regards,
> > Jérôme.
> >
> >
> >> -----Message d'origine-----
> >> De : jerome brauge
> >> Envoyé : vendredi 3 novembre 2017 12:18 À : 'Alexander Barkov'
> >> Cc : MariaDB Developers (maria-developers@xxxxxxxxxxxxxxxxxxx)
> >> Objet : RE: MDEV-13418 Compatibility: The order of evaluation of
> >> SELECT..INTO assignments
> >>
> >> Hi Alexander,
> >> Thanks for this idea, but for us, cpu is a critical hotspot.
> >> I will evaluate how many statements in our application are affected.
> >> If this number is not to high, we modify them.
> >> In this case, we will need two sql_mode (one for MDEV-13417 and one
> >> for MDEV-13418).
> >>
> >> Best regards.
> >> Jérôme.
> >>
> >>
> >>> -----Message d'origine-----
> >>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : vendredi 3
> >>> novembre 2017 11:02 À : jerome brauge Cc : MariaDB Developers
> >>> (maria-developers@xxxxxxxxxxxxxxxxxxx)
> >>> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
> >>> SELECT..INTO assignments
> >>>
> >>> Hello Jerome,
> >>>
> >>>
> >>> On 11/03/2017 12:22 PM, jerome brauge wrote:
> >>>> Hello Alexander.
> >>>> I've begun to implement your proposal but now I'm not sure that
> >>>> it's a
> >>> better solution than mine.
> >>>> Let me explain .
> >>>> - first : number of temporary variables can be significant because
> >>>> we don't
> >>> know when they are really needed and their scope are local to the
> >>> statement.
> >>>>
> >>>>   declare b1 INTEGER;
> >>>>   declare res INTEGER;
> >>>> ...
> >>>>   if b1 = 0 then
> >>>>     select 1,b1+1 into b1, res from dual;
> >>>>   end if;
> >>>>   if b1 = 1 then
> >>>>     select 2,b1+2 into b1, res from dual;
> >>>>   end if;
> >>>>
> >>>> will be transform in :
> >>>>
> >>>>   declare b1 INTEGER;
> >>>>   declare res INTEGER;
> >>>> ...
> >>>>   if b1 = 0 then
> >>>>     declare _b1 INTEGER default res;
> >>>>     declare _res INTEGER default res;
> >>>>     select 1,b1+1 into _b1, _res from dual;
> >>>>     set b1=_b1;
> >>>>     set res=_res;
> >>>>     -- _res is not needed, but we don't know because the select
> >>>> statement is
> >>> not parsed
> >>>>   end if;
> >>>>   if b1 = 1 then
> >>>>     declare _b1 INTEGER default res;
> >>>>     declare _res INTEGER default res;
> >>>>     select 2,b1+2 into b1, res from dual;
> >>>>     set b1=_b1;
> >>>>     set res=_res;
> >>>>     -- same thing here, and we have declare two variables for each
> >>>> target
> >>> variables
> >>>>   end if;
> >>>>
> >>>> Perhaps we could
> >>>>  - declare these temporary variables only one time in the first
> >>>> frame of the stored procedure (may be tricky)
> >>>>  - parse columns of each select to know what variables are really
> >>>> assigned and reused  (heavy cost in cpu and time)
> >>>>
> >>>> - second : if we can't determine variables which must have a
> >>>> temporary
> >>> variable, number of sp_instr_set and sp_instr_set_var will be too
> >>> high and their cpu cost is not acceptable.
> >>>>
> >>>> My first solution has a fixed memory impact (and memory is not an
> >>>> issue
> >>> nowadays), and especially a very light cpu cost.
> >>>>
> >>>> What do you think about ?
> >>>
> >>> I agree that determining variables which must have a temporary pair
> >>> by full scanning the SELECT list expressions is probably not a good idea.
> >>>
> >>> Declaring these variables only one time is easy.
> >>> I earlier made about the same trick with cursor parameters.
> >>> See sp_pcontext::retrieve_field_definitions().
> >>>
> >>> The idea is that we can put such backup variables into a separate
> >>> child sp_pcontext frame, to make sure that only one backup variable
> >>> exists if the real variable appears multiple time as a SELECT INTO
> >>> target. Having a dedicated frame allows:
> >>> - to add variables in the middle of a BEGIN..END block, without
> >>> having to re- enumerate local variables of the same block.
> >>> - handle unique names
> >>>
> >>> See the comment in  sp_pcontext::retrieve_field_definitions() about
> >>> frame positions and run-time positions, and the CURSOR declaration
> >>> grammar in sql_yacc_ora.yy.
> >>>
> >>>
> >>> Every sp_pcontext frame should have one sp_pcontext frame for
> backup
> >>> variables, which should be put into m_children.
> >>>
> >>> sp_pcontext should probably have a new flag:
> >>>   bool m_is_backup;
> >>> So we can iterate through m_children and find the backup frame.
> >>>
> >>> Another option is to add:
> >>>
> >>>   int m_backup_child_context_index;
> >>>
> >>> which will store -1 if the current frame does not have a backup
> >>> child yet, or a non-negative value meaning the index in m_children.
> >>> So a new method could looks like this:
> >>>
> >>> sp_pcontext *get_backup_context()
> >>> {
> >>>   if (m_backup_child_contex_index < 0)
> >>>     return 0;
> >>>   return m_children(m_backup_child_context_index);
> >>> }
> >>>
> >>>
> >>>>
> >>>> Regard,
> >>>> Jérôme.
> >>>>
> >>>>
> >>>>> -----Message d'origine-----
> >>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi 30
> >>>>> octobre 2017 10:02 À : jerome brauge Objet : Re: MDEV-14139
> >>>>> Anchored data types for variables
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 10/28/2017 07:29 PM, Alexander Barkov wrote:
> >>>>>>
> >>>>>> On 10/27/2017 10:27 PM, Alexander Barkov wrote:
> >>>>>>>   Hello Jerome,
> >>>>>>>
> >>>>>>> I have implemented "MDEV-14139 Anchored data types for
> >> variables".
> >>>>>>> and pushed it to bb-10.2-ext:
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://github.com/MariaDB/server/commit/5dd5253f7e50c21fa758e2eb58f
> >>>>> 3
> >>>>>>> aa9c9754e733
> >>>>>>>
> >>>>>>> So now it should be easier to implement consistent SET by
> >>>>>>> creating backup variables.
> >>>>>>>
> >>>>>>>
> >>>>>>> LEX::sp_variable_declarations_vartype_finalize() implements the
> >>>>>>> logic which copies data type from another variable.
> >>>>>>>
> >>>>>>>
> >>>>>>> The idea is that for all variables, which are assignment targets
> >>>>>>> in a SET or a SELECT INTO statement, we create a backup variable.
> >>>>>>>
> >>>>>>> It will involve these calls for every such variable:
> >>>>>>>
> >>>>>>> LEX::sp_variable_declarations_init(thd, 1);
> >>>>>>> sp_pcontext::add_variable(thd, backup_variable_name);
> >>>>>>> LEX::sp_variable_declarations_vartype_finalize(thd, 1,
> >>>>>>> orig_variable_name, def);
> >>>>>>>
> >>>>>>> where "def" is Item_splocal created for the original variable.
> >>>>>>
> >>>>>> Just an idea: instead of creating sp_instr_set, it's easier to
> >>>>>> introduce a new class sp_instr_set_var, to copy the value from
> >>>>>> one variable to another variable.
> >>>>>>
> >>>>>> This operation will not need neither Item, nor sp_lex_keeper. It
> >>>>>> will only need two offsets:
> >>>>>> - the source variable offset and
> >>>>>> - the target variable offset.
> >>>>>>
> >>>>>> Using these offsets, we can access to
> >>>>>> spcont->m_var_table->field[source] and
> >>>>>> spcont->spcont->m_var_table->field[target]
> >>>>>> and copy the value between them using Field::store_field().
> >>>>>>
> >>>>>> This won't however for the ROW variables at the moment, because
> >>>>>> ROW fields are stored in the Item_spvar_args::m_table member of
> >>>>>> Item_field_row.
> >>>>>>
> >>>>>> It seems we need a new class Field_row and move
> Virtual_tmp_table
> >>>>>> from Item_field_row to Field_row.
> >>>>>>
> >>>>>> I will try to do it.
> >>>>>
> >>>>>
> >>>>> I have implemented "MDEV-14212 Add Field_row for SP ROW
> variables"
> >>>>> and pushed to bb-10.2-ext.
> >>>>>
> >>>>> Also, added a comment:
> >>>>>
> >>>>> MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO
> >>>>> assignments
> >>>>>
> >>>>>
> >>>>> Now, when MDEV-14212 is done, these declarations:
> >>>>>
> >>>>>     DECLARE a_tmp TYPE OF a DEFAULT a;
> >>>>>     DECLARE b_tmp TYPE OF b DEFAULT b;
> >>>>>
> >>>>> and these assignments:
> >>>>>
> >>>>>     SET a=a_tmp;
> >>>>>     SET b=b_tmp;
> >>>>>
> >>>>> can use a new sp_instr_setvar instead of sp_inst_set.
> >>>>>
> >>>>> The new command sp_instr_setvar can do direct copying between
> two
> >>>>> spcont->m_var_table->field[XXX], without a need to create Item and
> >> LEX.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Greetings.
> >>>>>>>

Follow ups

Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-09

References

MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-03
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-03
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-03
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-06
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-07