maria-developers team mailing list archive

Thread
Date
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments

To: 'Alexander Barkov' <bar@xxxxxxxxxxx>
From: jerome brauge <j.brauge@xxxxxxxxxxx>
Date: Thu, 9 Nov 2017 14:42:33 +0000
Accept-language: fr-FR, en-US
Cc: "'MariaDB Developers \(maria-developers@xxxxxxxxxxxxxxxxxxx\)'" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <72ae3716-1a34-5620-53f3-13e896b9f240@mariadb.org>
Thread-index: AdNUfI8F7tLlcS6dTHuuNcTv1QOySgABd0UAAARYMhAAj7cg8ABArnyAADGcJeAAJ4DwgAACaHpgAAV/jQAAAuTR0A==
Thread-topic: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
Hi Alexander,

> -----Message d'origine-----
> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
> Envoyé : jeudi 9 novembre 2017 14:52
> À : jerome brauge
> Cc : 'MariaDB Developers (maria-developers@xxxxxxxxxxxxxxxxxxx)'
> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
> SELECT..INTO assignments
> 
> 
> 
> On 11/09/2017 03:03 PM, jerome brauge wrote:
> >
> >
> >> -----Message d'origine-----
> >> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : jeudi 9
> >> novembre 2017 11:05 À : jerome brauge Cc : 'MariaDB Developers
> >> (maria-developers@xxxxxxxxxxxxxxxxxxx)'
> >> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
> >> SELECT..INTO assignments
> >>
> >> Hello Jerome,
> >>
> >>
> >> On 11/09/2017 12:03 PM, jerome brauge wrote:
> >>> Hello Alexander,
> >>>
> >>>> -----Message d'origine-----
> >>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : mardi 7
> >>>> novembre 2017 16:34 À : jerome brauge Cc : 'MariaDB Developers
> >>>> (maria-developers@xxxxxxxxxxxxxxxxxxx)'
> >>>> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
> >>>> SELECT..INTO assignments
> >>>>
> >>>> Hi Jerome,
> >>>>
> >>>>
> >>>> On 11/06/2017 01:01 PM, jerome brauge wrote:
> >>>>> Hello Alexander,
> >>>>> We have checked all our stored procedure and the good news for us
> >>>>> is that
> >>>> only 2 statements are concerned !
> >>>>> So, MDEV-13418 is not mandatory for us.
> >>>>>
> >>>>> However, can we consider to do a full parse just at compile time ?
> >>>>> It would allow to :
> >>>>>  - determining variables which must have a temporary pair and
> >>>>> modifying stored source code to handle them
> >>>>>  - issuing error or warning when these statements have syntax
> >>>>> error or call unknown functions ( like Oracle, SQLserver , UDB ,
> >>>>> .. )
> >>>>
> >>>> I'm not sure I understood about full parser at compile time.
> >>>> Can you please give an example?
> >>>>
> >>>
> >>> Currently, the following function is compiled without any warning
> >>> and run
> >> fine as long as b1 is not between 1 and 5.
> >>>
> >>> delimiter /
> >>> CREATE or replace procedure p1(b1 integer) BEGIN
> >>>   declare res INTEGER default 0;
> >>>   if b1 = 0 then
> >>>     -- unknow variable
> >>>     set res=x;
> >>>   end if;
> >>>   if b1 = 1 then
> >>>     -- unknow function in set
> >>>     set res=lenngth('v');
> >>>   end if;
> >>>   if b1 = 2 then
> >>>     -- unknow variable
> >>>     select a into res;
> >>>   end if;
> >>>   if b1 = 3 then
> >>>     -- unknow table / column / function
> >>>    select 1 into res from unknowtable where unknowcolumn =
> >> unknowfunc();
> >>>   end if;
> >>>   if b1 = 4 then
> >>>     -- unknow group by column
> >>>     select 1 into res from dual group by 13;
> >>>   end if;
> >>>   if b1 = 5 then
> >>>     -- unknow function in test
> >>>     if coalesc(res,'x') = 'y' then
> >>>       set res:='z';
> >>>     end if;
> >>>   end if;
> >>>
> >>> END
> >>> /
> >>> call p1(-1)
> >>> /
> >>>
> >>> My idea is to have different behavior during "create procedure" and
> >> load_routine.
> >>> Load_routine must be fast but "create procedure" can be slower and
> >>> do more check (at least check existence of variables, tables,
> >>> columns,
> >> functions and procedure) It's not truly acceptable to throw this kind
> >> of error only when the code is really executed.
> >>
> >> At CREATE time, we can only add a warning when an unknown procedure
> >> or function is called. Issuing errors is not possible, because one
> >> would not be able to create two routines mutually executing each
> >> other (now it is possible).
> >>
> >
> > I agree, a warning is fine for an undefined called function.
> > But is there any chance to have control on used variables and columns ?
> 
> In this example:
> 
> DELIMITER $$
> CREATE OR REPLACE PROCEDURE p1()
> BEGIN
>   DECLARE res INT DEFAULT 0;
>   SET res=x;
> END;
> $$
> DELIMITER ;
> 
> I think we should generate the error at CREATE time, in all sql_mode's.
> 
> Looks like a bug that we don't do it.

Fine.

> 
> For things like:
> 
>   SELECT 1 INTO var FROM unknowtable where unknowcolumn = 10;
> 
> We cannot return errors by default.
> This behavior has been in MySQL/MariaDB since the time when stored
> procedures appeared: tables and columns resolution is done at execute
> time.
> 
> But adding a new sql_mode to return errors in such cases should be possible.
> The solution should handle cases when CREATE PROCEDURE was done with a
> known column, but then the column was removed by a ALTER TABLE. So it
> seems both CREATE and load must do this.
> I'm not sure how it can affect performance.
> 

I'm not sure that we need handle these cases during load.
It's the responsibility of the developer to do an impact analysis when he modify objects  (tables or procedure/function).
An error at runtime is acceptable (Sybase and SQLserver works like this).

> 
> >
> >
> >> So issuing errors is possible at execution time only.
> >>
> >> When a routine is loaded, it could be checked for all called
> >> procedures and functions, and a warning or an error could be issued
> >> if some routine is missing (independently on conditions in control
> structures).
> >>
> >>
> >>>
> >>> To do this, each select list expression have to be "parse" and so we
> >>> can
> >> determine variables which must have a temporary pair.
> >>> Ex:
> >>> CREATE PROCEDURE p1(res OUT VARCHAR) AS
> >>>   b1 INT:=10;
> >>> BEGIN
> >>>   SELECT  1,CAST(b1 AS VARCHAR(10)) INTO b1, res FROM dual; END;
> >>>
> >>> Can be transform and store as
> >>> CREATE PROCEDURE p1(res OUT VARCHAR) AS
> >>>   b1 INT:=10;
> >>> BEGIN
> >>>   DECLARE  _b1 INTEGER DEFAULT b1;
> >>>   SELECT  1,CAST(b1 AS VARCHAR(10)) INTO b1, res FROM dual;
> >>>   SET b1=_b1;
> >>> END;
> >>>
> >>> So, no additional works should be done during load_routine.
> >>
> >> Now we always put the routine into mysql.proc AS IS, how the definer
> >> defined it.
> >>
> >> I'm not sure if we should rewrite CREATE statements...
> >>
> >>
> >>>>>
> >>>>> Without this second point we are exposed to keystrokes errors and
> >>>>> to find
> >>>> these mistakes only at runtime.
> >>>>> It's already very hard to have tests that cover most of the code,
> >>>>> but here
> >>>> we must have 100% of coverage !
> >>>>> Fortunately for us, SQLServer and Oracle does the work for us :-)
> >>>>>
> >>>>> Other idea : maybe the behavior of "set" could be conditioned by
> >>>>> another
> >>>> sql_mode.
> >>>>> Sybase ,  SQLServer, Mariadb : multiple "set" are evaluated as
> >>>>> multiple
> >>>> single "set"
> >>>>> DB2 : work as select (each expression is evaluated before the
> >>>>> assignments are performed)
> >>>>>
> >>>>> We could have three sql_mode:
> >>>>> - SET_CONSISTENCY_UPDATE
> >>>>> - SET_CONSISTENCY_SELECT
> >>>>> - SET_CONSISTENCY_SET
> >>>>
> >>>> What would SET_CONSISTENCY_SELECT stand for?
> >>>> Can you give an example?
> >>>
> >>> I thought to:
> >>> - SET_CONSISTENCY_UPDATE for MDEV-13417 : UPDATE produces
> wrong
> >> values
> >>> if an updated column is later used as an update source
> >>> - SET_CONSISTENCY_SELECT for MDEV-13418 : The order of evaluation of
> >>> SELECT..INTO assignments
> >>> - SET_CONSISTENCY_SET  :  evaluate expression before any assignment
> >>> in "grouped" SET
> >>
> >> Thanks!
> >>
> >> I thought that behavior of SET is directly related to behavior of
> >> SELECT INTO, as both save data into variables. Why have two different
> >> sql_mode options for SET and SELECT INTO?
> >>
> >> Btw, why not have just one option SET_CONSISTENCY, which will control
> >> all three (UPDATE, SELECT INTO, SET) ?
> >>
> >> I didn't understand why you propose separate flags.
> >> Can you clarify please?
> >>
> >
> > For UPDATE , there is no workaround, we must implement it for Oracle
> compatibility.
> >
> > For SELECT INTO : I've always  thinking that it's a bad practice to use and
> assign the same variable in one statement. When reading the code, we can't
> know what the developer meant to do.
> > It's not  hard to correct this kind of statement.
> > Introduce an overhead  for most of select statement (we can optimize it
> when select return only one value), is not acceptable for us and we don't
> want enable this feature.
> >
> > For SET, most of RDBMS (that I know) works as Mariadb :
> >  - Sybase and SQLServer
> >  - and implicitly ORACLE because it can only do single assignment at
> > one time Others like IBM DB2 have same behavior for multi set and select
> into.
> >
> > So it's not a required feature for Oracle compatibility.
> 
> We cannot waste sql_mode bits for small features. We'll run out of the
> second 32 bit slot quickly.

Well. Is it complex to add a new longlong to extend sql_mode? (perhaps for replication)
Do you think that we could commit a first step of SET_CONSISTENCY just for the UPDATE part ?


> 
> In this case, I'm inclined to have a single SET_CONSISTENCY bit for all three
> purposes (UPDATE, SELECT INTO, SET). We should just make sure that SELECT
> INTO (and SET) works with a good performance.
> 
> Multi-SET is less important than SELECT INTO, as multi-SET is neither
> standard, nor Oracle-compatible.
> 
> >
> >
> >>>
> >>>>
> >>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> Regards,
> >>>>> Jérôme.
> >>>>>
> >>>>>
> >>>>>> -----Message d'origine-----
> >>>>>> De : jerome brauge
> >>>>>> Envoyé : vendredi 3 novembre 2017 12:18 À : 'Alexander Barkov'
> >>>>>> Cc : MariaDB Developers (maria-developers@xxxxxxxxxxxxxxxxxxx)
> >>>>>> Objet : RE: MDEV-13418 Compatibility: The order of evaluation of
> >>>>>> SELECT..INTO assignments
> >>>>>>
> >>>>>> Hi Alexander,
> >>>>>> Thanks for this idea, but for us, cpu is a critical hotspot.
> >>>>>> I will evaluate how many statements in our application are affected.
> >>>>>> If this number is not to high, we modify them.
> >>>>>> In this case, we will need two sql_mode (one for MDEV-13417 and
> >>>>>> one for MDEV-13418).
> >>>>>>
> >>>>>> Best regards.
> >>>>>> Jérôme.
> >>>>>>
> >>>>>>
> >>>>>>> -----Message d'origine-----
> >>>>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : vendredi
> >>>>>>> 3 novembre 2017 11:02 À : jerome brauge Cc : MariaDB Developers
> >>>>>>> (maria-developers@xxxxxxxxxxxxxxxxxxx)
> >>>>>>> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
> >>>>>>> SELECT..INTO assignments
> >>>>>>>
> >>>>>>> Hello Jerome,
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/03/2017 12:22 PM, jerome brauge wrote:
> >>>>>>>> Hello Alexander.
> >>>>>>>> I've begun to implement your proposal but now I'm not sure that
> >>>>>>>> it's a
> >>>>>>> better solution than mine.
> >>>>>>>> Let me explain .
> >>>>>>>> - first : number of temporary variables can be significant
> >>>>>>>> because we don't
> >>>>>>> know when they are really needed and their scope are local to
> >>>>>>> the statement.
> >>>>>>>>
> >>>>>>>>   declare b1 INTEGER;
> >>>>>>>>   declare res INTEGER;
> >>>>>>>> ...
> >>>>>>>>   if b1 = 0 then
> >>>>>>>>     select 1,b1+1 into b1, res from dual;
> >>>>>>>>   end if;
> >>>>>>>>   if b1 = 1 then
> >>>>>>>>     select 2,b1+2 into b1, res from dual;
> >>>>>>>>   end if;
> >>>>>>>>
> >>>>>>>> will be transform in :
> >>>>>>>>
> >>>>>>>>   declare b1 INTEGER;
> >>>>>>>>   declare res INTEGER;
> >>>>>>>> ...
> >>>>>>>>   if b1 = 0 then
> >>>>>>>>     declare _b1 INTEGER default res;
> >>>>>>>>     declare _res INTEGER default res;
> >>>>>>>>     select 1,b1+1 into _b1, _res from dual;
> >>>>>>>>     set b1=_b1;
> >>>>>>>>     set res=_res;
> >>>>>>>>     -- _res is not needed, but we don't know because the select
> >>>>>>>> statement is
> >>>>>>> not parsed
> >>>>>>>>   end if;
> >>>>>>>>   if b1 = 1 then
> >>>>>>>>     declare _b1 INTEGER default res;
> >>>>>>>>     declare _res INTEGER default res;
> >>>>>>>>     select 2,b1+2 into b1, res from dual;
> >>>>>>>>     set b1=_b1;
> >>>>>>>>     set res=_res;
> >>>>>>>>     -- same thing here, and we have declare two variables for
> >>>>>>>> each target
> >>>>>>> variables
> >>>>>>>>   end if;
> >>>>>>>>
> >>>>>>>> Perhaps we could
> >>>>>>>>  - declare these temporary variables only one time in the first
> >>>>>>>> frame of the stored procedure (may be tricky)
> >>>>>>>>  - parse columns of each select to know what variables are
> >>>>>>>> really assigned and reused  (heavy cost in cpu and time)
> >>>>>>>>
> >>>>>>>> - second : if we can't determine variables which must have a
> >>>>>>>> temporary
> >>>>>>> variable, number of sp_instr_set and sp_instr_set_var will be
> >>>>>>> too high and their cpu cost is not acceptable.
> >>>>>>>>
> >>>>>>>> My first solution has a fixed memory impact (and memory is not
> >>>>>>>> an issue
> >>>>>>> nowadays), and especially a very light cpu cost.
> >>>>>>>>
> >>>>>>>> What do you think about ?
> >>>>>>>
> >>>>>>> I agree that determining variables which must have a temporary
> >>>>>>> pair by full scanning the SELECT list expressions is probably
> >>>>>>> not a good
> >> idea.
> >>>>>>>
> >>>>>>> Declaring these variables only one time is easy.
> >>>>>>> I earlier made about the same trick with cursor parameters.
> >>>>>>> See sp_pcontext::retrieve_field_definitions().
> >>>>>>>
> >>>>>>> The idea is that we can put such backup variables into a
> >>>>>>> separate child sp_pcontext frame, to make sure that only one
> >>>>>>> backup variable exists if the real variable appears multiple
> >>>>>>> time as a SELECT INTO target. Having a dedicated frame allows:
> >>>>>>> - to add variables in the middle of a BEGIN..END block, without
> >>>>>>> having to re- enumerate local variables of the same block.
> >>>>>>> - handle unique names
> >>>>>>>
> >>>>>>> See the comment in  sp_pcontext::retrieve_field_definitions()
> >>>>>>> about frame positions and run-time positions, and the CURSOR
> >>>>>>> declaration grammar in sql_yacc_ora.yy.
> >>>>>>>
> >>>>>>>
> >>>>>>> Every sp_pcontext frame should have one sp_pcontext frame for
> >>>> backup
> >>>>>>> variables, which should be put into m_children.
> >>>>>>>
> >>>>>>> sp_pcontext should probably have a new flag:
> >>>>>>>   bool m_is_backup;
> >>>>>>> So we can iterate through m_children and find the backup frame.
> >>>>>>>
> >>>>>>> Another option is to add:
> >>>>>>>
> >>>>>>>   int m_backup_child_context_index;
> >>>>>>>
> >>>>>>> which will store -1 if the current frame does not have a backup
> >>>>>>> child yet, or a non-negative value meaning the index in m_children.
> >>>>>>> So a new method could looks like this:
> >>>>>>>
> >>>>>>> sp_pcontext *get_backup_context() {
> >>>>>>>   if (m_backup_child_contex_index < 0)
> >>>>>>>     return 0;
> >>>>>>>   return m_children(m_backup_child_context_index);
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Regard,
> >>>>>>>> Jérôme.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Message d'origine-----
> >>>>>>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi
> 30
> >>>>>>>>> octobre 2017 10:02 À : jerome brauge Objet : Re: MDEV-14139
> >>>>>>>>> Anchored data types for variables
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 10/28/2017 07:29 PM, Alexander Barkov wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 10/27/2017 10:27 PM, Alexander Barkov wrote:
> >>>>>>>>>>>   Hello Jerome,
> >>>>>>>>>>>
> >>>>>>>>>>> I have implemented "MDEV-14139 Anchored data types for
> >>>>>> variables".
> >>>>>>>>>>> and pushed it to bb-10.2-ext:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/MariaDB/server/commit/5dd5253f7e50c21fa758e2eb58f
> >>>>>>>>> 3
> >>>>>>>>>>> aa9c9754e733
> >>>>>>>>>>>
> >>>>>>>>>>> So now it should be easier to implement consistent SET by
> >>>>>>>>>>> creating backup variables.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> LEX::sp_variable_declarations_vartype_finalize() implements
> >>>>>>>>>>> the logic which copies data type from another variable.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The idea is that for all variables, which are assignment
> >>>>>>>>>>> targets in a SET or a SELECT INTO statement, we create a
> backup
> >> variable.
> >>>>>>>>>>>
> >>>>>>>>>>> It will involve these calls for every such variable:
> >>>>>>>>>>>
> >>>>>>>>>>> LEX::sp_variable_declarations_init(thd, 1);
> >>>>>>>>>>> sp_pcontext::add_variable(thd, backup_variable_name);
> >>>>>>>>>>> LEX::sp_variable_declarations_vartype_finalize(thd, 1,
> >>>>>>>>>>> orig_variable_name, def);
> >>>>>>>>>>>
> >>>>>>>>>>> where "def" is Item_splocal created for the original variable.
> >>>>>>>>>>
> >>>>>>>>>> Just an idea: instead of creating sp_instr_set, it's easier to
> >>>>>>>>>> introduce a new class sp_instr_set_var, to copy the value from
> >>>>>>>>>> one variable to another variable.
> >>>>>>>>>>
> >>>>>>>>>> This operation will not need neither Item, nor sp_lex_keeper.
> >>>>>>>>>> It will only need two offsets:
> >>>>>>>>>> - the source variable offset and
> >>>>>>>>>> - the target variable offset.
> >>>>>>>>>>
> >>>>>>>>>> Using these offsets, we can access to
> >>>>>>>>>> spcont->m_var_table->field[source] and
> >>>>>>>>>> spcont->spcont->m_var_table->field[target]
> >>>>>>>>>> and copy the value between them using Field::store_field().
> >>>>>>>>>>
> >>>>>>>>>> This won't however for the ROW variables at the moment,
> >> because
> >>>>>>>>>> ROW fields are stored in the Item_spvar_args::m_table
> member
> >> of
> >>>>>>>>>> Item_field_row.
> >>>>>>>>>>
> >>>>>>>>>> It seems we need a new class Field_row and move
> >>>> Virtual_tmp_table
> >>>>>>>>>> from Item_field_row to Field_row.
> >>>>>>>>>>
> >>>>>>>>>> I will try to do it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have implemented "MDEV-14212 Add Field_row for SP ROW
> >>>> variables"
> >>>>>>>>> and pushed to bb-10.2-ext.
> >>>>>>>>>
> >>>>>>>>> Also, added a comment:
> >>>>>>>>>
> >>>>>>>>> MDEV-13418 Compatibility: The order of evaluation of
> >>>>>>>>> SELECT..INTO assignments
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Now, when MDEV-14212 is done, these declarations:
> >>>>>>>>>
> >>>>>>>>>     DECLARE a_tmp TYPE OF a DEFAULT a;
> >>>>>>>>>     DECLARE b_tmp TYPE OF b DEFAULT b;
> >>>>>>>>>
> >>>>>>>>> and these assignments:
> >>>>>>>>>
> >>>>>>>>>     SET a=a_tmp;
> >>>>>>>>>     SET b=b_tmp;
> >>>>>>>>>
> >>>>>>>>> can use a new sp_instr_setvar instead of sp_inst_set.
> >>>>>>>>>
> >>>>>>>>> The new command sp_instr_setvar can do direct copying
> between
> >>>> two
> >>>>>>>>> spcont->m_var_table->field[XXX], without a need to create
> Item
> >>>>>>>>> spcont->and
> >>>>>> LEX.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Greetings.
> >>>>>>>>>>>
References

MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-03
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-03
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-03
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-06
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-07
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-09
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-09
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: jerome brauge, 2017-11-09
Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments
From: Alexander Barkov, 2017-11-09