← Back to team overview

maria-developers team mailing list archive

Re: MDEV-13418 Compatibility: The order of evaluation of SELECT..INTO assignments

 


On 11/09/2017 03:03 PM, jerome brauge wrote:
> 
> 
>> -----Message d'origine-----
>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx]
>> Envoyé : jeudi 9 novembre 2017 11:05
>> À : jerome brauge
>> Cc : 'MariaDB Developers (maria-developers@xxxxxxxxxxxxxxxxxxx)'
>> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
>> SELECT..INTO assignments
>>
>> Hello Jerome,
>>
>>
>> On 11/09/2017 12:03 PM, jerome brauge wrote:
>>> Hello Alexander,
>>>
>>>> -----Message d'origine-----
>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : mardi 7
>>>> novembre 2017 16:34 À : jerome brauge Cc : 'MariaDB Developers
>>>> (maria-developers@xxxxxxxxxxxxxxxxxxx)'
>>>> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
>>>> SELECT..INTO assignments
>>>>
>>>> Hi Jerome,
>>>>
>>>>
>>>> On 11/06/2017 01:01 PM, jerome brauge wrote:
>>>>> Hello Alexander,
>>>>> We have checked all our stored procedure and the good news for us is
>>>>> that
>>>> only 2 statements are concerned !
>>>>> So, MDEV-13418 is not mandatory for us.
>>>>>
>>>>> However, can we consider to do a full parse just at compile time ?
>>>>> It would allow to :
>>>>>  - determining variables which must have a temporary pair and
>>>>> modifying stored source code to handle them
>>>>>  - issuing error or warning when these statements have syntax error
>>>>> or call unknown functions ( like Oracle, SQLserver , UDB , .. )
>>>>
>>>> I'm not sure I understood about full parser at compile time.
>>>> Can you please give an example?
>>>>
>>>
>>> Currently, the following function is compiled without any warning and run
>> fine as long as b1 is not between 1 and 5.
>>>
>>> delimiter /
>>> CREATE or replace procedure p1(b1 integer) BEGIN
>>>   declare res INTEGER default 0;
>>>   if b1 = 0 then
>>>     -- unknow variable
>>>     set res=x;
>>>   end if;
>>>   if b1 = 1 then
>>>     -- unknow function in set
>>>     set res=lenngth('v');
>>>   end if;
>>>   if b1 = 2 then
>>>     -- unknow variable
>>>     select a into res;
>>>   end if;
>>>   if b1 = 3 then
>>>     -- unknow table / column / function
>>>    select 1 into res from unknowtable where unknowcolumn =
>> unknowfunc();
>>>   end if;
>>>   if b1 = 4 then
>>>     -- unknow group by column
>>>     select 1 into res from dual group by 13;
>>>   end if;
>>>   if b1 = 5 then
>>>     -- unknow function in test
>>>     if coalesc(res,'x') = 'y' then
>>>       set res:='z';
>>>     end if;
>>>   end if;
>>>
>>> END
>>> /
>>> call p1(-1)
>>> /
>>>
>>> My idea is to have different behavior during "create procedure" and
>> load_routine.
>>> Load_routine must be fast but "create procedure" can be slower and do
>>> more check (at least check existence of variables, tables, columns,
>> functions and procedure) It's not truly acceptable to throw this kind of error
>> only when the code is really executed.
>>
>> At CREATE time, we can only add a warning when an unknown procedure or
>> function is called. Issuing errors is not possible, because one would not be
>> able to create two routines mutually executing each other (now it is
>> possible).
>>
> 
> I agree, a warning is fine for an undefined called function.
> But is there any chance to have control on used variables and columns ?

In this example:

DELIMITER $$
CREATE OR REPLACE PROCEDURE p1()
BEGIN
  DECLARE res INT DEFAULT 0;
  SET res=x;
END;
$$
DELIMITER ;

I think we should generate the error at CREATE time, in all sql_mode's.

Looks like a bug that we don't do it.


For things like:

  SELECT 1 INTO var FROM unknowtable where unknowcolumn = 10;

We cannot return errors by default.
This behavior has been in MySQL/MariaDB since the time when
stored procedures appeared: tables and columns resolution
is done at execute time.

But adding a new sql_mode to return errors in such cases should be
possible. The solution should handle cases when CREATE PROCEDURE
was done with a known column, but then the column was removed
by a ALTER TABLE. So it seems both CREATE and load must do this.
I'm not sure how it can affect performance.


> 
> 
>> So issuing errors is possible at execution time only.
>>
>> When a routine is loaded, it could be checked for all called procedures and
>> functions, and a warning or an error could be issued if some routine is missing
>> (independently on conditions in control structures).
>>
>>
>>>
>>> To do this, each select list expression have to be "parse" and so we can
>> determine variables which must have a temporary pair.
>>> Ex:
>>> CREATE PROCEDURE p1(res OUT VARCHAR)
>>> AS
>>>   b1 INT:=10;
>>> BEGIN
>>>   SELECT  1,CAST(b1 AS VARCHAR(10)) INTO b1, res FROM dual; END;
>>>
>>> Can be transform and store as
>>> CREATE PROCEDURE p1(res OUT VARCHAR)
>>> AS
>>>   b1 INT:=10;
>>> BEGIN
>>>   DECLARE  _b1 INTEGER DEFAULT b1;
>>>   SELECT  1,CAST(b1 AS VARCHAR(10)) INTO b1, res FROM dual;
>>>   SET b1=_b1;
>>> END;
>>>
>>> So, no additional works should be done during load_routine.
>>
>> Now we always put the routine into mysql.proc AS IS, how the definer
>> defined it.
>>
>> I'm not sure if we should rewrite CREATE statements...
>>
>>
>>>>>
>>>>> Without this second point we are exposed to keystrokes errors and to
>>>>> find
>>>> these mistakes only at runtime.
>>>>> It's already very hard to have tests that cover most of the code,
>>>>> but here
>>>> we must have 100% of coverage !
>>>>> Fortunately for us, SQLServer and Oracle does the work for us :-)
>>>>>
>>>>> Other idea : maybe the behavior of "set" could be conditioned by
>>>>> another
>>>> sql_mode.
>>>>> Sybase ,  SQLServer, Mariadb : multiple "set" are evaluated as
>>>>> multiple
>>>> single "set"
>>>>> DB2 : work as select (each expression is evaluated before the
>>>>> assignments are performed)
>>>>>
>>>>> We could have three sql_mode:
>>>>> - SET_CONSISTENCY_UPDATE
>>>>> - SET_CONSISTENCY_SELECT
>>>>> - SET_CONSISTENCY_SET
>>>>
>>>> What would SET_CONSISTENCY_SELECT stand for?
>>>> Can you give an example?
>>>
>>> I thought to:
>>> - SET_CONSISTENCY_UPDATE for MDEV-13417 : UPDATE produces wrong
>> values
>>> if an updated column is later used as an update source
>>> - SET_CONSISTENCY_SELECT for MDEV-13418 : The order of evaluation of
>>> SELECT..INTO assignments
>>> - SET_CONSISTENCY_SET  :  evaluate expression before any assignment in
>>> "grouped" SET
>>
>> Thanks!
>>
>> I thought that behavior of SET is directly related to behavior of SELECT INTO,
>> as both save data into variables. Why have two different sql_mode options
>> for SET and SELECT INTO?
>>
>> Btw, why not have just one option SET_CONSISTENCY, which will control all
>> three (UPDATE, SELECT INTO, SET) ?
>>
>> I didn't understand why you propose separate flags.
>> Can you clarify please?
>>
> 
> For UPDATE , there is no workaround, we must implement it for Oracle compatibility.
> 
> For SELECT INTO : I've always  thinking that it's a bad practice to use and assign the same variable in one statement. When reading the code, we can't know what the developer meant to do.
> It's not  hard to correct this kind of statement.
> Introduce an overhead  for most of select statement (we can optimize it when select return only one value), is not acceptable for us and we don't want enable this feature.
> 
> For SET, most of RDBMS (that I know) works as Mariadb :
>  - Sybase and SQLServer 
>  - and implicitly ORACLE because it can only do single assignment at one time
> Others like IBM DB2 have same behavior for multi set and select into.
> 
> So it's not a required feature for Oracle compatibility.

We cannot waste sql_mode bits for small features. We'll run out of the
second 32 bit slot quickly.

In this case, I'm inclined to have a single SET_CONSISTENCY bit
for all three purposes (UPDATE, SELECT INTO, SET). We should just
make sure that SELECT INTO (and SET) works with a good performance.

Multi-SET is less important than SELECT INTO, as multi-SET is
neither standard, nor Oracle-compatible.

> 
> 
>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>>
>>>>> Regards,
>>>>> Jérôme.
>>>>>
>>>>>
>>>>>> -----Message d'origine-----
>>>>>> De : jerome brauge
>>>>>> Envoyé : vendredi 3 novembre 2017 12:18 À : 'Alexander Barkov'
>>>>>> Cc : MariaDB Developers (maria-developers@xxxxxxxxxxxxxxxxxxx)
>>>>>> Objet : RE: MDEV-13418 Compatibility: The order of evaluation of
>>>>>> SELECT..INTO assignments
>>>>>>
>>>>>> Hi Alexander,
>>>>>> Thanks for this idea, but for us, cpu is a critical hotspot.
>>>>>> I will evaluate how many statements in our application are affected.
>>>>>> If this number is not to high, we modify them.
>>>>>> In this case, we will need two sql_mode (one for MDEV-13417 and one
>>>>>> for MDEV-13418).
>>>>>>
>>>>>> Best regards.
>>>>>> Jérôme.
>>>>>>
>>>>>>
>>>>>>> -----Message d'origine-----
>>>>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : vendredi 3
>>>>>>> novembre 2017 11:02 À : jerome brauge Cc : MariaDB Developers
>>>>>>> (maria-developers@xxxxxxxxxxxxxxxxxxx)
>>>>>>> Objet : Re: MDEV-13418 Compatibility: The order of evaluation of
>>>>>>> SELECT..INTO assignments
>>>>>>>
>>>>>>> Hello Jerome,
>>>>>>>
>>>>>>>
>>>>>>> On 11/03/2017 12:22 PM, jerome brauge wrote:
>>>>>>>> Hello Alexander.
>>>>>>>> I've begun to implement your proposal but now I'm not sure that
>>>>>>>> it's a
>>>>>>> better solution than mine.
>>>>>>>> Let me explain .
>>>>>>>> - first : number of temporary variables can be significant
>>>>>>>> because we don't
>>>>>>> know when they are really needed and their scope are local to the
>>>>>>> statement.
>>>>>>>>
>>>>>>>>   declare b1 INTEGER;
>>>>>>>>   declare res INTEGER;
>>>>>>>> ...
>>>>>>>>   if b1 = 0 then
>>>>>>>>     select 1,b1+1 into b1, res from dual;
>>>>>>>>   end if;
>>>>>>>>   if b1 = 1 then
>>>>>>>>     select 2,b1+2 into b1, res from dual;
>>>>>>>>   end if;
>>>>>>>>
>>>>>>>> will be transform in :
>>>>>>>>
>>>>>>>>   declare b1 INTEGER;
>>>>>>>>   declare res INTEGER;
>>>>>>>> ...
>>>>>>>>   if b1 = 0 then
>>>>>>>>     declare _b1 INTEGER default res;
>>>>>>>>     declare _res INTEGER default res;
>>>>>>>>     select 1,b1+1 into _b1, _res from dual;
>>>>>>>>     set b1=_b1;
>>>>>>>>     set res=_res;
>>>>>>>>     -- _res is not needed, but we don't know because the select
>>>>>>>> statement is
>>>>>>> not parsed
>>>>>>>>   end if;
>>>>>>>>   if b1 = 1 then
>>>>>>>>     declare _b1 INTEGER default res;
>>>>>>>>     declare _res INTEGER default res;
>>>>>>>>     select 2,b1+2 into b1, res from dual;
>>>>>>>>     set b1=_b1;
>>>>>>>>     set res=_res;
>>>>>>>>     -- same thing here, and we have declare two variables for
>>>>>>>> each target
>>>>>>> variables
>>>>>>>>   end if;
>>>>>>>>
>>>>>>>> Perhaps we could
>>>>>>>>  - declare these temporary variables only one time in the first
>>>>>>>> frame of the stored procedure (may be tricky)
>>>>>>>>  - parse columns of each select to know what variables are really
>>>>>>>> assigned and reused  (heavy cost in cpu and time)
>>>>>>>>
>>>>>>>> - second : if we can't determine variables which must have a
>>>>>>>> temporary
>>>>>>> variable, number of sp_instr_set and sp_instr_set_var will be too
>>>>>>> high and their cpu cost is not acceptable.
>>>>>>>>
>>>>>>>> My first solution has a fixed memory impact (and memory is not an
>>>>>>>> issue
>>>>>>> nowadays), and especially a very light cpu cost.
>>>>>>>>
>>>>>>>> What do you think about ?
>>>>>>>
>>>>>>> I agree that determining variables which must have a temporary
>>>>>>> pair by full scanning the SELECT list expressions is probably not a good
>> idea.
>>>>>>>
>>>>>>> Declaring these variables only one time is easy.
>>>>>>> I earlier made about the same trick with cursor parameters.
>>>>>>> See sp_pcontext::retrieve_field_definitions().
>>>>>>>
>>>>>>> The idea is that we can put such backup variables into a separate
>>>>>>> child sp_pcontext frame, to make sure that only one backup
>>>>>>> variable exists if the real variable appears multiple time as a
>>>>>>> SELECT INTO target. Having a dedicated frame allows:
>>>>>>> - to add variables in the middle of a BEGIN..END block, without
>>>>>>> having to re- enumerate local variables of the same block.
>>>>>>> - handle unique names
>>>>>>>
>>>>>>> See the comment in  sp_pcontext::retrieve_field_definitions()
>>>>>>> about frame positions and run-time positions, and the CURSOR
>>>>>>> declaration grammar in sql_yacc_ora.yy.
>>>>>>>
>>>>>>>
>>>>>>> Every sp_pcontext frame should have one sp_pcontext frame for
>>>> backup
>>>>>>> variables, which should be put into m_children.
>>>>>>>
>>>>>>> sp_pcontext should probably have a new flag:
>>>>>>>   bool m_is_backup;
>>>>>>> So we can iterate through m_children and find the backup frame.
>>>>>>>
>>>>>>> Another option is to add:
>>>>>>>
>>>>>>>   int m_backup_child_context_index;
>>>>>>>
>>>>>>> which will store -1 if the current frame does not have a backup
>>>>>>> child yet, or a non-negative value meaning the index in m_children.
>>>>>>> So a new method could looks like this:
>>>>>>>
>>>>>>> sp_pcontext *get_backup_context()
>>>>>>> {
>>>>>>>   if (m_backup_child_contex_index < 0)
>>>>>>>     return 0;
>>>>>>>   return m_children(m_backup_child_context_index);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Regard,
>>>>>>>> Jérôme.
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Message d'origine-----
>>>>>>>>> De : Alexander Barkov [mailto:bar@xxxxxxxxxxx] Envoyé : lundi 30
>>>>>>>>> octobre 2017 10:02 À : jerome brauge Objet : Re: MDEV-14139
>>>>>>>>> Anchored data types for variables
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/28/2017 07:29 PM, Alexander Barkov wrote:
>>>>>>>>>>
>>>>>>>>>> On 10/27/2017 10:27 PM, Alexander Barkov wrote:
>>>>>>>>>>>   Hello Jerome,
>>>>>>>>>>>
>>>>>>>>>>> I have implemented "MDEV-14139 Anchored data types for
>>>>>> variables".
>>>>>>>>>>> and pushed it to bb-10.2-ext:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>> https://github.com/MariaDB/server/commit/5dd5253f7e50c21fa758e2eb58f
>>>>>>>>> 3
>>>>>>>>>>> aa9c9754e733
>>>>>>>>>>>
>>>>>>>>>>> So now it should be easier to implement consistent SET by
>>>>>>>>>>> creating backup variables.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LEX::sp_variable_declarations_vartype_finalize() implements
>>>>>>>>>>> the logic which copies data type from another variable.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The idea is that for all variables, which are assignment
>>>>>>>>>>> targets in a SET or a SELECT INTO statement, we create a backup
>> variable.
>>>>>>>>>>>
>>>>>>>>>>> It will involve these calls for every such variable:
>>>>>>>>>>>
>>>>>>>>>>> LEX::sp_variable_declarations_init(thd, 1);
>>>>>>>>>>> sp_pcontext::add_variable(thd, backup_variable_name);
>>>>>>>>>>> LEX::sp_variable_declarations_vartype_finalize(thd, 1,
>>>>>>>>>>> orig_variable_name, def);
>>>>>>>>>>>
>>>>>>>>>>> where "def" is Item_splocal created for the original variable.
>>>>>>>>>>
>>>>>>>>>> Just an idea: instead of creating sp_instr_set, it's easier to
>>>>>>>>>> introduce a new class sp_instr_set_var, to copy the value from
>>>>>>>>>> one variable to another variable.
>>>>>>>>>>
>>>>>>>>>> This operation will not need neither Item, nor sp_lex_keeper.
>>>>>>>>>> It will only need two offsets:
>>>>>>>>>> - the source variable offset and
>>>>>>>>>> - the target variable offset.
>>>>>>>>>>
>>>>>>>>>> Using these offsets, we can access to
>>>>>>>>>> spcont->m_var_table->field[source] and
>>>>>>>>>> spcont->spcont->m_var_table->field[target]
>>>>>>>>>> and copy the value between them using Field::store_field().
>>>>>>>>>>
>>>>>>>>>> This won't however for the ROW variables at the moment,
>> because
>>>>>>>>>> ROW fields are stored in the Item_spvar_args::m_table member
>> of
>>>>>>>>>> Item_field_row.
>>>>>>>>>>
>>>>>>>>>> It seems we need a new class Field_row and move
>>>> Virtual_tmp_table
>>>>>>>>>> from Item_field_row to Field_row.
>>>>>>>>>>
>>>>>>>>>> I will try to do it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have implemented "MDEV-14212 Add Field_row for SP ROW
>>>> variables"
>>>>>>>>> and pushed to bb-10.2-ext.
>>>>>>>>>
>>>>>>>>> Also, added a comment:
>>>>>>>>>
>>>>>>>>> MDEV-13418 Compatibility: The order of evaluation of
>>>>>>>>> SELECT..INTO assignments
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now, when MDEV-14212 is done, these declarations:
>>>>>>>>>
>>>>>>>>>     DECLARE a_tmp TYPE OF a DEFAULT a;
>>>>>>>>>     DECLARE b_tmp TYPE OF b DEFAULT b;
>>>>>>>>>
>>>>>>>>> and these assignments:
>>>>>>>>>
>>>>>>>>>     SET a=a_tmp;
>>>>>>>>>     SET b=b_tmp;
>>>>>>>>>
>>>>>>>>> can use a new sp_instr_setvar instead of sp_inst_set.
>>>>>>>>>
>>>>>>>>> The new command sp_instr_setvar can do direct copying between
>>>> two
>>>>>>>>> spcont->m_var_table->field[XXX], without a need to create Item
>>>>>>>>> spcont->and
>>>>>> LEX.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Greetings.
>>>>>>>>>>>


Follow ups

References