maria-developers team mailing list archive

Thread
Date

Re: d67c3f88883: MDEV-27744 InnoDB: Failing assertion: !cursor->index->is_committed() in row0ins.cc (from row_ins_sec_index_entry_by_modify) | Assertion `0' failed in row_upd_sec_index_entry (debug) | Corruption

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Thu, 4 May 2023 16:20:04 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <ZEmpa+Ws3lp2LCmX@pweza>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

  Hello Sergei,

Thanks for your review.

Please find a new version here:

https://github.com/MariaDB/server/commits/bb-10.4-bar-MDEV-27744-v2

I moved pure refacoting into separate patches,
they are in the same branch:

MDEV-31184 Remove parser tokens DECODE_MARIADB_SYM and DECODE_ORACLE_SYM
MDEV-31187 Add class Sql_mode_save_for_frm_handling

So now the main patch is much smaller.


On 4/27/23 2:44 AM, Sergei Golubchik wrote:

Hi, Alexander,

A question: if the frm will *always* be created in the canonical,
non-ORACLE mode and always parsed in the same mode, will we even need
these schema-qualified native functions?

okay, the schema-qualified trick will help to avoid creating
REGEX_REPLACE_ORACLE and so on. But is it needed to fix MDEV-27744?


Unfortunately I did not check how much of the patch can be removed
to fix only MDEV-27744.

My intent was to do everything at the same time:
1. Fix MDEV-27744

2. Prepare infrustructure to avoid adding mew XXX_ORACLE functions

3. Make the code symmetric for oracle_schema and mariadb_schema for
   simplicity of the patch.

4. Make it possible to call MariaDB functions from inside Oracle stored
   procedures

5. Make SHOW CREATE TABLE and SHOW CREATE VIEW display MariaDB versions
   of sql_mode-dependent functions with the mariadb_schema prefix
   if the current sql_mode is ORACLE. I think this is needed to
   avoid confusion for the user.
   But after that if we display MariaDB functions using mariadb_schema,
   we also need to parse mariadb_schema.


Now that we have a customer bug "MDEV-29095 REGEXP_REPLACE ...",
I think it's better not to minimize the patch to MDEV-27744.
This symmetric solution is more reliable for my opinion.


More comments below

On Apr 27, Alexander Barkov wrote:

revision-id: d67c3f88883 (mariadb-10.3.33-61-gd67c3f88883)
parent(s): 7355f7b1f5c
author: Alexander Barkov
committer: Alexander Barkov
timestamp: 2022-04-22 15:35:16 +0400
message:


<cut>

diff --git a/sql/unireg.cc b/sql/unireg.cc
index efbb5e773b4..3b37944afa5 100644
--- a/sql/unireg.cc
+++ b/sql/unireg.cc
@@ -652,6 +649,7 @@ static bool pack_expression(String *buf, Virtual_column_info *vcol,
  static bool pack_vcols(String *buf, List<Create_field> &create_fields,
                               List<Virtual_column_info> *check_constraint_list)
  {
+  Sql_mode_save_for_frm_handling sql_mode_save(current_thd);


well, it was done above pack_vcols(), because the caller had access to
the thd. Why not to keep it that way?


I added a THD* parameter to pack_vcols.

I think when we have a code line this:

  change_sql_mode();
  call_just_one_function();
  restore_sql_mode();

then change/restore should go inside call_just_one_function().
This looks more readable for me. But I won't insist.
Please suggest.

    List_iterator<Create_field> it(create_fields);
    Create_field *field;

diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc

index f41414f8ae9..0f5c7519d07 100644
--- a/sql/item_cmpfunc.cc
+++ b/sql/item_cmpfunc.cc
@@ -3292,9 +3292,24 @@ void Item_func_case_simple::print(String *str, enum_query_type query_type)
  }

+Item_func_decode_oracle *

+Item_func_decode_oracle::create(THD *thd, const LEX_CSTRING &name,
+                                List<Item> *item_list)
+{
+  if (unlikely(!item_list || item_list->elements < 3))
+  {
+    wrong_param_count_error(oracle_schema_ref.name(), name);
+    return NULL;
+  }
+  return new (thd->mem_root) Item_func_decode_oracle(thd, *item_list);
+}


why? items are normally created in the Create_xxx::create* methods,
defined in item_create.cc. Why did you delegate that to static
methods in Item_func_xxx ?



I thought it would be more future proof.
It would allow to use very short templates for Create_func_xxx
by passing Item_func_xxx as a template parameter.

But you're right, this is a separate issue.
I removed these Item_func_xxx::create() helper methods for now.

diff --git a/sql/mysqld.h b/sql/mysqld.h
index d40e1d170d0..54069c3cd5a 100644
--- a/sql/mysqld.h
+++ b/sql/mysqld.h
@@ -741,12 +741,25 @@ enum enum_query_type
                          QT_ITEM_SUBSELECT_ID_ONLY,

QT_SHOW_SELECT_NUMBER= (1<<10),

+
+  //QT_ITEM_IDENT_DISABLE_DB_TABLE_NAMES= (1 <<11), -- this is taken in 10.5
+
+  /// Print sql_mode-dependent functions with the schema qualifier
+  /// even if the currently implied (by sql_mode) schema is equal to
+  /// to the function schema, e.g. mariadb_schema.concat('a').
+  QT_ITEM_FUNC_FORCE_SCHEMA_NAME= (1 << 12),
+
+  /// Print for FRM file. Focus on parse-back.
+  /// e.g. VIEW expressions and virtual column expressions
+  QT_FOR_FRM= (1 << 13),
+
    /// This is used for EXPLAIN EXTENDED extra warnings / Be more detailed
    /// Be more detailed than QT_EXPLAIN.
    /// Perhaps we should eventually include QT_ITEM_IDENT_SKIP_CURRENT_DATABASE
    /// here, as it would give better readable results
    QT_EXPLAIN_EXTENDED=  QT_TO_SYSTEM_CHARSET|
-                        QT_SHOW_SELECT_NUMBER,
+                        QT_SHOW_SELECT_NUMBER|
+                        QT_ITEM_FUNC_FORCE_SCHEMA_NAME,


don't do this. let's try to keep the behavior *exactly as before*
for people who don't switch between sql_modes all the time

on the second thought. why do you need QT_ITEM_FUNC_FORCE_SCHEMA_NAME at all?
Besides EXPLAIN EXTENDED (where we'd better not use it), it's only
used in generating view's sql for I_S. It's not supposed to be used for
generating views, so why would it need QT_ITEM_FUNC_FORCE_SCHEMA_NAME?
May be it can be deleted altogether?


Fixed. I removed QT_ITEM_FUNC_FORCE_SCHEMA_NAME.
Now the patch is only adding QT_FOR_FRM.

// If an expression is constant, print the expression, not the value

    // it evaluates to. Should be used for error messages, so that they
diff --git a/sql/sql_class.h b/sql/sql_class.h
index 8268ebe0bcd..6b988938f0b 100644
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@@ -6589,6 +6589,50 @@ class Sql_mode_save
    sql_mode_t old_mode; // SQL mode saved at construction time.
  };

+

+/*
+  Save the current sql_mode. Switch off sql_mode flags which can prevent
+  normal parsing of VIEWs, expressions in generated columns.
+  Restore the old sql_mode on destructor.
+*/
+class Sql_mode_save_for_frm_handling: public Sql_mode_save
+{
+public:
+  Sql_mode_save_for_frm_handling(THD *thd)
+   :Sql_mode_save(thd)
+  {
+    /*
+      - MODE_REAL_AS_FLOAT            affect only CREATE TABLE parsing
+      + MODE_PIPES_AS_CONCAT          affect expression parsing
+      + MODE_ANSI_QUOTES              affect expression parsing
+      + MODE_IGNORE_SPACE             affect expression parsing
+      - MODE_IGNORE_BAD_TABLE_OPTIONS affect only CREATE/ALTER TABLE parsing
+      * MODE_ONLY_FULL_GROUP_BY       affect execution
+      * MODE_NO_UNSIGNED_SUBTRACTION  affect execution
+      - MODE_NO_DIR_IN_CREATE         affect table creation only
+      - MODE_POSTGRESQL               compounded from other modes
+      - MODE_ORACLE                   affects Item creation (e.g for CONCAT)


should be '+' for MODE_ORACLE, not '-', right?


Fixed.

diff --git a/sql/sql_schema.h b/sql/sql_schema.h
index 7c8f284d526..36914e36990 100644
--- a/sql/sql_schema.h
+++ b/sql/sql_schema.h
@@ -33,6 +36,34 @@ class Schema
    {
      return src;
    }
+
+
+  /**
+    Find the native function builder associated with a given function name.
+    @param thd The current thread
+    @param name The native function name
+    @return The native function builder associated with the name, or NULL
+  */
+  virtual Create_func *find_native_function_builder(THD *thd,
+                                                    const LEX_CSTRING &name)
+                                                    const;
+  /**
+    Find a native function builder, return an error if not found,
+    build an Item otherwise.
+  */
+  Item *make_item_func_call_native(THD *thd,
+                                   const Lex_ident_sys &name,
+                                   List<Item> *args) const;
+
+  // Builders for native SQL function with a special syntax in sql_yacc.yy
+  virtual Item *make_item_func_substr(THD *thd,
+                                      const Lex_substring_spec_st &spec) const;
+
+  virtual Item *make_item_func_trim(THD *thd, const Lex_trim_st &spec) const;
+  virtual Item *make_item_func_replace(THD *thd,
+                                       Item *subj,
+                                       Item *find,
+                                       Item *replace) const;


Why no Lex_replace_spec_st ?


The intent for adding Lex_substring_spec_st and Lex_trim_spec_st
was to move the grammar into separate rules:

  substring_operands:
  trim_operands:


REPLACE operands on the other hand have a very simple notation:

| REPLACE '(' expr ',' expr ',' expr ')'


I didn't find useful to add Lex_replace_spec_st.

    /*
      For now we have *hard-coded* compatibility schemas:
        schema_mariadb, schema_oracle, schema_maxdb.
@@ -66,5 +97,6 @@ class Schema

extern Schema mariadb_schema;

+extern const Schema &oracle_schema_ref;


??? why did you declare mariadb_schema extern, but oracle_schema_ref
only via a const ref?


The internal structure of class Schema_oracle is visible only
inside sql_schema.cc. I did not want to expose it to sql_schema.h.

It looks like compatibility modes like ORACLE can in the future
very easily go into a new type of plugins - compatibility plugins,
and oracle_schema_ref plays a role of the entry point of such a plugin.

So I made it this way intentionally, to avoid Schema_oracle specific
methods or members to be visible outside.

#endif // SQL_SCHEMA_H_INCLUDED

diff --git a/sql/item_create.h b/sql/item_create.h
index 894e9777b8d..3fadaecb090 100644
--- a/sql/item_create.h
+++ b/sql/item_create.h
@@ -215,9 +205,48 @@ struct Native_func_registry
    Create_func *builder;
  };

+

+class Native_functions_hash: public HASH
+{
+public:
+  Native_functions_hash()
+  {
+    bzero(this, sizeof(*this));
+  }
+  ~Native_functions_hash()
+  {
+    /*
+      No automatic free.
+      The the upper level code should call cleanup() explicitly.


why is that?


As discussed on slack - instances of this class
are global variables and they destruct at the server
shutdown time only. At this point mysys can already
be deinitialized.

  Item *create_func_dyncol_create(THD *thd, List<DYNCALL_CREATE_DEF> &list);
diff --git a/sql/sql_schema.cc b/sql/sql_schema.cc
index 0bf4a63c2f8..e2993f51c6d 100644
--- a/sql/sql_schema.cc
+++ b/sql/sql_schema.cc
@@ -78,3 +89,86 @@ Schema *Schema::find_implied(THD *thd)
      return &maxdb_schema;
    return &mariadb_schema;
  }
+
+
+Create_func *
+Schema::find_native_function_builder(THD *thd, const LEX_CSTRING &name) const
+{
+  return native_functions_hash.find(thd, name);
+}
+
+
+Create_func *
+Schema_oracle::find_native_function_builder(THD *thd,
+                                            const LEX_CSTRING &name) const
+{
+  Create_func *create= native_functions_hash_oracle_overrides.find(thd, name);
+  if (create)
+    return create;
+  return native_functions_hash.find(thd, name);


may be populate native_functions_hash_oracle will all names to
do only one lookup here? this will be used often in oracle mode


Done.

+}
+
+
+Item *Schema::make_item_func_call_native(THD *thd,
+                                         const Lex_ident_sys &name,
+                                         List<Item> *args) const
+{
+  Create_func *builder= find_native_function_builder(thd, name);
+  if (builder)
+    return builder->create_func(thd, &name, args);
+  my_error(ER_FUNCTION_NOT_DEFINED, MYF(0), name.str);
+  return NULL;
+}
+
+
+Item *Schema::make_item_func_trim(THD *thd, const Lex_trim_st &spec) const
+{
+  return spec.make_item_func_trim_std(thd);


why does trim is constructed via helper in spec,
but substr and replace are not?


SUBSTR and TRIM are constructed using helpers:


  virtual Item *make_item_func_substr(THD *thd,

const Lex_substring_spec_st&spec) const;

virtual Item *make_item_func_trim(THD *thd, const Lex_trim_st &spec)const;



REPLACE is not - because it does not have a helper (explained above).

<cut>

diff --git a/sql/item_func.h b/sql/item_func.h
index 754b1cd1eb2..52bf7f661d1 100644
--- a/sql/item_func.h
+++ b/sql/item_func.h

+  void print_sql_mode_dependent_name(String *to, enum_query_type query_type,
+                                     const Schema &schema,
+                                     const char *function_name) const
+  {
+    static const LEX_CSTRING oracle= {STRING_WITH_LEN("oracle")};


can it be oracle_schema.name() instead?


Unfortunately, it can not.
oracle_schema.name() is "oracle_schema", it's not "oracle".


Btw, this code does not exists any more.
I changed a few Item_func_xxx_oracle classes to do it like this:

  void print(String *str, enum_query_type query_type)
  {
    if (query_type & QT_FOR_FRM)
      str->append(STRING_WITH_LEN("lpad_oracle"));
    else
      print_sql_mode_qualified_name(str, query_type);
    print_args_parenthesized(str, query_type);
  }

Oh, by the way, it's better to add a comment here

    if (query_type & QT_FOR_FRM)
    {
      // 10.3 FRM downgrade compatibility
      str->append(STRING_WITH_LEN("lpad_oracle"));
    }


I find this way more clear to follow.
This way we can see which functions support
backward FRM compatibility, and which functions do not.
Item_func_regexp_replace_oracle will not support.


But there is now a similar code in here:

void Item_func_trim::print(String *str, enum_query_type query_type)
{
  LEX_CSTRING suffix= {STRING_WITH_LEN("_oracle")};

Now it handles both sql_mode=DEFAULT and sql_mode=ORACLE.

Should I split out Oracle-specific code into a new
overriding methods Item_func_trim_oracle::print()?
So it will pass "trim_oracle" as a parameter to a shared
function, while Item_func_trim::print() won't have any references
to "oracle" in the body.

diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
index 3770488f3f4..e0bafc75a36 100644
--- a/sql/sql_yacc.yy
+++ b/sql/sql_yacc.yy
@@ -11235,6 +11236,34 @@ function_call_generic:
              if (unlikely(!($$= Lex->make_item_func_call_generic(thd, &$1, &$3, &$5, $7))))
                MYSQL_YYABORT;
            }
+        | ident_cli '.' REPLACE '(' expr ',' expr ',' expr ')'
+          {
+            if (unlikely(!($$= Lex->make_item_func_replace(thd, $1, $3,
+                                                           $5, $7, $9))))
+              MYSQL_YYABORT;
+          }
+        | ident_cli '.' SUBSTRING '(' substring_operands ')'
+          {
+            if (unlikely(!($$= Lex->make_item_func_substr(thd, $1, $3, $5))))
+              MYSQL_YYABORT;
+          }
+        | ident_cli '.' TRIM '(' trim_operands ')'
+          {
+            if (unlikely(!($$= $5.make_item_func_trim(thd, $1, $3))))
+              MYSQL_YYABORT;
+          }
+          /*
+            We don't add a qualified syntax for TRIM_ORACLE here,
+            as this syntax is not absolutely required:
+              SELECT mariadb_schema.TRIM_ORACLE(..);
+            What absolutely required is only:
+              SELECT mariadb_schema.TRIM(..);
+            Adding a qualified syntax for TRIM_ORACLE would be tricky because
+            it is a non-reserved keyword. To avoid new shift/reduce conflicts
+            it would require grammar changes, like introducing a new rule
+            ident_step2_cli (which would include everything that ident_cli
+            includes but TRIM_ORACLE).


Did you add a qualified syntax for any other xxx_ORACLE functions?

Functions defined through "const Native_func_registry func_array[]"support qualifiers automatically. All this work:



select mariadb_schema.decode_oracle(1,1,'11');
select mariadb_schema.substr_oracle('abcde',2,3);
select mariadb_schema.replace_oracle('abc','a','b');

select oracle_schema.decode_oracle(1,1,'11');
select oracle_schema.substr_oracle('abcde',2,3);
select oracle_schema.replace_oracle('abc','a','b');


TRIM_ORACLE is different - it's a keyword defined in lex.h.

It does not support qualifiers automatically.
And I don't think it needs to.

+          */
          ;

fulltext_options:

diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc
index 7f1b362d91d..6b0a351f76e 100644
--- a/sql/sql_lex.cc
+++ b/sql/sql_lex.cc
@@ -2008,7 +2023,59 @@ bool Lex_input_stream::get_7bit_or_8bit_ident(THD *thd, uchar *last_char)
  }

+/*

+  Resolve special SQL functions that have a qualified syntax in sql_yacc.yy.
+  These functions are not listed in the native function registry
+  because of a special syntax, or a reserved keyword:
+
+    mariadb_schema.SUBSTRING('a' FROM 1 FOR 2)   -- Special syntax
+    mariadb_schema.TRIM(BOTH ' ' FROM 'a')       -- Special syntax
+    mariadb_schema.REPLACE('a','b','c')          -- Verb keyword
+*/
+
+int Lex_input_stream::find_keyword_qualified_special_func(Lex_ident_cli_st *str,
+                                                          uint length) const
+{
+  /*
+    There are many other special functions, see the following grammar rules:
+      function_call_keyword
+      function_call_nonkeyword
+    Here we resolve only those that have a qualified syntax to handle
+    different behavior in different @@sql_mode settings.
+
+    Other special functions do not work in qualified context:
+      SELECT mariadb_schema.year(now()); -- Function year is not defined
+      SELECT mariadb_schema.now();       -- Function now is not defined
+
+    We don't resolve TRIM_ORACLE here, because it does not have
+    a qualified syntax yet. Search for "trim_operands" in sql_yacc.yy
+    to find more comments.
+  */
+  static LEX_CSTRING funcs[]=
+  {
+    {STRING_WITH_LEN("SUBSTRING")},
+    {STRING_WITH_LEN("SUBSTR")},
+    {STRING_WITH_LEN("TRIM")},
+    {STRING_WITH_LEN("REPLACE")}
+  };
+
+  int tokval= find_keyword(str, length, true);
+  if (!tokval)
+    return 0;
+  for (size_t i= 0; i < array_elements(funcs); i++)
+  {
+    CHARSET_INFO *cs= system_charset_info;
+    if (!cs->coll->strnncollsp(cs,


may be strcasecmp? or with cs=latin1?
you know the repertoire is ascii, so you can cut corners here.

+                               (const uchar *) m_tok_start, length,
+                               (const uchar *) funcs[i].str, funcs[i].length))
+      return tokval;
+  }


Fixed to additionally check length equality to
avoid non-ASCII letters matching.

Note, strcasecmp() cannot be used here
because one side it not a 0-terminated string.

+  return 0;
+}
+
+
-int Lex_input_stream::scan_ident_sysvar(THD *thd, Lex_ident_cli_st *str)
+int Lex_input_stream::scan_ident_common(THD *thd, Lex_ident_cli_st *str,
+                                        Ident_mode mode)
  {
    uchar last_char;
    uint length;
@@ -8035,30 +8133,49 @@ bool LEX::add_grant_command(THD *thd, enum_sql_command sql_command_arg,
  }

+const Schema *

+LEX::find_func_schema_by_name_or_error(const Lex_ident_sys &schema,
+                                       const Lex_ident_sys &func)
+{
+  Schema *res= Schema::find_by_name(schema);
+  if (res)
+    return res;
+  Database_qualified_name qname(schema, func);
+  my_error(ER_FUNCTION_NOT_DEFINED, MYF(0), ErrConvDQName(&qname).ptr());


eh, not quite. I can define a function with the same name as a native
function in any db. E.g.

   create test.replace(a int) returns int

and it'll be callable as test.replace(1)
So ER_FUNCTION_NOT_DEFINED is wrong here.

But note that even though test.substr() is possible,
it should not allow substr() special syntax.


Good catch, I overlooked this.

From a glance, a workaround could be to use test.`substr`().
But it's better to provide backward compatibility.

I don't have a quick idea how to fix this properly.
If you have, please let me know.

<cut>

Follow ups

Re: d67c3f88883: MDEV-27744 InnoDB: Failing assertion: !cursor->index->is_committed() in row0ins.cc (from row_ins_sec_index_entry_by_modify) | Assertion `0' failed in row_upd_sec_index_entry (debug) | Corruption
From: Alexander Barkov, 2023-05-11

References

Re: d67c3f88883: MDEV-27744 InnoDB: Failing assertion: !cursor->index->is_committed() in row0ins.cc (from row_ins_sec_index_entry_by_modify) | Assertion `0' failed in row_upd_sec_index_entry (debug) | Corruption
From: Sergei Golubchik, 2023-04-26