← Back to team overview

maria-developers team mailing list archive

Re: 6768e3a2830: MDEV-22166 MIGRATE PARTITION: move out partition into a table

 

Hi Sergei!

The updated branch is cabd0530 bb-10.7-midenok-MDEV-22166

On Wed, Aug 25, 2021 at 9:33 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
>
> Hi, Aleksey!
>
> It's great that you made it atomic. Frankly, I didn't think of it, MDEV
> defines the feature as being exactly equivalent to the 3-command
> sequence, so non-atomic. But, of course, making it atomic makes perfect
> sense.
>
> See other comments below.
>
> On Aug 23, Aleksey Midenkov wrote:
> > commit 6768e3a2830
> > Author: Aleksey Midenkov <midenok@xxxxxxxxx>
> > Date:   Thu Jul 29 22:54:12 2021 +0300
> >
> > diff --git a/sql/lex.h b/sql/lex.h
> > index cbf9d9d51b2..55eefc39a7e 100644
> > --- a/sql/lex.h
> > +++ b/sql/lex.h
> > @@ -240,6 +240,7 @@ SYMBOL symbols[] = {
> >    { "EXPLAIN",         SYM(DESCRIBE)},
> >    { "EXTENDED",        SYM(EXTENDED_SYM)},
> >    { "EXTENT_SIZE",     SYM(EXTENT_SIZE_SYM)},
> > +  { "EXTRACT",         SYM(EXTRACT_SYM)},
>
> So, you did EXTRACT after all, not MIGRATE?

I did like the original description said but I agree that this should be
changed to something better. Moreover EXTRACT parsing is now
incorrect, see

git show 3b693cd3d9 mysql-test/main/parser.result

> ADD/EXTRACT aren't opposites in SQL.
>
> It would be rather consistent to have
>
>     ALTER ... ADD PARTITION (...) FROM table_name
>
> and
>
>     ALTER ... DROP PARTITION ... TO table_name
>
> except that DROP...TO looks kind of strange, and ADD...FROM
> does not convey the meaning that the table disappears.
>
> The second issue also applies to IMPORT/EXPORT.
>
> Another option, also kind of strange, but also consistent and
> makes it clear that the table/partition disappears:
>
>     ALTER ... MOVE TABLE table_name TO PARTITION (...)
>     ALTER ... MOVE PARTITION ... TO TABLE table_name
>
> or the same with CHANGE instead of MOVE to reuse an existing keyword.

There is no MOVE symbol and we should not add a new one because of the
same compatibility reason: it will not be possible to create table
"move" (as well as to work with existing one I suspect).

That's why we found MIGRATE as a good candidate And the reasons to
have some special keyword, not just "ADD PARTITION ... FROM ..." are:

1. it is not clear whether the source table stays or not (that was the
discussion in MDEV-22165);
2. for keyword like MIGRATE it is clear how it works in both
directions and no need to remember or guess the opposite keyword;
3. (and most important) is scalability: we may want to expand the
syntax for copying the table instead of moving it so instead of
MIGRATE we put COPY and that's it: same formula, clear how it works,
easy to interchange between MIGRATE and COPY.

I've already thought about MOVE as a better keyword but sadly enough
it is not reserved. So how about the proposed syntax with MIGRATE?

>
> I'm not saying anything of the above is perfect, and better variants are
> very much welcome. But pretty much anything is better than ADD/EXTRACT.
>
> >    { "FALSE",           SYM(FALSE_SYM)},
> >    { "FAST",            SYM(FAST_SYM)},
> >    { "FAULTS",  SYM(FAULTS_SYM)},
> > diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> > index dc2eb4425f0..a41ea7d3e4a 100644
> > --- a/sql/sql_yacc.yy
> > +++ b/sql/sql_yacc.yy
> > @@ -7605,6 +7605,25 @@ alter_commands:
> >              if (Lex->stmt_alter_table_exchange_partition($6))
> >                MYSQL_YYABORT;
> >            }
> > +        | EXTRACT_SYM PARTITION_SYM alt_part_name_item
> > +          AS TABLE_SYM table_ident have_partitioning
> > +          {
> > +            LEX *lex= Lex;
> > +            lex->first_select_lex()->db= $6->db;
> > +            if (lex->first_select_lex()->db.str == NULL &&
> > +                lex->copy_db_to(&lex->first_select_lex()->db))
> > +              MYSQL_YYABORT;
> > +            if (unlikely(check_table_name($6->table.str,$6->table.length,
> > +                                          FALSE)) ||
> > +                ($6->db.str && unlikely(check_db_name((LEX_STRING*) &$6->db))))
> > +              my_yyabort_error((ER_WRONG_TABLE_NAME, MYF(0), $6->table.str));
>
> good point, LEX::stmt_alter_table_exchange_partition() doesn't seem to do it
>
> > +            lex->name= $6->table;
> > +
> > +            lex->m_sql_cmd= new (thd->mem_root) Sql_cmd_alter_table();
> > +            if (unlikely(lex->m_sql_cmd == NULL))
> > +              MYSQL_YYABORT;
> > +            lex->alter_info.partition_flags|= ALTER_PARTITION_EXTRACT;
>
> why wouldn't you combine the common code from here and EXCHANGE PARTITION?
> less code duplication and as a bonus EXCHANGE will get proper
> table/db name checks.

Updated.

>
> > +          }
> >          ;
> >
> >  remove_partitioning:
> > diff --git a/sql/sql_alter.h b/sql/sql_alter.h
> > index a0f89c28d2a..6aa65d232ca 100644
> > --- a/sql/sql_alter.h
> > +++ b/sql/sql_alter.h
> > @@ -345,6 +348,8 @@ class Alter_table_ctx
> >    /** Indicates that we are altering temporary table */
> >    bool tmp_table;
> >
> > +  DDL_LOG_STATE *ddl_log_state;
>
> What do you need this for?

Cleaned up this.

>
> > +
> >  private:
> >    char new_filename[FN_REFLEN + 1];
> >    char new_alias_buff[NAME_LEN + 1];
> > diff --git a/sql/sql_table.cc b/sql/sql_table.cc
> > index 779d125af70..033d77a827d 100644
> > --- a/sql/sql_table.cc
> > +++ b/sql/sql_table.cc
> > @@ -10016,7 +10146,7 @@ bool mysql_alter_table(THD *thd, const LEX_CSTRING *new_db,
> >      }
> >
> >      // In-place execution of ALTER TABLE for partitioning.
> > -    DBUG_RETURN(fast_alter_partition_table(thd, table, alter_info,
> > +    DBUG_RETURN(fast_alter_partition_table(thd, table, alter_info, &alter_ctx,
> >                                             create_info, table_list,
> >                                             &alter_ctx.db,
> >                                             &alter_ctx.table_name));
>
> you no longer need to pass alter_ctx.db and alter_ctx.table_name,
> if you pass the whole alter_ctx down.

Cleaned up this.

>
> > diff --git a/include/my_dbug.h b/include/my_dbug.h
> > index e25bfcf28a7..cf969ce5750 100644
> > --- a/include/my_dbug.h
> > +++ b/include/my_dbug.h
> > @@ -104,6 +104,8 @@ extern int (*dbug_sanity)(void);
> >          (_db_keyword_(0,(keyword), 0) ? (a1) : (a2))
> >  #define DBUG_EVALUATE_IF(keyword,a1,a2) \
> >          (_db_keyword_(0,(keyword), 1) ? (a1) : (a2))
> > +#define DBUG_TRUE_IF(keyword) DBUG_EVALUATE_IF(keyword, 1, 0)
> > +#define DBUG_FALSE_IF(keyword) DBUG_EVALUATE_IF(keyword, 0, 1)
>
> hmm. DBUG_FALSE_IF is not used anywhere, DBUG_TRUE_IF is used once as
>

(1):
>   if (DBUG_TRUE_IF("error_extract_partition_00") ||
>       unlikely(error= file->ha_rename_table(from_name, to_name)))
>
> and this is typically written as
>

(2):
>   if (DBUG_EVALUATE_IF("error_extract_partition_00", 1,
>         error= file->ha_rename_table(from_name, to_name)))
>
> so I see no need for either DBUG_TRUE_IF or DBUG_FALSE_IF,
> please, remove both

Please, (1) looks much better! Considering that there are no complex
expressions for a1 in the code but only 1 or 0, we should use
DBUG_TRUE_IF/DBUG_FALSE_IF instead. I can change all entries of
DBUG_EVALUATE_IF for that. This is supposed to be an independent refactoring.

>
> >  #define DBUG_PRINT(keyword,arglist) \
> >          do if (_db_pargs_(__LINE__,keyword)) _db_doprnt_ arglist; while(0)
> >
> > diff --git a/sql/sql_partition.cc b/sql/sql_partition.cc
> > index 85d880cbbb4..4172a61812e 100644
> > --- a/sql/sql_partition.cc
> > +++ b/sql/sql_partition.cc
> > @@ -5432,7 +5432,7 @@ uint prep_alter_part_table(THD *thd, TABLE *table, Alter_info *alter_info,
> >        } while (++part_count < tab_part_info->num_parts);
> >        if (num_parts_found != num_parts_dropped)
> >        {
> > -        my_error(ER_DROP_PARTITION_NON_EXISTENT, MYF(0), "DROP");
> > +        my_error(ER_DROP_PARTITION_NON_EXISTENT, MYF(0), cmd);
>
> Not part of your patch, I know, but the error message here is
>
>    "Error in list of partitions to %-.64s"
>
> shouldn't it say "Partition does not exist" or something?

Why does DROP use such wording then? I don't think here should be different
error code, but the code name itself should be something like
ER_PARTITION_NOT_EXISTS.


>
> >          goto err;
> >        }
> >        if (table->file->is_fk_defined_on_table_or_index(MAX_KEY))
> > @@ -7268,6 +7420,48 @@ uint fast_alter_partition_table(THD *thd, TABLE *table,
> >      if (alter_partition_lock_handling(lpt))
> >        goto err;
> >    }
> > +  else if (alter_info->partition_flags & ALTER_PARTITION_EXTRACT)
> > +  {
> > +    if (mysql_write_frm(lpt, WFRM_WRITE_EXTRACTED) ||
> > +        write_log_drop_shadow_frm(lpt) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_1") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_1") ||
> > +        mysql_write_frm(lpt, WFRM_WRITE_SHADOW) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_2") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_2") ||
> > +        wait_while_table_is_used(thd, table, HA_EXTRA_NOT_USED) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_3") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_3") ||
> > +        write_log_extract_partition(lpt) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_4") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_4") ||
> > +        alter_close_table(lpt) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_5") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_5") ||
> > +        alter_partition_extract(lpt) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_7") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_7") ||
> > +        (frm_install= TRUE, FALSE) ||
> > +        mysql_write_frm(lpt, WFRM_INSTALL_SHADOW|WFRM_BACKUP_ORIGINAL) ||
> > +        log_partition_alter_to_ddl_log(lpt) ||
> > +        (frm_install= FALSE, FALSE) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_8") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_8") ||
> > +        (ddl_log_complete(lpt->part_info), false) ||
>
> as far as I understand this ^^^ should be done *after* binlogging.
> I suspect that if you crash here, blnlog won't get this ALTER TABLE.

Right, I fixed that by reordering calls and using XID.

>
> > +        ((!thd->lex->no_write_to_binlog) &&
> > +         (write_bin_log(thd, FALSE,
> > +                        thd->query(), thd->query_length()), FALSE)) ||
>
> if you crash here, what will roll forward on recovery and
> drop the backup? The log is already closed here, isn't it?
>
> I don't understand why you need a backup. Why not to install shadow
> after the binlog was written?

If we install shadow after binlog was written and before
ddl_log_complete() and it crashes after binlog was written and before
ddl_log_complete() the chain will be closed via XID and we are left
with (or without) original frm and without installed frm. Let's
imagine we do not close chain via XID, then it is replayed back and we
are left with master-slave inconsistency. There are more examples of
how it can crash and how we can try to recover, but the simplest and
most reliable implementation is via WFRM_BACKUP_ORIGINAL. If you want
to discuss more, let's do that in chat.

>
> > +        mysql_write_frm(lpt, WFRM_DROP_BACKUP) ||
> > +        ERROR_INJECT_CRASH("crash_extract_partition_9") ||
> > +        ERROR_INJECT_ERROR("fail_extract_partition_9"))
> > +    {
> > +      (void) ddl_log_revert(thd, lpt->part_info);
> > +      handle_alter_part_error(lpt, true, true, frm_install);
> > +      goto err;
> > +    }
> > +    if (alter_partition_lock_handling(lpt))
> > +      goto err;
> > +  }
> >    else if ((alter_info->partition_flags & ALTER_PARTITION_ADD) &&
> >             (part_info->part_type == RANGE_PARTITION ||
> >              part_info->part_type == LIST_PARTITION))
> > diff --git a/mysql-test/include/lcase_names.inc b/mysql-test/include/lcase_names.inc
> > new file mode 100644
> > index 00000000000..e69de29bb2d
>
> better put here some comment, like in other dummy *.inc files
> see e.g. platform.inc or protocol.inc

Done.

>
> > diff --git a/mysql-test/suite/atomic/alter_partition,aria.rdiff b/mysql-test/suite/atomic/alter_partition,aria.rdiff
> > new file mode 100644
> > index 00000000000..20c0510c5ed
> > --- /dev/null
> > +++ b/mysql-test/suite/atomic/alter_partition,aria.rdiff
> > @@ -0,0 +1,219 @@
> > +--- alter_partition.result
> > ++++ alter_partition,aria.reject
> > +@@ -1,4 +1,4 @@
> > +-set @@default_storage_engine=MyISAM;
> > ++set @@default_storage_engine=Aria;
> > + # Crash recovery
> > + create or replace procedure prepare_table()
> > + begin
> > +@@ -13,12 +13,12 @@
> > + end $
> > + # QUERY: ALTER TABLE t1 EXTRACT PARTITION p1 AS TABLE tp1
> > + # CRASH POINT: ddl_log_create_before_create_frm
> > +-t1#P#p0.MYD
> > +-t1#P#p0.MYI
> > +-t1#P#p1.MYD
>
> may be better `replace_result MAI MYI MAD MYD` ?
> if that's the only difference. To avoid big rdiff files.

Done.

>
> > +-t1#P#p1.MYI
> > +-t1#P#pn.MYD
> > +-t1#P#pn.MYI
> > ++t1#P#p0.MAD
> > ++t1#P#p0.MAI
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and security@xxxxxxxxxxx



--
All the best,

Aleksey Midenkov
@midenok


Follow ups

References