← Back to team overview

maria-developers team mailing list archive

Re: Lag Free ALTER Table for Slave

 

GitHub branch is still in development phase , I will update it as soon as I
implement more features , although r_ttt test in rpl suite can be runned.

On Wed, Jan 22, 2020, 5:59 PM Sachin Setiya <sachin.setiya@xxxxxxxxxxx>
wrote:

> Hi Everyone!
>
> This mail will describe usage and implementation of Lag Free Alter. And
> this
> will also answer the question raised by Kristian and Simon J Mudd(in 2016)
>
> Desc:- This will split Alter into 2 different commits. START ALTER and
> COMMIT
> /ROLLBACK ALTER , Start Alter will be written in binlog as soon as we get
> the
> locks for the table, alter will proceeds as usual and at the time of
> writing
> binlog if alter is successful we will write COMMIT Alter other wise
> ROLLBACK
> Alter.
>
> Usage:- Using this feature is quite simple.
>   1. On master you have to turn on `BINLOG_SPLIT_ALTER` dynamic variable.
>   2. Slave must be using parallel replication.
>
> Advance Usage:-
> So alter is divided like this.
> 1. START identifier Actual_alter_stmt
> 2. COMMIT/ROLLBACK identifier Actual_alter_stmt. OR
> 2. COMMIT/ROLLBACK identifier ALTER
>
> identifier is thread_id.
>
> Questions by Simon Mudd.
> >>* this behaviour should be configurable?
> Yes.
> >>   - new global variable on the master to allow injection of this
> changed event stream?
> Right , `BINLOG_SPLIT_ALTER`
> >>   - a new session variable set in the session where the command is
> triggered ?
> Right , `BINLOG_SPLIT_ALTER`
> >>   - on slave a setting to indicate how many INPLACE  ALTER TABLEs can
> run at once?
> No setting so far , but I am thinking of maximum no of CONCURRENT ALTER =
> slave_parallel_threads
> >>* how does a DBA monitor what’s going on?
> >> - the progress and number of active DDL statements
> So as we there is 2 part, So progress will go like this
> 1. Executing of start alter (this will take most of time)
> 2. Waiting for commit/rollback Signal
> 3. Commit/ Rollback Alter.
> Number of active ALTER , these will create new threads so DBA can know
> using this
> or I am thinking of adding variable in SHOW SLAVE INFO. which will
> show active DDL.
> >> - please consider adding counters/metrics for:
> >>   * number of “asynchronous DDLs”  in progress / completed successfully
> / failed or rolled back
> Okay, We can have counter for these metrics. If i get time to implement
> this.
> >>   * sizes of the DDL changes made so far
> Not sure if we need this.
> >>   * number of threads busy in this state (maybe can be implied from
> SHOW PROCESSLIST or equivalent but nicer to have an easy to query metric)
> Show processlist will show the busy threads.
> Sample Output when concurrent alter is running on slave.
> slave_parallel_threads= 10
> Every 1.0s: mysql -uroot -S
> /home/sachin/alter/build/mysql-test/var/tmp/mysqld.2.sock -e show
> processlist
> sachin-sp52: Wed Jan 22 16:56:00 2020
>
> Id      User    Host    db      Command Time    State   Info    Progress
> 1       system user             NULL    Daemon  NULL    InnoDB purge
> coordinator        NULL    0.000
> 3       system user             NULL    Daemon  NULL    InnoDB purge
> worker     NULL    0.000
> 2       system user             NULL    Daemon  NULL    InnoDB purge
> worker     NULL    0.000
> 4       system user             NULL    Daemon  NULL    InnoDB purge
> worker     NULL    0.000
> 5       system user             NULL    Daemon  NULL    InnoDB
> shutdown handler NULL    0.000
> 10      root    localhost:44478 test    Sleep   6               NULL
> 0.000
> 11      root    localhost:44480 test    Sleep   7               NULL
> 0.000
> 15      root    localhost:44492 test    Sleep   6               NULL
> 0.000
> 16      root    localhost:44494 test    Sleep   6               NULL
> 0.000
> 17      system user             NULL    Slave_IO        6
> Waiting for master to send event        NULL    0.000
> 19      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 22      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 20      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 23      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 24      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 25      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 26      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 21      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 27      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 28      system user             NULL    Slave_worker    6
> Waiting for work from SQL thread        NULL    0.000
> 18      system user             NULL    Slave_SQL       6       Slave
> has read all relay log; waiting for the slave I/O thread to update it
>    NULL    0.000
> 29      system user             test    Slave_worker    6       NULL
>  /*!100001 START 19 alter table t3 add column c int, force,
> algorithm=inplace */ 0.000
> 30      system user             test    Slave_worker    6       NULL
>  /*!100001 START 17 alter table t1 add column c int, force,
> algorithm=inplace */ 0.000
> 31      system user             test    Slave_worker    6       NULL
>  /*!100001 START 18 alter table t2 add column c int, force,
> algorithm=inplace */ 0.000
> 32      system user             test    Slave_worker    6       NULL
>  /*!100001 START 20 alter table t4 add column c int, force,
> algorithm=inplace */ 0.000
> 33      system user             test    Slave_worker    6       NULL
>  /*!100001 START 21 alter table t5 add column c int, force,
> algorithm=inplace */ 0.000
> 34      root    localhost       NULL    Query   0       Init    show
> processlist        0.000
>
> >>* what happens if the commit/rollback ddl event never arrives? (the
> statement on the master could be skipped for one of several reasons)
> So here are 2 options
> If slave is still running DBA can manually run
> COMMIT/ROLLBACK identifier alter_stmt/ALTER , where identifier is same as
> START ALTER identifier.
> so this will either commit or rollback alter.
> If slave is not running.
> Since slave is not running all context is lost , so this will take as much
> as
> time a normal alter.
> >> - I assume that users on the slave see the old structure all the time
> until it completes
> Right, we work on copy of alter for myisam and for other case innodb will
> take
> care of it.
> >> - would this have a storage or performance penalty on the slave if the
> commit/rollback DDL event never arrives?
> Not performance but yes storage and ram penalty.
> >> - can a “commit” / “rollback” be manually triggered by the DBA in such
> circumstances?
> yes.
> >>* what happens if the server crashes while this process (or these
> processes) are ongoing when the server starts up again?
> So if we crash in START ALTER no issue , we can run it again , since it is
> transactional(we are working on copy so ..)
> If we crash after start alter and we receive COMMIT alter then it will be
> treated as normal alter.
> ROLLBACK ALTER will be instant.
> If we crash in COMMIT ALTER this will be same as crash in normal ALTER
> table ,
> so DBA has to deal with it.
>
> Architecture:-
> Lets look at sample alter perf data for innodb and myisam
> Myisam
> +   96.72%     0.00%  mysqld   mysqld               [.] mysql_parse
> +   96.72%     0.00%  mysqld   mysqld               [.]
> mysql_execute_command
> +   96.72%     0.00%  mysqld   mysqld               [.]
> Sql_cmd_alter_table::execute
> -   96.72%     0.00%  mysqld   mysqld               [.] mysql_alter_table
>    - mysql_alter_table
>       - 96.63% copy_data_between_tables
>          + 79.64% handler::ha_write_row
>          + 6.24% TABLE::update_default_fields
>          + 5.71% READ_RECORD::read_record
>          + 1.87% do_copy_null
>            1.05% Field::do_field_int
>          + 0.95% _mcount
> InnoDB
> +   41.27%     0.00%  mysqld   mysqld               [.] mysql_parse
> +   41.27%     0.00%  mysqld   mysqld               [.]
> mysql_execute_command
> +   41.27%     0.00%  mysqld   mysqld               [.]
> Sql_cmd_alter_table::execute
> -   41.27%     0.00%  mysqld   mysqld               [.] mysql_alter_table
>    - mysql_alter_table
>       - 41.27% mysql_inplace_alter_table
>          - 41.15% handler::ha_inplace_alter_table
>               ha_innobase::inplace_alter_table
>             - row_merge_build_indexes
>                - 41.00% row_merge_read_clustered_index
>                   + 22.88% row_merge_insert_index_tuples
>                   + 5.89% row_build_w_add_vcol
>                   + 5.20% BtrBulk::finish
>                   + 3.20% row_merge_buf_add
>                   + 0.87% page_cur_move_to_next
>                     0.62% rec_get_offsets_func
> So as we can see most of the work is done by copy_data_between_tables/
> mysql_inplace_alter_table.
> Master Side:-
> There is not much change into master side.
> So after locking table and before executing these functions we write
> START _id_ alter_stmt into binlog.
> And at the time of write_bin_log we write
> COMMIT _id_ alter_stmt into binlog.
> So alter statement will be divided into 2 , hence 2 gtid. START_ALTER will
> have special flag FL_START_ALTER_E1(4), this will be used on slave side to
> create new worker thread for start alter processing.
>
> Slave Side:-
> This require parallel replication on slave side.
> So in do_event when we get a gtid_event with FL_START_ALTER_E1,
> ::choose_thread
> will call rpl_parallel_add_extra_worker, which will create a new worker
> thread
> and this will do start alter processing. So the gtid_log_event and next
> Query_log_event will be scheduled to this new thread. And this thread will
> be
> exited as soon as alter is finished.
> On slave side START ALTER will binlog after successfully getting MDL lock
> and
> thd lock. Untill this point we are executed as DDL(so new GCO), but after
> getting locks we will call finish_event_group. So that new events can
> proceed in
> parallel.
> Then it will continue to execute code in mysql_alter_table until
> we reach a non transactional part(like renaming table/dropping table in
> myisam)
> So before executing NON-Transactional part it will wait signal from other
> worker
> thread to either abort or proceed forward. We will add a entry into mi->
> start_alter_list, with thread_id(of master) as a key.
>
> COMMIT/ROLLBACK Alter is treated as a normal query_log_event so it will be
> assigned normal worker. This command will take SQLCOM_COMMIT_PREVIOUS path
> in
> mysql_execute_command. We will simple search for thread_id into
> start_alter_list
> and change status from ::WAITING to ::COMMIT_ALTER and signal the wait
> condition
> if we dont find the thread_id we will wait on wait condition, So
> simple it is just
> consumer producer between start alter and commit/rollback.
>
> Questions by Kristian:-
> >Can you explain what you mean by "true LOCK=NONE"? It is not clear from
> your
> >description. I think you mean something that will allow to run an ALTER
> >TABLE on the slave in parallel with the same statement on the master?
> Yes
> >There will be a number of things to consider with this. Previously, all
> >transactions in the binlog have been atomic; this introduces related
> events
> >that can be arbitrarily separated in the binlog.
> >For example, suppose the slave stops in the middle between BEGIN_DDL_EVENT
> >and COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. What is then the slave's binlog
> >position / GTID position?
> Now we are using 2 gtid for ALTER , So it wont be issue.
> >Hopefully, the exiting pool of slave worker threads (currently used for
> >parallel replication) can be used for this as well? Doesn't seem right to
> >introduce a new kind of thread pool...
> We are using same(global) thread pool but I am creating new threads for
> START
> ALTER. Otherwise we will have a deadlock , Suppose if we have 5 concurrent
> START
> ALTER and just 5 slave-worker-threads , all worker will be waiting for
> COMMIT/ROLLBACK, but we cant execute commit rollback because we dont have
> any
> free worker. There can be more case when we have concurrent DML too ,
> so creating
> a new thread was the safest option.
> >Won't you need some mechanism to prevent following events to access the
> >table before ALTER TABLE in a worker thread can acquire the metadata lock?
> Right, So Untill Locking START ALTER is executed as DDL so no following
> event
> will be executed before locking.
> And one more thing regarding locks , So there will we only one thread will
> be
> doing whole work of alter, (we can just assume that there is some
> sleep on slave)
> so execution on master and slave will be equivalent.
> >There are some nasty issues to be aware of with potential deadlocks
> related
> >to FLUSH TABLES WITH GLOBAL READ LOCK and such, when multiple threads are
> >trying to coordinate waiting and locking; there were already some quite
> hard
> >issues with this for parallel replication.
> I have to test for it.
>
> Code branch bb-10.5-olter
> Jira (Mdev-11675(https://jira.mariadb.org/browse/MDEV-11675))
> Regards
> Sachin
>
>
> --
> Regards
> Sachin Setiya
> Software Engineer at  MariaDB
>

References