maria-developers team mailing list archive

Thread
Date

Parallel replication MDEV-4506

To: maria-developers@xxxxxxxxxxxxxxxxxxx
From: Michael Widenius <monty@xxxxxxxxxxxx>
Date: Mon, 14 Oct 2013 20:54:48 +0300
Cc: knielsen@xxxxxxxxxxxxxxx
Reply-to: monty@xxxxxxxxxxxx

Hi!

Just a quick status update of parallel replication in MariaDB 10.0

I yesterday pushed to the 10.0-knielsen tree changes that makes many
of the replication variables thread safe. I will today push a patch
that should fix the rest of the variables.

Here is the changelog entry for this:

Fixes for parallel slave:
- Made slaves temporary table multi-thread slave safe by adding mutex around save_temporary_table usage.
- rli->save_temporary_tables is the active list of all used temporary tables
- This is copied to THD->temporary_tables when temporary tables are opened and updated when temporary tables are closed
- Added THD->lock_temporary_tables() and THD->unlock_temporary_tables() to simplify this.
- Relay_log_info->sql_thd renamed to Relay_log_info->sql_driver_thd to avoid wrong usage for merged code.
- Added is_part_of_group() to mark functions that are part of the next function. This replaces setting IN_STMT when events are executed.
- Added is_begin(), is_commit() and is_rollback() functions to Query_log_event to simplify code.
- If slave_skip_counter is set run things in single threaded mode. This simplifies code for skipping events.
- Updating state of relay log (IN_STMT and IN_TRANSACTION) is moved to one single function: update_state_of_relay_log()
We can't use OPTION_BEGIN to check for the state anymore as the sql_driver and sql execution threads may be different.
Clear IN_STMT and IN_TRANSACTION in init_relay_log_pos() and Relay_log_info::cleanup_context() to ensure the flags doesn't survive slave restarts
is_in_group() is now independent of state of executed transaction.
- Reset thd->transaction.all.modified_non_trans_table() if we did set it for single table row events.
This was mainly for keeping the flag as documented.
- Changed slave_open_temp_tables to uint32 to be able to use atomic operators on it.
- Relay_log_info::sleep_lock -> rpl_group_info::sleep_lock
- Relay_log_info::sleep_cond -> rpl_group_info::sleep_cond
- Changed some functions to take rpl_group_info instead of Relay_log_info to make them multi-slave safe and to simplify usage
- do_shall_skip()
- continue_group()
- sql_slave_killed()
- next_event()
- Simplifed arguments to io_slave_killed(), check_io_slave_killed() and sql_slave_killed(); No reason to supply THD as this is part of the given structure.
- set_thd_in_use_temporary_tables() removed as in_use is set on usage
- Added information to thd_proc_info() which thread is waiting for slave mutex to exit.
- In open_table() reuse code from find_temporary_table()

Other things:
- More DBUG statements
- Fixed the rpl_incident.test can be run with --debug
- More comments
- Disabled not used function rpl_connect_master()

The TODO for parallel replication is documented at top of rpl_parallel.cc.

Here is the comment:
-------

- Error handling. If we fail in one of multiple parallel executions, we
need to make a best effort to complete prior transactions and roll back
following transactions, so slave binlog position will be correct.
And all the retry logic for temporary errors like deadlock.

- Stopping the slave needs to handle stopping all parallel executions. And
the logic in sql_slave_killed() that waits for current event group to
complete needs to be extended appropriately...

- Audit the use of Relay_log_info::data_lock. Make sure it is held
correctly in all needed places also when using parallel replication.

- We need some user-configurable limit on how far ahead the SQL thread will
fetch and queue events for parallel execution (otherwise if slave gets
behind we will fill up memory with pending malloc()'ed events).

- Fix update of relay-log.info and master.info. In non-GTID replication,
they must be serialised to preserve correctness. In GTID replication, we
should not update them at all except at slave thread stop.

- All the waits (eg. in struct wait_for_commit and in
rpl_parallel_thread_pool::get_thread()) need to be killable. And on kill,
everything needs to be correctly rolled back and stopped in all threads,
to ensure a consistent slave replication state.

- Handle the case of a partial event group. This occurs when the master
crashes in the middle of writing the event group to the binlog. The
slave rolls back the transaction; parallel execution needs to be able
to deal with this wrt. commit_orderer and such.

- We should notice if the master doesn't support GTID, and then run in
single threaded mode against that master. This is needed to be able to
support multi-master-replication with old and new masters.

- Retry of failed transactions is not yet implemented for the parallel case.
----------

What this means is:

In theory things should work, as long as there is no problems in the
binary or relay log and we don't fill up memory with too many events.

We will merge this branch to 10.0-base and then to 10.0 withing the
next few days and start testing it. We also plan to release a beta
of 10.0 with this code ASAP as 10.0 is now feature complete.

During the beta phase we will fix the above outstanding issues.
Kristian will this week start working on the error handling. After
that we have to look at the memory consumption (ie, not give the sql
execution threads more work than they can handle).

I will shortly update the following article with information about how
to use parallel replication.
https://mariadb.com/kb/en/parallel-replication/

The task itself is documented at:
https://mariadb.atlassian.net/browse/MDEV-4506

Regards,
Monty