← Back to team overview

maria-developers team mailing list archive

Re: e5fc78f84e3: MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker

 

Hello Sergei,

Good Morning.

Thank for the review comments. Please find my replies inline.

On 21/03/21 8:21 pm, Sergei Golubchik wrote:
Hi, Sujatha!

Could you split this patch, please?


Sure.


1. Just add replication_applier_status_by_worker table. With
    CHANNEL_NAME column and (perhaps, not sure if it applies) with
    WORKER_ID column. Without extra columns and backup.

Reason for not including CHANNEL_NAME and WORKER_ID is, multi-source based parallel replication is bit different in MySQL.

Please find the following information.

==> https://dev.mysql.com/doc/refman/5.7/en/replication-channels.html

        A multi-source replica can also be set up as a multi-threaded replica, by setting the slave_parallel_workers system variable to a value greater than 0. When you do this on a multi-source replica, each channel on the replica         has the specified number of applier threads, plus a coordinator thread to manage them. You cannot configure the number of applier threads for individual channels.

In case of MySQL, worker threads are dedicated to particular channel.

In MariaDB "The pool of replication worker threads is shared among all multi-source master connections, and among all replication domains that can replicate in parallel using out-of-order".

Because of this I didn't include CHANNEL_NAME. In MariaDB Slave_worker thread's don't have WORKER_ID they only have 'thread_id'.


2. Add extra columns.


Ack.


3. Add backup.

With these three commits you'll have exactly the same diff as now, just
split (and with CHANNEL_NAME column).

But really, I wonder whether this backup functionality is needed at all?
In MySQL there is persistent information about these workers, it doesn't
go away when they're stopped, it's stored persistently in a table,
indexed by worker_id. If we don't have anything persistent like that and
all workers completely disappear into oblivion, then, may be,
replication_applier_status_by_worker should not show anything when they
aren't running?


MySQL doesn't persist the worker information in a table. When workers are stopped

due to an error or STOP SLAVE,  worker information is copied(backup) and it is retained

till next START SLAVE. Please find following snippets.

File Name:  sql/rpl_rli_pdb.cc
=======
static void slave_stop_workers(Relay_log_info *rli, bool *mts_inited) {
....

    /*
      Make copies for reporting through the performance schema tables.
      This is preserved until the next START SLAVE.
    */
    Slave_worker *worker_copy = new Slave_worker(
        nullptr,
#ifdef HAVE_PSI_INTERFACE
        &key_relay_log_info_run_lock, &key_relay_log_info_data_lock,
        &key_relay_log_info_sleep_lock, &key_relay_log_info_thd_lock,
        &key_relay_log_info_data_cond, &key_relay_log_info_start_cond,
        &key_relay_log_info_stop_cond, &key_relay_log_info_sleep_cond,
#endif
        w->id, rli->get_channel());
    worker_copy->copy_values_for_PFS(w->id, w->running_status, w->info_thd,
                                     w->last_error(),
w->get_gtid_monitoring_info());
    rli->workers_copy_pfs.push_back(worker_copy);
  }


/*
  This function is used to make a copy of the worker object before we
  destroy it while STOP SLAVE. This new object is then used to report the
  worker status until next START SLAVE following which the new worker objetcs
  will be used.
*/
void Slave_worker::copy_values_for_PFS(ulong worker_id,
                                       en_running_state thd_running_status,
                                       THD *worker_thd, const Error &last_error,                                        Gtid_monitoring_info *monitoring_info) {
  id = worker_id;
  running_status = thd_running_status;
  info_thd = worker_thd;
  m_last_error = last_error;
  monitoring_info->copy_info_to(get_gtid_monitoring_info());
}


Please provide your suggestion.


Thank you

S.Sujatha


If you agree, then you'll only need two commits.

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx

On Mar 21, Sujatha wrote:
revision-id: e5fc78f84e3 (mariadb-10.5.2-303-ge5fc78f84e3)
parent(s): 8b8969929d7
author: Sujatha <sujatha.sivakumar@xxxxxxxxxxx>
committer: Sujatha <sujatha.sivakumar@xxxxxxxxxxx>
timestamp: 2020-11-27 12:59:42 +0530
message:

MDEV-20220: Merge 5.7 P_S replication table 'replication_applier_status_by_worker

Fix:
===
Iterate through rpl_parallel_thread_pool and display slave worker thread
specific information as part of 'replication_applier_status_by_worker' table.

---------------------------------------------------------------------------------
|Column Name:           |        Description:                                   |
|-------------------------------------------------------------------------------|
|                       |                                                       |
|THREAD_ID              | Thread_Id as displayed in 'performance_schema.threads'|
|                       | table for thread with name                            |
|                       | 'thread/sql/rpl_parallel_thread'                      |
|                       |                                                       |
|                       | THREAD_ID will be NULL when worker threads are stopped|
|                       | due to error/force stop                               |
|                       |                                                       |
|SERVICE_STATE          | Thread is running or not                              |
|                       |                                                       |
|LAST_SEEN_TRANSACTION  | Last GTID executed by worker                          |
|                       |                                                       |
|LAST_ERROR_NUMBER      | Last Error that occurred on a particular worker       |
|                       |                                                       |
|LAST_ERROR_MESSAGE     | Last error specific message                           |
|                       |                                                       |
|LAST_ERROR_TIMESTAMP   | Time stamp of last error                              |
|                       |                                                       |
|WORKER_IDLE_TIME       | Total idle time in seconds that the worker thread has |
|                       | spent waiting for work from SQL thread                |
|                       |                                                       |
|LAST_TRANS_RETRY_COUNT | Total number of retries attempted by last transaction |
---------------------------------------------------------------------------------


In case STOP SLAVE is executed worker threads will be gone, hence worker
threads will be unavailable. Querying the table at this stage will give empty
rows. To address this case when worker threads are about to stop, due to an
error or forced stop, create a backup pool and preserve the data which is
relevant to populate performance schema table. Clear the backup pool upon
slave start.


Follow ups

References