← Back to team overview

maria-developers team mailing list archive

Re: [Commits] 2f4a0c5be2c: Fix accumulation of old rows in mysql.gtid_slave_pos

 

andrei.elkin@xxxxxxxxxx writes:

> Why won't we defer the current eager/optimistic old sub-id records
> discard in rpl_slave_state::record_gtid()

>   mysql_mutex_lock(&LOCK_slave_state);

>   if ((elist= elem->grab_list()) != NULL)
>   {
>     /* Delete any old stuff, but keep around the most recent one. */

>   mysql_mutex_unlock(&LOCK_slave_state);

> till we know the transaction will be committed. E.g:
>
>  Xid_log_event::do_apply_event
>    rpl_slave_state::update_state_hash
>      rpl_slave_state::update

We need to have those records removed from the state immediately, otherwise
the next record_gtid() (from another parallel replication worker) will try
to grab and delete the same records, leading to extra redundant delete calls
and loss of efficiency. We cannot hold the lock on LOCK_slave_state until we
have committed, of course, as that would severely reduce parallelism.

And we cannot defer the deletion of records until rpl_slave_state::update(),
as that would require starting another transaction, which is again
inefficient.

However, I do like the idea of defering the handling of deletions from
record_gtid() to rpl_slave_state::update(). I have never been entirely happy
about the overhead of one row deletion in every replicated transaction.

Maybe the deletion of old rows (and update of gtid state) could be done in
the replication background thread instead (handle_slave_background())?
In rpl_slave_state::update(), we will increment a counter. When this counter
reaches 100 (or some configurable value), we will signal the background
thread to grab the list and delete all of it in one go.

This way we batch up more row deletions, and avoid the delete overhead in
time-critical record_gtid(), which is probably more efficient. We do
introduce some extra transactions, but that can be minimised by adjusting
the batch size (eg 100 -> just 1% more transactions).

I can try to create a patch to do this.

Maybe moving deletions from record_gtid() into the background thread is too
big a change for 10.1/10.2 ? But we could use the current patch for 10.1,
and do the more complicated patch only in 10.3 (or 10.4?).

What do you think?

 - Kristian.


References