maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #11454
Re: [Commits] 2f4a0c5be2c: Fix accumulation of old rows in mysql.gtid_slave_pos
Kristian, salute.
> Hi Andrei!
>
> Can you review this patch?
Thanks for taking care of this critical and complicated issue!
I studied the patch to have understand its idea of basically
implementing a roll-back to the slave gtid state as the slave
transaction participant resource.
This would be really a way to go unless you will dismiss the following
tip.
Why won't we defer the current eager/optimistic old sub-id records
discard in rpl_slave_state::record_gtid()
mysql_mutex_lock(&LOCK_slave_state);
if ((elem= get_element(gtid->domain_id)) == NULL)
{
mysql_mutex_unlock(&LOCK_slave_state);
my_error(ER_OUT_OF_RESOURCES, MYF(0));
err= 1;
goto end;
}
if ((elist= elem->grab_list()) != NULL)
{
/* Delete any old stuff, but keep around the most recent one. */
uint64 best_sub_id= elist->sub_id;
list_element **best_ptr_ptr= &elist;
cur= elist;
while ((next= cur->next))
{
if (next->sub_id > best_sub_id)
{
best_sub_id= next->sub_id;
best_ptr_ptr= &cur->next;
}
cur= next;
}
/*
Delete the highest sub_id element from the old list, and put it back as
the single-element new list.
*/
cur= *best_ptr_ptr;
*best_ptr_ptr= cur->next;
cur->next= NULL;
elem->list= cur;
}
mysql_mutex_unlock(&LOCK_slave_state);
till we know the transaction will be committed. E.g:
Xid_log_event::do_apply_event
rpl_slave_state::update_state_hash
rpl_slave_state::update
the stack bottom looks appropriate place to me, but I may miss
some background or negative use cases.
Doing it this way makes the slave gtid state to be updated right after
gtid_slave_pos insert+delete records are committed, while currently it's
in the opposite order. That is neither one provides atomicity of
updating the two objects. Should not really matter, right?
Cheers,
Andrei
>
> This fixes a rather serious bug in parallel replication that was reported on
> maria-discuss@ here: https://lists.launchpad.net/maria-discuss/msg05253.html
>
> The user ended up with millions of rows in mysql.gtid_slave_pos, which is
> quite bad.
>
> I think this should go in 10.1. The bug is probably also present in 10.0,
> but it will be much harder to trigger since 10.0 does not have optimistic
> parallel replication. So leaving it unfixed in 10.0 seems ok.
>
> I plan to manually merge it to 10.2 and 10.3 as well, since the merge to
> 10.3 is not entirely trivial, due to interaction with
> @@gtid_pos_auto_engines.
>
> Github branch here:
>
> https://github.com/knielsen/server/commits/gtid_table_garbage_rows
>
> - Kristian.
>
> Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:
>
>> revision-id: 2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 (mariadb-10.1.35-71-g2f4a0c5be2c)
>> parent(s): 1fc5a6f30c3a9c047dcf9a36b00026d98f286f6b
>> author: Kristian Nielsen
>> committer: Kristian Nielsen
>> timestamp: 2018-10-07 18:59:52 +0200
>> message:
>>
>> Fix accumulation of old rows in mysql.gtid_slave_pos
>>
>> This would happen especially in optimistic parallel replication, where there
>> is a good chance that a transaction will be rolled back (due to conflicts)
>> after it has executed record_gtid(). If the transaction did any deletions of
>> old rows as part of record_gtid(), those deletions will be undone as well.
>> And the code did not properly ensure that the deletions would be re-tried.
>>
>> This patch makes record_gtid() remember the list of deletions done as part
>> of a transaction. Then in rpl_slave_state::update() when the changes have
>> been committed, we discard the list. However, in case of error and rollback,
>> in cleanup_context() we will instead put the list back into
>> rpl_global_gtid_slave_state so that the deletions will be re-tried later.
>>
>> Probably fixes part of the cause of MDEV-12147 as well.
>>
>> Signed-off-by: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
>>
>> ---
>> .../suite/rpl/r/rpl_parallel_optimistic.result | 6 ++
>> .../suite/rpl/t/rpl_parallel_optimistic.test | 17 +++++
>> sql/log_event.cc | 6 +-
>> sql/rpl_gtid.cc | 64 ++++++++++++------
>> sql/rpl_gtid.h | 2 +-
>> sql/rpl_rli.cc | 79 ++++++++++++++++++++++
>> sql/rpl_rli.h | 11 +++
>> 7 files changed, 161 insertions(+), 24 deletions(-)
>>
>> diff --git a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result
>> index 3cd4f8231bf..99bd8562ffe 100644
>> --- a/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result
>> +++ b/mysql-test/suite/rpl/r/rpl_parallel_optimistic.result
>> @@ -571,4 +571,10 @@ SET GLOBAL slave_parallel_mode=@old_parallel_mode;
>> SET GLOBAL slave_parallel_threads=@old_parallel_threads;
>> include/start_slave.inc
>> DROP TABLE t1, t2, t3;
>> +include/save_master_gtid.inc
>> +include/sync_with_master_gtid.inc
>> +Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos
>> +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id;
>> +count(*)
>> +2
>> include/rpl_end.inc
>> diff --git a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test
>> index 9f6669279db..3867a3fdf3a 100644
>> --- a/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test
>> +++ b/mysql-test/suite/rpl/t/rpl_parallel_optimistic.test
>> @@ -549,5 +549,22 @@ SET GLOBAL slave_parallel_threads=@old_parallel_threads;
>>
>> --connection server_1
>> DROP TABLE t1, t2, t3;
>> +--source include/save_master_gtid.inc
>> +
>> +--connection server_2
>> +--source include/sync_with_master_gtid.inc
>> +# Check for left-over rows in table mysql.gtid_slave_pos (MDEV-12147).
>> +#
>> +# There was a bug when a transaction got a conflict and was rolled back. It
>> +# might have also handled deletion of some old rows, and these deletions would
>> +# then also be rolled back. And since the deletes were never re-tried, old no
>> +# longer needed rows would accumulate in the table without limit.
>> +#
>> +# The earlier part of this test file have plenty of transactions being rolled
>> +# back. But the last DROP TABLE statement runs on its own and should never
>> +# conflict, thus at this point the mysql.gtid_slave_pos table should be clean.
>> +--echo Check that no more than the expected last two GTIDs are in mysql.gtid_slave_pos
>> +select count(*) from mysql.gtid_slave_pos order by domain_id, sub_id;
>>
>> +--connection server_1
>> --source include/rpl_end.inc
>> diff --git a/sql/log_event.cc b/sql/log_event.cc
>> index e1912ad4620..e07b7002398 100644
>> --- a/sql/log_event.cc
>> +++ b/sql/log_event.cc
>> @@ -4429,7 +4429,7 @@ int Query_log_event::do_apply_event(rpl_group_info *rgi,
>>
>> gtid= rgi->current_gtid;
>> if (rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id,
>> - true, false))
>> + rgi, false))
>> {
>> int errcode= thd->get_stmt_da()->sql_errno();
>> if (!is_parallel_retry_error(rgi, errcode))
>> @@ -7132,7 +7132,7 @@ Gtid_list_log_event::do_apply_event(rpl_group_info *rgi)
>> {
>> if ((ret= rpl_global_gtid_slave_state->record_gtid(thd, &list[i],
>> sub_id_list[i],
>> - false, false)))
>> + NULL, false)))
>> return ret;
>> rpl_global_gtid_slave_state->update_state_hash(sub_id_list[i], &list[i],
>> NULL);
>> @@ -7639,7 +7639,7 @@ int Xid_log_event::do_apply_event(rpl_group_info *rgi)
>> rgi->gtid_pending= false;
>>
>> gtid= rgi->current_gtid;
>> - err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, true,
>> + err= rpl_global_gtid_slave_state->record_gtid(thd, >id, sub_id, rgi,
>> false);
>> if (err)
>> {
>> diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc
>> index 7b1acf17ef5..94944b5b3e5 100644
>> --- a/sql/rpl_gtid.cc
>> +++ b/sql/rpl_gtid.cc
>> @@ -77,7 +77,7 @@ rpl_slave_state::record_and_update_gtid(THD *thd, rpl_group_info *rgi)
>> rgi->gtid_pending= false;
>> if (rgi->gtid_ignore_duplicate_state!=rpl_group_info::GTID_DUPLICATE_IGNORE)
>> {
>> - if (record_gtid(thd, &rgi->current_gtid, sub_id, false, false))
>> + if (record_gtid(thd, &rgi->current_gtid, sub_id, NULL, false))
>> DBUG_RETURN(1);
>> update_state_hash(sub_id, &rgi->current_gtid, rgi);
>> }
>> @@ -328,6 +328,8 @@ rpl_slave_state::update(uint32 domain_id, uint32 server_id, uint64 sub_id,
>> }
>> }
>> rgi->gtid_ignore_duplicate_state= rpl_group_info::GTID_DUPLICATE_NULL;
>> +
>> + rgi->pending_gtid_deletes_clear();
>> }
>>
>> if (!(list_elem= (list_element *)my_malloc(sizeof(*list_elem), MYF(MY_WME))))
>> @@ -377,15 +379,24 @@ int
>> rpl_slave_state::put_back_list(uint32 domain_id, list_element *list)
>> {
>> element *e;
>> + int err= 0;
>> +
>> + mysql_mutex_lock(&LOCK_slave_state);
>> if (!(e= (element *)my_hash_search(&hash, (const uchar *)&domain_id, 0)))
>> - return 1;
>> + {
>> + err= 1;
>> + goto end;
>> + }
>> while (list)
>> {
>> list_element *next= list->next;
>> e->add(list);
>> list= next;
>> }
>> - return 0;
>> +
>> +end:
>> + mysql_mutex_unlock(&LOCK_slave_state);
>> + return err;
>> }
>>
>>
>> @@ -468,12 +479,12 @@ gtid_check_rpl_slave_state_table(TABLE *table)
>> /*
>> Write a gtid to the replication slave state table.
>>
>> - Do it as part of the transaction, to get slave crash safety, or as a separate
>> - transaction if !in_transaction (eg. MyISAM or DDL).
>> -
>> gtid The global transaction id for this event group.
>> sub_id Value allocated within the sub_id when the event group was
>> read (sub_id must be consistent with commit order in master binlog).
>> + rgi rpl_group_info context, if we are recording the gtid transactionally
>> + as part of replicating a transactional event. NULL if called from
>> + outside of a replicated transaction.
>>
>> Note that caller must later ensure that the new gtid and sub_id is inserted
>> into the appropriate HASH element with rpl_slave_state.add(), so that it can
>> @@ -481,13 +492,13 @@ gtid_check_rpl_slave_state_table(TABLE *table)
>> */
>> int
>> rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> - bool in_transaction, bool in_statement)
>> + rpl_group_info *rgi, bool in_statement)
>> {
>> TABLE_LIST tlist;
>> int err= 0;
>> bool table_opened= false;
>> TABLE *table;
>> - list_element *elist= 0, *next;
>> + list_element *elist= 0, *cur, *next;
>> element *elem;
>> ulonglong thd_saved_option= thd->variables.option_bits;
>> Query_tables_list lex_backup;
>> @@ -558,7 +569,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> thd->wsrep_ignore_table= true;
>> #endif
>>
>> - if (!in_transaction)
>> + if (!rgi)
>> {
>> DBUG_PRINT("info", ("resetting OPTION_BEGIN"));
>> thd->variables.option_bits&=
>> @@ -601,9 +612,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> if ((elist= elem->grab_list()) != NULL)
>> {
>> /* Delete any old stuff, but keep around the most recent one. */
>> - list_element *cur= elist;
>> - uint64 best_sub_id= cur->sub_id;
>> + uint64 best_sub_id= elist->sub_id;
>> list_element **best_ptr_ptr= &elist;
>> + cur= elist;
>> while ((next= cur->next))
>> {
>> if (next->sub_id > best_sub_id)
>> @@ -636,7 +647,8 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> table->file->print_error(err, MYF(0));
>> goto end;
>> }
>> - while (elist)
>> + cur = elist;
>> + while (cur)
>> {
>> uchar key_buffer[4+8];
>>
>> @@ -646,9 +658,9 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> /* `break' does not work inside DBUG_EXECUTE_IF */
>> goto dbug_break; });
>>
>> - next= elist->next;
>> + next= cur->next;
>>
>> - table->field[1]->store(elist->sub_id, true);
>> + table->field[1]->store(cur->sub_id, true);
>> /* domain_id is already set in table->record[0] from write_row() above. */
>> key_copy(key_buffer, table->record[0], &table->key_info[0], 0, false);
>> if (table->file->ha_index_read_map(table->record[1], key_buffer,
>> @@ -662,8 +674,7 @@ rpl_slave_state::record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> not want to endlessly error on the same element in case of table
>> corruption or such.
>> */
>> - my_free(elist);
>> - elist= next;
>> + cur= next;
>> if (err)
>> break;
>> }
>> @@ -686,18 +697,31 @@ IF_DBUG(dbug_break:, )
>> */
>> if (elist)
>> {
>> - mysql_mutex_lock(&LOCK_slave_state);
>> put_back_list(gtid->domain_id, elist);
>> - mysql_mutex_unlock(&LOCK_slave_state);
>> + elist = 0;
>> }
>>
>> ha_rollback_trans(thd, FALSE);
>> }
>> close_thread_tables(thd);
>> - if (in_transaction)
>> + if (rgi)
>> + {
>> thd->mdl_context.release_statement_locks();
>> + /*
>> + Save the list of old gtid entries we deleted. If this transaction
>> + fails later for some reason and is rolled back, the deletion of those
>> + entries will be rolled back as well, and we will need to put them back
>> + on the to-be-deleted list so we can re-do the deletion. Otherwise
>> + redundant rows in mysql.gtid_slave_pos may accumulate if transactions
>> + are rolled back and retried after record_gtid().
>> + */
>> + rgi->pending_gtid_deletes_save(gtid->domain_id, elist);
>> + }
>> else
>> + {
>> thd->mdl_context.release_transactional_locks();
>> + rpl_group_info::pending_gtid_deletes_free(elist);
>> + }
>> }
>> thd->lex->restore_backup_query_tables_list(&lex_backup);
>> thd->variables.option_bits= thd_saved_option;
>> @@ -1080,7 +1104,7 @@ rpl_slave_state::load(THD *thd, char *state_from_master, size_t len,
>>
>> if (gtid_parser_helper(&state_from_master, end, >id) ||
>> !(sub_id= next_sub_id(gtid.domain_id)) ||
>> - record_gtid(thd, >id, sub_id, false, in_statement) ||
>> + record_gtid(thd, >id, sub_id, NULL, in_statement) ||
>> update(gtid.domain_id, gtid.server_id, sub_id, gtid.seq_no, NULL))
>> return 1;
>> if (state_from_master == end)
>> diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h
>> index 79d566bddbf..7bd639b768f 100644
>> --- a/sql/rpl_gtid.h
>> +++ b/sql/rpl_gtid.h
>> @@ -182,7 +182,7 @@ struct rpl_slave_state
>> uint64 seq_no, rpl_group_info *rgi);
>> int truncate_state_table(THD *thd);
>> int record_gtid(THD *thd, const rpl_gtid *gtid, uint64 sub_id,
>> - bool in_transaction, bool in_statement);
>> + rpl_group_info *rgi, bool in_statement);
>> uint64 next_sub_id(uint32 domain_id);
>> int iterate(int (*cb)(rpl_gtid *, void *), void *data,
>> rpl_gtid *extra_gtids, uint32 num_extra,
>> diff --git a/sql/rpl_rli.cc b/sql/rpl_rli.cc
>> index 64a1b535307..b35130c1505 100644
>> --- a/sql/rpl_rli.cc
>> +++ b/sql/rpl_rli.cc
>> @@ -1680,6 +1680,7 @@ rpl_group_info::reinit(Relay_log_info *rli)
>> long_find_row_note_printed= false;
>> did_mark_start_commit= false;
>> gtid_ev_flags2= 0;
>> + pending_gtid_delete_list= NULL;
>> last_master_timestamp = 0;
>> gtid_ignore_duplicate_state= GTID_DUPLICATE_NULL;
>> speculation= SPECULATE_NO;
>> @@ -1804,6 +1805,12 @@ void rpl_group_info::cleanup_context(THD *thd, bool error)
>> erroneously update the GTID position.
>> */
>> gtid_pending= false;
>> +
>> + /*
>> + Rollback will have undone any deletions of old rows we might have made
>> + in mysql.gtid_slave_pos. Put those rows back on the list to be deleted.
>> + */
>> + pending_gtid_deletes_put_back();
>> }
>> m_table_map.clear_tables();
>> slave_close_thread_tables(thd);
>> @@ -2027,6 +2034,78 @@ rpl_group_info::unmark_start_commit()
>> }
>>
>>
>> +/*
>> + When record_gtid() has deleted any old rows from the table
>> + mysql.gtid_slave_pos as part of a replicated transaction, save the list of
>> + rows deleted here.
>> +
>> + If later the transaction fails (eg. optimistic parallel replication), the
>> + deletes will be undone when the transaction is rolled back. Then we can
>> + put back the list of rows into the rpl_global_gtid_slave_state, so that
>> + we can re-do the deletes and avoid accumulating old rows in the table.
>> +*/
>> +void
>> +rpl_group_info::pending_gtid_deletes_save(uint32 domain_id,
>> + rpl_slave_state::list_element *list)
>> +{
>> + /*
>> + We should never get to a state where we try to save a new pending list of
>> + gtid deletes while we still have an old one. But make sure we handle it
>> + anyway just in case, so we avoid leaving stray entries in the
>> + mysql.gtid_slave_pos table.
>> + */
>> + DBUG_ASSERT(!pending_gtid_delete_list);
>> + if (unlikely(pending_gtid_delete_list))
>> + pending_gtid_deletes_put_back();
>> +
>> + pending_gtid_delete_list= list;
>> + pending_gtid_delete_list_domain= domain_id;
>> +}
>> +
>> +
>> +/*
>> + Take the list recorded by pending_gtid_deletes_save() and put it back into
>> + rpl_global_gtid_slave_state. This is needed if deletion of the rows was
>> + rolled back due to transaction failure.
>> +*/
>> +void
>> +rpl_group_info::pending_gtid_deletes_put_back()
>> +{
>> + if (pending_gtid_delete_list)
>> + {
>> + rpl_global_gtid_slave_state->put_back_list(pending_gtid_delete_list_domain,
>> + pending_gtid_delete_list);
>> + pending_gtid_delete_list= NULL;
>> + }
>> +}
>> +
>> +
>> +/*
>> + Free the list recorded by pending_gtid_deletes_save(). Done when the deletes
>> + in the list have been permanently committed.
>> +*/
>> +void
>> +rpl_group_info::pending_gtid_deletes_clear()
>> +{
>> + pending_gtid_deletes_free(pending_gtid_delete_list);
>> + pending_gtid_delete_list= NULL;
>> +}
>> +
>> +
>> +void
>> +rpl_group_info::pending_gtid_deletes_free(rpl_slave_state::list_element *list)
>> +{
>> + rpl_slave_state::list_element *next;
>> +
>> + while (list)
>> + {
>> + next= list->next;
>> + my_free(list);
>> + list= next;
>> + }
>> +}
>> +
>> +
>> rpl_sql_thread_info::rpl_sql_thread_info(Rpl_filter *filter)
>> : rpl_filter(filter)
>> {
>> diff --git a/sql/rpl_rli.h b/sql/rpl_rli.h
>> index 74d5b6fe416..b40a34a54e6 100644
>> --- a/sql/rpl_rli.h
>> +++ b/sql/rpl_rli.h
>> @@ -676,6 +676,11 @@ struct rpl_group_info
>> /* Needs room for "Gtid D-S-N\x00". */
>> char gtid_info_buf[5+10+1+10+1+20+1];
>>
>> + /* List of not yet committed deletions in mysql.gtid_slave_pos. */
>> + rpl_slave_state::list_element *pending_gtid_delete_list;
>> + /* Domain associated with pending_gtid_delete_list. */
>> + uint32 pending_gtid_delete_list_domain;
>> +
>> /*
>> The timestamp, from the master, of the commit event.
>> Used to do delayed update of rli->last_master_timestamp, for getting
>> @@ -817,6 +822,12 @@ struct rpl_group_info
>> char *gtid_info();
>> void unmark_start_commit();
>>
>> + static void pending_gtid_deletes_free(rpl_slave_state::list_element *list);
>> + void pending_gtid_deletes_save(uint32 domain_id,
>> + rpl_slave_state::list_element *list);
>> + void pending_gtid_deletes_put_back();
>> + void pending_gtid_deletes_clear();
>> +
>> time_t get_row_stmt_start_timestamp()
>> {
>> return row_stmt_start_timestamp;
>> _______________________________________________
>> commits mailing list
>> commits@xxxxxxxxxxx
>> https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits
Follow ups
References