← Back to team overview

maria-developers team mailing list archive

Re: 880e92a48ba: MDEV-27760 event may non stop replicate in circular semisync setup

 

Hi, Andrei,

On Feb 28, Andrei wrote:
> revision-id: 880e92a48ba (mariadb-10.6.6-16-g880e92a48ba)
> parent(s): 4030a9fb2eb
> author: Andrei
> committer: Andrei
> timestamp: 2022-02-21 14:58:13 +0200
> message:
> 
> MDEV-27760 event may non stop replicate in circular semisync setup
> 
> diff --git a/sql/slave.cc b/sql/slave.cc
> index c0eef02ca7a..a1f4f4f0081 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc
> @@ -6186,13 +6186,13 @@ static int queue_event(Master_info* mi, const uchar *buf, ulong event_len)
>    bool is_rows_event= false;
>    /*
>      The flag has replicate_same_server_id semantics and is raised to accept
> -    a same-server-id event on the semisync slave, for both the gtid and legacy
> -    connection modes.
> -    Such events can appear as result of this server recovery so the event
> -    was created there and replicated elsewhere right before the crash. At recovery
> -    it could be evicted from the server's binlog.
> -  */
> -  bool do_accept_own_server_id= false;
> +    a same-server-id event group by the gtid strict mode semisync slave.
> +    Own server-id events can appear as result of this server crash-recovery:
> +    the transaction was created on this server then being master, got replicated
> +    elsewhere right before the crash before commit;
> +    finally at recovery the transaction gets evicted from the server's binlog.
> + */
> +  static bool do_accept_own_server_id= false;

No, sorry, it cannot be static. If you need to preserve it between
events, you can keep it, for example, in Master_info.

yes, I remember, that's where it was and I asked why it's in
Master_info. You could've answered, that you need to preserve the value
between queue_event invocations - I didn't realize that at the time.

So, the question now is, why do you need to preserve the value of
do_accept_own_server_id between queue_event invocations?

>    /*
>      FD_q must have been prepared for the first R_a event
>      inside get_master_version_and_clock()
> @@ -6281,6 +6281,8 @@ static int queue_event(Master_info* mi, const uchar *buf, ulong event_len)
>                      dbug_rows_event_count = 0;
>                    };);
>  #endif
> +  s_id= uint4korr(buf + SERVER_ID_OFFSET);
> +
>    mysql_mutex_lock(&mi->data_lock);
>  
>    switch (buf[EVENT_TYPE_OFFSET]) {
> @@ -6722,6 +6724,11 @@ static int queue_event(Master_info* mi, const uchar *buf, ulong event_len)
>  
>      ++mi->events_queued_since_last_gtid;
>      inc_pos= event_len;
> +
> +    do_accept_own_server_id=
> +      (s_id == global_system_variables.server_id && rpl_semi_sync_slave_enabled
> +       && mi->using_gtid != Master_info::USE_GTID_NO && opt_gtid_strict_mode) ?
> +      true : false;

this is a ridiculous pattern

   bool var = (bool expr) ? true : false;

don't do that.

>    }
>    break;
>    /*
> @@ -6909,7 +6916,6 @@ static int queue_event(Master_info* mi, const uchar *buf, ulong event_len)
>    */
>  
>    mysql_mutex_lock(log_lock);
> -  s_id= uint4korr(buf + SERVER_ID_OFFSET);
>    /*
>      Write the event to the relay log, unless we reconnected in the middle
>      of an event group and now need to skip the initial part of the group that
> @@ -6955,7 +6961,7 @@ static int queue_event(Master_info* mi, const uchar *buf, ulong event_len)
>    else
>    if ((s_id == global_system_variables.server_id &&
>         !(mi->rli.replicate_same_server_id ||
> -         (do_accept_own_server_id= rpl_semi_sync_slave_enabled))) ||
> +         do_accept_own_server_id)) ||
>        event_that_should_be_ignored(buf) ||
>        /*
>          the following conjunction deals with IGNORE_SERVER_IDS, if set
> 
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx