← Back to team overview

maria-developers team mailing list archive

Re: Mdev-10715 -- Galera: Replicate MariaDB GTID to other nodes in the cluster

 

Sachin Setiya <sachin.setiya@xxxxxxxxxxx> writes:

> <knielsen> sachin_setiya_7: so maybe the problem is - that a node
> broadcasts its write set before the commit order has been determined?
>
> I do not think , this is the problem. Galera enforces the commit order.
> Yes,  it broadcast write set in prepare phase. but it also
>
> guarantees that t1->t2 order will be maintained in all participating N
> nodes.

Ok, good.

> <knielsen> sachin_setiya_7: how is the galera internal transaction id
> allocated and broadcast?
>
> I am here assuming that we are talking about gtid-sequence no.
>
> Suppose our initial seqno is S. So basically at this time all N  have same
> sequence no.
>
> Some transaction T is executed at node Ni .It broadcast the writeset with
>
> its current sequence no S.
>
> At all Node Nj (including Ni).It receives this message. It checks some
> conditions
>
> Like it “totally ordered action”. If yes then Nj updates its sequence no to
> + 1.

Right. I think "totally ordered action" is something like DDL, which needs
special handling to ensure same commit order over the entire cluster.

I am wondering if there is a slightly different method used for normal DML
transactions, or if it is the same.

In any case, I am guessing it must work much the same way, because after the
prepare phase the transaction will be written into the binlog, and at that
point it cannot be rolled back, so it should have passed certification and
gotten its Galera transaction ID.

> Blueprint of task:- We can do something like galera GTID, we will take
>  initial
>
> sequence no from server. We will add one more variable in gcs_group_t
>
> Named s_sequence_no and will increment it at each node. We also have to
>
> Create a gtid event and append it to message received at Nj , so that on
> late stages wsrep_apply_cb() can take care of gtid.

It sounds reasonable. I still don't fully understand how the Galera
transaction id is generated, but my guess is that doing something similar
for the MariaDB GTID sequence number should be the right way.

But one more thing is needed, which is to ensure that the transactions will
be written into the binary log in the same order as the GTID sequence
numbers were assigned. This is necessary for GTID to work. The slave only
stores one GTID position for each replication domain. So the sequence
numbers must be in the right order in the master binlog, otherwise the slave
cannot determine the starting position correctly.

The original way I imagined this would be done is that Galera would take
over the transaction coordinator role and implement the
TC_LOG::log_and_order() virtual method. This was though not done, and I
imagine it would be a somewhat large task.

One possible simpler alternative is to use the wait_for_prior_commit,
similar to how parallel replication does it. The idea would be that if
Galera commits transactions T1, T2, T3, ... in order, Galera would for each
T_i call wait_for_commit::register_wait_for_prior_commit(T_(i-1)).
This will make TC_LOG_BINLOG::log_and_order() write the transactions to the
binlog in the correct order. See comments on struct wait_for_commit in
sql/sql_class.h.

Another thing that needs to be handled is what happens to transactions that
enter Galera through normal replication from another cluster, using MariaDB
parallel replication. In this case, the commit order and GTID sequence
numbers are already decided, from the replication master. Some way will be
needed to force Galera to use the same commit order that replication has, it
cannot invent its own order or GTIDs. I am not sure if that is even
possible. So maybe it will be necessary to forbid the use of parallel
replication with GTID against a Galera cluster?

 - Kristian.


References