← Back to team overview

maria-developers team mailing list archive

Re: [Maria-discuss] Pipeline-stype slave parallel applier


Actually the optimistic statement scheduling is restricted
to the Write_log_events/INSERT-Query_log_event types.
Unlike inter-transaction conflicts which can be cured through
proper victimization and retry this time it's the intra- case
where retrying won't help when the 2nd event is separated from the parent.

And that means the parent and the child must be in the same
modified branch.
Only Delete vs Delete and Write vs Write are pairwise non-conflicting
in RBR.

I've briefly explained the idea of the table-level pessimistic
dependency approximation solely by slave in the Row format case.

Similarly to cover the Statement format 
we would have to make master to collecting involved tables, including
triggered ones (mysql does so for databases names).
Another approach is to switch to the conservative mode when
the master group contains at least one Query-log-event. This fact
can be passed with the first Gtid-log-event of the group (master
binlogging change required).

So in your example

>>>   BEGIN;  /* T1 */
>>>   INSERT INTO t1 VALUES  (...);  /* S1 */
>>>   COMMIT;
>>>   BEGIN;  /* T2 */
>>>   UPDATE t1 SET b=10 WHERE a=1;  /* S2 */
>>>   UPDATE t1 SET b=20 WHERE a=1;  /* S3 */
>>>   COMMIT;

the updates are doomed to schedule in the same branch.



>>> So the idea is to schedule these two transactions (T1 T2) to three different
>>> worker threads, right? Each thread gets one of (S1 S2 S3).
>>> But we must not commit S3 before S2, obviously, or we end up with incorrect
>>> result. So how will this dependency be handled?
>> This is dependency tracking specifics.
>> In RBR case (the whole group is row-format) S2 -> S3 can be only
>> pessimistically estimated right now via reading table name from Table_map.
>> In SBR case that you meant Query-log-events would have to carry table names.
>> We could refine the estimate with help of enriched logging on master to collect
>> PK values into the event.
>> So having learned S2->S3 at assigning in the first approximation I
>> suggest we schedule T2 as one transaction.
> Naturally the first approximation admits unfairness when the input is
> dependent. Yet we can safe that what is statistically corner cases.
> Either (though it's not binary after all)
>   with running optimistically as you've suggested, or
>   with reinforcing master logging to track *intra*-transaction
>   dependecies. Practically it's to record a set of 'db.table.record_id'
>   arrays, an array per 'db.table'.
>   Each array will be sorted per 'record_id' as part of a replication
>   optimizer to build up
>   non-dependant branches as equal as possible sizes.
>   Also just a number
>   of noticed dependecies could be checked by slave to its execution to
>   conservative mode.
>>> I was assuming that this would be using the same mechanism as optimistic
>>> parallel replication - in fact, I thought the point was that this is
>>> precisely a clever generalisation of optimistic parallel replication,
>> You have a very nice point and the dumb pessimistic method is refined by
>> optimistic execution of *one* statement (whose cost to rollback is
>> insignificant)!
>> [Why not it be the only option...]
>> More to that, somewhat similarly to waiting for prior trx to commit now
>> we would wait for prior statements if they (may) depend. When an earlier ordered
>> statement finds a locked row it should eliminate low-ranked current
>> owner (could take the owner to reach the wait-for-prior-stmt
>> commit). Otherwise it waits.
> This obviously requires cross-group event enumeration and awareness of
> potential dependency on an earlier scheduled event (of a smaller sequence number).