maria-developers team mailing list archive
Mailing list archive
Re: comments to the plugin.h
Sergei Golubchik <serg@xxxxxxxxxxx> writes:
> As you suggested on irc, it would make sense to make a smaller
> innodb/xtradb only fix in 10.0 and a more engine-friendly, with the new
> api, in 10.1
> Hmm, okay... When you put it this way, it does sound simpler.
> Allright, let's keep thd_report_wait_for() :)
So then, the plan seems to be:
1. I remove the new calls from include/plugin.h, instead place them somewhere
not part of a public API (maybe just "extern" declarations inside
InnoDB/XtraDB, and whatever is needed to make it work correctly for
ha_innodb.dll on Windows).
2. I try to remove the kill-in-background, instead do it directly in the
thread doing thd_report_wait_for() (I think that should be possible).
3. I apply the other review comments that you sent in another mail.
4. I file a Jira task for 10.1 about a general solution, with a good API and
other ideas collected so far.
Other than this, the patch will be much the same as what I had initially.
Is this ok with you? Or did I miss something?
>> It's not the expensive that worries me. The problem is that some of
>> the following transactions may not be possible to roll back.
> Ah, yes, indeed. We could still
> 1. rollback regardless and possibly break replication in this case.
> saying that a transactional engine will work without modifications in
> most cases, but not when it's mixed with non-trans updates
> 2. as discussed, have a flag to mark non-trans-updates transactions and
> don't run them in parallel at all. then a transactional engine will
> work without modifications.
> but that's for 10.1, if we do innodb-only fix in 10.0, it means we
> aren't concerned with other engines there.
Seems like a reasonable solution. I share your concerns about the current
solution, and some of these ideas seem possible to solve most of the issues
better, but are better suited for a next major release.
> How can T2 run in parallel with T1 if they're from different groups?
T2 can run in parallel with the commit step of T1, but not with any events
of T1 prior to commit.
In more detail:
Suppose we have 4 transactions in two group commits: (T1, T2) followed by
We will schedule T1, T2, T3, and T4 in parallel, each on their own thread
(assuming @@slave_parallel_threads >= 4).
T1 and T2 are in the same group commit, so they are allowed to start
immediately. However, T3 and T4 are in a different group commit, so they are
not ready to start - they might conflict with T1 or T2. So they wait.
Suppose T2 reaches its COMMIT (or XID) event first. It calls
mark_start_commit(), however at this point it does not do anything. T2 has
commit order after T1, so it goes to wait for T1 in wait_for_prior_commit().
Now T1 reaches its COMMIT/XID event, and calls mark_start_commit(). Now both
T1 and T2 have completed all their modifications, and are ready to
commit. This means that we can now start running T3 and T4. T3 or T4 might
have conflicting rows with T1 or T2, but T1 and T2 have already done all their
modifications, so it's ok. If there is a conflict, T3 and T4 will just
wait. If not, T3 and T4 can run in parallel with the commit steps of T1 and
Suppose both T3 and T4 have time to reach their COMMIT/XID event before T1 has
time to complete commit. Then T1 can find both T2, T3, and T4 queued up for
group commit. And T1 can do a single group commit for all four of them,
sharing the fsync overhead among 4 transactions.
This way, we get more opportunity for parallelism. This optimisation (starting
T3/T4 at _start_ of T1/T2 commit, rather than after) is particularly effective
when commit is expensive, eg. with --sync-binlog=1 and
--innodb-flush-log-at-trx-commit=1. It allows to make effective use of group
commit. It also allows to improve parallelism on slaves deeper down in the
hierarchy, using --binlog-commit-wait-count. Without this, the group commit
parallelism from a slave would always be less than (or equal) to that on the