← Back to team overview

maria-developers team mailing list archive

Re: [server] Mdev 7802 binlog groupcommit stats (#30)

 

Daniel Black <notifications@xxxxxxxxxx> writes:

Thanks for your patch. I agree that these kinds of status information could be
useful. But I have some parts that are not clear, and I want you to think
about the questions below and answer with your thoughts. This is to make sure
you understand yourself the issue in detail; to help me understand it; and in
some cases maybe some changes should be made.

> The following global status variables where added:
> * binlog_group_commit_reason_count
> * binlog_group_commit_reason_usec

So, these names seem a bit odd. Neither "count" nor "usec" makes much sense as
a reason for something to happen. "_timeout" would be a better reason that
"_usec", but I am not sure of a better name for "_count". The good thing about
these names of course is that they match the corresponding
--binlog-commit-wait-* option.

What do you think of something like

  binlog_commit_trigger_count
  binlog_commit_trigger_timeout

instead ?

> * binlog_group_commit_reason_transaction
> * binlog_group_commit_reason_immediate

> binlog_group_commit_reason_transaction is a result of ordered
> transaction that need to occur in the same order on the slave and can't
> be parallelised.
>
> binlog_group_commit_reason_immediate is caused to prevent stalls with
> row locks as described in log.cc:binlog_report_wait_for. This immediate
> count is also counted a second time in binlog_group_commit_reason_transaction.

Please re-read that explanation, and consider: do you think an average
user/DBA will fully understand what these status variables mean? Honestly, I
am wondering if you can even yourself explain exactly what the difference is
between the two :-).

Can you give two simple examples of transactions, one example of which will
increment the _transaction status variable and one of which will increment the
_immediate?

Do you think the distinction between the two is useful for the user?

If not, maybe just have a single status variable, like

  binlog_commit_trigger_lock_wait

for example?

> Overall binlog_group_commits = binlog_group_commit_reason_count +
> binlog_group_commit_reason_usec + binlog_group_commit_reason_transaction

Is this also true if --binlog-commit-wait-count=0? If so, what will the values
of the different status variables be in this case?

Now, to the patches.

> +   {
> +     if (++count >= opt_binlog_commit_wait_count)
> +     {
> +       group_commit_reason_count++;
> +       return;
> +     }
> +     if (unlikely(e->thd->has_waiter))
> +     {
> +       group_commit_reason_transaction++;
>         return;
> +     }
> +   }

So this increments the global status variables directly, I think?
How does the thread locking work? Is it possible for a reader to see a corrupt
value (eg. one word of the old value and one word of the new) on 32-bit
platform?

How is this handled for other similar status variables? It would not be good
if we had to take some LOCK_status or something inside the critical group
commit code.

With respect to the test cases: Can you please comment yourself on the
different places where you added output of the status variables, and explain
why this output will always be the same, no matter how threads are scheduled
on the machine running the test? (This is the critical part of most tests
related to replication; because of the inherently multi-threaded nature of
such tests, a lot of care is needed to make sure the test will not produce
different output on different test runs depending on the speed of the host or
how threads are scheduled on a loaded machine).

 - Kristian.
 


Follow ups