← Back to team overview

maria-developers team mailing list archive

Re: parallel replication monitoring

 

"useful to have in I_S or P_S".  I vote for I_S. P_S may be disabled by
admin due to its memory and cpu overhead.

-- Peter
-- Webyog


On Fri, Nov 15, 2013 at 11:20 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx
> wrote:

> Giuseppe Maxia <g.maxia@xxxxxxxxx> writes:
>
> > * which database they are running is important even if the parallel
> > replication is not split by schema. As a DBA, I need to know at a glance
> > where the action is occurring. SHOW PROCESSLIST gives me that
> information,
> > but it doesn't tell me the GTID, which is the info that gives me the
> pulse
> > of the replication progress. What bothers me is that I can get either the
> > GTID list or where the action is happening. Having both in one place
> should
> > not be difficult (a I_S or P_S view, for example).
>
> Right, I think I see.
>
> So it could be useful to have in I_S or P_S a table that gives for each
> worker
> stuff like:
>
>  - The GTID of the event group currently executing, or NULL if idle
>  - Status (executing, waiting for prior transaction before starting,
> waiting
>    for prior transaction before committing, stuff like that)
>  - The current database (USE xxx)
>  - Currently executing query
>  - Whether this worker was scheduled in parallel with something else, and
> if
>    so, why that was possible (group commit id or replication domain id)
>  - Total number of events and event groups executed by worker thread
>  - Possibly time spent idle, time spent executing, and time spent waiting
> for
>    prior transactions to commit (if such times can be obtained without too
>    high performance overhead).
>
> That is a nice input, thanks. Something like this would definitely be nice
> to
> have. I've put this into Jira:
>
>     https://mariadb.atlassian.net/browse/MDEV-5296
>
> On top of this, I think we could also add some statistics for the SQL
> thread. Like, how often did it have to wait for a worker to become free to
> schedule a potentially parallel transaction (might indicate a too-low
> --slave-parallel-threads). And how many transactions could / could not be
> scheduled in parallel (could indicate the need to tune the master to
> provide
> more parallelism in the binlog).
>
> What would be really nice is to have two numbers in SHOW SLAVE STATUS. One
> is
> the wall-clock time since START SLAVE. The other is the total time spent by
> workers on executing events for this master connection (excluding waiting
> for
> other replication threads). The ratio between these two numbers would
> immediately give an indication of how effective parallel replication is at
> utilising the machine, same as the cpu% numbers in the `top` Linux utility.
>
> > I see in mysql.gtid_slave_pos that the number of rows grows and shrinks
> > while the workers are replicating, but this does not give me any useful
> > information on the effectiveness of the operations
>
> Agree, that is mostly useless information. All except the rows with maximal
> sub_id are justa garbage rows, the deletion of which has been delayed to
> avoid
> lock contention between transactions.
>
> > I hope to see better integration and more tools in the future. As it is
> now,
> > parallel replication seems to be a very powerful engine with brake and
> > steering wheel but with a blackened out dashboard. Makes for some
> > adventurous driving!
>
> Hehe, a very nice and succinct description ;-)
>
> Thanks,
>
>  - Kristian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp
>

References