maria-developers team mailing list archive

Thread
Date

Re: parallel replication monitoring

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Giuseppe Maxia <g.maxia@xxxxxxxxx>
Date: Fri, 15 Nov 2013 11:31:06 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87habe55l9.fsf@frigg.knielsen-hq.org>

Kristian,
Thanks for taking action with this issue. Another nice touch would be having the default value of slave_parallel_threads=1 instead of 0, and doing the right thing, as explained in this article:
http://datacharmer.blogspot.it/2013/11/parallel-replication-off-by-one.html

Cheers

Giuseppe

On November 15, 2013 at 11:20:36 , Kristian Nielsen (knielsen@xxxxxxxxxxxxxxx) wrote:

Giuseppe Maxia <g.maxia@xxxxxxxxx> writes:  

> * which database they are running is important even if the parallel  
> replication is not split by schema. As a DBA, I need to know at a glance  
> where the action is occurring. SHOW PROCESSLIST gives me that information,  
> but it doesn't tell me the GTID, which is the info that gives me the pulse  
> of the replication progress. What bothers me is that I can get either the  
> GTID list or where the action is happening. Having both in one place should  
> not be difficult (a I_S or P_S view, for example).  

Right, I think I see.  

So it could be useful to have in I_S or P_S a table that gives for each worker  
stuff like:  

- The GTID of the event group currently executing, or NULL if idle  
- Status (executing, waiting for prior transaction before starting, waiting  
for prior transaction before committing, stuff like that)  
- The current database (USE xxx)  
- Currently executing query  
- Whether this worker was scheduled in parallel with something else, and if  
so, why that was possible (group commit id or replication domain id)  
- Total number of events and event groups executed by worker thread  
- Possibly time spent idle, time spent executing, and time spent waiting for  
prior transactions to commit (if such times can be obtained without too  
high performance overhead).  

That is a nice input, thanks. Something like this would definitely be nice to  
have. I've put this into Jira:  

https://mariadb.atlassian.net/browse/MDEV-5296  

On top of this, I think we could also add some statistics for the SQL  
thread. Like, how often did it have to wait for a worker to become free to  
schedule a potentially parallel transaction (might indicate a too-low  
--slave-parallel-threads). And how many transactions could / could not be  
scheduled in parallel (could indicate the need to tune the master to provide  
more parallelism in the binlog).  

What would be really nice is to have two numbers in SHOW SLAVE STATUS. One is  
the wall-clock time since START SLAVE. The other is the total time spent by  
workers on executing events for this master connection (excluding waiting for  
other replication threads). The ratio between these two numbers would  
immediately give an indication of how effective parallel replication is at  
utilising the machine, same as the cpu% numbers in the `top` Linux utility.  

> I see in mysql.gtid_slave_pos that the number of rows grows and shrinks  
> while the workers are replicating, but this does not give me any useful  
> information on the effectiveness of the operations  

Agree, that is mostly useless information. All except the rows with maximal  
sub_id are justa garbage rows, the deletion of which has been delayed to avoid  
lock contention between transactions.  

> I hope to see better integration and more tools in the future. As it is now,  
> parallel replication seems to be a very powerful engine with brake and  
> steering wheel but with a blackened out dashboard. Makes for some  
> adventurous driving!  

Hehe, a very nice and succinct description ;-)  

Thanks,  

- Kristian.

Follow ups

Re: parallel replication monitoring
From: Jean Weisbuch, 2013-11-15

References

parallel replication monitoring
From: Giuseppe Maxia, 2013-11-14
Re: parallel replication monitoring
From: Kristian Nielsen, 2013-11-15
Re: parallel replication monitoring
From: Giuseppe Maxia, 2013-11-15
Re: parallel replication monitoring
From: Kristian Nielsen, 2013-11-15