maria-developers team mailing list archive

Thread
Date
Re: prospective GSOC 2017 student [MDEV-7502]

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Stephane Varoqui <stephane@xxxxxxxxxxx>
Date: Sun, 19 Mar 2017 19:39:49 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20170319175335.GA3599@meddwl.fritz.box>


> Le 19 mars 2017 à 18:53, Sergei Golubchik <serg@xxxxxxxxxxx> a écrit :
> 
> Hi, ibrar!
> 
> On Mar 19, ibrar arshad wrote:
>> Hi,
>> 
>> My name is Ibrar Arshad and I am interested in working on the task of
>> automatic slave provisioning(ticket: MDEV-7502
>> <https://jira.mariadb.org/browse/MDEV-7502>) during GSOC 2017. I have read
>> the summary on the ticket and have achieved a fair understanding of the
>> problem and I am working towards ironing out the implementation details.
>> The use-case as I understand is that we want the slave to auto-replicate
>> the data from master once pointed the master
> 
> Yes.
> 
>> and we want to do it in such a manner that the binlog events from
>> current master position as well as the old data chunks are relayed to
>> the slave in a parallel fashion.
> 
> Not necessarily. There could be other approaches too.
> 
> May be even bulk-loading the data would be faster than sending data in
> chunks and applying events in parallel. Or may be not.
> 
>> I have a few questions related to the proposal:
>> 
>>   1. After reading a few pages on replication, my understanding is
>>   that after "CHANGE MASTER TO" and "START SLAVE", master starts
>>   sending binlog events from its current position to the slave which
>>   slave starts applying. The usual replication approach is to get the
>>   current binlog position on master, backup all the data till this
>>   position from master to slave, point slave to this position(or
>>   GTID) via "CHANGE MASTER TO", and START SLAVE to start replicating
>>   bin events from master. But for MDEV-7502, we want the normal
>>   events and old data chunks to be transmitted in parallel.
> 
> The main thing we want for MDEV-7502 is to avoid the step of "backup all
> the data... restore on the slave".
> 
>>   The ticket summary mentions using separate domain_ids to send the
>>   new and old data in parallel, does there exist a way to do so
>>   currently? How can domain id be used here? Can we currently point
>>   the slave to 2 different bin positions on a single master and
>>   expect the master to send events from both positions?  Or will this
>>   require some sort of new process/thread implementation on master to
>>   do so?
> 
> No, this won't. I didn't actually try to connect twice from a slave to
> the same master, but I suspect it'll either work or can be fixed to work
> rather easily.
> 
>>   2. There are at-least two other approaches mentioned in the
>>   ticket's comments section. It doesn't seem like that a single
>>   approach has been finalized. This project doesn't seem to have a
>>   mentor yet to provide guidance so which approach should an
>>   applicant pursue further?
> 
> Yes, the project suggests few different approaches. You can discuss them
> in your proposal and suggest the one you think is the best.
> There will be a mentor, don't worry. It just wasn't formally assigned
> yet.
> 
>> I would like to discuss the project approaches and implementation
>> further in detail before submitting a proposal so can somebody please
>> answer my queries and further suggest pointers to this project
>> specific material which I can go through to get a deeper
>> understanding? Thanks.
> 
> Hmm..
> 
> For example, I've mentioned above that it's not clear whether sending
> all data first and bulk-loading them will be faster or slower than
> interleaving data anf RBR binlog events.
> 
> You can test it. Get a big table dump (not huge, but something that
> loads a noticeable amount of time). Then get a bunch of single-row
> update/delete/updates.
> And try 1) load the dump, do updates. 2) do updates in parallel with the
> dump. Just take care to enable at least the primary key, and made sure
> that in both approaches you get the same table content at the end.
> That's a simple test, no coding involved, but it'll give some
> understanding as to what approach is faster on the slave side.

I would strongly suggest to have a look at https://github.com/maxbube/mydumper
Before implementing there are interesting collections of issues already fixed inside . 

/svar

Stéphane Varoqui, Senior Consultant
Phone: +33 695-926-401, skype: svaroqui
http://www.mariadb.com



> 
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp
References

prospective GSOC 2017 student [MDEV-7502]
From: ibrar arshad, 2017-03-19
Re: prospective GSOC 2017 student [MDEV-7502]
From: Sergei Golubchik, 2017-03-19