maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #10525
Re: prospective GSOC 2017 student [MDEV-7502]
> Le 19 mars 2017 à 18:53, Sergei Golubchik <serg@xxxxxxxxxxx> a écrit :
>
> Hi, ibrar!
>
> On Mar 19, ibrar arshad wrote:
>> Hi,
>>
>> My name is Ibrar Arshad and I am interested in working on the task of
>> automatic slave provisioning(ticket: MDEV-7502
>> <https://jira.mariadb.org/browse/MDEV-7502>) during GSOC 2017. I have read
>> the summary on the ticket and have achieved a fair understanding of the
>> problem and I am working towards ironing out the implementation details.
>> The use-case as I understand is that we want the slave to auto-replicate
>> the data from master once pointed the master
>
> Yes.
>
>> and we want to do it in such a manner that the binlog events from
>> current master position as well as the old data chunks are relayed to
>> the slave in a parallel fashion.
>
> Not necessarily. There could be other approaches too.
>
> May be even bulk-loading the data would be faster than sending data in
> chunks and applying events in parallel. Or may be not.
>
>> I have a few questions related to the proposal:
>>
>> 1. After reading a few pages on replication, my understanding is
>> that after "CHANGE MASTER TO" and "START SLAVE", master starts
>> sending binlog events from its current position to the slave which
>> slave starts applying. The usual replication approach is to get the
>> current binlog position on master, backup all the data till this
>> position from master to slave, point slave to this position(or
>> GTID) via "CHANGE MASTER TO", and START SLAVE to start replicating
>> bin events from master. But for MDEV-7502, we want the normal
>> events and old data chunks to be transmitted in parallel.
>
> The main thing we want for MDEV-7502 is to avoid the step of "backup all
> the data... restore on the slave".
>
>> The ticket summary mentions using separate domain_ids to send the
>> new and old data in parallel, does there exist a way to do so
>> currently? How can domain id be used here? Can we currently point
>> the slave to 2 different bin positions on a single master and
>> expect the master to send events from both positions? Or will this
>> require some sort of new process/thread implementation on master to
>> do so?
>
> No, this won't. I didn't actually try to connect twice from a slave to
> the same master, but I suspect it'll either work or can be fixed to work
> rather easily.
>
>> 2. There are at-least two other approaches mentioned in the
>> ticket's comments section. It doesn't seem like that a single
>> approach has been finalized. This project doesn't seem to have a
>> mentor yet to provide guidance so which approach should an
>> applicant pursue further?
>
> Yes, the project suggests few different approaches. You can discuss them
> in your proposal and suggest the one you think is the best.
> There will be a mentor, don't worry. It just wasn't formally assigned
> yet.
>
>> I would like to discuss the project approaches and implementation
>> further in detail before submitting a proposal so can somebody please
>> answer my queries and further suggest pointers to this project
>> specific material which I can go through to get a deeper
>> understanding? Thanks.
>
> Hmm..
>
> For example, I've mentioned above that it's not clear whether sending
> all data first and bulk-loading them will be faster or slower than
> interleaving data anf RBR binlog events.
>
> You can test it. Get a big table dump (not huge, but something that
> loads a noticeable amount of time). Then get a bunch of single-row
> update/delete/updates.
> And try 1) load the dump, do updates. 2) do updates in parallel with the
> dump. Just take care to enable at least the primary key, and made sure
> that in both approaches you get the same table content at the end.
> That's a simple test, no coding involved, but it'll give some
> understanding as to what approach is faster on the slave side.
I would strongly suggest to have a look at https://github.com/maxbube/mydumper
Before implementing there are interesting collections of issues already fixed inside .
/svar
Stéphane Varoqui, Senior Consultant
Phone: +33 695-926-401, skype: svaroqui
http://www.mariadb.com
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help : https://help.launchpad.net/ListHelp
References