maria-developers team mailing list archive

Thread
Date
Re: [GSoC] MDEV-7502 Automatic provisioning of slave

To: Martin Kaluznik <martin.kaluznik@xxxxxxxxx>, Aalekh Nigam <aalekh.nigam@xxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Wed, 18 Mar 2015 14:49:32 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAMMGaZ7W+EKpt_bqSr8Q8wsHZA+AqLjCh1gQSJvwCzgkrzhDuA@mail.gmail.com> (Martin Kaluznik's message of "Tue, 17 Mar 2015 12:16:12 +0100")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
[Replying to both Aalekh and Martin at the same time.]

Martin Kaluznik <martin.kaluznik@xxxxxxxxx> writes:

> I am student interested in GSoC project Automatic provisioning of
> slave ( https://mariadb.atlassian.net/browse/MDEV-7502 ). I have read
> current code and put together my idea of how it could be implemented.

Cool! This would be a very useful thing to have for many users.I feel there is
missing a straight-forward, simple way to do a non-blocking copy of an
existing MariaDB server (for purpose of slave provisioning or backup), and
this feature would provide just that.

There will be a need to decide on design details on this task during the
process, as this is a new idea where not everything is yet thought through. So
in addition to coding skills, this will require a good understanding of how to
design production-quality systems software, and of what it means to make a
user-friendly feature for operations engineers of a high level of skill.

> Slave starts regular sql thread and something really similar to
> current io thread (maybe best to begin with will be copy of
> handle_slave_io and related functions, it may still end with some
> duplicate code, which at the end can be moved to new common
> functions).

Yes. We might even be able to use the same io thread, though it seems likely
that there will be some flag or something do denote the provisioning mode,
which might be special-cased in certain parts of the code.

> Slave will receive gtids of current master as response to
> copy-modified version of COM_BINLOG_DUMP, they will mark position from
> which binlogs will be sent to slave.

Yes, I agree. The slave will start the replication from whatever point the
master is at (@@gtid_binlog_pos on the master), so the master will need to
send this starting point to the slave. (IIRC, the master already sends this
position, but it's a good point that in this mode, the starting point is
determined by the master, not by the slave as is normally the case).

> Master proceeds with building of list containing what needs to be sent
> to slave. Here exact implementation will depend on whether the master
> itself will block any DDL queries during provisioning process or not.
> If he will (probably won’t – it’s not necessary and it will just slow
> him down), we will have guaranteed that internal containers with
> databases/tables won’t change and they can be just iterated through
> safely (not true for procedure/trigger containers?). In other case, it
> will be probably required to prefetch list of databases/tables to copy
> to slave (it can be done lazily with tables/triggers/procedures/…,
> holding only list of tables/… for database which is currently being
> processed). In the second case, slave will have to detect DDL event
> and interrupt connection.

Yes, that sounds right, it's something that I did not think much about in the
design as currently written.

I think it's reasonable for this project that the master will not block DDL,
and the slave will instead detect DDL events and interrupt connections. So
that the list of tables etc. to send can be prefetched.

It would be useful to have the ability to block DDL (but not DML) on the
master for the duration of the transfer. Such a DDL lock would be useful for
other cases as well, but it can be done as a separate project, and this
project will be quite useful even without it.

> The range scans on primary key mentioned in description, I assume it
> would look like
> SELECT * FROM tbl WHERE pk > @lastChunkKey ORDER BY pk LIMIT @chunkSize;
> just done with internal functions more efficiently, and the result
> will be converted to Rows_log_event which will be sent to slave.

Yes.

> To restrictions I would add at least one more, provisioning cannot be
> continued if master is restarted during process (or can we actually
> store information about provisioning process in database?).

Agree, that is a reasonable restriction.

> Another possible problem could be slave reconnect. I found
> kill_zombie_dump_threads function, in this case, new thread could take
> information about provisioning progress from old thread and with help
> of fake gtids bundled with chunks somehow figure out which chunk was
> last successfully received by slave before disconnect. It would
> require additional memory to store few (or more?) most recently sent
> chunks and wouldn't guarantee success.
>
> Great way to solve this would be, to be able to map gtid sequence
> number to exact chunk of data, but I can’t see how it can be done
> without assuming something about maximum amount of rows in table /
> number of tables / triggers / …

Right, that is a good point.

I think it would be acceptable in a first version to fail if slave disconnects
during the dump. And this could them be improved if time permits or as a
separate project.

I think using GTIDs to track progress of the dump is a very good idea. If each
chunk of data is sent to the slave marked with a GTID the same way as normal
replication, then the slave will have in mysql.gtid_slave_pos the position
that is consistent with how far the dump has progressed on the slave side. And
the master could then remember in-memory the table name and primary-key
position corresponding to the last 100 GTIDs sent or something. And if the
slave reconnects, it can resume from the GTID sent by the slave (or fail if
the slave was too far behind). But it will require a fair amount of work also
on the master side to be able for a new dump thread to take over from the old
zombie_dump. So an interesting idea, but probably for a future project.

In addition to the above, we also need to think through exactly how the dump
of table data can be interleaved with other transactions that modify the
master and is binlogged. I guess my idea was that such dumps from range scans
would be binlogged as INSERTS (Write_rows_Log_event) and could be replication
the same way. We need to make sure that the correct data will be transfered
regardless of the ordering between range scans of the dump and other
transactions.

Maybe the range scans should take a read lock on the rows scanned, to ensure
that the slave will process the scanned rows _before_ any updates done
afterwards on the master.

The slave SQL thread will also need to ignore errors on normal replication
events caused by a row not existing (or a table) - in this case, the correct
data will instead be received in a chunk dumped later.

This needs to be thought through properly to be sure that correctness is
guaranteed in all cases.

> Thanks for feedback, Martin Kaluznik ( martin.kaluznik [at] gmail.com ).
> I can be also found on IRC under nick Farnham

Ok cool, I'm knielsen on FreeNode #maria.



> I am Aalekh Nigam, 3rd year Undergraduate Student at JIIT, Noida.

> Can you please guide me on how to proceed with this project, to assure my
> participation in this GSOC? For now I am assuming the project to be

I think the above captures well the essense of the idea.

I would suggest looking at the mysqldump program (client/mysqldump.cc in the
source tree). The code uses SQL client side, so is not directly applicable,
but it should give an idea of what is required to take a full copy of
everything on a server.

I would also suggest reading up on how Global Transaction IDs (GTID) work in
MariaDB, here are some links:

  https://mariadb.com/kb/en/mariadb/global-transaction-id/
  http://kristiannielsen.livejournal.com/16826.html
  http://kristiannielsen.livejournal.com/17008.html
  http://kristiannielsen.livejournal.com/17238.html

The code on the master side that receives a connection from a slave is in
sql_repl.cc starting from mysql_binlog_send().

The code on the slave side is in sql/slave.cc, starting from the IO thread
(handle_slave_io()) and the SQL thread (handle_slave_sql()).

Do feel free to ask me more questions about this.

> something like this :
> https://technet.microsoft.com/en-us/library/cc645937(v=sql.105).aspx

Hm, not really, I think. This seems to be a mechanism to expose to external
clients sufficient information for them to be able to implement their own type
of row-based replication. While the idea in this task is to do everything
inside the server code, using the existing infrastructure for MariaDB
row-based replication.

> Is there any task/bug I need to get working on ?

Hm, I don't have anything in mind at the moment, unfortunately. Maybe you can
find something by searching the bug tracker? Else the pointers to the code I
gave above can hopefully be a reasonable starting point.

 - Kristian.
References

[GSoC] MDEV-7502 Automatic provisioning of slave
From: Martin Kaluznik, 2015-03-17