maria-developers team mailing list archive

Thread
Date
Re: MaxScale as a binlog server

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Mark Riddoch <mark.riddoch@xxxxxxxxxx>
Date: Tue, 18 Mar 2014 08:25:44 +0000
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <871ty0ue72.fsf@frigg.knielsen-hq.org>
Thanks Kristian, that gives me some useful pointers and some things to think about and discuss with them when we meet tomorrow. I agree that limiting it to a subset of scenarios would be could, especially as we are only looking at proving a concept at the moment.

I will keep you informed of what we find out from them.

Regards
Mark

On 18 Mar 2014, at 08:02, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> wrote:

> Mark Riddoch <mark.riddoch@xxxxxxxxxx> writes:
> 
>> we have been approached by a user with a suggestion for a project that we
>> might do with them. They have a replication environment with a large fan out
>> (100 slaves per master) and are looking at doing something to cut one the
>> load on the master. the thought they had was to use MaxScale to effectively
>> cache the bin log files, so MaxScale would act as a single Slave to the
>> Master and would itself have a number of Slaves that read the bin log from
>> MaxScale. My initial response was why not just use an instance of MariaDB as
>> the intermediate node and connect slaves to one of those instances, in a
>> tree structure. They had a few reasons why they did not want to do this;
> 
>> Looking in more detail I think it is feasible, we essentially write a router
>> module for MaxSale that acts as a slave, aching the bin log locally on the
>> MaxScale node. We then have another router, or maybe even the same one, that
>> relays that binlog to the real slaves. We are looking at doing something for
>> MariaDB 10, since we would want to utilise the GTID to get the best failover
>> semantics, I am assuming there must be some differences between the
>> replication stream that I can find documented for MySQL and what MariaDB 10
>> does? Is this stuff anywhere on the Knowledge Base? I tried looking but
> 
> Right, so I agree with what Serg said, that maybe going back to the original
> problem and seeing if there are better solutions possible would be
> better. However, I will try to answer the technical aspect here.
> 
> I suppose caching and routing binlogs outside of the server should be
> possible, but it might be more complex than you would think at first.
> 
> I am not aware of any comprehensive documentation for this. However, it should
> be relatively easy to see from the code. Basically, it is the function
> mysql_binlog_send() in sql/sql_repl.cc that handles sending binlog data to a
> slave. There are good comments in there describing many of the trickier
> points.
> 
> You need to be aware that there is more logic in sending binlog data to a
> slave than just streaming raw binlog files. Maybe you can implement a subset
> and document any limitations.
> 
> The old-style replication (not using GTID) is the simplest, but it still does
> a bit extra, like sending some extra events (FORMAT_DESCRIPTION_EVENT and fake
> ROTATE_EVENT).
> 
> There is also the handling of old slaves. A slave sends a value in
> @mariadb_slave_capability, and the master will rewrite or remove any events
> that the slave does not understand, as appropriate.
> 
> And there is the @@skip_replication flag; when the slave sets this, the master
> removes events that were logged with @@skip_replication set (to reduce
> bandwidth needs).
> 
> The slave also does several SQL queries to obtain additional information. For
> example, the slave does a SELECT binlog_gtid_pos(file, offset) to obtain the
> GTID position associated with a given old-format replication position, so that
> it knows how to switch to GTID mode later. Maybe those queries could be
> handled by relaying them back to a real master server.
> 
> The GTID mode is significantly more complex, due to a GTID position having
> multiple streams, one per replication domain. The slave sends
> @slave_connect_state, @slave_gtid_strict_mode, @slave_gtid_ignore_duplicates,
> and optionally @slave_until_gtid values. The router would need to parse these
> values and act accordingly.
> 
> The binlog sender needs to keep track of each domain as the binlog files are
> scanned. Events in a domain must be skipped until the point specified in
> @slave_connect_state is reached. Then events are sent until the point in
> @slave_until_gtid is reached, at which point further events must again be
> skipped.
> 
> The master also sends extra GTID_LIST events, which contain the state of the
> binlog at the point of the event, which is needed by the slave to correctly
> handle START SLAVE UNTIL master_gtid_pos=XXX and MASTER_GTID_WAIT().
> 
> A lot of effort was put into error handling, to support all reasonable, and
> even many unreasonable uses of replication, while still giving an error in
> cases where things are obviously wrong (to prevent silently doing the wrong
> thing or hanging).
> 
> For example, if the slave tries to connect at a GTID that the master does not
> have, this is normally an error. This is important, because that GTID might be
> some transaction that was executed manually on the slave by mistake; if it
> only exists on the slave, then the master would endlessly skip events looking
> for the GTID that never shows up, and replication would silently
> hang.
> 
> However, there are special cases where the slave _is_ allowed to connect at a
> GTID that does not exist on the master. For example, one might have a slave
> with --log-slave-updates=0. Then if this slave is promoted as a new master,
> the binlogs will not have the current position, but it is still allowed to
> connect to it at the point where it became the new master (but not to a point
> earlier than that). Another special case is if the requested GTID (but no
> GTIDs following that) was purged from the binary logs, as can easily happen if
> a replication domain is unused for long time.
> 
> A _full_ implementation of a router for sending binlog to slaves would be some
> effort, but it might be reasonable to implement some subset. I suppose the
> first step would be for you to go through the code in mysql_binlog_send() and
> the functions it calls, and decide on which parts to implement. I will be
> happy to answer any questions that might pop up on the way, of course.
> 
>> 2. They wanted to maintain the same group commit groups as the original
>> master, so as to benefit from the parallel replication.
> 
> Note that this may no longer apply to MariaDB 10.0.9. The issue is that an
> intermediate slave that is itself a master may have less group commit than the
> original master, which reduces the possibility for a third-level slave to
> efficiently do parallel replication. But in 10.0.9, it should be possible to
> configure the intermediate slave with --binlog-commit-wait-* to get better
> group commit.
> 
>> 3. They think that can get better failover semantics because they will have
>> n (maybe 5 or 10) MaxScales and they will all have the same bin log, so
>> failover between them will be easier.
> 
> I am curious how it will be assured that they all have the same binlog in case
> of crash? But maybe you just mean that all will be a subset of the largest
> one.
> 
> Hope this helps,
> 
> - Kristian.
References

Re: MaxScale as a binlog server
From: Kristian Nielsen, 2014-03-18