maria-developers team mailing list archive

Thread
Date

Re: Documentation about GTID

To: Pavel Ivanov <pivanof@xxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Fri, 03 May 2013 16:45:25 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAAG=WUsmzYE-WWDSmKWj64=GPamen_a267P82ovm2xwf807RYw@mail.gmail.com> (Pavel Ivanov's message of "Fri, 3 May 2013 07:22:29 -0700")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> Well, I'd argue that accumulating lots of cruft does matter. Let's say
> it accumulated 10,000 different GTIDs. It will significantly slow down
> the slave connection initialization and it will blow up binlog size.
> The first effect could be especially dangerous because if e.g. 3
> slaves are connected to master simultaneously master could fully
> consume 3 CPUs for a prolonged period of time which may affect ability
> to respond to client queries.
>
> Overall I guess I don't quite like this design decision. It basically
> means that in the proper configuration special care should be taken to
> make sure that server_id numbers don't get retired forever but get
> reused instead...

I do not think it will be a problem. Even with 10000 master failovers (that's
10 times per day, every day for 3 years!), it's just 150kB in each binlog
file, plus a single iteration over it during slave connect.

But if it turns out that I am wrong and it is in fact a problem, we can easily
add later something that prunes this information. A given (domain_id,
server_id, seq_no) GTID is only needed as long as there is any slave that
might request to start replicating from this position. After that, it can be
safely omitted in subsequent Gtid_list_log_events.

For example, each time we rotate the binlog, we can check for any GTID in
Gtid_list_log_event that was the same in the Gtid_list_log_event of the first
unpurged binlog file, and omit any such in the next Gtid_list_log_event
output. But there is no reason to spend time on that until we know it is a
problem.

Maybe the deeper issue is that you would prefer a design where sequence number
is assumed unique by itself (or (domain_id, sequence_number) if using
multi-source). And the code is allowed to silently break if this assumption is
violated by user?

This makes a lot of things simpler, of course. I did think a lot about this,
and in the end I decided against it, because of numerous cases where this
could make things harder for users that are perhaps not intimately aware with
how GTID works. I am still hopeful that the current design can give the best
of both worlds: something that "just works" in most cases for most users, and
still provides what is needed for advanced users that can be expected to know
what they are doing.

 - Kristian.

Follow ups

Re: Documentation about GTID
From: Pavel Ivanov, 2013-05-03

References

Documentation about GTID
From: Giuseppe Maxia, 2013-05-01
Re: Documentation about GTID
From: Kristian Nielsen, 2013-05-01
Re: Documentation about GTID
From: Giuseppe Maxia, 2013-05-01
Re: Documentation about GTID
From: Kristian Nielsen, 2013-05-02
Re: Documentation about GTID
From: Pavel Ivanov, 2013-05-02
Re: Documentation about GTID
From: Kristian Nielsen, 2013-05-03
Re: Documentation about GTID
From: Pavel Ivanov, 2013-05-03