maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07802
Re: [Commits] Rev 3879: MDEV-6593 : domain_id based replication filters in lp:~maria-captains/maria/maria-10.0-galera
Hi Kristian!
Thank you for the review remarks. I have committed a patch addressing your
comments.
http://lists.askmonty.org/pipermail/commits/2014-October/006775.html
See inline for my response on specific comments.
On Mon, Sep 15, 2014 at 9:25 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:
> Nirbhay Choubey <nirbhay@xxxxxxxxxx> writes:
>
> > ------------------------------------------------------------
> > revno: 3879
> > revision-id: nirbhay@xxxxxxxxxx-20140912181622-s33e7r0vtgpju95q
> > parent: nirbhay@xxxxxxxxxx-20140814224304-rtkeal7p9lk06li1
> > committer: Nirbhay Choubey <nirbhay@xxxxxxxxxx>
> > branch nick: maria-10.0-galera_6593
> > timestamp: Fri 2014-09-12 14:16:22 -0400
> > message:
> > MDEV-6593 : domain_id based replication filters
> >
> > Added support for IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS.
>
> Here is my review.
>
> Some general comments, followed by some detailed comments in-line in the
> patch:
>
> 1. Did you prepare any documentation? You could either add it to the
> knowledgebase directly, or write up a text file and ask Daniel to add it in
> relevant places.
>
[NC] I haven't but I will prepare it and pass it down to Daniel.
>
> 2. Did you run the full test suite? I would have expected at least some
> tests
> in the multi_source suite to need modification. The chance touches enough
> things that you should test it in a feature tree in Buildbot to ensure
> that it
> is green before pushing it to a main tree.
>
[NC] You are right. I did that now and updated the result files.
> 3. I think some extra test cases are needed:
>
> - You should test (and document) what happens when the same domain_id is
> in
> both the DO and the IGNORE list. Is it actually ever sensible to have
> both
> DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS ? If not, I think it should be an
> error
> to try.
>
[NC] You are right, setting both DO_ and IGNORE_ at the same time would not
make any sense.
But I actually followed the pattern of the existing replication, where both
_do_ and _ignore_
filters are allowed to hold values at the same time and _do_ is preferred
in case both are set.
> - You should also add a test using non-GTID mode.
>
[NC] Done.
>
> - You should also add a test where parallel replication is used
> (--slave-parallel-threads > 0).
>
[NC] Done.
>
> - Add a test for the following scenario when using GTID mode: Some domain
> id
> is ignored, some events are received from the master and ignored. Then
> the
> slave is stopped, and the ignore rule is removed. What happens when the
> slave is restarted and reconnects? I think the previously ignored
> domain id
> events will now be re-fetched and replicated. I guess that's ok, but it
> should be documented. Also test what happens when binlog files on the
> master have been rotated and purged, so that the events are no longer
> available on the master (an error should occur).
>
[NC] As the filtering takes place in the IO thread. I have added some test
cases covering
various scenarios including ones where IO thread crashes while reading
COMMIT/XID event.
In this scenario if the filtering is ON the GTID is added to the ignored
list (ign_gtids) on GTID
event, so when the crash happens before COMMIT/XID event, that particular
GTID gets ignored
after the slave reconnects.
> - Add a similar test for when not using GTID mode, the last event
> replicated
> by the slave is ignored, the slave is stopped, the ignore rule removed
> and
> the slave restarted; will the last event now be re-fetched or not (I
> think
> not).
>
[NC] Done.
>
> - Add a test that checks that seconds_behind_master is properly updated to
> zero even when the last event is ignored due to
> DO_DOMAIN_IDS/IGNORE_DOMAIN_IDS. It can perhaps be a bit tricky to test,
> but I think you can use this existing test as a basis:
> mysql-test/suite/rpl/t/rpl_parallel2.test
>
[NC] Done.
>
> - It would also be good to check that the code works when used on an old
> master (which does not send GTID events). You can do this by creating a
> binlog file with a 5.5 server and storing it in mysql-test/std_data/,
> and
> then installing it in a master; there should be several other test cases
> that already do this.
>
[NC] Done.
>
> 4. Also see detailed comments for some possible problems with the
> implementation. The most serious is probably to ensure that events are not
> skipped after the end of the group, we need a couple of tests for this, see
> below. Also, there is a small memory leak, see below.
>
> > +DO_DOMAIN_IDS (before) : 1
> > +IGNORE_DOMAIN_IDS (before) : 2
> > +CHANGE MASTER TO DO_DOMAIN_IDS=(4,4,5,1,7,7,7,1,1,2,6,8,1,4,5,5,9,3),
> MASTER_USE_GTID=current_pos;
>
> So if a domain_id is specified multiple times, the redundant ones will be
> ignored.
>
[NC] Right. Also, there was a bug in the current IGNORE_SERVER_IDS
implementation. Its fixed now.
> It would seem preferable to give an error... but as far as I can see the
> behaviour is the same for IGNORE_SERVER_IDS, so better to be consistent as
> you
> did in the code. Just be sure to add this to the documentation.
>
[NC] Ok.
>
>
> > === added file 'mysql-test/suite/rpl/t/rpl_domain_id_filter.test'
> > --- a/mysql-test/suite/rpl/t/rpl_domain_id_filter.test 1970-01-01
> 00:00:00 +0000
> > +++ b/mysql-test/suite/rpl/t/rpl_domain_id_filter.test 2014-09-12
> 18:16:22 +0000
>
> > +SET @gtid_domain_id_saved = @@session.gtid_domain_id;
>
> This is not needed, nor do you use it anywhere, you can remove it.
>
[NC] was a leftover, removed.
>
> > +##### Case 3: When both DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS are
> non-empty
> > +
> > +source include/stop_slave.inc;
> > +let $do_domain_ids_before= query_get_value(SHOW SLAVE STATUS,
> Replicate_Do_Domain_Ids, 1);
> > +let $ignore_domain_ids_before= query_get_value(SHOW SLAVE STATUS,
> Replicate_Ignore_Domain_Ids, 1);
> > +--echo DO_DOMAIN_IDS (before) : $do_domain_ids_before
> > +--echo IGNORE_DOMAIN_IDS (before) : $ignore_domain_ids_before
> > +
> > +# Replicate events belonging to "domain_id 1". (DO_DOMAIN_IDS should
> win!)
> > +CHANGE MASTER TO DO_DOMAIN_IDS=(1), IGNORE_DOMAIN_IDS=(2),
> MASTER_USE_GTID=current_pos;
>
> Did you intend here to test overlapping ids in DO_DOMAIN_IDS and
> IGNORE_DOMAIN_IDS (due to the comment "DO_DOMAIN_IDS should win") ?
>
[NC] No, its about the DO_ and IGNORE_ both being non-empty. In that case
DO_ should be preferred.
> But if I understand correctly, IGNORE_DOMAIN_IDS will be ignored if
> DO_DOMAIN_IDS is non-empty.
[NC] Right.
> So I think it should be an error to try to set
> both - do you agree?
>
[NC] Following the current pattern, I think it should DO_ should be
preferred. But I can change it fail with error
if you would insist.
>
> > +--source include/rpl_end.inc
>
> Please add additional testcases, as suggested at the start of review.
>
>
> > === modified file 'sql/rpl_mi.cc'
> > --- a/sql/rpl_mi.cc 2014-02-19 10:05:15 +0000
> > +++ b/sql/rpl_mi.cc 2014-09-12 18:16:22 +0000
> > @@ -513,6 +513,25 @@
> > else
> > mi->using_gtid= Master_info::USE_GTID_NO;
> > }
> > +
> > + // DO_DOMAIN_IDS
> > + if (0 == strncmp(buf, STRING_WITH_LEN("do_domain_ids=")) &&
> > + mi->domain_id_filter.init_ids(buf +
> sizeof("do_domain_ids"),
> > +
> Domain_id_filter::DO_DOMAIN_IDS))
> > + {
> > + sql_print_error("Failed to initialize master info
> do_domain_ids");
> > + goto errwithmsg;
> > + }
> > +
> > + // IGNORE_DOMAIN_IDS
> > + if (0 == strncmp(buf, STRING_WITH_LEN("ignore_domain_ids="))
> &&
> > + mi->domain_id_filter.init_ids(buf +
> sizeof("ignore_domain_ids"),
> > +
> Domain_id_filter::IGNORE_DOMAIN_IDS))
> > + {
> > + sql_print_error("Failed to initialize master info "
> > + "ignore_domain_ids");
> > + goto errwithmsg;
> > + }
>
> So here you use the same style as adjacent code (that I wrote). However, I
> have since learned that the style (0 == strncmp(...)) is not used in the
> server. So please change this, as well as my existing code, to use either
> strncmp(...) == 0 or !strncmp(), as you prefer.
>
[NC] Done.
>
> Another thing from my old code is the sizeof("do_domain_ids"), which highly
> confused me for a bit before I realised that while the '=' is not included
> in
> the sizeof(), the terminating '\0' _is_ included. So since even I myself is
> confused by this code, better clarify it ;-) Perhaps it would be clearer if
> changed to (sizeof("do_domain_ids=")-1) (both in my old code and in your
> new
> code).
>
> (Normally it's good to follow the style of existing code as you did, I just
> noticed this in the review and realised that my own "existing code" would
> benefit from these small improvements).
>
[NC] I did a bit of refactoring. Let me know if it looks ok.
>
> > @@ -634,6 +653,16 @@
> > }
> > }
> >
> > + char* do_domain_ids_buf=
> > + mi->domain_id_filter.as_string(Domain_id_filter::DO_DOMAIN_IDS);
> > + if (do_domain_ids_buf == NULL)
> > + DBUG_RETURN(1);
> > +
> > + char* ignore_domain_ids_buf=
> > + mi->domain_id_filter.as_string(Domain_id_filter::IGNORE_DOMAIN_IDS);
> > + if (ignore_domain_ids_buf == NULL)
> > + DBUG_RETURN(1);
>
> You have a memory leak here, do_domain_ids_buf will not be freed if
> ignore_domain_ids_buf fails due to out-of-memory.
>
[NC] Fixed.
>
> > @@ -1348,4 +1382,218 @@
> > DBUG_RETURN(result);
> > }
> >
> > +/* ctor */
> > +Domain_id_filter::Domain_id_filter() : m_filter(false)
> > +{
> > + for (int i= DO_DOMAIN_IDS; i <= IGNORE_DOMAIN_IDS; i ++)
> > + {
> > + my_init_dynamic_array(&m_domain_ids[i], sizeof(uint32), 16, 16,
> MYF(0));
> > + }
> > +}
> > +
> > +/* dtor */
> > +Domain_id_filter::~Domain_id_filter()
>
> I don't think comments like /* ctor */ and /* dtor */ are useful, better
> just
> omit them.
>
[NC] Done.
>
>
> > +/**
> > + Update the domain id list of the given type with elements from the
> new list.
> > +
> > + @param new_ids [IN] new list of ids
> > + @param type [IN] domain id list type to update
> > + @param changed [IN] has the domain id list changed in
> the
> > + last CHANGE MASTER command?
> > +
> > + @retval void
> > +*/
> > +void Domain_id_filter::update_ids(DYNAMIC_ARRAY *new_ids,
> enum_list_type type,
> > + bool changed)
> > +{
> > + DYNAMIC_ARRAY *ids= &m_domain_ids[type];
> > +
> > + if (changed == true)
> > + reset_dynamic(ids);
> > +
> > + /* bsearch requires an ordered list. */
> > + sort_dynamic(new_ids, change_master_domain_id_cmp);
> > +
> > + for (uint i= 0; i < new_ids->elements; i++)
> > + {
> > + uint32 domain_id;
> > + get_dynamic(new_ids, (void *) &domain_id, i);
> > +
> > + if (bsearch((const uint32 *) &domain_id, ids->buffer, ids->elements,
> > + sizeof(uint32), change_master_domain_id_cmp) == NULL)
> > + {
> > + insert_dynamic(ids, (uint32*) &domain_id);
> > + }
> > + }
> > + return;
> > +}
>
> Ok, so I understand that this code is modelled after the existing code for
> IGNORED_SERVER_IDS. But I found it rather confusing, because it looks like
> there is a bug: if changed==false, then the code does not ensure that the
> resulting array will still be sorted.
>
> I guess the bug is not possible to trigger, because the list of stuff to
> add
> will only be non-empty if changed==true? But that is rather confusing.
> Please
> simplify the logic; maybe you only need to call this function when
> changed==true, so the parameter can be omitted and the array always reset?
>
[NC] Right, in case of domain_ids the list should only be reset/updated when
it has changed. I have remove the extra parameter.
> Also, consider if you can rewrite the existing code for IGNORE_SERVER_IDS
> so
> that the same common code can be used as for IGNORE_DOMAIN_IDS and
> DO_DOMAIN_IDS, that would simplify the code and reduce risk of bugs or
> inconsistencies.
>
[NC] Done.
>
>
> > +/**
> > + Initialize the given domain id list with the numbers from the
> specified
> > + buffer.
> > +
> > + @param buf [IN] buffer
> > + @param type [IN] domain id list type
> > +
> > + @retval false Success
> > + true Error
> > +*/
> > +bool Domain_id_filter::init_ids(char *buf, enum_list_type type)
> > +{
> > + char *token, *last;
> > + uint num_items;
> > + DYNAMIC_ARRAY *ids= &m_domain_ids[type];
> > +
> > + token= strtok_r(buf, " ", &last);
> > + if (token == NULL)
> > + {
> > + return true; /* error */
> > + }
> > +
> > + num_items= atoi(token);
> > + for (uint i= 0; i < num_items; i++)
> > + {
> > + token= strtok_r(NULL, " ", &last);
> > + if (token == NULL)
> > + {
> > + return true; /* error */
> > + }
> > + else
> > + {
> > + uint32 val= atoi(token);
> > + insert_dynamic(ids, (uchar *) &val);
> > + }
> > + }
> > + return false;
> > +}
>
> Again, did you consider using the same code for this and for
> IGNORE_SERVER_IDS? (I think you did, because you added a comment to
> init_dynarray_intvar_from_file ;-). Maybe consider it again; it is fine to
> rewrite/remove the existing code for IGNORE_SERVER_IDS and replace it with
> your new code. Again, it would reduce risk for bugs or inconsistencies.
>
>
[NC] True, done.
>
> > + /* getter */
> > + bool get_filter() { return m_filter; }
> > +
> > + /* setter */
> > + void set_filter(uint32 domain_id);
>
> These are not really getter and setters, I think, at least I find those
> names
> confusing.
>
> Maybe something like do_filter(uint32 domain_id) and is_group_filtered() ?
> And expand the comments, so that they explain that these functions
> respectively check a GTID for needing to be filtered, and return whether
> the
> current group is being filtered.
>
[NC] Done.
>
> Also, personally I dislike setter/getter, and would prefer just to access
> the
> m_filter as a public member. But I suppose that's style and up to you.
>
[NC] I have kept it private for now to avoid accidental modification.
>
> > + /*
> > + Initialize the given domain id list with the numbers from the
> specified
> > + buffer.
> > +
> > + @param buf [IN] buffer
> > + @param type [IN] domain id list type
> > +
> > + @retval false Success
> > + true Error
> > + */
> > + bool init_ids(char *buf, enum_list_type type);
>
> Clarify in the comment that the list is initialized from a string of
> space-separated numbers, the first of which is the number of following
> entries.
>
[NC] Done.
>
>
> > @@ -182,6 +279,7 @@
> > bool flush_relay_log_cache,
> > bool need_lock_relay_log);
> > int change_master_server_id_cmp(ulong *id1, ulong *id2);
> > +int change_master_domain_id_cmp(uint32 *id1, uint32 *id2);
>
> I think change_master_domain_id_cmp() is only used in rpl_mi.cc, so should
> be
> static there, and not refered here in rpl_mi.h
>
[NC] Done. Also renamed it to change_master_ids_cmp to be used by
IGNORE_SERVER_IDS impl as well.
>
>
> > === modified file 'sql/slave.cc'
> > --- a/sql/slave.cc 2014-08-12 03:55:41 +0000
> > +++ b/sql/slave.cc 2014-09-12 18:16:22 +0000
> > @@ -1259,6 +1259,7 @@
> > going to ignore (to not log them in the relay log).
> > Items being read are supposed to be decimal output of values of a
> > type shorter or equal of @c long and separated by the single space.
> > + It also used to restore DO_DOMAIN_IDS & IGNORE_DOMAIN_IDS lists.
>
> This comment suggests you wanted to share code, then later changed your
> mind. If you end up keeping the code separate, you still need to delete
> this
> comment. (And there is a typo: "It also" -> "It is also", but if you
> remove it
> anyway...).
>
>
[NC] I have modified the patch to use this function now.
>
> > @@ -5601,6 +5610,10 @@
> > mi->last_queued_gtid= event_gtid;
> > mi->last_queued_gtid_standalone=
> > (gtid_flag & Gtid_log_event::FL_STANDALONE) != 0;
> > +
> > + /* Should filter all the subsequent events in the current GTID
> group? */
> > + mi->domain_id_filter.set_filter(event_gtid.domain_id);
>
> So, here we will set m_filter to true, if we want to skip the following
> group.
>
> But I don't see any code that will set it to false at the end of the
> group?
[NC] With the previous logic m_filter was set/reset only at GTID event. I
have now added the logic
to reset this flag on COMMIT/XID events too.
> That seems a problem, as we could then end up wrongly skipping all kinds of
> other events, such as Gtid_list, Incident, ...
>
[NC] Right. I have added the following events types in the exception list :
FORMAT_DESCRIPTION_EVENT, ROTATE_EVENT, GTID_LIST_EVENT,
HEARTBEAT_LOG_EVENT and INCIDENT_EVENT.
> Also note that it is possible for a GTID group to not be terminated by
> COMMIT/XID. This can happen at least if the master crashes in the middle of
> writing a group; the slave will receive only the first part of it,
> followed by
> a master restart Format_description event. That case needs to be handled,
> and
> you should add a test case for it.
>
[NC] Added a test to cover this scenario.
>
> Also it is possible for eg. a master downgrade to introduce following event
> groups that do not start with GTID - seems these would also be wrongly
> skipped? So we also need a test case for this...
>
[NC] Right, its a valid case, but I couldn't find a way to repeat this
scenario in mtr. Suggestions?
>
> Hope this helps,
>
Thanks!
-- Nirbhay
>
> - Kristian.
>
Follow ups
References