← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 3879: MDEV-6593 : domain_id based replication filters in lp:~maria-captains/maria/maria-10.0-galera

 

Hi Kristian!

Thank you for the review remarks. I have committed a patch addressing your
comments.

http://lists.askmonty.org/pipermail/commits/2014-October/006775.html

See inline for my response on specific comments.

On Mon, Sep 15, 2014 at 9:25 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:

> Nirbhay Choubey <nirbhay@xxxxxxxxxx> writes:
>
> > ------------------------------------------------------------
> > revno: 3879
> > revision-id: nirbhay@xxxxxxxxxx-20140912181622-s33e7r0vtgpju95q
> > parent: nirbhay@xxxxxxxxxx-20140814224304-rtkeal7p9lk06li1
> > committer: Nirbhay Choubey <nirbhay@xxxxxxxxxx>
> > branch nick: maria-10.0-galera_6593
> > timestamp: Fri 2014-09-12 14:16:22 -0400
> > message:
> >   MDEV-6593 : domain_id based replication filters
> >
> >   Added support for IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS.
>
> Here is my review.
>
> Some general comments, followed by some detailed comments in-line in the
> patch:
>
> 1. Did you prepare any documentation? You could either add it to the
> knowledgebase directly, or write up a text file and ask Daniel to add it in
> relevant places.
>

[NC] I haven't but I will prepare it and pass it down to Daniel.


>
> 2. Did you run the full test suite? I would have expected at least some
> tests
> in the multi_source suite to need modification. The chance touches enough
> things that you should test it in a feature tree in Buildbot to ensure
> that it
> is green before pushing it to a main tree.
>

[NC] You are right. I did that now and updated the result files.


> 3. I think some extra test cases are needed:
>
>  - You should test (and document) what happens when the same domain_id is
> in
>    both the DO and the IGNORE list. Is it actually ever sensible to have
> both
>    DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS ? If not, I think it should be an
> error
>    to try.
>

[NC] You are right, setting both DO_ and IGNORE_ at the same time would not
make any sense.
But I actually followed the pattern of the existing replication, where both
_do_ and _ignore_
filters are allowed to hold values at the same time and _do_ is preferred
in case both are set.


>  - You should also add a test using non-GTID mode.
>

[NC] Done.


>
>  - You should also add a test where parallel replication is used
>    (--slave-parallel-threads > 0).
>

[NC] Done.


>
>  - Add a test for the following scenario when using GTID mode: Some domain
> id
>    is ignored, some events are received from the master and ignored. Then
> the
>    slave is stopped, and the ignore rule is removed. What happens when the
>    slave is restarted and reconnects? I think the previously ignored
> domain id
>    events will now be re-fetched and replicated. I guess that's ok, but it
>    should be documented. Also test what happens when binlog files on the
>    master have been rotated and purged, so that the events are no longer
>    available on the master (an error should occur).
>

[NC] As the filtering takes place in the IO thread. I have added some test
cases covering
various scenarios including ones where IO thread crashes while reading
COMMIT/XID event.
In this scenario if the filtering is ON the GTID is added to the ignored
list (ign_gtids) on GTID
event, so when the crash happens before COMMIT/XID event, that particular
GTID gets ignored
after the slave reconnects.


>  - Add a similar test for when not using GTID mode, the last event
> replicated
>    by the slave is ignored, the slave is stopped, the ignore rule removed
> and
>    the slave restarted; will the last event now be re-fetched or not (I
> think
>    not).
>

[NC] Done.


>
>  - Add a test that checks that seconds_behind_master is properly updated to
>    zero even when the last event is ignored due to
>    DO_DOMAIN_IDS/IGNORE_DOMAIN_IDS. It can perhaps be a bit tricky to test,
>    but I think you can use this existing test as a basis:
>    mysql-test/suite/rpl/t/rpl_parallel2.test
>

[NC] Done.


>
>  - It would also be good to check that the code works when used on an old
>    master (which does not send GTID events). You can do this by creating a
>    binlog file with a 5.5 server and storing it in mysql-test/std_data/,
> and
>    then installing it in a master; there should be several other test cases
>    that already do this.
>

[NC] Done.


>
> 4. Also see detailed comments for some possible problems with the
> implementation. The most serious is probably to ensure that events are not
> skipped after the end of the group, we need a couple of tests for this, see
> below. Also, there is a small memory leak, see below.
>
> > +DO_DOMAIN_IDS (before)     : 1
> > +IGNORE_DOMAIN_IDS (before) : 2
> > +CHANGE MASTER TO DO_DOMAIN_IDS=(4,4,5,1,7,7,7,1,1,2,6,8,1,4,5,5,9,3),
> MASTER_USE_GTID=current_pos;
>
> So if a domain_id is specified multiple times, the redundant ones will be
> ignored.
>

[NC] Right. Also, there was a bug in the current IGNORE_SERVER_IDS
implementation. Its fixed now.


> It would seem preferable to give an error... but as far as I can see the
> behaviour is the same for IGNORE_SERVER_IDS, so better to be consistent as
> you
> did in the code. Just be sure to add this to the documentation.
>

[NC] Ok.


>
>
> > === added file 'mysql-test/suite/rpl/t/rpl_domain_id_filter.test'
> > --- a/mysql-test/suite/rpl/t/rpl_domain_id_filter.test        1970-01-01
> 00:00:00 +0000
> > +++ b/mysql-test/suite/rpl/t/rpl_domain_id_filter.test        2014-09-12
> 18:16:22 +0000
>
> > +SET @gtid_domain_id_saved = @@session.gtid_domain_id;
>
> This is not needed, nor do you use it anywhere, you can remove it.
>

[NC] was a leftover, removed.


>
> > +##### Case 3: When both DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS are
> non-empty
> > +
> > +source include/stop_slave.inc;
> > +let $do_domain_ids_before= query_get_value(SHOW SLAVE STATUS,
> Replicate_Do_Domain_Ids, 1);
> > +let $ignore_domain_ids_before= query_get_value(SHOW SLAVE STATUS,
> Replicate_Ignore_Domain_Ids, 1);
> > +--echo DO_DOMAIN_IDS (before)     : $do_domain_ids_before
> > +--echo IGNORE_DOMAIN_IDS (before) : $ignore_domain_ids_before
> > +
> > +# Replicate events belonging to "domain_id 1". (DO_DOMAIN_IDS should
> win!)
> > +CHANGE MASTER TO DO_DOMAIN_IDS=(1), IGNORE_DOMAIN_IDS=(2),
> MASTER_USE_GTID=current_pos;
>
> Did you intend here to test overlapping ids in DO_DOMAIN_IDS and
> IGNORE_DOMAIN_IDS (due to the comment "DO_DOMAIN_IDS should win") ?
>

[NC] No, its about the DO_ and IGNORE_ both being non-empty. In that case
DO_ should be preferred.


> But if I understand correctly, IGNORE_DOMAIN_IDS will be ignored if
> DO_DOMAIN_IDS is non-empty.


[NC] Right.


> So I think it should be an error to try to set
> both - do you agree?
>

[NC] Following the current pattern, I think it should DO_ should be
preferred. But I can change it fail with error
if you would insist.


>
> > +--source include/rpl_end.inc
>
> Please add additional testcases, as suggested at the start of review.
>
>
> > === modified file 'sql/rpl_mi.cc'
> > --- a/sql/rpl_mi.cc   2014-02-19 10:05:15 +0000
> > +++ b/sql/rpl_mi.cc   2014-09-12 18:16:22 +0000
> > @@ -513,6 +513,25 @@
> >              else
> >                mi->using_gtid= Master_info::USE_GTID_NO;
> >            }
> > +
> > +          // DO_DOMAIN_IDS
> > +          if (0 == strncmp(buf, STRING_WITH_LEN("do_domain_ids=")) &&
> > +              mi->domain_id_filter.init_ids(buf +
> sizeof("do_domain_ids"),
> > +
> Domain_id_filter::DO_DOMAIN_IDS))
> > +          {
> > +            sql_print_error("Failed to initialize master info
> do_domain_ids");
> > +            goto errwithmsg;
> > +          }
> > +
> > +          // IGNORE_DOMAIN_IDS
> > +          if (0 == strncmp(buf, STRING_WITH_LEN("ignore_domain_ids="))
> &&
> > +              mi->domain_id_filter.init_ids(buf +
> sizeof("ignore_domain_ids"),
> > +
> Domain_id_filter::IGNORE_DOMAIN_IDS))
> > +          {
> > +            sql_print_error("Failed to initialize master info "
> > +                            "ignore_domain_ids");
> > +            goto errwithmsg;
> > +          }
>
> So here you use the same style as adjacent code (that I wrote). However, I
> have since learned that the style (0 == strncmp(...)) is not used in the
> server. So please change this, as well as my existing code, to use either
> strncmp(...) == 0 or !strncmp(), as you prefer.
>

[NC] Done.


>
> Another thing from my old code is the sizeof("do_domain_ids"), which highly
> confused me for a bit before I realised that while the '=' is not included
> in
> the sizeof(), the terminating '\0' _is_ included. So since even I myself is
> confused by this code, better clarify it ;-) Perhaps it would be clearer if
> changed to (sizeof("do_domain_ids=")-1) (both in my old code and in your
> new
> code).
>
> (Normally it's good to follow the style of existing code as you did, I just
> noticed this in the review and realised that my own "existing code" would
> benefit from these small improvements).
>

[NC] I did a bit of refactoring. Let me know if it looks ok.


>
> > @@ -634,6 +653,16 @@
> >      }
> >    }
> >
> > +  char* do_domain_ids_buf=
> > +    mi->domain_id_filter.as_string(Domain_id_filter::DO_DOMAIN_IDS);
> > +  if (do_domain_ids_buf == NULL)
> > +    DBUG_RETURN(1);
> > +
> > +  char* ignore_domain_ids_buf=
> > +    mi->domain_id_filter.as_string(Domain_id_filter::IGNORE_DOMAIN_IDS);
> > +  if (ignore_domain_ids_buf == NULL)
> > +    DBUG_RETURN(1);
>
> You have a memory leak here, do_domain_ids_buf will not be freed if
> ignore_domain_ids_buf fails due to out-of-memory.
>

[NC] Fixed.


>
> > @@ -1348,4 +1382,218 @@
> >    DBUG_RETURN(result);
> >  }
> >
> > +/* ctor */
> > +Domain_id_filter::Domain_id_filter() : m_filter(false)
> > +{
> > +  for (int i= DO_DOMAIN_IDS; i <= IGNORE_DOMAIN_IDS; i ++)
> > +  {
> > +    my_init_dynamic_array(&m_domain_ids[i], sizeof(uint32), 16, 16,
> MYF(0));
> > +  }
> > +}
> > +
> > +/* dtor */
> > +Domain_id_filter::~Domain_id_filter()
>
> I don't think comments like /* ctor */ and /* dtor */ are useful, better
> just
> omit them.
>

[NC] Done.


>
>
> > +/**
> > +  Update the domain id list of the given type with elements from the
> new list.
> > +
> > +  @param new_ids [IN]               new list of ids
> > +  @param type    [IN]               domain id list type to update
> > +  @param changed [IN]               has the domain id list changed in
> the
> > +                                    last CHANGE MASTER command?
> > +
> > +  @retval                           void
> > +*/
> > +void Domain_id_filter::update_ids(DYNAMIC_ARRAY *new_ids,
> enum_list_type type,
> > +                                  bool changed)
> > +{
> > +  DYNAMIC_ARRAY *ids= &m_domain_ids[type];
> > +
> > +  if (changed == true)
> > +    reset_dynamic(ids);
> > +
> > +  /* bsearch requires an ordered list. */
> > +  sort_dynamic(new_ids, change_master_domain_id_cmp);
> > +
> > +  for (uint i= 0; i < new_ids->elements; i++)
> > +  {
> > +    uint32 domain_id;
> > +    get_dynamic(new_ids, (void *) &domain_id, i);
> > +
> > +    if (bsearch((const uint32 *) &domain_id, ids->buffer, ids->elements,
> > +                sizeof(uint32), change_master_domain_id_cmp) == NULL)
> > +    {
> > +      insert_dynamic(ids, (uint32*) &domain_id);
> > +    }
> > +  }
> > +  return;
> > +}
>
> Ok, so I understand that this code is modelled after the existing code for
> IGNORED_SERVER_IDS. But I found it rather confusing, because it looks like
> there is a bug: if changed==false, then the code does not ensure that the
> resulting array will still be sorted.
>
> I guess the bug is not possible to trigger, because the list of stuff to
> add
> will only be non-empty if changed==true? But that is rather confusing.
> Please
> simplify the logic; maybe you only need to call this function when
> changed==true, so the parameter can be omitted and the array always reset?
>

[NC] Right, in case of domain_ids the list should only be reset/updated when
it has changed. I have remove the extra parameter.


> Also, consider if you can rewrite the existing code for IGNORE_SERVER_IDS
> so
> that the same common code can be used as for IGNORE_DOMAIN_IDS and
> DO_DOMAIN_IDS, that would simplify the code and reduce risk of bugs or
> inconsistencies.
>

[NC] Done.


>
>
> > +/**
> > +  Initialize the given domain id list with the numbers from the
> specified
> > +  buffer.
> > +
> > +  @param buf  [IN]                  buffer
> > +  @param type [IN]                  domain id list type
> > +
> > +  @retval false                     Success
> > +          true                      Error
> > +*/
> > +bool Domain_id_filter::init_ids(char *buf, enum_list_type type)
> > +{
> > +  char *token, *last;
> > +  uint num_items;
> > +  DYNAMIC_ARRAY *ids= &m_domain_ids[type];
> > +
> > +  token= strtok_r(buf, " ", &last);
> > +  if (token == NULL)
> > +  {
> > +    return true;                                /* error */
> > +  }
> > +
> > +  num_items= atoi(token);
> > +  for (uint i= 0; i < num_items; i++)
> > +  {
> > +    token= strtok_r(NULL, " ", &last);
> > +    if (token == NULL)
> > +    {
> > +      return true;                              /* error */
> > +    }
> > +    else
> > +    {
> > +      uint32 val= atoi(token);
> > +      insert_dynamic(ids, (uchar *) &val);
> > +    }
> > +  }
> > +  return false;
> > +}
>
> Again, did you consider using the same code for this and for
> IGNORE_SERVER_IDS? (I think you did, because you added a comment to
> init_dynarray_intvar_from_file ;-). Maybe consider it again; it is fine to
> rewrite/remove the existing code for IGNORE_SERVER_IDS and replace it with
> your new code. Again, it would reduce risk for bugs or inconsistencies.
>
>
[NC] True, done.


>
> > +  /* getter */
> > +  bool get_filter() { return m_filter; }
> > +
> > +  /* setter */
> > +  void set_filter(uint32 domain_id);
>
> These are not really getter and setters, I think, at least I find those
> names
> confusing.
>
> Maybe something like do_filter(uint32 domain_id) and is_group_filtered() ?
> And expand the comments, so that they explain that these functions
> respectively check a GTID for needing to be filtered, and return whether
> the
> current group is being filtered.
>

[NC] Done.


>
> Also, personally I dislike setter/getter, and would prefer just to access
> the
> m_filter as a public member. But I suppose that's style and up to you.
>

[NC] I have kept it private for now to avoid accidental modification.


>
> > +  /*
> > +    Initialize the given domain id list with the numbers from the
> specified
> > +    buffer.
> > +
> > +    @param buf  [IN]                  buffer
> > +    @param type [IN]                  domain id list type
> > +
> > +    @retval false                     Success
> > +            true                      Error
> > +  */
> > +  bool init_ids(char *buf, enum_list_type type);
>
> Clarify in the comment that the list is initialized from a string of
> space-separated numbers, the first of which is the number of following
> entries.
>

[NC] Done.


>
>
> > @@ -182,6 +279,7 @@
> >                        bool flush_relay_log_cache,
> >                        bool need_lock_relay_log);
> >  int change_master_server_id_cmp(ulong *id1, ulong *id2);
> > +int change_master_domain_id_cmp(uint32 *id1, uint32 *id2);
>
> I think change_master_domain_id_cmp() is only used in rpl_mi.cc, so should
> be
> static there, and not refered here in rpl_mi.h
>

[NC] Done. Also renamed it to change_master_ids_cmp to be used by
IGNORE_SERVER_IDS impl as well.


>
>
> > === modified file 'sql/slave.cc'
> > --- a/sql/slave.cc    2014-08-12 03:55:41 +0000
> > +++ b/sql/slave.cc    2014-09-12 18:16:22 +0000
> > @@ -1259,6 +1259,7 @@
> >     going to ignore (to not log them in the relay log).
> >     Items being read are supposed to be decimal output of values of a
> >     type shorter or equal of @c long and separated by the single space.
> > +   It also used to restore DO_DOMAIN_IDS & IGNORE_DOMAIN_IDS lists.
>
> This comment suggests you wanted to share code, then later changed your
> mind. If you end up keeping the code separate, you still need to delete
> this
> comment. (And there is a typo: "It also" -> "It is also", but if you
> remove it
> anyway...).
>
>
[NC] I have modified the patch to use this function now.


>
> > @@ -5601,6 +5610,10 @@
> >      mi->last_queued_gtid= event_gtid;
> >      mi->last_queued_gtid_standalone=
> >        (gtid_flag & Gtid_log_event::FL_STANDALONE) != 0;
> > +
> > +    /* Should filter all the subsequent events in the current GTID
> group? */
> > +    mi->domain_id_filter.set_filter(event_gtid.domain_id);
>
> So, here we will set m_filter to true, if we want to skip the following
> group.
>
> But I don't see any code that will set it to false at the end of the
> group?


[NC] With the previous logic m_filter was set/reset only at GTID event. I
have now added the logic
to reset this flag on COMMIT/XID events too.


> That seems a problem, as we could then end up wrongly skipping all kinds of
> other events, such as Gtid_list, Incident, ...
>

[NC] Right. I have added the following events types in the exception list :
FORMAT_DESCRIPTION_EVENT, ROTATE_EVENT, GTID_LIST_EVENT,
HEARTBEAT_LOG_EVENT and INCIDENT_EVENT.


> Also note that it is possible for a GTID group to not be terminated by
> COMMIT/XID. This can happen at least if the master crashes in the middle of
> writing a group; the slave will receive only the first part of it,
> followed by
> a master restart Format_description event. That case needs to be handled,
> and
> you should add a test case for it.
>

[NC] Added a test to cover this scenario.


>
> Also it is possible for eg. a master downgrade to introduce following event
> groups that do not start with GTID - seems these would also be wrongly
> skipped? So we also need a test case for this...
>

[NC] Right, its a valid case, but I couldn't find a way to repeat this
scenario in mtr. Suggestions?


>
> Hope this helps,
>

Thanks!

-- Nirbhay


>
>  - Kristian.
>

Follow ups

References