maria-developers team mailing list archive

Thread
Date
Re: Updated (by Guest): Add a mysqlbinlog option to change the used database (36)

To: Alexi1952 <Alexi1952@xxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Mon, 21 Sep 2009 13:07:26 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20631253296666@webmail45.yandex.ru> (Alexi's message of "Fri\, 18 Sep 2009 21\:57\:46 +0400")
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)
Alexi1952 <Alexi1952@xxxxxxxxx> writes:

> PS. I don't know company rules, so being currently a "pre-member" of Maria (ha! I even don't know how the company is called)
> I didn't send this reply to "maria-developers@xxxxxxxxxxxxxxxxxxx". If that's not right, I will do it.

You are welcome to use maria-developers@xxxxxxxxxxxxxxxxxxx for anything
related to development of MariaDB. We often have people also outside of Monty
Program that provide insightful comments on patches or discussions that catch
their interest. I've Cc:ed the list for now.

(Are you a member of maria-developers@ ? If not, you should be, apply on
https://launchpad.net/~maria-developers (or just let me know your Launchpad
login) and I will approve you.)

>
> 18.09.09, 17:29, "Kristian Nielsen" <knielsen@xxxxxxxxxxxxxxx>:
>
>> Hi Alexi,
>> Thanks for writing up the low-level design. I read it through, and have a
>> couple of comments:
>> > 1.3. In mysqlbinlog.cc:
>> >
>> > - Add { "rewrite-db", OPT_REWRITE_DB, ...} record to my_long_options:
>> > - Add Rpl_filter object to mysqlbinlog.cc
>> >
>> >       Rpl_filter* binlog_filter;
>> Sharing code with similar replication options inside mysqld is a noble
>> goal. However, in this case I think it is a case of "the cure is worse than
>> the disease".
>> The Rpl_filter class has _so_ much mysqld server internals that we do not want
>> to mix into a client application. That is also why you need to do all these
>> modifications in sql_list, rpl_filter, etc.
>> So I think it is wrong to use the Rpl_filter class in mysqlbinlog.
>> To share code between the two, I think the better method is to move out the
>> needed functionality (add_db_rewrite() and get_rewrite_db()) in a separate
>> class, and have both the Rpl_filter class and mysqlbinlog use that.
>> Alternatively, if the shared functionality is really small (as it appears it
>> might be), just duplicating the functionality may be better.
>
> *******************************************************************************
> Funny: in my first version I wrote my own simple list-class with add() and get()
> functions (what is really needed here) and was "scarified" by SPetrunia for why
> didn't I use Rpl_filter. :) His idea was that mysqlbinlog options should be

Right, sharing the code is best, hence the idea to extract common
functionality in a separate class.

In particular, I do not like this method of Rpl_filter:

  bool tables_ok(const char* db, TABLE_LIST* tables);

TABLE_LIST is deep deep into server internals, that is why I didn't like
pulling Rpl_filter as it is now into mysqlbinlog.

But actually Rpl_filter::tables_ok() seems to be the only problem of this kind
in Rpl_filter. So probably we just need to move this single method out into a
separate class (or existing class or static function, didn't check which would
be most appropriate). That method feels misplaced in that class to me.

So an Rpl_filter class without tables_ok() I see no problem with including in
mysqlbinlog.  That would seem to me much cleaner, and should be simple, what
do you think?

> processed in the same manner as for replication.
>
> I had two reasons for using the very Rpl_filter:
>
> 1. It already contains add_db_rewrite() and get_rewrite_db() functions which
>    are exactly what is needed.
>    
> 2. I had in my mind WL40 ("Add a mysqlbinlog option to filter updates to
>    certain tables") for which also I saw needed function in Rpl_filter.

Yes, I agree that these are good reasons.

> But frankly speaking, I looked through Rpl_filter code not-deeply - just to
> be sure that two mentioned function do what exactly I need and to get an
> impression that other functions looks like appropriate for options mentioned
> in WL40. I need to examine this more closely to take a final decision and/or
> to continue discussing with you on this point. Nevertheless, just few notes:
>
> Note 1. In any case, I like the idea of a "separate class".
>         (But see the "objection" in Note 2 which may be applied to rpl_filter
>         as well).
>
> Note 2. Please note that, essentially, modifications touches only sql_list - 
>         not Rpl_filter. As I noticed there several "generally used" classes
>         (lists is just one example) which are bound to the server context only
>         because of using the sql_alloc() function in new-operator(s). This
>         function returns MEM_ROOT pointer attached to the current thread and
>         because of that is "server-dependent". But why not - with the help of
>         just two-three #ifdef's - to make this classes server-independent?
>         Why not to allow sql_list to be used outside server context especially
>         in view of that sql_list essentially (i.e. functionally) is not server
>         dependent?
>         
>         Surely, I can foresee at least one reasonable objection: because these
>         classes strictly belong to the server "internals" and are not supposed
>         to be used outside. That's OK. But they can be used outside INDIRECTLY.
>         Thus starting to work on "embedded parser" (currently, my work is only
>         in embryo :) I came acrross several places where sql_alloc() is used in
>         the same "not-essentially-server-dependent" manner. So in my opinion,
>         making lists and similar classes applicable for both MYSQL_SERVER and
>         MYSQL_CLIENT contexts is quite reasonble in general.

Yes. This is the other problem to be solved with getting Rpl_filter into
mysqlbinlog (apart from the tables_ok() problem discussed above).

And I agree that it _does_ make sense to have String, IList, HASH,
DYNAMIC_ARRAY, etc. available in client code (and these seem to use sql_alloc,
right?).

So it's 'just' a matter of doing this in a good way. Maybe that is better seen
in a full patch, and maybe we should ask someone (like Monty or maybe Sergei
Golubchik) who has a better overview of how memory management works in the
client and server code. But for now, a couple of comments:

1. Maybe some of this is already done? I would imagine at least some of the
primitive data structures would be used in client code.

2. One of the crucial points is to control the lifetime of allocations. In the
server there is extensive use of memroots with different lifetimes. Probably
a good way would be to support memroots in client applications (don't know
if/how much is already available).

3. I would like to avoid sprinkling #ifdef around the code, it should be
possible to do in a cleaner way. #ifdef introduce complex dependencies when
working on code, and there are already too much of that for
mysqlbinlog. Though if the existing client code already depends on some
#ifdef-magic we need to use it I guess.

In particular related to the changes you propose:

> - In rpl_filter.cc:
> 
>       Rpl_filter::Rpl_filter() :
>         ...
>       {
>       #ifdef MYSQL_CLIENT
>         init_alloc_root(&sql_list_client_mem_root, ...);
>       #endif
>       ...
>       }
> 
>      Rpl_filter::~Rpl_filter() 
>      { ...
>      #ifdef MYSQL_CLIENT
>        free_root(&sql_list_client_mem_root, ...);
>      #endif
>      }

This is particular feels wrong, Rpl_filter should not need to maintain a
global mem_root like sql_list_client_mem_root. What if I want to use two
instances of Rpl_filter?

> - In sql_list.cc/h, Sql_alloc::new(size_t) and Sql_alloc::new[](size_t)
>   uses sql_alloc() which is THD dependent. These are to be modified
>   as follows:
> 
>       #ifdef MYSQL_CLIENT
>         extern MEM_ROOT sql_list_client_mem_root; // defined in sql_list.cc
>       #endif
> 
>       class Sql_alloc
>       { ...
>         static void *operator new(size_t size) throw ()
>         {
>       #ifndef MYSQL_CLIENT
>           return sql_alloc(size);
>       #else
>           return alloc_root(&sql_list_client_mem_root, size);
>       #endif
>         }
>         static void *operator new[](size_t size) throw ()
>         {
>       #ifndef MYSQL_CLIENT
>           return sql_alloc(size);
>       #else
>           return alloc_root(&sql_list_client_mem_root, size); 
>       #endif
>         }
>       ...
>       }
> 

Can't we instead of this just define an alternate implementation of
sql_alloc() in mysqlbinlog?

Then mysqlbinlog could initialize a memroot which is used in its version of
sql_alloc(). And it seems we would not need any of the above #ifdef's?

In fact if you look at the end of client/mysql.cc it seems to do exactly this.
(Except it does not use memroot, just malloc(). Not sure what is best in our
case).

What do you think?

> ************************************************************************************
> This is OK: I wrote my code closely going along the code of the corresponding
> Table_map_log_event constructor (just in case, I copied full text of the rewrite_db
> at the end of this letter).
> ************************************************************************************

Ok, great!

 - Kristian.
Follow ups

Re: Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
From: Alexi1952, 2009-09-24
References

Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
From: worklog-noreply, 2009-09-16
Re: Updated (by Guest): Add a mysqlbinlog option to change the used database (36)
From: Kristian Nielsen, 2009-09-18