← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication

 

Hi, Kristian!

> >> API, eliminating lots of class definitions and accessor functions.
> >> Though arguably it wouldn't really simplify the API, as the
> >> complexity would just be in understanding the THD class.
> >>
> >> For now, the API is proposed without exposing the THD class.
> >> (Similar encapsulation could be added in actual implementation to
> >> also not expose TABLE and similar classes).
> >
> > completely agree
> 
> Ok, so some follow up questions:
> 
> 1. Do I understand correctly that you agree that the API should also
> encapsulate TABLE and similar classes? These _are_ exposed to storage
> engines as far as I can see.

I think it's ok to use TABLE and Field as storage engines are using
them. It would be good to encapsulate them, of course, but I'd say
there's no need to try to do it at all costs.
 
> 2. If TABLE and so on should be encapsulated, there will be the issue
> of having iterators to run over columns, etc. Do we already have
> standard classes for this that could be used? Or should I do this
> modelled using the iterators of the Stardard C++ library, for example?

We have List and an iterator over it.  Alternatively, you can return an
array and let the caller iterate it any way it wants.
 
> (I would like to make the new API fit in as well as possible with the
> existing MySQL/MariaDB code, which you know much better).
> 
> >> A consumer is implented as a virtual class (interface). There is
> >> one virtual function for every event that can be received. A
> >> consumer would derive from
> >
> > hm. This part I don't understand.
> > How would that work ? A consumer want to see a uniform stream of
> > events, perhaps for sending them to a slave. Why would you need
> > different consimers and different methods for different events ?
> >
> > I'd just have one method, receive_event(rpl_event_base *)
> 
> Ok, so do I understand you correctly that class rpl_event_base would
> have a type field, and the consumer could then down-cast to the
> appropriate specific event class based on the type?
> 
>   receive_event(const rpl_event_base *generic_event)
>   {
>     switch (generic_event->type)
>     {
>       case rpl_event_base::RPL_EVENT_STATEMENT_QUERY:
>         const rpl_event_statement_query *ev=
>           static_cast<const rpl_event_statement_query *>(generic_event);
>         do_stuff(ev->get_query_string(), ...);
>         break;
>       case rpl_event_base::RPL_EVENT_ROW_UPDATE:
>         const rpl_event_row_update *ev=
>           static_cast<const rpl_event_row_update *>(generic_event);
>         do_stuff(ev->get_after_image(), ...);
>         break;
>       ...
>     }
>   }
> 
> I have always disliked having such type field and upcasting. So I
> tried to make an API where it was not needed. Like this:
> 
>   class my_event_consumer
>   {
>     int stmt_query(const rpl_event_statement_query *ev)
>     {
>       do_stuff(ev->get_query_string(), ...);
>     }
>     int row_update(const rpl_event_row_update *ev)
>     {
>       do_stuff(ev->get_after_image(), ...);
>     }
>     ...
>   };

Okay, now I see what you mean.
I don't like downcasting either.

On the other hand, I don't want to force plugins that work on an event
as a whole to implement methods for every particular type of an event.

It may be possible to do both. Like - virtual methods for every event
type, as you proposed, but not abstract - the default implementation
calls receive_event() - a generic one. And a plugin can either implement
a family of receive_event* methods or a generic.

But if the above wouldn't work and we'll have to choose, I'd prefer a
simpler interface with one generic receive_event().
 
> >> One generator can be stacked on top of another. This means that a
> >> generator on top (for example row-based events) will handle some
> >> events itself (eg. non-deterministic update in mixed-mode
> >> binlogging).  Other events that it does not want to or cannot
> >> handle (for example deterministic delete or DDL) will be defered to
> >> the generator below (for example statement-based events).
> >
> > There's a problem with this idea. Say, Event B is nested in Event A:
> >
> >    ... ... |<-    Event A ... .. .. ->| .. .. ..
> >    *  *  *  *  * |<-   Event B  ->| *  *  *  *
> >
> > This is fine. But what about
> >
> >    ... ... |<-    Event A ... ->| .. .. ..
> >    *  *  *  *  * |<-    Event B   ->| *  *  *  *
> >
> > In the latter case no event is nested in the other, and no level can
> > simply dever to the other.
> >
> > I don't know a solution for this, I'm just hoping the above
> > situation is impossible. At least, I could not find an example of
> > "overlapping" events.
> 
> Another way of thinking about this is that we have one layer above
> handling (or not handling) an event that can be generated below.
...
> So one case where this becomes a problem is if we have a multi-table
> update where one table is PBXT and another is not, and we are using
> PBXT engine-level replication on top of statement-based replication.
> In this case, one half of the statement-based event is handled by the
> layer above, but the other is not. So we cannot deal with this
> situation.

On the opposite, this is quite easy. Even a CREATE ... SELECT is a mix
of statement-based and row-based. A simple solution would be to
replicate it completely statement-based - that is, to discard the
row-based part of the event. We can do that, because statement level
description of the event is sufficient - the row based evetn is
completely nested within a statement-based one (other, more complex
solutions are possible too).

I was describing a case when events overlap, but neither one is
completely nested within the other. This case I know no solution for,
but I hope it is never possible in practice.
 
Regards,
Sergei



References