maria-developers team mailing list archive

Thread
Date
Re: 256753e8ae8: Clean up and speed up interfaces for binary row logging

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Michael Widenius <michael.widenius@xxxxxxxxx>
Date: Sun, 1 Mar 2020 13:35:43 +0200
Cc: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>, Michael Widenius <monty@xxxxxxxxxxx>
In-reply-to: <20200228200100.GA27157@pweza.fritz.box>
Hi!

On Fri, Feb 28, 2020 at 10:00 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
...
> > diff --git a/mysql-test/suite/rpl/r/create_or_replace_mix.result b/mysql-test/suite/rpl/r/create_or_replace_mix.result
> > index 661278aa7ef..6c83d27eef9 100644
> > --- a/mysql-test/suite/rpl/r/create_or_replace_mix.result
> > +++ b/mysql-test/suite/rpl/r/create_or_replace_mix.result
> > @@ -223,26 +226,12 @@ Log_name        Pos     Event_type      Server_id       End_log_pos     Info
> >  slave-bin.000001     #       Gtid    #       #       GTID #-#-#
> >  slave-bin.000001     #       Query   #       #       use `test`; create table t1 (a int)
> >  slave-bin.000001     #       Gtid    #       #       BEGIN GTID #-#-#
> > -slave-bin.000001     #       Annotate_rows   #       #       insert into t1 values (0),(1),(2)
> > -slave-bin.000001     #       Table_map       #       #       table_id: # (test.t1)
> > -slave-bin.000001     #       Write_rows_v1   #       #       table_id: # flags: STMT_END_F
> > +slave-bin.000001     #       Query   #       #       use `test`; insert into t1 values (0),(1),(2)
>
> why is it STATEMENT now?

" - In the original code, the master could come into a state where row
      logging is enforced for all future events if statement could be used.
      This is now partly fixed."

The test is run in mix mode.  Insert are normally run in statement
mode for simple tables like the above.
The old code assumed that if you have created a temporary table, then
all future statements until the
temporary table is dropped will always be in binary logging mode.  I
have now fixed so that only
statements that are actually using the temporary table will use binary logging.


<cut>

> > diff --git a/mysql-test/suite/rpl/t/rpl_foreign_key.test b/mysql-test/suite/rpl/t/rpl_foreign_key.test
> > new file mode 100644
> > index 00000000000..50be97af24d
> > --- /dev/null
> > +++ b/mysql-test/suite/rpl/t/rpl_foreign_key.test
> > @@ -0,0 +1,18 @@
> > +--source include/have_innodb.inc
> > +--source include/have_binlog_format_row.inc
> > +
> > +CREATE TABLE t1 (
> > +    id INT,
> > +    k INT,
> > +    c CHAR(8),
> > +    KEY (k),
> > +    PRIMARY KEY (id),
> > +    FOREIGN KEY (id) REFERENCES t1 (k)
> > +) ENGINE=InnoDB;
> > +LOCK TABLES t1 WRITE;
> > +SET SESSION FOREIGN_KEY_CHECKS= OFF;
> > +SET AUTOCOMMIT=OFF;
> > +INSERT INTO t1 VALUES (1,1,'foo');
> > +DROP TABLE t1;
> > +SET SESSION FOREIGN_KEY_CHECKS= ON;
> > +SET AUTOCOMMIT=ON;
>
> What kind of replication test is it? You don't check binlog events, you
> don't compare master and slave, you don't even run anything on the slave
> to check whether your staments were replicated correctly.

The test case is from mdev-21617.  It caused a crash in one version of the code
as such. I agree that it would be better to check the result, but as
this was never
a problem I didn't think about adding it.

> In fact, you don't have any slave, you have not included
> master-slave.inc, you only have binlog, so this test should be in the
> binlog suite, not in the rpl suite - it's a binlog test, not replication
> test.

> And even in the binlog suite it would make sense to see what's actually
> in a binlog. Just as a test that it's ok.
I have now moved it to binlog and check the binary log, even if it's
not really needed
for the MDEV.

> > diff --git a/sql/ha_sequence.cc b/sql/ha_sequence.cc
> > index 6cb9937ebb4..71da208d775 100644
> > --- a/sql/ha_sequence.cc
> > +++ b/sql/ha_sequence.cc
> > @@ -202,7 +202,11 @@ int ha_sequence::write_row(const uchar *buf)
> >    DBUG_ENTER("ha_sequence::write_row");
> >    DBUG_ASSERT(table->record[0] == buf);
> >
> > -  row_already_logged= 0;
> > +  /*
> > +    Log to binary log even if this function has been called before
> > +    (The function ends by setting row_logging to 0)
> > +  */
> > +  row_logging= row_logging_init;
>
> this is a sequence-specific hack, so you should define row_logging_init
> in ha_sequence class, not in the base handler class

This would force me to make prepare_for_row_logging() and ha_reset()
virtual or add extra virtual
functions that would have to be called for all handlers in both of the
above cases.
The overhead is much bigger than having two set of the above variable
in the two above functions.

> > index 1e3f987b4e5..4096ae8b90f 100644
> > --- a/sql/handler.cc
> > +++ b/sql/handler.cc
> > @@ -6224,32 +6225,37 @@ bool ha_show_status(THD *thd, handlerton *db_type, enum ha_stat_type stat)
> >      1  Row needs to be logged
> >  */
> >
> > -bool handler::check_table_binlog_row_based(bool binlog_row)
> > +bool handler::check_table_binlog_row_based()
> >  {
> > -  if (table->versioned(VERS_TRX_ID))
> > -    return false;
> > -  if (unlikely((table->in_use->variables.sql_log_bin_off)))
> > -    return 0;                            /* Called by partitioning engine */
> >  #ifdef WITH_WSREP
> > -  if (!table->in_use->variables.sql_log_bin &&
> > -      wsrep_thd_is_applying(table->in_use))
> > -    return 0;      /* wsrep patch sets sql_log_bin to silence binlogging
> > -                      from high priority threads */
> >  #endif /* WITH_WSREP */
>
> That's an empty #ifdef :)

Strange that I didn't notice that. Fixed!

> > @@ -6769,13 +6718,17 @@ int handler::ha_write_row(const uchar *buf)
> >                        { error= write_row(buf); })
> >
> >    MYSQL_INSERT_ROW_DONE(error);
> > -  if (likely(!error) && !row_already_logged)
> > +  if (likely(!error))
> >    {
> >      rows_changed++;
> > -    error= binlog_log_row(table, 0, buf, log_func);
> > +    if (row_logging)
> > +    {
> > +      Log_func *log_func= Write_rows_log_event::binlog_row_logging_function;
> > +      error= binlog_log_row(table, 0, buf, log_func);
> > +    }
> >  #ifdef WITH_WSREP
> > -    if (table_share->tmp_table == NO_TMP_TABLE &&
> > -        WSREP(ha_thd()) && (error= wsrep_after_row(ha_thd())))
> > +    if (WSREP_NNULL(ha_thd()) && table_share->tmp_table == NO_TMP_TABLE &&
>
> why did you swap tests? NO_TMP_TABLE check is cheaper
> (same in update and delete)

Actually, it's more expensive as it's almost almost true while
WSREP_NNULL is only true
if galera is used. In other words, the new test favors normal server,
not galera which I think
is right.

> > +++ b/sql/sql_table.cc
> > @@ -10506,10 +10506,10 @@ do_continue:;
> >          No additional logging of query is needed
> >        */
> >        binlog_done= 1;
> > +      DBUG_ASSERT(new_table->file->row_logging);
> >        new_table->mark_columns_needed_for_insert();
> >        thd->binlog_start_trans_and_stmt();
> > -      binlog_write_table_map(thd, new_table,
> > -                             thd->variables.binlog_annotate_row_events);
> > +      thd->binlog_write_table_map(new_table, 1);
>
> does it mean you force annotations for ALTER TABLE even if they were
> configured off?

No. This means just that he binlog_write_table_map is the first call and
it should consider writing an annotate event. The function
bool THD::binlog_write_annotated_row(Log_event_writer *writer) will
check that variables.binlog_annotate_row_events is true (in addition
to other things)

> And why would ALTER TABLE generate row events anyway?

ALTER TABLE s3_table engine=InnoDB.

As the s3_table is not available on slaves, it has to row logged.
Same should happen if one converts a clustrix table to InnoDB.
We did discuss this a couple of times...

Thanks for the review.
All fixed and also rebased on latest 10.5 is now pushed into bb-10.5-monty

Regards,
Monty
References

Re: 256753e8ae8: Clean up and speed up interfaces for binary row logging
From: Sergei Golubchik, 2020-02-28