maria-developers team mailing list archive

Thread
Date
Re: MWL#116: Efficient group commit, ready for code review

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Michael Widenius <monty@xxxxxxxxxxxx>
Date: Fri, 25 Mar 2011 17:43:03 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, Sergei Golubchik <serg@xxxxxxxxxxxx>
In-reply-to: <87eib53wx4.fsf@knielsen-hq.org>
Reply-to: monty@xxxxxxxxxxxx
Hi!

>>>>> "Kristian" == Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

Kristian> Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:
>> Please find attached the patch for code review of MWL#116: Efficient group
>> commit.


> === added file 'mysql-test/suite/binlog/t/binlog_ioerr.test'
> --- mysql-test/suite/binlog/t/binlog_ioerr.test	1970-01-01 00:00:00 +0000
> +++ mysql-test/suite/binlog/t/binlog_ioerr.test	2010-10-01 08:16:44 +0000

<cut>

> +# Actually the output from this currently shows a bug.
> +# The injected IO error leaves partially written transactions in the binlog in
> +# the form of stray "BEGIN" events.
> +# These should disappear from the output if binlog error handling is improved.

Do you have a worklog number for the above. If yes, please add it to
the source too.

<cut>

> --- mysql-test/t/group_commit_binlog_pos.test	1970-01-01 00:00:00 +0000
> +++ mysql-test/t/group_commit_binlog_pos.test	2010-10-28 19:01:32 +0000
<cut>

> +# Check that the binlog position reported by InnoDB is the correct one
> +# for the end of the second transaction (as can be checked with
> +# mysqlbinlog).
> +let $MYSQLD_DATADIR= `SELECT @@datadir`;
> +--exec grep 'InnoDB: Last MySQL binlog file position' $MYSQLD_DATADIR/../../log/mysqld.1.err | tail -1

Is there another way to get the position without using grep ?
(as grep probably don't work on windows)

As we already have regexp in mysqltest, it should be trivial to do our
own simple version of grep there.
Let's see if we can get someone to implement the grep.
Another option would be to store the startup position in some user
variables (assuming this would be useful for anyone else).

> --- sql/handler.cc	2010-10-20 10:58:43 +0000

> @@ -1075,7 +1077,7 @@ ha_check_and_coalesce_trx_read_only(THD 
>  */
>  int ha_commit_trans(THD *thd, bool all)
>  {
>    /*
>  #ifdef USING_TRANSACTIONS
> -  if (ha_info)
> +  if (!ha_info)
>    {
> +    /* Free resources and perform other cleanup even for 'empty' transactions. */
> +    if (is_real_trans)
> +      thd->transaction.cleanup();
> +    DBUG_RETURN(0);
> +  }

Thanks for cleaning up the indentation levels !
(It's always better to handle the 'if-not-active-then-exit' cases
first and not have them in the else part)

<cut>

> +    /*
> +      Sic: we know that prepare() is not NULL since otherwise
> +      trans->no_2pc would have been set.
> +    */
> +    if ((err= ht->prepare(ht, thd, all)))
> +      my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> +    status_var_increment(thd->status_var.ha_prepare_count);
> +
> +    if (err)
> +      goto err;
> +

The above is better written as:

  status_var_increment(thd->status_var.ha_prepare_count);
  if ((err= ht->prepare(ht, thd, all)))
  {
    my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
    goto err;
  }

> +    if (ht->prepare_ordered)
> +      need_prepare_ordered= TRUE;
> +    if (ht->commit_ordered)
> +      need_commit_ordered= TRUE;

Another way to do this would be:

  need_prepare_ordered|= ht->prepare_ordered;
  need_commit_ordered|= ht->commit_ordered;

Less code and faster...

> +  }
> +  DBUG_EXECUTE_IF("crash_commit_after_prepare", DBUG_ABORT(););

Should be DBUG_SUICIDE() after you merge with lastest 5.1

<cut>

> +  DBUG_EXECUTE_IF("crash_commit_before_unlog", DBUG_ABORT(););
> +  tc_log->unlog(cookie, xid);

Latest 5.1 code tests value for unlog, so you need to check after
merge.

> +
> +  DBUG_EXECUTE_IF("crash_commit_after", DBUG_ABORT(););
> +  goto end;
> +
> +  /* Come here if error and we need to rollback. */
> +err:
> +  if (!error)
> +    error= 1;

Add comment:
       error= 1;        /* transaction was rolled back */

I would also remove the test if error is set as we are doing a rollback
here and in this case the only allowed result is 1.

<cut>

> @@ -1839,7 +1876,13 @@ int ha_start_consistent_snapshot(THD *th
>  {
>    bool warn= true;
>  
> +  /*
> +    Holding the LOCK_commit_ordered mutex ensures that for any transaction
> +    we either see it committed in all engines, or in none.
> +  */

A slightly better comment is:

/*
  Holding the LOCK_commit_ordered mutex ensures that we get a the same
  snapshot for all engines (including the binary log).  This allows us
  among other things to do backups with
  START TRANSACTION WITH CONSISTENT SNAPSHOT and
  have a consistent binlog position
*/

> +  pthread_mutex_lock(&LOCK_commit_ordered);
>    plugin_foreach(thd, snapshot_handlerton, MYSQL_STORAGE_ENGINE_PLUGIN, &warn);
> +  pthread_mutex_unlock(&LOCK_commit_ordered);
>  
>    /*
>      Same idea as when one wants to CREATE TABLE in one engine which does not
> 


> === modified file 'sql/handler.h'
> --- sql/handler.h	2010-10-20 10:58:43 +0000
> +++ sql/handler.h	2010-11-01 14:15:03 +0000
> @@ -659,9 +659,100 @@ struct handlerton
>       NOTE 'all' is also false in auto-commit mode where 'end of statement'
>       and 'real commit' mean the same event.
>     */

<cut>

> +     Not that like prepare(), commit_ordered() is only called when 2-phase
> +     commit takes place. Ie. when no binary log and only a single engine
> +     participates in a transaction, one commit() is called, no
> +     commit_orderd(). So engines must be prepared for this.

commit_orderd -> commit_ordered()

> +
> +     The calls to commit_ordered() in multiple parallel transactions is
> +     guaranteed to happen in the same order in every participating
> +     handler. This can be used to ensure the same commit order among multiple
> +     handlers (eg. in table handler and binlog). So if transaction T1 calls
> +     into commit_ordered() of handler A before T2, then T1 will also call
> +     commit_ordered() of handler B before T2.
> +
> +     Engines that implement this method should during this call make the
> +     transaction visible to other transactions, thereby making the order of
> +     transaction commits be defined by the order of commit_ordered() calls.
> +
> +     The intension is that commit_ordered() should do the minimal amount of

intension -> intention

<cut>

> +     When 2-phase commit is not used (eg. only one engine (and no binlog) in
> +     transaction), prepare() is not called and in such cases prepare_ordered()
> +     also is not called.

->
    When 2-phase commit is not used (eg. only one engine (and no binlog) in
    transaction), nether prepare() or prepare_ordered() is called.

> === modified file 'sql/log.cc'
> --- sql/log.cc	2010-10-19 13:58:35 +0000
> +++ sql/log.cc	2010-11-01 14:15:03 +0000
> @@ -38,6 +38,7 @@
>  #endif
>  
>  #include <mysql/plugin.h>
> +#include "debug_sync.h"
>  
>  /* max size of the log message */
>  #define MAX_LOG_BUFFER_SIZE 1024
> @@ -154,7 +155,7 @@ class binlog_trx_data {
>  public:
>    binlog_trx_data()
>      : at_least_one_stmt_committed(0), incident(FALSE), m_pending(0),
> -    before_stmt_pos(MY_OFF_T_UNDEF)
> +    before_stmt_pos(MY_OFF_T_UNDEF), commit_bin_log_file_pos(0), using_xa(0)

Change using_xa(0) -> using_xa(FALSE)

(Just to be consistent with:

> @@ -208,11 +209,13 @@ public:
...
>      before_stmt_pos= MY_OFF_T_UNDEF;
>      incident= FALSE;
>      trans_log.end_of_file= max_binlog_cache_size;
> +    using_xa= FALSE;
> +    commit_bin_log_file_pos= 0;
>      DBUG_ASSERT(empty());
>    }

> @@ -1394,103 +1408,131 @@ static int binlog_close_connection(handl
>  }
>  
>  /*
> -  End a transaction.
> +  End a transaction, writing events to the binary log.
>  
>    SYNOPSIS
> -    binlog_end_trans()
> +    binlog_flush_trx_cache()
>  
>      thd      The thread whose transaction should be ended
>      trx_data Pointer to the transaction data to use
> -    end_ev   The end event to use, or NULL
> -    all      True if the entire transaction should be ended, false if
> -             only the statement transaction should be ended.
> +    end_ev   The end event to use (COMMIT, ROLLBACK, or commit XID)
>  
<cut>

> -    If 'end_ev' is NULL, the transaction is a rollback of only
> -    transactional tables, so the transaction cache will be truncated
> -    to either just before the last opened statement transaction (if
> -    'all' is false), or reset completely (if 'all' is true).
> +    This can be to commit a transaction, with a COMMIT query event or an XA
> +    commit XID event. But it can also be to rollback a transaction with a
> +    ROLLBACK query event, used for rolling back transactions which also
> +    contain updates to non-transactional tables.
>   */

Please add back documentation for 'all' as this is still a parameter.
Looking at the code, end_ev can still be NULL.  It would be good to
document this case too (or never call this with end_ev). See comment
about this later in the code...

<cut>

> +static LEX_STRING const write_error_msg=
> +    { C_STRING_WITH_LEN("error writing to the binary log") };
> +

Please move to the start of the file (so we have all variables
declared in one place).

>  
> +#ifndef DBUG_OFF
> +static ulong opt_binlog_dbug_fsync_sleep= 0;
> +#endif

Add also above to start of file.

>  bool MYSQL_BIN_LOG::flush_and_sync()
>  {
>    int err=0, fd=log_file.file;
> @@ -3956,6 +4003,11 @@ bool MYSQL_BIN_LOG::flush_and_sync()
>    {
>      sync_binlog_counter= 0;
>      err=my_sync(fd, MYF(MY_WME));
> +#ifndef DBUG_OFF
> +    ulong usec_sleep= opt_binlog_dbug_fsync_sleep;
> +    if (usec_sleep > 0)
> +      my_sleep(usec_sleep);
> +#endif

Why not simple do:
    if (opt_binlog_dbug_fsync_sleep > 0)
      my_sleep(opt_binlog_dbug_fsync_sleep)

as this is only for debugging, the above is safe

> @@ -4335,8 +4379,6 @@ bool MYSQL_BIN_LOG::write(Log_event *eve
>    if (thd->binlog_flush_pending_rows_event(end_stmt))
>      DBUG_RETURN(error);
>  
> -  pthread_mutex_lock(&LOCK_log);
> -

ok to remove, but....

>  #endif /* USING_TRANSACTIONS */
>      DBUG_PRINT("info",("event type: %d",event_info->get_type_code()));
> +    if (file == &log_file)
> +      pthread_mutex_lock(&LOCK_log);


Here we must also check if is_open() is still true, as we did the
earlier test without the lock_log mutex.

<cut>

> +int MYSQL_BIN_LOG::write_cache(IO_CACHE *cache)
>  {
> -  Mutex_sentry sentry(lock_log ? &LOCK_log : NULL);
> -

Please remove also the class Mutex_sentry, as it's not used anymore.

It would however be good to add here:

   safe_mutex_assert_owner(&LOCK_log);

> @@ -4778,26 +4812,22 @@ int query_error_code(THD *thd, bool not_
>    return error;
>  }
>  
> -bool MYSQL_BIN_LOG::write_incident(THD *thd, bool lock)
> +bool MYSQL_BIN_LOG::write_incident(THD *thd)
>  {
>    uint error= 0;
>    DBUG_ENTER("MYSQL_BIN_LOG::write_incident");

add also here:

>    Incident incident= INCIDENT_LOST_EVENTS;
>    Incident_log_event ev(thd, incident, write_error_msg);
> -  if (lock)
> -    pthread_mutex_lock(&LOCK_log);
> +
> +  pthread_mutex_lock(&LOCK_log);

You need to add a test if 'is_open()' is true here.

<cut>

> +bool
> +MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry)
> +{

<cut>

> +
> +  if (!entry->error)
> +    return 0;

Add 'likely() around the above test to make it clear why this is not
handled in the following switch.

> +
> +  switch (entry->error)
> +  {
...

<cut>

> +MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
> +{

<cut>

> +
> +      /* Skip log_xid for transactions without xid, marked by NULL end_event. */
> +      if (!current->end_event)
> +        continue;

We talked about ignoring these.

>        /*
> +        We only bother to write to the binary log if there is anything
> +        to write.
>        */
> +      if (my_b_tell(cache) > 0)
> +      {

Can't we ignore transactions with cache == 0 on the upper level ?
There is no reason I can come up with why we put these in the queue.

> +        current->error= write_transaction(current);
> +        if (current->error)
> +          current->commit_errno= errno;

You could, if you want, change to use:

       if ((current->error= write_transaction(current)))
        current->commit_errno= errno;

(I find this much faster to read)

> +        write_count++;
> +      }
>  
> +      trx_data->commit_bin_log_file_pos=
> +        log_file.pos_in_file + (log_file.write_pos - log_file.write_buffer);

Please add a macro in mys_sys:

my_b_write_tell(info) info->pos_in_file + (info->write_pos - info->write_buffer) 

and use this instead.

<cut>

> +  current= queue;
> +  while (current != NULL)
>    {
> +    DEBUG_SYNC(leader->thd, "commit_loop_entry_commit_ordered");
> +    ++num_commits;
> +    if (current->trx_data->using_xa && !current->error)
> +      run_commit_ordered(current->thd, current->all);
> +
> +    /*
> +      Careful not to access current->next after waking up the other thread! As
> +      it may change immediately after wakeup.
> +    */
> +    group_commit_entry *next= current->next;

Please add variable declaration at start of loop.

<cut>

> +int
> +MYSQL_BIN_LOG::write_transaction(group_commit_entry *entry)
> +{
> +  binlog_trx_data *trx_data= entry->trx_data;
> +  IO_CACHE *cache= &trx_data->trans_log;
> +  /*
> +    Log "BEGIN" at the beginning of every transaction.  Here, a transaction is
> +    either a BEGIN..COMMIT block or a single statement in autocommit mode. The
> +    event was constructed in write_transaction_to_binlog(), in the thread
> +    running the transaction.
> +
> +    Now this Query_log_event has artificial log_pos 0. It must be
> +    adjusted to reflect the real position in the log. Not doing it
> +    would confuse the slave: it would prevent this one from
> +    knowing where he is in the master's binlog, which would result
> +    in wrong positions being shown to the user, MASTER_POS_WAIT
> +    undue waiting etc.
> +  */

The above old comment doesn't make sense anymore in this place of the
code. Please remove or move it to a better place.
(I tried to find a better place, but couldn't find it)

> +  if (entry->begin_event->write(&log_file))
> +    return ER_ERROR_ON_WRITE;
> +
> +  DBUG_EXECUTE_IF("crash_before_writing_xid",
> +                  {
> +                    if ((write_cache(cache)))
> +                      DBUG_PRINT("info", ("error writing binlog cache"));
> +                    else
> +                      flush_and_sync();
> +
> +                    DBUG_PRINT("info", ("crashing before writing xid"));
> +                    abort();
> +                  });
> +
> +  if (write_cache(cache))
> +    return ER_ERROR_ON_WRITE;
> +
> +  if (entry->end_event->write(&log_file))
> +    return ER_ERROR_ON_WRITE;
> +
> +  if (entry->incident_event && entry->incident_event->write(&log_file))
> +    return ER_ERROR_ON_WRITE;
> +
> +  if (cache->error)				// Error on read
> +    return ER_ERROR_ON_READ;

It would be better to test this after 'write_cache'.
There is no point in writing anymore if this fails....


> +static my_bool mutexes_inited;
> +pthread_mutex_t LOCK_prepare_ordered;
> +pthread_mutex_t LOCK_commit_ordered;

Move to start of file.

> +
> +void
> +TC_init()
> +{
> +  my_pthread_mutex_init(&LOCK_prepare_ordered, MY_MUTEX_INIT_SLOW,
> +                        "LOCK_prepare_ordered", MYF(0));
> +  my_pthread_mutex_init(&LOCK_commit_ordered, MY_MUTEX_INIT_SLOW,
> +                        "LOCK_commit_ordered", MYF(0));
> +  mutexes_inited= TRUE;
> +}
> +
> +void
> +TC_destroy()
> +{
> +  if (mutexes_inited)
> +  {
> +    pthread_mutex_destroy(&LOCK_prepare_ordered);
> +    pthread_mutex_destroy(&LOCK_commit_ordered);
> +    mutexes_inited= FALSE;
> +  }
> +}
> +
> +void
> +TC_LOG::run_prepare_ordered(THD *thd, bool all)
> +{
> +  Ha_trx_info *ha_info=
> +    all ? thd->transaction.all.ha_list : thd->transaction.stmt.ha_list;
> +
> +  for (; ha_info; ha_info= ha_info->next())
> +  {
> +    handlerton *ht= ha_info->ht();
> +    if (!ht->prepare_ordered)
> +      continue;
> +    safe_mutex_assert_owner(&LOCK_prepare_ordered);

Move the safe_mutex_assert_owner to start of function. It's enough to
test it once and it should be taken in all cases.

> +    ht->prepare_ordered(ht, thd, all);
> +  }
> +}
> +

Add empty line here (two empty lines between functions)

> +void
> +TC_LOG::run_commit_ordered(THD *thd, bool all)
> +{
> +  Ha_trx_info *ha_info=
> +    all ? thd->transaction.all.ha_list : thd->transaction.stmt.ha_list;
> +
> +  for (; ha_info; ha_info= ha_info->next())
> +  {
> +    handlerton *ht= ha_info->ht();
> +    if (!ht->commit_ordered)
> +      continue;
> +    safe_mutex_assert_owner(&LOCK_commit_ordered);

Move to start of function

> +    ht->commit_ordered(ht, thd, all);
> +    DEBUG_SYNC(thd, "commit_after_run_commit_ordered");
> +  }
> +}
> +
Add empty line here (two empty lines between functions)

> +int TC_LOG_MMAP::log_and_order(THD *thd, my_xid xid, bool all,
> +                               bool need_prepare_ordered,
> +                               bool need_commit_ordered)
> +{
> +  int cookie;
> +  struct commit_entry entry;
> +  bool is_group_commit_leader;
> +  LINT_INIT(is_group_commit_leader);
> +
> +  if (need_prepare_ordered)
> +  {
> +    pthread_mutex_lock(&LOCK_prepare_ordered);
> +    run_prepare_ordered(thd, all);
> +    if (need_commit_ordered)
> +    {
> +      /*
> +        Must put us in queue so we can run_commit_ordered() in same sequence
> +        as we did run_prepare_ordered().
> +      */
> +      thd->clear_wakeup_ready();
> +      entry.thd= thd;
> +      commit_entry *previous_queue= commit_ordered_queue;
> +      entry.next= previous_queue;
> +      commit_ordered_queue= &entry;
> +      is_group_commit_leader= (previous_queue == NULL);
> +    }
> +    pthread_mutex_unlock(&LOCK_prepare_ordered);
> +  }
> +
> +  if (xid)
> +    cookie= log_one_transaction(xid);
> +  else
> +    cookie= 0;

It's easier and faster to do:

  cookie= 0;
  if (xid)
    cookie= log_one_transaction(xid);

> @@ -5965,30 +6374,66 @@ void TC_LOG_BINLOG::close()
>    pthread_cond_destroy (&COND_prep_xids);
>  }
>  
> +/*
> +  Do a binlog log_xid() for a group of transactions, linked through
> +  thd->next_commit_ordered.
>  */
> -int TC_LOG_BINLOG::log_xid(THD *thd, my_xid xid)
> +int
> +TC_LOG_BINLOG::log_and_order(THD *thd, my_xid xid, bool all,
> +                             bool need_prepare_ordered __attribute__((unused)),
> +                             bool need_commit_ordered __attribute__((unused)))
>  {
> -  DBUG_ENTER("TC_LOG_BINLOG::log");
> -  Xid_log_event xle(thd, xid);
> -  binlog_trx_data *trx_data=
> +  int err;
> +  DBUG_ENTER("TC_LOG_BINLOG::log_and_order");
> +
> +  binlog_trx_data *const trx_data=
>      (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
> -  /*
> -    We always commit the entire transaction when writing an XID. Also
> -    note that the return value is inverted.
> -   */
> -  DBUG_RETURN(!binlog_end_trans(thd, trx_data, &xle, TRUE));
> +
> +  trx_data->using_xa= TRUE;
> +  if (xid)

The above looks strange.
How can we have xa without an xid ?
If this is possible, please add a comment before setting 'using_xa'

If it's not possible that xid is 0, then add an assert.

> +  {
> +    Xid_log_event xid_event(thd, xid);
> +    err= binlog_flush_trx_cache(thd, trx_data, &xid_event, all);
> +  }
> +  else
> +    err= binlog_flush_trx_cache(thd, trx_data, NULL, all);

Shouldn't the above be:
binlog_truncate_trx_cache()

At least all other cases where we before called
binlog_flush_trx_cache(thd, trx_data, NULL, all) was replaced with
binlog_truncate_trx_cache().

If the call is right, this should be explained in the function comment
for binlog_flush_trx_cache().

> +
> +  DBUG_RETURN(!err);
>  }
>  
<cut>

> +/*
> +  Once an XID is committed, it is safe to rotate the binary log, as it can no
> +  longer be needed during crash recovery.
> +
> +  This function is called to mark an XID this way. It needs to decrease the
> +  count of pending XIDs, and signal the log rotator thread when it reaches zero.
> +*/
> +void
> +TC_LOG_BINLOG::mark_xid_done()
>  {
> +  DBUG_ENTER("TC_LOG_BINLOG::mark_xid_done");
>    pthread_mutex_lock(&LOCK_prep_xids);
>    DBUG_ASSERT(prepared_xids > 0);
>    if (--prepared_xids == 0) {
>      pthread_cond_signal(&COND_prep_xids);
>    }
>    pthread_mutex_unlock(&LOCK_prep_xids);
>    DBUG_VOID_RETURN;

An even fast way to do this is:

   pthread_mutex_lock(&LOCK_prep_xids);
   DBUG_ASSERT(prepared_xids > 0);
   signal= !--prepared_xids;
   pthread_mutex_unlock(&LOCK_prep_xids);
   if (signal)
     pthread_cond_signal(&COND_prep_xids);
   DBUG_VOID_RETURN;

This is faster as it will not wake up the thread and have it sleep
again on the mutex...


> +void TC_LOG_BINLOG::unlog(ulong cookie, my_xid xid)
> +{
> +  DBUG_ENTER("TC_LOG_BINLOG::unlog");
> +  if (xid)
> +    mark_xid_done();
> +  rotate_and_purge(0);     // as ::write_transaction_to_binlog() did not rotate

Note for when doing merge: Newest 5.1 code returns value of
rotate_and_purge().

> +static ulonglong binlog_status_var_num_commits;
> +static ulonglong binlog_status_var_num_group_commits;

Move variables to start of file

> +
> +
> +/*
> +  Copy out current values of status variables, for SHOW STATUS or
> +  information_schema.global_status.
> +
> +  This is called only under LOCK_status, so we can fill in a static array.
> +*/
> +void
> +TC_LOG_BINLOG::set_status_variables()
> +{
> +  ulonglong num_commits, num_group_commits;
> +
> +  pthread_mutex_lock(&LOCK_commit_ordered);
> +  num_commits= this->num_commits;
> +  num_group_commits= this->num_group_commits;
> +  pthread_mutex_unlock(&LOCK_commit_ordered);
> +
> +  binlog_status_var_num_commits= num_commits;
> +  binlog_status_var_num_group_commits= num_group_commits;

Why not set the above variables directly ?
(Having to store local variables over the pthread_mutex_unlock() call
has a higher overhead than setting the global variables directly).

<cut>

> === modified file 'sql/sql_class.cc'
> --- sql/sql_class.cc	2010-08-27 14:12:44 +0000
> +++ sql/sql_class.cc	2010-10-28 09:35:46 +0000

> +void
> +THD::wait_for_wakeup_ready()
> +{
> +  pthread_mutex_lock(&LOCK_wakeup_ready);
> +  while (!wakeup_ready)
> +    pthread_cond_wait(&COND_wakeup_ready, &LOCK_wakeup_ready);
> +  pthread_mutex_unlock(&LOCK_wakeup_ready);
> +}
> +
> +void
> +THD::signal_wakeup_ready()
> +{
> +  pthread_mutex_lock(&LOCK_wakeup_ready);
> +  wakeup_ready= true;
> +  pthread_cond_signal(&COND_wakeup_ready);
> +  pthread_mutex_unlock(&LOCK_wakeup_ready);

A faster version is:

> +  pthread_mutex_lock(&LOCK_wakeup_ready);
> +  wakeup_ready= true;
> +  pthread_mutex_unlock(&LOCK_wakeup_ready);
> +  pthread_cond_signal(&COND_wakeup_ready);

You need the lock around wakup_read to ensure that
wait_for_wakeup_ready() doesn't miss the setting of the variable.

> === modified file 'storage/xtradb/handler/ha_innodb.cc'
> --- storage/xtradb/handler/ha_innodb.cc	2010-10-19 15:03:26 +0000

<cut>


> @@ -11619,11 +11643,6 @@ static MYSQL_SYSVAR_ENUM(adaptive_checkp
>    "Enable/Disable flushing along modified age. (none, reflex, [estimate])",
>    NULL, innodb_adaptive_checkpoint_update, 2, &adaptive_checkpoint_typelib);
>  
> -static MYSQL_SYSVAR_ULONG(enable_unsafe_group_commit, srv_enable_unsafe_group_commit,
> -  PLUGIN_VAR_RQCMDARG,
> -  "Enable/Disable unsafe group commit when support_xa=OFF and use with binlog or other XA storage engine.",
> -  NULL, NULL, 0, 0, 1, 0);
> -

We probably would need to have the variable available (and do nothing)
and mark it deprecated.  (Just to ensure that MariaDB starts with a
my.cnf file made for MySQL).

<cut>

Regards,
Monty
Follow ups

Re: MWL#116: Efficient group commit, ready for code review
From: Kristian Nielsen, 2011-04-04
References

MWL#116: Efficient group commit, ready for code review
From: Kristian Nielsen, 2010-11-01
Re: MWL#116: Efficient group commit, ready for code review
From: Kristian Nielsen, 2010-11-01