maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04989
Re: MDEV-532: Async InnoDB commit checkpoint
Hi, Kristian!
Please, find the review below!
I had only few minor comments.
On Sep 14, Kristian Nielsen wrote:
> At http://bazaar.launchpad.net/~maria-captains/maria/10.0
>
> ------------------------------------------------------------
> revno: 3435
> revision-id: knielsen@xxxxxxxxxxxxxxx-20120914124453-zsap6hjclq3vrb6n
> parent: knielsen@xxxxxxxxxxxxxxx-20120913123129-kaujy4cw0jc9o08k
> committer: knielsen@xxxxxxxxxxxxxxx
> branch nick: work-10.0-mdev225-181-232
> timestamp: Fri 2012-09-14 14:44:53 +0200
> message:
> MDEV-532: Async InnoDB commit checkpoint.
>
> Make the commit checkpoint inside InnoDB be asynchroneous.
> Implement a background thread in binlog to do the writing and flushing of
> binlog checkpoint events to disk.
> === modified file 'sql/log.cc'
> --- a/sql/log.cc 2012-09-13 12:31:29 +0000
> +++ b/sql/log.cc 2012-09-14 12:44:53 +0000
> @@ -106,6 +107,14 @@ static SHOW_VAR binlog_status_vars_detai
> {NullS, NullS, SHOW_LONG}
> };
>
> +/* Variables for the binlog background thread. */
Please add to the comment: ", protected by the
LOCK_binlog_thread mutex"
> +static bool binlog_thread_started= false;
> +static bool binlog_background_thread_stop= false;
I'd prefer if both variables would start from "binlog_thread" or both
from "binlog_background_thread". Either way, but the same for both
variables.
Also, the mutex is LOCK_binlog_thread. The function is
start_binlog_background_thread(). The condition is COND_binlog_thread.
The queue is binlog_background_thread_queue.
Could you somehow decide whether the official name for it is a "binlog
background thread" or simply a "binlog thread" and use it consistently
everywhere? It's also important for the configuration or status
variables, if any, for the warnings, status or error messages, for the
documentation pages, etc.
> +static MYSQL_BIN_LOG::xid_count_per_binlog *
> + binlog_background_thread_queue= NULL;
> +
> +static bool start_binlog_background_thread();
> +
>
> /**
> purge logs, master and slave sides both, related error code
> @@ -8116,9 +8145,128 @@ int TC_LOG_BINLOG::unlog(ulong cookie, m
> void
> TC_LOG_BINLOG::commit_checkpoint_notify(void *cookie)
> {
> - mark_xid_done(((xid_count_per_binlog *)cookie)->binlog_id, true);
> + xid_count_per_binlog *entry= static_cast<xid_count_per_binlog *>(cookie);
> + mysql_mutex_lock(&LOCK_binlog_thread);
> + entry->next_in_queue= binlog_background_thread_queue;
> + binlog_background_thread_queue= entry;
> + mysql_cond_signal(&COND_binlog_thread);
> + mysql_mutex_unlock(&LOCK_binlog_thread);
> }
>
> +/*
> + Binlog service thread.
> +
> + This thread is used to log binlog checkpoints in the background, rather than
> + in the context of random storage engine threads that happen to call
> + commit_checkpoint_notify_ha() and may not like the delays while syncing
> + binlog to disk or may not be setup with all my_thread_init() and other
> + necessary stuff.
> +
> + In the future, this thread could also be used to do log rotation in the
> + background, which could elimiate all stalls around binlog rotations.
> +*/
> +pthread_handler_t
> +binlog_background_thread(void *arg __attribute__((unused)))
> +{
> + bool stop;
> + MYSQL_BIN_LOG::xid_count_per_binlog *queue, *next;
> + THD *thd;
> +
> + my_thread_init();
> + thd= new THD;
> + thd->system_thread= SYSTEM_THREAD_BINLOG_BACKGROUND;
> + my_pthread_setspecific_ptr(THR_THD, thd);
^^^ the normal way of setting up a THD, would be to call
thd->store_globals(). Why did that not work for you and you had to
resort to the manual my_pthread_setspecific_ptr() ?
> + mysql_mutex_lock(&LOCK_thread_count);
> + thd->thread_id= thread_id++;
> + mysql_mutex_unlock(&LOCK_thread_count);
> +
> + for (;;)
> + {
> + /*
> + Wait until there is something in the queue to process, or we are asked
> + to shut down.
> + */
> + thd_proc_info(thd, "Waiting for background binlog tasks");
In 10.0 you need to use THD_STAGE_INFO()
(here and below - everywhere instead of thd_proc_info)
> + mysql_mutex_lock(&mysql_bin_log.LOCK_binlog_thread);
> + for (;;)
> + {
> + stop= binlog_background_thread_stop;
> + queue= binlog_background_thread_queue;
> + if (stop || queue)
> + break;
> + mysql_cond_wait(&mysql_bin_log.COND_binlog_thread,
> + &mysql_bin_log.LOCK_binlog_thread);
> + }
> + /* Grab the queue, if any. */
> + binlog_background_thread_queue= NULL;
> + mysql_mutex_unlock(&mysql_bin_log.LOCK_binlog_thread);
...
> === modified file 'storage/innobase/handler/ha_innodb.cc'
> --- a/storage/innobase/handler/ha_innodb.cc 2012-09-13 12:31:29 +0000
> +++ b/storage/innobase/handler/ha_innodb.cc 2012-09-14 12:44:53 +0000
> @@ -3008,6 +3015,16 @@ innobase_rollback_trx(
> DBUG_RETURN(convert_error_code_to_mysql(error, 0, NULL));
> }
>
> +
> +struct pending_checkpoint {
> + struct pending_checkpoint *next;
> + handlerton *hton;
> + void *cookie;
> + ib_uint64_t lsn;
> +};
> +static struct pending_checkpoint *pending_checkpoint_list;
> +static struct pending_checkpoint *pending_checkpoint_list_end;
> +
> /*****************************************************************//**
> Handle a commit checkpoint request from server layer.
> We simply flush the redo log immediately and do the notify call.*/
The comment seems to be wrong now, doesn't it?
> @@ -3017,8 +3034,113 @@ innobase_checkpoint_request(
> handlerton *hton,
> void *cookie)
> {
> - log_buffer_flush_to_disk();
> - commit_checkpoint_notify_ha(hton, cookie);
> + ib_uint64_t lsn;
> + ib_uint64_t flush_lsn;
> + struct pending_checkpoint * entry;
> +
> + /* Do the allocation outside of lock to reduce contention. The normal
> + case is that not everything is flushed, so we will need to enqueue. */
> + entry = static_cast<struct pending_checkpoint *>
> + (my_malloc(sizeof(*entry), MYF(MY_WME)));
> + if (!entry) {
> + sql_print_error("Failed to allocate %u bytes."
> + " Commit checkpoint will be skipped.",
> + static_cast<unsigned>(sizeof(*entry)));
> + return;
> + }
> +
> + entry->next = NULL;
> + entry->hton = hton;
why do you need to remember the handlerton here? It's always
innodb_hton_ptr, isn't it?
Why do you even pass hton down as an argument, to be able to reuse one
checkpoint request function for many engines?
> + entry->cookie = cookie;
> +
> + mysql_mutex_lock(&pending_checkpoint_mutex);
> + lsn = log_get_lsn();
> + flush_lsn = log_get_flush_lsn();
> + if (lsn > flush_lsn) {
> + /* Put the request in queue.
> + When the log gets flushed past the lsn, we will remove the
> + entry from the queue and notify the upper layer. */
> + entry->lsn = lsn;
> + if (pending_checkpoint_list_end) {
> + pending_checkpoint_list_end->next = entry;
please add an assert here that
pending_checkpoint_list_end->lsn < entry->lsn
> + } else {
> + pending_checkpoint_list = entry;
> + }
> + pending_checkpoint_list_end = entry;
> + entry = NULL;
> + }
> + mysql_mutex_unlock(&pending_checkpoint_mutex);
> +
> + if (entry) {
> + /* We are already flushed. Notify the checkpoint immediately. */
> + commit_checkpoint_notify_ha(entry->hton, entry->cookie);
> + my_free(entry);
I'm a bit worried, whether you're going to do a lot of small
allocations. Perhaps it'd be better to use an allocator of a fixed-size
objects, with a pool. To avoid going to the heap too often. The same
applies to the new code log.cc.
Lock-free allocator can do that. May be we have other fixed-size
allocators with a pool too.
> + }
> +}
Regards,
Sergei
Follow ups
References