maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12653
Re: c4de76aeff8: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
Hi, Aleksey!
On Apr 07, Aleksey Midenkov wrote:
> > > +
> > > +--echo # INSERT, INSERT .. SELECT don't increment partitions
> >
> > it's not really "increment", better say "don't auto-create"
>
> Actually I like "increment" more. "Auto-create" overcomplicates phrases:
>
> --echo # Increment from 3 to 5
> --echo # Increment from 3 to 6, manual names, LOCK TABLES
> --echo # Multiple increments in single command
>
> Besides "increment" is correct because PARTITIONS number is incremented.
Sure.
"Increment the number of partitions", this is fine.
"Auto create partitions" is also fine.
"Increment partitions" is meaningless.
> > > +insert into t1 values (1);
> > > +
> > > +create or replace table t2 (y int);
> > > +insert into t2 values (2);
> > > +
> > > +insert into t1 select * from t2;
> > > +insert into t2 select * from t1;
> >
> > why do you need a t2 table in this test?
>
> t1 is not incremented in case of
>
> insert into t2 select * from t1;
add a comment please
> > > +--echo # Here t2 is incremented too, but not updated
> > > +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> > > +update t1, t2 set t1.x= 0 where t1.x< t2.y;
> > > +--replace_result $default_engine DEFAULT_ENGINE
> > > +show create table t1;
> > > +# Multiupdate_prelocking_strategy::handle_end() is processed after table open.
> > > +# For PS it is possible to skip unneeded auto-creation because the above happens at
> > > +# prepare stage and auto-creation is done at execute stage.
> > > +--replace_result $default_engine DEFAULT_ENGINE 'PARTITIONS 4' 'PARTITIONS ok' 'PARTITIONS 5' 'PARTITIONS ok'
> >
> > eh... I don't think this is really "ok".
> > As far as I remember, Multiupdate_prelocking_strategy knows what tables
> > should be opened for writing and what for reading. Why would a new partition
> > be created for t2?
>
> It knows this after tables are opened. Look at handle_end(),
> specifically mysql_handle_derived(), handle_derived(),
> setup_fields_with_no_wrap() and check_fields(). I believe all these
> calls are required to get proper get_table_map().
>
> To get this working properly there must be 2-staged open tables,
> something like PS does.
You mean, a new partition is created for t2 in normal mode and not
created in --ps?
What if you'll also add `&& table_list->updating` to your condition in
TABLE::vers_need_hist_part() ?
> > > diff --git a/mysql-test/suite/versioning/r/partition.result b/mysql-test/suite/versioning/r/partition.result
> > > index 2a277b1c2ea..8369b40d98c 100644
> > > --- a/mysql-test/suite/versioning/r/partition.result
> > > +++ b/mysql-test/suite/versioning/r/partition.result
> > > @@ -1236,27 +1270,752 @@ t1 CREATE TABLE `t1` (
> > ...
> > > +set timestamp= unix_timestamp('2000-01-01 03:00:00');
> > > +affected rows: 0
> > > +lock tables t1 write;
> > > +affected rows: 0
> > > +execute s;
> > > +affected rows: 1
> > > +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0
> > > +execute s;
> > > +affected rows: 1
> > > +info: Rows matched: 1 Changed: 1 Inserted: 1 Warnings: 0
> > > +show create table t1;
> > > +Table Create Table
> > > +t1 CREATE TABLE `t1` (
> > > + `x` int(11) DEFAULT NULL
> > > +) ENGINE=DEFAULT_ENGINE DEFAULT CHARSET=latin1 WITH SYSTEM VERSIONING
> > > + PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR STARTS TIMESTAMP'2000-01-01 00:00:00' AUTO
> > > +PARTITIONS 6
> >
> > why did it add two partitions here?
>
> Because of this condition
>
> /*
> When hist_part is almost full LOCK TABLES my overflow the partition as we
> can't add new partitions under LOCK TABLES. Reserve one more for this case.
> */
> if (auto_hist && create_count < 2 &&
> thd->lex->sql_command == SQLCOM_LOCK_TABLES &&
> vers_info->hist_part->id + 1 == vers_info->now_part->id)
> create_count++;
Hmm. I thought the logic is - normally for UPDATE/DELETE you create a
new history partition when you need to write into it. And under LOCK
TABLES you create the next partition, just in case, even if the current
partition is still fine. Which kind of makes sense, at least until
we'll fix that MDEV about creating partitions under LOCK TABLES.
But this doesn't mean LOCK TABLES should create two partitions, I think.
If UPDATE/DELETE needs to write into a new partition - you create it,
LOCK TABLES or not. But I'd rather say that if it's just created, then
it's not quite "almost full" (it's more like "almost empty", even
"completely empty"). So there's no need to create a second partition at
the same time.
> > > +
> > > --source include/rpl_end.inc
> > > diff --git a/sql/ha_partition.h b/sql/ha_partition.h
> > > index 60a2d7f6762..7ade4419c22 100644
> > > --- a/sql/ha_partition.h
> > > +++ b/sql/ha_partition.h
> > > @@ -1610,7 +1610,7 @@ class ha_partition :public handler
> > > for (; part_id < part_id_end; ++part_id)
> > > {
> > > handler *file= m_file[part_id];
> > > - DBUG_ASSERT(bitmap_is_set(&(m_part_info->read_partitions), part_id));
> > > + bitmap_set_bit(&(m_part_info->read_partitions), part_id);
> >
> > Where was it set before your patch?
>
> #1 in partition_info::set_partition_bitmaps () at ../src/sql/partition_info.cc:274
> #2 in ha_partition::change_partitions_to_open () at ../src/sql/ha_partition.cc:8644
> #3 in set_partitions_as_used () at ../src/sql/sql_base.cc:1562
> #4 in open_table () at ../src/sql/sql_base.cc:1990
> #5 in open_and_process_table () at ../src/sql/sql_base.cc:3801
> #6 in open_tables () at ../src/sql/sql_base.cc:4275
okay. Why is it not happening now?
> > after such an overflow a table becomes essentially corrupted,
> > rows are in a wrong partition. So new history partitions cannot be added
> > anymore without reorganizing the last partition.
> >
> > How is it handled now?
> > (it's just a question, not part of the review, as it's not something you
> > were changing in your patch)
You've seemed to miss this question.
> > > +bool TABLE::vers_need_hist_part(const THD *thd, const TABLE_LIST *table_list) const
> > > +{
> > > +#ifdef WITH_PARTITION_STORAGE_ENGINE
> > > + if (part_info && part_info->part_type == VERSIONING_PARTITION &&
> > > + !table_list->vers_conditions.delete_history &&
> > > + !thd->stmt_arena->is_stmt_prepare() &&
> > > + table_list->lock_type >= TL_WRITE_ALLOW_WRITE &&
> > > + table_list->mdl_request.type >= MDL_SHARED_WRITE &&
> > > + table_list->mdl_request.type < MDL_EXCLUSIVE)
> > > + {
> > > + switch (thd->lex->sql_command)
> > > + {
> >
> > SQLCOM_LOAD with DUP_REPLACE?
>
> Added.
With a test, I hope :)
> > > + if (thd->rgi_slave && thd->rgi_slave->current_event && thd->lex->sql_command == SQLCOM_END)
> >
> > I wonder, would it be possible, instead of introducing rgi_slave->current_event,
> > just make row events set thd->lex->sql_command appropriately?
> > For example, currently row events increment thd->status_var.com_stat[]
> > each event for its own SQLCOM_xxx, it won't be needed if they'll just set
> > thd->lex->sql_command.
>
> Do you think this is a quick refactoring? I had the same idea when I
> did this. And probably I tried to do that quickly already.
> Added TODO comment.
Right, may be not quick enough for now :(
> > > @@ -2032,6 +2083,20 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
> > > tc_add_table(thd, table);
> > > }
> > >
> > > + if (!ot_ctx->vers_create_count &&
> >
> > what does this condition mean? When is ot_ctx->vers_create_count > 0 ?
>
> When we already requested backoff action.
Could you describe an example?
I thought if there are two tables in a list and both need to add a new
partition (like in a multi-update), then backoff will happen on the
first one, so open_table() won't reach the second with
ot_ctx->vers_create_count > 0.
> > > + table->vers_need_hist_part(thd, table_list))
> > > + {
> > > + ot_ctx->vers_create_count= table->part_info->vers_set_hist_part(thd, true);
> > > + if (ot_ctx->vers_create_count)
> > > + {
> > > + MYSQL_UNBIND_TABLE(table->file);
> > > + tc_release_table(table);
> > > + ot_ctx->request_backoff_action(Open_table_context::OT_ADD_HISTORY_PARTITION,
> > > + table_list);
> > > + DBUG_RETURN(TRUE);
> > > + }
> > > + }
> > > +
> > > if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&
> > > table->s->table_category < TABLE_CATEGORY_INFORMATION)
> > > {
> > > @@ -3172,10 +3239,18 @@ Open_table_context::recover_from_failed_open()
> > > break;
> > > case OT_DISCOVER:
> > > case OT_REPAIR:
> > > - if ((result= lock_table_names(m_thd, m_thd->lex->create_info,
> > > - m_failed_table, NULL,
> > > - get_timeout(), 0)))
> > > + case OT_ADD_HISTORY_PARTITION:
> > > + result= lock_table_names(m_thd, m_thd->lex->create_info, m_failed_table,
> > > + NULL, get_timeout(), 0);
> > > + if (result)
> > > + {
> > > + if (m_action == OT_ADD_HISTORY_PARTITION)
> > > + {
> > > + m_thd->clear_error();
> >
> > why?
>
> Because of MDEV-23642.
I see. I generally prefer to avoid thd->clear_error() because of its non
discriminative nature.
It might be ok here, but then, please add an assert before
lock_table_names:
DBUG_ASSERT(!m_thd->get_stmt_da()->is_set());
> >
> > > + result= false;
> > > + }
> > > @@ -7281,7 +7291,7 @@ uint fast_alter_partition_table(THD *thd, TABLE *table,
> > > write_log_add_change_partition(lpt) ||
> > > ERROR_INJECT_CRASH("crash_add_partition_4") ||
> > > ERROR_INJECT_ERROR("fail_add_partition_4") ||
> > > - mysql_change_partitions(lpt) ||
> > > + mysql_change_partitions(lpt, false) ||
> >
> > this way you skip trans_commit_stmt/trans_commit_implicit for
> > ALTER TABLE ... ADD RANGE/LIST partition.
> > is it always ok?
> >
> > A safer alternative would've been
> >
> > mysql_change_partitions(lpt, !(alter_info->partition_flags & ALTER_PARTITION_AUTO_HIST))
> >
> > but it's more complex, so I'd prefer your variant, if we can be sure that it
> > doesn't break anything.
>
> The condition is
>
> if (copy_data && mysql_trans_commit_alter_copy_data(thd))
>
> Data is not needed to be copied in ADD RANGE/LIST partition, is it? We
> can be sure only from testing period until GA.
well, we can never be sure after any amount of testing.
You're right, the data, indeed, won't be copied in ADD RANGE/LIST
partition. But perhaps InnoDB can still open a transaction there? As far
as I understand a transaction is started on
ha_innobase::external_lock(). Will it happen here? Or start_stmt() under
LOCK TABLES. Is that possible?
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx
Follow ups
References