← Back to team overview

maria-developers team mailing list archive

Re: 38a888da0f1: MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT

 

Sergei,

On Wed, Jun 2, 2021 at 12:06 AM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
>
> Hi, Aleksey!
>
> On Jun 01, Aleksey Midenkov wrote:
> > > > +# Don't auto-create new partition on DELETE HISTORY:
> > > > +set timestamp= unix_timestamp('2000-01-01 00:00:00');
> > > > +create or replace table t (a int) with system versioning
> > > > +partition by system_time interval 1 hour auto;
> > > > +set timestamp= unix_timestamp('2000-01-01 10:00:00');
> > >
> > > perhaps, make it only 02:00:00 to avoid any chance that
> > > a partition won't be created because of max-auto-create limit.
> > > it's possible we put the limit at 10, but 2 is very unlikely.
> > >
> > > or does max-auto-create limit always result in a warning?
> >
> > What is max-auto-create limit and how do we make it 10? Did you mean
> > that condition:
> >
> >           if (*create_count == MAX_PARTITIONS - 2)
> >           {
> >             my_error(ER_TOO_MANY_PARTITIONS_ERROR, MYF(0));
> >             return true;
> >           }
>
> Yes, that's what I meant. Ok, there will be an error, good.
>
> > > > diff --git a/mysql-test/suite/versioning/r/partition.result b/mysql-test/suite/versioning/r/partition.result
> > > > --- a/mysql-test/suite/versioning/r/partition.result
> > > > +++ b/mysql-test/suite/versioning/r/partition.result
> > > > @@ -1236,27 +1270,826 @@ t1   CREATE TABLE `t1` (
> ...
> > > > +Warnings:
> > > > +Warning      4114    Versioned table `test`.`t1`: last HISTORY partition (`p0`) is out of INTERVAL, need more HISTORY partitions
> > > > +select * from t1 for system_time as of '2000-01-01 01:00:00';
> > > > +x
> > > > +select * from t1 partition (p0) order by x;
> > > > +x
> > > > +440
> > > > +441
> > > > +alter table t1 add partition partitions 3;
> > > > +select * from t1 for system_time as of '2000-01-01 01:00:00';
> > > > +x
> > > > +441
> > > > +select * from t1 partition (p0) order by x;
> > > > +x
> > > > +440
> > >
> > > would be good to have EXPLAINs after the overflow and after the manual fix.
> > > to show how ALTER changes what partition will be pruned
>
> What about this? Agree/disagree?

Agree.

>
> > > > diff --git a/mysql-test/suite/versioning/t/partition.test b/mysql-test/suite/versioning/t/partition.test
> > > > --- a/mysql-test/suite/versioning/t/partition.test
> > > > +++ b/mysql-test/suite/versioning/t/partition.test
> > > > @@ -1068,13 +1078,456 @@ alter table t1 add partition partitions 2;
> > > >  --replace_result $default_engine DEFAULT_ENGINE
> > > >  show create table t1;
> > > >  alter table t1 add partition partitions 8;
> > > > +drop table t1;
> > > > +
> > > > +--echo #
> > > > +--echo # End of 10.5 tests
> > > > +--echo #
> > > > +
> > > > +--echo #
> > > > +--echo # MDEV-17554 Auto-create new partition for system versioned tables with history partitioned by INTERVAL/LIMIT
> > > > +--echo #
> > > > +create or replace table t1 (x int) with system versioning
> > > > +partition by system_time limit 1 auto;
> > > > +--replace_result $default_engine DEFAULT_ENGINE
> > > > +show create table t1;
> > > > +
> > >
> > > in the previous version of this patch you had clarifying comments here, like
> > >
> > > --echo # INSERT, INSERT .. SELECT don't auto-create partitions
> > >
> > > or
> > >
> > > --echo # Number of partitions goes from 3 to 5
> > >
> > > I kind of miss them now, they were helpful
> >
> > Yes, they were helpful as is. I removed them like I promised in the
> > previous thread. Though, Sergei, you are free to add your own comments
> > in the form you like. I'm really sorry, but nobody won't tell me what
> > words to use unless I am not able to be clear and bear the common
> > sense.
>
> okay
>
> > > > diff --git a/sql/ha_partition.h b/sql/ha_partition.h
> > > > --- a/sql/ha_partition.h
> > > > +++ b/sql/ha_partition.h
> > > > @@ -1607,7 +1607,7 @@ class ha_partition :public handler
> > > >      for (; part_id < part_id_end; ++part_id)
> > > >      {
> > > >        handler *file= m_file[part_id];
> > > > -      DBUG_ASSERT(bitmap_is_set(&(m_part_info->read_partitions), part_id));
> > > > +      bitmap_set_bit(&(m_part_info->read_partitions), part_id);
> > >
> > > would be good to have a comment before the method that it's
> > > only used in vers_set_hist_part(), if (vers_info->limit).
> > >
> > > because if it's a generic method to return the number of rows in a partition
> > > (incl. subpartitions) then it's totally not clear why it modifies
> > > read_partitions bitmap.
> > >
> > > or may be you'd want to rename the method to make it look more
> > > specific for LIMIT partitions in vers_set_hist_part() ?
> >
> > Added a comment if it will matter to anyone. No, I prefer generic and
> > easy names instead of long and capacious ones. I believe the latter
> > ones are non-productive.
>
> I also prefer generic names, but only if the method is generic.
>
> I feel that modifying the read_partitions bitmap as a side effect
> makes this function rather specialized.

Generally there are a lot of complications and exceptions everywhere
in the world. That is the nature of life. Or that is just a limitation
of us to not see the higher level of order in the observed entropy. If
we always mention and signify that, we'll spend too much energy in
vain. That is called tediousness. But I made a good enough comment for
the whole function, so I hope that will satisfy both of us.

>
> > > > diff --git a/sql/lock.cc b/sql/lock.cc
> > > > --- a/sql/lock.cc
> > > > +++ b/sql/lock.cc
> > > > @@ -662,16 +662,28 @@ MYSQL_LOCK *mysql_lock_merge(MYSQL_LOCK *a,MYSQL_LOCK *b)
> > > >    DBUG_PRINT("enter", ("a->lock_count: %u  b->lock_count: %u",
> > > >                         a->lock_count, b->lock_count));
> > > >
> > > > -  if (!(sql_lock= (MYSQL_LOCK*)
> > > > -     my_malloc(key_memory_MYSQL_LOCK, sizeof(*sql_lock) +
> > > > -               sizeof(THR_LOCK_DATA*)*((a->lock_count+b->lock_count)*2) +
> > > > -               sizeof(TABLE*)*(a->table_count+b->table_count),MYF(MY_WME))))
> > > > -    DBUG_RETURN(0);                          // Fatal error
> > > > +  const size_t lock_size= sizeof(*sql_lock) +
> > > > +    sizeof(THR_LOCK_DATA *) * ((a->lock_count + b->lock_count) * 2) +
> > > > +    sizeof(TABLE *) * (a->table_count + b->table_count);
> > > > +  if (thd)
> > > > +  {
> > > > +    sql_lock= (MYSQL_LOCK *) thd->alloc(lock_size);
> > > > +    if (!sql_lock)
> > > > +      DBUG_RETURN(0);
> > > > +    sql_lock->flags= GET_LOCK_ON_THD;
> > >
> > > why? the commit comment for MDEV-23639 doesn't mention it.
> > >
> > > generally, I think it's a good idea and, likely, should be used more than
> > > just in recover_from_failed_open(). But why right now and in this commit?
> > > Does this commit (22158b3c) rely on locks being on THD memroot? How so?
> >
> > Yes, it is possible to have locks on THD at that moment. Sorry, I was
> > too tired to investigate the exact cause nor to have interest in that.
> > And I don't really feel that misc refactoring is something important
> > to mention. All I can say: I am not happy with the two allocation
> > methods for locks. They should always be allocated from mem_root.
> > Locked_tables_list has its own dedicated one AFAIR.
>
> This doesn't answer my question. Was it part of the MDEV-23639 bug fix?
> It was part of the MDEV-23639 commit.

Sorry, I meant the moment when MDEV-23639 calls mysql_lock_merge().
Yes, that is the part of the bug fix.

>
> > > > diff --git a/sql/share/errmsg-utf8.txt b/sql/share/errmsg-utf8.txt
> > > > --- a/sql/share/errmsg-utf8.txt
> > > > +++ b/sql/share/errmsg-utf8.txt
> > > > @@ -7985,3 +7985,5 @@ ER_JSON_TABLE_MULTIPLE_MATCHES
> > > >          eng "Can't store multiple matches of the path in the column '%s' of JSON_TABLE '%s'."
> > > >  ER_WITH_TIES_NEEDS_ORDER
> > > >          eng "FETCH ... WITH TIES requires ORDER BY clause to be present"
> > > > +ER_VERS_HIST_PART_FAILED
> > > > +        eng "Versioned table %`s.%`s: adding HISTORY partition failed"
> > >
> > > Can you come up with a test for this error?
> >
> > Why? There are 5 cases, for which of them? I'll try to explain: the
> > code in these calls is vanilla and should be already tested. So, why?
>
> I believe I didn't see "adding HISTORY partition failed" error message
> in any of your tests, that's why I asked.
>
> If it's already tested, then, of course, there's no need to test it
> again. I just asked to have it tested _at least once_

Well, I meant the lower level is tested by other tests via other error
codes. But after rethinking I agree this error code should be tested
at least once. I'll try to find some way to test this without DBUG
package.

>
> > > > +  /*
> > > > +      NOTE: The semantics of vers_set_hist_part() is double: even when we
> > >
> > > twofold
> >
> > Dual, double or twofold. What is the difference between the synonyms?
>
> https://diffsense.com/diff/double/twofold
>
> here it's used as an adjective. The page above explains:
>
>   Twofold: having two parts.
>   Examples: "a twofold nature; a twofold sense; a twofold argument"

This site says:

When used as adjectives, ... twofold means double.

Do you really believe in that butter butterish? :) I mean do we need
to discuss all sorts of butter? That level of white noise should be
ignored.

>
> > > > +      don't need auto-create, we need to update part_info->hist_part.
> > > > +  */
> ...
> > > > +    else
> > > > +    {
> > > > +      /*
> > > > +          NOTE: this may repeat multiple times until creating thread acquires
> > > > +          MDL_EXCLUSIVE. Since auto-creation is rare operation this is acceptable.
> > > > +          We could suspend this thread on cond-var but we must first exit
> > > > +          MDL_SHARED_WRITE first and we cannot store cond-var into TABLE_SHARE
> > > > +          because it is already released and there is no guarantee that it will
> > > > +          be same instance if we acquire it again.
> > > > +      */
> > > > +      table_list->vers_skip_create= false;
> > > > +      ot_ctx->vers_create_count= 0;
> > > > +      action= Open_table_context::OT_REOPEN_TABLES;
> > > > +      table_arg= NULL;
> > > > +    }
> > >
> > > I'm afraid I don't understand. All this business with vers_skip_create and
> > > vers_skip_auto_create, it wasn't in the previous version of the patch. So, I
> > > believe it was a fix for one of the MDEV bugs reported and fixed meanwhile.
> > >
> > > What MDEV was it?
> > > Could you explain what was the issue and what was the fix, please?
> >
> > No, that was the multi-threaded case which worked good for me, but
> > suddenly I discovered it fails on some buildbot.
>
> Could you elaborate on what the problem was?
> Two threads trying to add the partition in parallel? I'd expect
> MDL_EXCLUSIVE to prevent that.

MDL_EXCLUSIVE prevents parallel execution of
repair_from_failed_open(), but not sequential. So it can add several
partitions instead of 1, one after another.

>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and security@xxxxxxxxxxx



-- 
All the best,

Aleksey Midenkov
@midenok


Follow ups

References