maria-developers team mailing list archive

Thread
Date
Re: 5365db70cb8: Improve update handler (long unique keys on blobs)

To: Michael Widenius <michael.widenius@xxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Mon, 16 Mar 2020 20:38:55 +0100
Cc: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAEiQ=_uLBGpPmzw9XTwG2zkACnVndwrOJXiq+TiwwPkQZUzs4g@mail.gmail.com>
User-agent: Mutt/1.10.1 (2018-07-13)
Hi, Michael!

On Mar 16, Michael Widenius wrote:
> 
> > > +#
> > > +# MDEV-21819 Assertion `inited == NONE || update_handler != this'
> > > +# failed in handler::ha_write_row
> > > +#
> > > +CREATE OR REPLACE TABLE t1 (a INT, b BLOB, s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(b)) ENGINE=MyISAM PARTITION BY HASH(a) PARTITIONS 2;
> > > +INSERT INTO t1 VALUES (1,'foo','2022-01-01', '2025-01-01');
> > > +DELETE FROM t1 FOR PORTION OF app FROM '2023-01-01' TO '2024-01-01';
> > > +ERROR 23000: Duplicate entry 'foo' for key 'b'
> > > +DROP TABLE t1;
> > >  set @@GLOBAL.max_allowed_packet= @allowed_packet;
> > > diff --git a/mysql-test/suite/versioning/r/long_unique.result b/mysql-test/suite/versioning/r/long_unique.result
> > > new file mode 100644
> > > index 00000000000..da07bc66e22
> > > --- /dev/null
> > > +++ b/mysql-test/suite/versioning/r/long_unique.result
> > > @@ -0,0 +1,8 @@
> > > +#
> > > +# Assertion `inited == NONE || update_handler != this' failed in
> > > +# handler::ha_write_row
> > > +#
> > > +CREATE TABLE t1 (f VARCHAR(4096), s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(f)) ENGINE=MyISAM;
> > > +INSERT INTO t1 VALUES ('foo', '2023-08-30', '2025-07-09'),('bar', '2021-01-01', '2021-12-31');
> > > +DELETE FROM t1 FOR PORTION OF app FROM '2023-08-29' TO '2025-07-01';
> > > +DROP TABLE t1;
> >
> > you've had almost the same test (MDEV-21819) in main/long_unique.test
> >
> > it's a bit strange to have to very similar tests in different suites
> 
> Agree. However I didn't want to do too much cleanup work
> (Already done way more than my share :( )
> 
> However, I noticed that there was different code path's for versioning when it
> comes to unique handling so I had to add the test to many places.
> 
> > I suggest to move MDEV-21819 test from main/long_unique.test to this file.
> Don't know what you mean with 'this file'. The versioning/long_unique file
> only contains the test relevant for versioning, there is no reason to have
> any more tests in there.
> 
> > and then move this whole file from versioning suite to the period suite.
> As this particular test crashes with just versioning, it should be there,
> not in period.
> Feel free to do the above cleanup later if you want. However, I think
> that main unique testing should be in the main test suite, not in period.

I mean, you have tests, both with PERIOD, but one in the main suite, the
other in the versioning suite. While they both should be in the period
suite. Here, your test from the main/long_unique.test:

+CREATE OR REPLACE TABLE t1 (a INT, b BLOB, s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(b)) ENGINE=MyISAM PARTITION BY HASH(a) PARTITIONS 2;
+INSERT INTO t1 VALUES (1,'foo','2022-01-01', '2025-01-01');
+--error ER_DUP_ENTRY
+DELETE FROM t1 FOR PORTION OF app FROM '2023-01-01' TO '2024-01-01';
+DROP TABLE t1;

Your test from the suite/versioning/t/long_unique.test:

+CREATE TABLE t1 (f VARCHAR(4096), s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(f)) ENGINE=MyISAM;
+INSERT INTO t1 VALUES ('foo', '2023-08-30', '2025-07-09'),('bar', '2021-01-01', '2021-12-31');
+DELETE FROM t1 FOR PORTION OF app FROM '2023-08-29' TO '2025-07-01';
+DROP TABLE t1;

> > > diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
> > > index 73191907aac..c7e61864366 100644
> > > --- a/sql/ha_partition.cc
> > > +++ b/sql/ha_partition.cc
> > > @@ -4340,6 +4339,10 @@ int ha_partition::write_row(const uchar * buf)
> > >    }
> > >    m_last_part= part_id;
> > >    DBUG_PRINT("info", ("Insert in partition %u", part_id));
> > > +  if (table->s->long_unique_table &&
> > > +      m_file[part_id]->update_handler == m_file[part_id] && inited == RND)
> > > +    m_file[part_id]->prepare_for_insert(0);
> >
> > This looks quite incomprehensible. If the table has a long unique key
> > and the update_handler is the handler itself you _prepare for insert_ ???
> 
> This is basically the original code we had in handler::write_row.
> We have to have this in this place as we don't want to call
> handler->prepare_for_insert() for all partitions, only those
> that where really used.

I mean not the condition, but the "prepare_for_insert" method name that
you've introduced, it didn't exist in the original code.

If the name would've been, say, "prepare_update_handler" or
"prepare_long_unique_handler" then the code would've looked fine and I
wouldn't have asked.

> > if it's an insert it should've been prepared earlier, like, normally for
> > an insert, and it don't have to do anything with long uniques.
> 
> See above. The above is to avoid creating new handlers when we don't need
> them!  If you have 100 partitions, you really don't want to create 100 cloned
> table handlers.

Of course. I just mean the method name. If it is "preparing for insert"
as the name suggests, one could've expected it to be done earlier. I'm
saying, the name should not suggest it's a preparation for an insert,
call it, for example, "prepare_update_handler"

> > What you actually do here (as I've found out looking firther in the patch)
> > you prepare update_handler. That is "prepare_for_insert" is VERY confusing
> > here, it sounds like it prepares for a normal insert and should be
> > called at the beginning of mysql_insert(). And it's nothing like that.
> >
> > It would remove half of the questions if you'd call it
> > prepare_update_handler() or, better, prepare_auxiliary_handler() (because
> > it'll be used not only for long uniques) or something like that.
> 
> The idea with prepare_for_insert is that we can add special handler code
> that we want to do once in case of insert. We didn't have that before and
> I can see many use of that.
> 
> In any case, I will add a comment to the above partition code to clarify
> what we are doing:
> 
> "We have to call prepare_for_insert() if we have an update handler
> in the underlying table (to clone the handler). This is because for
> INSERT's prepare_for_insert() is only called for the main table,
> not for all partitions. This is to reduce the huge overhead of cloning
> a possible not needed handler if there are many partitions."

I don't think this is needed. The confusing part was not the condition,
but the method name. If you rename the method, everything else will be
fine.

> > Still, some questions are left:
> >
> > 1. why partitioning needs special code for that?
> 
> The comment explains it all

Not quite. The long unique is only done for the main table, not for
partitions. Individual partitions should never have update_handler
initialized in the first place.

> > 2. when a handler itself is its update_handler AND inited==RND?
> >    the handler itself can be the update_handler on INSERT, but then inited is NONE
> >    Is it UPDATE with system versioning?
> 
> No, this is needed for normal updates, just like in the old code.

For normal updates this->update_handler != this.
For inserts inited!=RND.
I'm asking when update_handler==this && inited==RND.

> > > +
> > >    start_part_bulk_insert(thd, part_id);
> > >
> > >    tmp_disable_binlog(thd); /* Do not replicate the low-level changes. */
> > > diff --git a/sql/handler.cc b/sql/handler.cc
> > > index 0700b1571e5..1e3f987b4e5 100644
> > > --- a/sql/handler.cc
> > > +++ b/sql/handler.cc
> > > @@ -2648,6 +2649,39 @@ handler *handler::clone(const char *name, MEM_ROOT *mem_root)
> > >    return NULL;
> > >  }
> > >
> > > +/**
> > > +  @brief clone of current handler.
> >
> > you generally don't have to write @brief in these cases. Just
> 
> I usually don't write @brief.  Strange, I must have copied the code from
> somewhere.
> Ok, the @brief came from the original code that I copied from table.cc
> and I didn't remove it.
> 
> I have now removed it. Someone else should fix all the other places
> where @brief is used wrongly!

It's not really wrong, just redundant. So, I'd say, does not justify
global removal from everywhere, but better not to do it in the new code.
Not a big deal if you use it, though.

> <cut>
> > > +bool handler::clone_handler_for_update()
> > > +{
> > > +  handler *tmp;
> > > +  DBUG_ASSERT(table->s->long_unique_table);
> > > +
> > > +  if (update_handler != this)
> > > +    return 0;                                   // Already done
> > > +  if (!(tmp= clone(table->s->normalized_path.str, table->in_use->mem_root)))
> > > +    return 1;
> > > +  update_handler= tmp;
> > > +  update_handler->ha_external_lock(table->in_use, F_RDLCK);
> >
> > Looks strange. Why F_RDLCK if it's an *update* handler?
> > This definitely needs a comment, why you're using read lock for updates.
> 
> Because the update handler is ONLY used to check if there is an existing
> row. We never use it for doing changes.
> Note that this is old code. Didn't you catch this in your original review
> before it was pushed ?

Okay. It doesn't matter because it's renamed in the Nikita's patch
anyway. (but only a few days ago, after I wrote that review).

> > > @@ -6481,6 +6515,8 @@ int handler::ha_reset()
> > >    DBUG_ASSERT(inited == NONE);
> > >    /* reset the bitmaps to point to defaults */
> > >    table->default_column_bitmaps();
> > > +  if (update_handler != this)
> > > +    delete_update_handler();
> >
> > you already have an if() inside the delete_update_handler().
> > Either remove it from here or from there.
> 
> We don't check the handler in all calls.
> I just added the check to ha_reset() as this is a very common call
> and wanted to avoid an extra call.
> Keeping current code.

it's not virtual. And in the same file. And small. The compiler will
inline it, you don't need to bother.

> > > +++ b/sql/sql_delete.cc
> > > @@ -752,6 +752,10 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,
> > >            && !table->versioned()
> > >            && table->file->has_transactions();
> > >
> > > +  if (table->versioned(VERS_TIMESTAMP) ||
> > > +      (table_list->has_period() && !portion_of_time_through_update))
> > > +    table->file->prepare_for_insert(1);
> >
> > Ok, now I think that your idea of the default update_handler=this
> > is rather fragile. If the default is NULL, than we can put in many places
> >
> >    DBUG_ASSERT(update_handler != NULL);
> >
> > to make sure it was initialized when needed. Now we cannot do it.
> 
> The current is faster as we can always rely on the value.
> The code will work in most cases even if prepare_for_insert()
> is not set.
> 
> For an assert, on can always do (trough an inline handler function)
> DBUG_ASSERT(update_handler != table->file)
> 
> Exactly same thing.

Not at all, usually update_handler==table->file for inserts. It does not
mean update_handler is not initialized. NULL was not initialized before
your patch, and now one cannot detect whether update_handler was
initialized or not.

> > And there are many places in the code and non-trivial conditions when
> > the update_handler has to be initialized and these places and conditions
> > will change in the future. An assert would be helpful here.
> 
> Nothing stops us from having an assert with current code.
> The only different from my code is that we don't get a crash
> if we forget an assert and wrongly call update_handler.

see above. An assert is now impossible. If we wrongly call
update_handler it might work, but more often than not it'll produce
incorrect table content. So, no crash, but a hard to detect incorrect
data corruption.

> > > +
> > >    THD_STAGE_INFO(thd, stage_updating);
> > >    while (likely(!(error=info.read_record())) && likely(!thd->killed) &&
> > >           likely(!thd->is_error()))
> > > --- a/storage/myisam/ha_myisam.cc
> > > +++ b/storage/myisam/ha_myisam.cc
> > > @@ -1740,7 +1740,7 @@ void ha_myisam::start_bulk_insert(ha_rows rows, uint flags)
> > >      enable_indexes() failed, thus it's essential that indexes are
> > >      disabled ONLY for an empty table.
> > >    */
> > > -  if (file->state->records == 0 && can_enable_indexes &&
> > > +  if (file->state->records == 0 && can_enable_indexes && !has_long_unique() &&
> > >        (!rows || rows >= MI_MIN_ROWS_TO_DISABLE_INDEXES))
> >
> > why do you disable bulk inserts for long uniques?
> 
> Because they are incompatible.
> Didn't you read my excellent commit message?
> " Disable write cache in MyISAM and Aria when using update_handler as
> if cache is used, the row will not be inserted until end of statement
> and update_handler would not find conflicting rows"
> 
> Does that explain things?

No, sorry, it doesn't. I did read your comment before writing the question.

Here you didn't disable _write cache_. You disabled switching off
indexes. Your comment doesn't apply to that, because no matter whether
indexes are disabled or not, rows are still inserted at once.

> > >    {
> > >      if (file->open_flag & HA_OPEN_INTERNAL_TABLE)

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx
Follow ups

Re: 5365db70cb8: Improve update handler (long unique keys on blobs)
From: Michael Widenius, 2020-03-16
References

Re: 5365db70cb8: Improve update handler (long unique keys on blobs)
From: Sergei Golubchik, 2020-03-01
Re: 5365db70cb8: Improve update handler (long unique keys on blobs)
From: Michael Widenius, 2020-03-16