maria-developers team mailing list archive

Thread
Date
Re: 5365db70cb8: Improve update handler (long unique keys on blobs)

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Michael Widenius <michael.widenius@xxxxxxxxx>
Date: Mon, 16 Mar 2020 23:01:01 +0200
Cc: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20200316193855.GA8144@pweza.fritz.box>
Hi!

<cut>

> I mean, you have tests, both with PERIOD, but one in the main suite, the
> other in the versioning suite. While they both should be in the period
> suite. Here, your test from the main/long_unique.test:

> +CREATE OR REPLACE TABLE t1 (a INT, b BLOB, s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(b)) ENGINE=MyISAM PARTITION BY HASH(a) PARTITIONS 2;
> +INSERT INTO t1 VALUES (1,'foo','2022-01-01', '2025-01-01');
> +--error ER_DUP_ENTRY
> +DELETE FROM t1 FOR PORTION OF app FROM '2023-01-01' TO '2024-01-01';
> +DROP TABLE t1;
>
> Your test from the suite/versioning/t/long_unique.test:
>
> +CREATE TABLE t1 (f VARCHAR(4096), s DATE, e DATE, PERIOD FOR app(s,e), UNIQUE(f)) ENGINE=MyISAM;
> +INSERT INTO t1 VALUES ('foo', '2023-08-30', '2025-07-09'),('bar', '2021-01-01', '2021-12-31');
> +DELETE FROM t1 FOR PORTION OF app FROM '2023-08-29' TO '2025-07-01';
> +DROP TABLE t1;

The above are actual different, indendent tests not related to each other.
But I agree that they are both related to period.

I will move the PERIOD test from
main/long_unique.test
and
suite/versioning/t/long_unique.test
to
suite/period/t/long_unique.test

Sorry, I got confused about the suggestion with moving whole test files.


> > > > diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
> > > > index 73191907aac..c7e61864366 100644
> > > > --- a/sql/ha_partition.cc
> > > > +++ b/sql/ha_partition.cc
> > > > @@ -4340,6 +4339,10 @@ int ha_partition::write_row(const uchar * buf)
> > > >    }
> > > >    m_last_part= part_id;
> > > >    DBUG_PRINT("info", ("Insert in partition %u", part_id));
> > > > +  if (table->s->long_unique_table &&
> > > > +      m_file[part_id]->update_handler == m_file[part_id] && inited == RND)
> > > > +    m_file[part_id]->prepare_for_insert(0);
> > >
> > > This looks quite incomprehensible. If the table has a long unique key
> > > and the update_handler is the handler itself you _prepare for insert_ ???
> >
> > This is basically the original code we had in handler::write_row.
> > We have to have this in this place as we don't want to call
> > handler->prepare_for_insert() for all partitions, only those
> > that where really used.
>
> I mean not the condition, but the "prepare_for_insert" method name that
> you've introduced, it didn't exist in the original code.
>
> If the name would've been, say, "prepare_update_handler" or
> "prepare_long_unique_handler" then the code would've looked fine and I
> wouldn't have asked.
>
> > > if it's an insert it should've been prepared earlier, like, normally for
> > > an insert, and it don't have to do anything with long uniques.
> >
> > See above. The above is to avoid creating new handlers when we don't need
> > them!  If you have 100 partitions, you really don't want to create 100 cloned
> > table handlers.
>
> Of course. I just mean the method name. If it is "preparing for insert"
> as the name suggests, one could've expected it to be done earlier. I'm
> saying, the name should not suggest it's a preparation for an insert,
> call it, for example, "prepare_update_handler"

See my comments. It's not only intended for update handler, so I prefer
to keep the current name as that will be more future proof.

Also, it should only be called when one does insert's, not because
there exists an update handler.  It's perfectly safe to call before
any insert usage. The current code just optimized when it's called.

>
> > > What you actually do here (as I've found out looking firther in the patch)
> > > you prepare update_handler. That is "prepare_for_insert" is VERY confusing
> > > here, it sounds like it prepares for a normal insert and should be
> > > called at the beginning of mysql_insert(). And it's nothing like that.

We are calling it as start of mysql_insert(), however we have an if around
it as a small optimization.  We can remove the if you want, things will
work as before if we d.

> > > It would remove half of the questions if you'd call it
> > > prepare_update_handler() or, better, prepare_auxiliary_handler() (because
> > > it'll be used not only for long uniques) or something like that.

As one ONLY have to call it for insert and it's safe to call if one has
update handler or not, I don't think any of the above names are any better.

> > The idea with prepare_for_insert is that we can add special handler code
> > that we want to do once in case of insert. We didn't have that before and
> > I can see many use of that.
> >
> > In any case, I will add a comment to the above partition code to clarify
> > what we are doing:
> >
> > "We have to call prepare_for_insert() if we have an update handler
> > in the underlying table (to clone the handler). This is because for
> > INSERT's prepare_for_insert() is only called for the main table,
> > not for all partitions. This is to reduce the huge overhead of cloning
> > a possible not needed handler if there are many partitions."
>
> I don't think this is needed. The confusing part was not the condition,
> but the method name. If you rename the method, everything else will be
> fine.

I think it will be worse with a rename. I had originally plans to add more
things to prepare_for_insert(), but this is for a future commit.

> > > Still, some questions are left:
> > >
> > > 1. why partitioning needs special code for that?
> >
> > The comment explains it all
>
> Not quite. The long unique is only done for the main table, not for
> partitions. Individual partitions should never have update_handler
> initialized in the first place.

It must have as we call ha_write_row() with the partition handler and
this must do with the unique handler if the row exists or not.


> > > 2. when a handler itself is its update_handler AND inited==RND?
> > >    the handler itself can be the update_handler on INSERT, but then inited is NONE
> > >    Is it UPDATE with system versioning?
> >
> > No, this is needed for normal updates, just like in the old code.
>
> For normal updates this->update_handler != this.
> For inserts inited!=RND.
> I'm asking when update_handler==this && inited==RND.

This is the code from the original code before any of my changes.
I assumed you had already asked that as part of the orignal review.

In case of insert with DUP_ERROR we call ha_rnd_init_with_error().
I thought it was possible also with update, but that could be mainly
when versioning is involved.

> > > >    start_part_bulk_insert(thd, part_id);
> > > >
> > > >    tmp_disable_binlog(thd); /* Do not replicate the low-level changes. */
> > > > diff --git a/sql/handler.cc b/sql/handler.cc
> > > > index 0700b1571e5..1e3f987b4e5 100644
> > > > --- a/sql/handler.cc
> > > > +++ b/sql/handler.cc
> > > > @@ -2648,6 +2649,39 @@ handler *handler::clone(const char *name, MEM_ROOT *mem_root)
> > > >    return NULL;
> > > >  }
> > > >
> > > > +/**
> > > > +  @brief clone of current handler.
> > >
> > > you generally don't have to write @brief in these cases. Just
> >
> > I usually don't write @brief.  Strange, I must have copied the code from
> > somewhere.
> > Ok, the @brief came from the original code that I copied from table.cc
> > and I didn't remove it.
> >
> > I have now removed it. Someone else should fix all the other places
> > where @brief is used wrongly!
>
> It's not really wrong, just redundant. So, I'd say, does not justify
> global removal from everywhere, but better not to do it in the new code.
> Not a big deal if you use it, though.

I don't like it. I only got @brief becaues I copied two functions with
function comments between files.  When doing that, I prefer to do as
little changes as possible to make it easier for people to find
the new functions.

>
> > <cut>
> > > > +bool handler::clone_handler_for_update()
> > > > +{
> > > > +  handler *tmp;
> > > > +  DBUG_ASSERT(table->s->long_unique_table);
> > > > +
> > > > +  if (update_handler != this)
> > > > +    return 0;                                   // Already done
> > > > +  if (!(tmp= clone(table->s->normalized_path.str, table->in_use->mem_root)))
> > > > +    return 1;
> > > > +  update_handler= tmp;
> > > > +  update_handler->ha_external_lock(table->in_use, F_RDLCK);
> > >
> > > Looks strange. Why F_RDLCK if it's an *update* handler?
> > > This definitely needs a comment, why you're using read lock for updates.
> >
> > Because the update handler is ONLY used to check if there is an existing
> > row. We never use it for doing changes.
> > Note that this is old code. Didn't you catch this in your original review
> > before it was pushed ?
>
> Okay. It doesn't matter because it's renamed in the Nikita's patch
> anyway. (but only a few days ago, after I wrote that review).

I hope that Nikita will base his new work on my code. It would be better that
he wait until I have pushed before doing new work.
>
> > > > @@ -6481,6 +6515,8 @@ int handler::ha_reset()
> > > >    DBUG_ASSERT(inited == NONE);
> > > >    /* reset the bitmaps to point to defaults */
> > > >    table->default_column_bitmaps();
> > > > +  if (update_handler != this)
> > > > +    delete_update_handler();
> > >
> > > you already have an if() inside the delete_update_handler().
> > > Either remove it from here or from there.
> >
> > We don't check the handler in all calls.
> > I just added the check to ha_reset() as this is a very common call
> > and wanted to avoid an extra call.
> > Keeping current code.
>
> it's not virtual. And in the same file. And small. The compiler will
> inline it, you don't need to bother.

The compiler will also remove it. I think it's better to make things
explicite for anything that is in commonly used functions.


> > > > +++ b/sql/sql_delete.cc
> > > > @@ -752,6 +752,10 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,
> > > >            && !table->versioned()
> > > >            && table->file->has_transactions();
> > > >
> > > > +  if (table->versioned(VERS_TIMESTAMP) ||
> > > > +      (table_list->has_period() && !portion_of_time_through_update))
> > > > +    table->file->prepare_for_insert(1);
> > >
> > > Ok, now I think that your idea of the default update_handler=this
> > > is rather fragile. If the default is NULL, than we can put in many places
> > >
> > >    DBUG_ASSERT(update_handler != NULL);
> > >
> > > to make sure it was initialized when needed. Now we cannot do it.
> >
> > The current is faster as we can always rely on the value.
> > The code will work in most cases even if prepare_for_insert()
> > is not set.
> >
> > For an assert, on can always do (trough an inline handler function)
> > DBUG_ASSERT(update_handler != table->file)
> >
> > Exactly same thing.
>
> Not at all, usually update_handler==table->file for inserts. It does not
> mean update_handler is not initialized. NULL was not initialized before
> your patch, and now one cannot detect whether update_handler was
> initialized or not.

Initialized means that it changes value. In old code it was only
updated if there was an update handler, so asserts would not work 'as
such'. Also it was only set after the first insert, so it was hard
to have any asserts over the place.

It still really trivial to add an assert that checks if we should need
a clone and if the update handler was set.  It's actually easier to do
now as the update handler will be set BEFORE the ha_write_row call.


> > > And there are many places in the code and non-trivial conditions when
> > > the update_handler has to be initialized and these places and conditions
> > > will change in the future. An assert would be helpful here.
> >
> > Nothing stops us from having an assert with current code.
> > The only different from my code is that we don't get a crash
> > if we forget an assert and wrongly call update_handler.
>
> see above. An assert is now impossible. If we wrongly call
> update_handler it might work, but more often than not it'll produce
> incorrect table content. So, no crash, but a hard to detect incorrect
> data corruption.

If ha_reset() is called, then things will work.  If it's not called,
most things will crash.  This means that it's not likely that this
will go wrong without having it detected.

> > > > +
> > > >    THD_STAGE_INFO(thd, stage_updating);
> > > >    while (likely(!(error=info.read_record())) && likely(!thd->killed) &&
> > > >           likely(!thd->is_error()))
> > > > --- a/storage/myisam/ha_myisam.cc
> > > > +++ b/storage/myisam/ha_myisam.cc
> > > > @@ -1740,7 +1740,7 @@ void ha_myisam::start_bulk_insert(ha_rows rows, uint flags)
> > > >      enable_indexes() failed, thus it's essential that indexes are
> > > >      disabled ONLY for an empty table.
> > > >    */
> > > > -  if (file->state->records == 0 && can_enable_indexes &&
> > > > +  if (file->state->records == 0 && can_enable_indexes && !has_long_unique() &&
> > > >        (!rows || rows >= MI_MIN_ROWS_TO_DISABLE_INDEXES))
> > >
> > > why do you disable bulk inserts for long uniques?
> >
> > Because they are incompatible.
> > Didn't you read my excellent commit message?
> > " Disable write cache in MyISAM and Aria when using update_handler as
> > if cache is used, the row will not be inserted until end of statement
> > and update_handler would not find conflicting rows"
> >
> > Does that explain things?
>
> No, sorry, it doesn't. I did read your comment before writing the question.
>
> Here you didn't disable _write cache_. You disabled switching off
> indexes. Your comment doesn't apply to that, because no matter whether
> indexes are disabled or not, rows are still inserted at once.

Sorry, you right the comment is wrong. I orignally thought we have to
switch of the row cache, but we didn't need to as we do flush the cache
for any key read.

I update the comment to the following:

- Disable write cache in MyISAM and Aria when using update_handler as
  if cache is used, the long unique index will not be updated on insert
  because the index is not marked with HA_NOSAME

A better version for future updates to long unique would be to mark
the long unique index with a special flag (HA_EXTERNAL_NOSAME) so
that the engine knows that this can't be disabled.

Regards,
Monty
References

Re: 5365db70cb8: Improve update handler (long unique keys on blobs)
From: Sergei Golubchik, 2020-03-01
Re: 5365db70cb8: Improve update handler (long unique keys on blobs)
From: Michael Widenius, 2020-03-16
Re: 5365db70cb8: Improve update handler (long unique keys on blobs)
From: Sergei Golubchik, 2020-03-16