maria-developers team mailing list archive

Thread
Date

Re: 6ac19d09c66: MDEV-16978 Application-time periods: WITHOUT OVERLAPS

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Nikita Malyavin <nikitamalyavin@xxxxxxxxx>
Date: Tue, 3 Dec 2019 22:37:38 +1000
Cc: "MariaDB Developers \(maria-developers@xxxxxxxxxxxxxxxxxxx\)" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20191203115226.GA20701@pweza.fritz.box>

Hello!

On Tue, 3 Dec 2019 at 21:52, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Nikita!
>
> On Nov 26, Nikita Malyavin wrote:
> > Hello, Sergei!
> > I have updated the code, please see
> >
> >    - bb-10.5-MDEV-16978-without-overlaps
> >    <
> https://github.com/MariaDB/server/compare/bb-10.5-MDEV-16978-without-overlaps
> >
> >
> >   branch.
> >
> > Among the changes I, having noticed that innodb combination is missing,
> > have added it, and some merging down from my INSERT/REPLACE patch was
> > required to make it run successfully.
> >
> > Below are the rest comments:
> >
> > On Tue, 19 Nov 2019 at 15:45, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> >
> > >
> > > > diff --git a/sql/handler.cc b/sql/handler.cc
> > > > index 94cffd69b75..560f6316602 100644
> > > > --- a/sql/handler.cc
> > > > +++ b/sql/handler.cc
> > > > @@ -6913,6 +6928,130 @@ void handler::set_lock_type(enum
> thr_lock_type lock)
> > > >    table->reginfo.lock_type= lock;
> > > >  }
> > > >
> > > > +int handler::ha_check_overlaps(const uchar *old_data, const uchar*
> new_data)
> > > > +{
> > > > +  DBUG_ASSERT(new_data);
> > > > +  if (!table_share->period.unique_keys)
> > > > +    return 0;
> > > > +  if (table->versioned() && !table->vers_end_field()->is_max())
> > > > +    return 0;
> > > > +
> > > > +  bool is_update= old_data != NULL;
> > > > +  if (!check_overlaps_buffer)
> > > > +    check_overlaps_buffer=
> (uchar*)alloc_root(&table_share->mem_root,
> > > > + table_share->max_unique_length
> > > > +                                              +
> table_share->reclength);
> > > > +  auto *record_buffer= check_overlaps_buffer +
> table_share->max_unique_length;
> > > > +  auto *handler= this;
> > > > +  if (handler->inited != NONE)
> > > > +  {
> > > > +    if (!check_overlaps_handler)
> > > > +    {
> > > > +      check_overlaps_handler=
> clone(table_share->normalized_path.str,
> > > > +                                    &table_share->mem_root);
> > >
> > > wouldn't it be simpler and cheaper to use HA_EXTRA_REMEMBER_POS and
> > > HA_EXTRA_RESTORE_POS, as we've discussed?
> > >
> > Can it stack (like, what'd be if HA_EXTRA_REMEMBER_POS is called several
> > times)?
> > I guess it can't, and somebody on upper layers could have already used
> it.
> > This is for example done in TABLE::delete_row(). I see that you've
> removed
> > it in latest 10.4+, but anyway, I think it's better to avoid remembering
> on
> > lower layers
>
> strictly speaking, cloning is rather a high-leveloperation, not for lower
> layers.
>
> but I'm not as much interested in layer purity, as in the simpler and
> faster code. I thought that HA_EXTRA_REMEMBER_POS would be it - simpler
> and faster - but let me look and your new patch first, if I'll still
> think that I comment there again :)
>
> > > > +      int error= -1;
> > > > +      if (check_overlaps_handler != NULL)
> > > > +        error=
> check_overlaps_handler->ha_external_lock(table->in_use, F_RDLCK);
> > > > +      if (error)
> > > > +        return error;
> > > > +    }
> > > > +    handler= check_overlaps_handler;
> > > > +  }
> > >
> > > why "if (handler->inited != NONE)" ?
> > > What happens if it is NONE?
> > >
> > then `this` is used
>
> right, but when handler->inited is NONE? How can that happen?
>
> When you just insert the record, for example. Neither rnd, nor index can
be inited.


> > > But it can be done better. You search for the key with the same value
> > > and a period start <= than the period start of new row. And then you
> > > have to check two rows for overlaps. If you'll search for a key with
> the
> > > period start <= than the _period end_ of the new row, then you'll only
> > > need to check one row (may be one more, if updating).
> > >
> >  It can't work just that way: to handle case when period start = _period
> > end_ of the new row, you should write even more checks, and then move
> > cursor left.
>
> No, I don't see that. Can you show an example where you'd need even more
> checks?
>
> Ok,
suppose you have to rows in table with periods:
(b, c),
(c, d).

Now you insert (a, c). ha_index_read_map will look for period_start <= c,
so it will return (c, d). These rows do not overlap, but we yet do not know
if (b, c) is there. So we need to go to the previous row.

Now the algorithm looks same complex: read key, make some checks, got to
the prev record.


> > > Also, why do you store both period ends in the index? It seems like
> > > it'd be enough to store only one end - only the start or only the
> > > end. Both ends help if you use keyreads, but you don't. On the
> > > second thought, perhaps, you should use keyreads, there's no need to
> > > fetch the whole row for this overlap check. On the yet another
> > > thought it might not work with HA_EXTRA_REMEMBER_POS.
> >
> > I think keyreads can work with HA_EXTRA_REMEMBER_POS. Moreover, I
> > agree that using keyreads can be much more efficient. However, all of
> > my latest code, namely FKs and REPLACE/IODKU, are not based on
> > keyreads, and rewriting it now will require significant effort. Let's
> > make it a separate task, maybe?
>
> Why would it require significant effort? As far as I understand in your
> current code you only need to enable keyreads and that's all.
>
> it'll return in different format, isn't it? You can't rely on
key_part->field->ptr_in_record after that.

> > > +      /* In case of update it could appear that the nearest
> neighbour is
> > > > +       * a record we are updating. It means, that there are no
> overlaps
> > > > +       * from this side. */
> > > > +      if (is_update && memcmp(old_data + table->s->null_bytes,
> > > > +                              record_buffer + table->s->null_bytes,
> > > > +                              table->s->reclength -
> table->s->null_bytes) == 0)
> > > > +      {
> > > > +        continue;
> > > > +      }
> > >
> > > I'd rather compare row positions here.
> > What do you mean by that?
>
> two rows are the same, if their "positions" are equal, not if their
> column values are equal. Also positions are much shorter to compare.
>
> after ha_index_read_map or ha_index_next you do
>
>   handler->position(record_buffer)
>
> and then you have a "position" stored in handler->ref, and it has the
> length of handler->ref_length bytes. For MyISAM it's usually the file
> offset, for InnoDB - PK value.
>
> For UPDATE you can, I suppose, call this->position(old_data) to get the
> position.
>
> It actually returns the ref for the last fetched row. the argument passed
is not even used😐 . Only innodb uses it, and only for the primary key case.

> > > +      if (thd->work_part_info)
> > > > +      {
> > > > +        my_error(ER_PERIOD_WITHOUT_OVERLAPS_PARTITIONED, MYF(0));
> > >
> > > why?
> > >
> > Unfortunately partitions do not support searching upper/lower bounds
> (i.e.
> > KEY_OR_PREV, KEY_OR_NEXT). But I think it is doable. For now it just
> > doesn't work.
>
> please add a comment about it here.
>
>  added

Regards,
> Sergei
> VP of MariaDB Server Engineering
> and security@xxxxxxxxxxx
>


-- 
Yours truly,
Nikita Malyavin

Follow ups

Re: 6ac19d09c66: MDEV-16978 Application-time periods: WITHOUT OVERLAPS
From: Sergei Golubchik, 2019-12-03

References

Re: 6ac19d09c66: MDEV-16978 Application-time periods: WITHOUT OVERLAPS
From: Sergei Golubchik, 2019-11-19
Re: 6ac19d09c66: MDEV-16978 Application-time periods: WITHOUT OVERLAPS
From: Nikita Malyavin, 2019-11-26
Re: 6ac19d09c66: MDEV-16978 Application-time periods: WITHOUT OVERLAPS
From: Sergei Golubchik, 2019-12-03