maria-developers team mailing list archive

Thread
Date

Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()

To: Aleksey Midenkov <midenok@xxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Thu, 3 Jun 2021 12:58:20 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAF8BazDC6s4i-G881=YNGyP7B66iF4gRXNu0pzHJ9SssVwxytA@mail.gmail.com>

Hi, Aleksey!

On Jan 26, Aleksey Midenkov wrote:
> On Tue, Dec 22, 2020 at 7:58 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> > On Nov 03, Aleksey Midenkov wrote:
> >
> > And please, pretty please, don't use non-constant references as
> > function arguments.
> 
> Why? With pointers we can pass NULL and we don't need NULL there, do we?

Because this is the policy. If you convince Monty that non-constant
references are fine - feel free to use them.

> > > On Mon, Nov 2, 2020 at 11:33 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> > > >
> > > > > diff --git a/sql/field.cc b/sql/field.cc
> > > > > index bdaaecc20269..0fd40c979d2c 100644
> > > > > --- a/sql/field.cc
> > > > > +++ b/sql/field.cc
> > > > > @@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint length,CHARSET_INFO *cs)
> > > > >    copy_length= copier.well_formed_copy(field_charset,
> > > > >                                         (char*) value.ptr(), new_length,
> > > > >                                         cs, from, length);
> > > > > +  value.length(copy_length);
> > > >
> > > > good! Could this be a distinct bug with its own test case?
> > >
> > > Do we need to spend time on that bureaucracy? There are real problems to
> > > solve... Tickets drain time from the management people and generally from
> > > anyone who sees them.
> >
> > I didn't mean opening an MDEV. Only writing a test case and extracting
> > it into a separate commit.
> >
> > If this is a genuine bug fix and you don't create a test case - how can
> > you be sure the fix will stay? It may disappear in a merge or a rebase,
> > manually or automatically. Or in a refactoring that "looks fine, all
> > tests pass". If you can create a test case - please, do.
> 
> It is tested by federated.federated_partition test case.

Do you mean, that if I remove this line, federated.federated_partition
test will fail? If yes - good, this is the test I meant, no need to
create another one.

> Why do we have to extract this side-fix to a separate commit?

Because later when someone looks at the code it helps to be able to
understand what a commit is actually fixing.

> > > > > diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
> > > > > --- a/sql/ha_partition.cc
> > > > > +++ b/sql/ha_partition.cc
> > > > > @@ -5125,14 +5128,14 @@ bool ha_partition::init_record_priority_queue()
> > > > >    uint alloc_len;
> > > > >    uint used_parts= bitmap_bits_set(&m_part_info->read_partitions);
> > > > >    /* Allocate record buffer for each used partition. */
> > > > > -  m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS;
> > > > > +  m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET;
> > > > >    if (!m_using_extended_keys)
> > > > >        m_priority_queue_rec_len += m_file[0]->ref_length;
> > > > >    alloc_len= used_parts * m_priority_queue_rec_len;
> > > > >    /* Allocate a key for temporary use when setting up the scan. */
> > > > >    alloc_len+= table_share->max_key_length;
> > > > >
> > > > > -  if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, MYF(MY_WME))))
> > > > > +  if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, alloc_len)))
> > > >
> > > > I don't see why you need a memroot here. memroot is needed when
> > > > you allocate later at difeerent points in time some initially
> > > > unpredictable amount of memory that has the same lifetime and
> > > > needs to be freed all at once.
> > >
> One should keep in mind this is an artificial usage of the prio queue.
> In diversified real-world usage the share of such kind SELECT is say
> 0.1%. But even if it is 1% we divide our stats by 100. So, do you
> really care about 0.0015%?

A right tool for a job. If you need to allocate few buffers at the same
time - use multi_alloc, if you do many allocations over time and want to
fee them at once - use memroot.

> > > > > @@ -6178,7 +6203,11 @@ int ha_partition::handle_ordered_index_scan(uchar *buf, bool reverse_order)
> > > > >        */
> > > > >        error= file->read_range_first(m_start_key.key? &m_start_key: NULL,
> > > > >                                      end_range, eq_range, TRUE);
> > > > > -      memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> > > > > +      if (!error)
> > > > > +      {
> > > > > +        memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> > > > > +      }
> > > >
> > > > did you have a bug because of that? something didn't work?
> > >
> > > No, I just saw worthless memcpy and avoided it.
> >
> > You're right that memcopy is unnecessary in case of an error.
> > But most of the time there is no error. You made an exceptional code path
> > faster by making a common code path slower. I think it's generally
> > better to do the opposite, making the common, no-error code path
> > faster, fewer conditionals, in particular. At the cost of making the error
> > path slower, if needed.
> 
> I'm usually aware of such things in my decisions. The slowdown is
> close to 0, it is not even 0.0001%, it is much lower. OTOH memcpy()
> slowdown is much higher and this is not a fully exceptional code path:
> HA_ERR_KEY_NOT_FOUND continues the loop.

This is a wrong code pattern, you should optimize for the normal code
path, not for errors. Errors are exceptions, they're much more rare than
non-errors.

> > > > > @@ -6310,6 +6349,43 @@ int ha_partition::handle_ordered_index_scan_key_not_found()
> > > > > +      if (cached)
> > > > > +      {
> > > > > +        cached->swap(s.blob);
> > > > > +        s.set_read_value= set_read_value;
> > > >
> > > > is it indeed possible here for a value to be either in `value` or in
> > > > `read_value` ?
> > > >
> > > > When happens what?
> > >
> > > Yes, it uses `value` for plain fields and `read_value` for virtual fields.
> >
> > Oh, I see.
> >
> > This read_value was introduced in ea1b25046c81db8fdf to solve the
> > case of UPDATE, where a row is moved with store_record(table,
> > record[1]) and blob pointers aren't updated. This is very similar to
> > what you're fixing, perhaps read_value should go away and both bugs
> > should have a one unified solution that allows to "park" a record
> > somewhere out of record[0] and restore later, all with proper blob
> > handling?
> 
> Do you think this is necessary now? I believe there are a lot of
> *much* more important tasks.

Uhm, yes? This would be a notable code simplification, and it won't take
long, a low-hanging fruit.

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx

Follow ups

Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Aleksey Midenkov, 2021-06-10

References

Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Sergei Golubchik, 2020-11-02
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Aleksey Midenkov, 2020-11-03
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Sergei Golubchik, 2020-12-22
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Aleksey Midenkov, 2021-01-25