maria-developers team mailing list archive

Thread
Date
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Aleksey Midenkov <midenok@xxxxxxxxx>
Date: Tue, 1 Jun 2021 22:25:56 +0300
Cc: "MariaDB Developers \(maria-developers@xxxxxxxxxxxxxxxxxxx\)" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAF8BazDC6s4i-G881=YNGyP7B66iF4gRXNu0pzHJ9SssVwxytA@mail.gmail.com>
Hi Sergei!

Users could benefit from this fix of MDEV-18734 since the last
changes. Do you really think the open questions left are so important
for this bug to be still not pushed?

On Tue, Jan 26, 2021 at 1:51 AM Aleksey Midenkov <midenok@xxxxxxxxx> wrote:
>
> Hi Sergei,
>
> On Tue, Dec 22, 2020 at 7:58 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> >
> > Hi, Aleksey!
> >
> > On Nov 03, Aleksey Midenkov wrote:
> > >
> > > the updated patches are here:
> > >
> > > https://github.com/MariaDB/server/commits/5f6bcd6cd0145513974b0dfc5b2cefba28b72f1a
> >
> > I started writing my comments there, but realized that it'd be clearer
> > if I reply here, more context. That is, see below.
> >
> > One comment about the new test case, you have 16K lines in the result
> > file, are all 16K necessary? Can you do something like
> >
> >    select x, left(b,10), left(v,10)
> >
> > ?
>
> Yes, these shorter result cases reproduce as well. Updated.
>
> >
> > And please, pretty please, don't use non-constant references as function
> > arguments.
>
> Why? With pointers we can pass NULL and we don't need NULL there, do we?
>
> >
> > > On Mon, Nov 2, 2020 at 11:33 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> > > >
> > > > > diff --git a/sql/field.cc b/sql/field.cc
> > > > > index bdaaecc20269..0fd40c979d2c 100644
> > > > > --- a/sql/field.cc
> > > > > +++ b/sql/field.cc
> > > > > @@ -8310,6 +8310,7 @@ int Field_blob::store(const char *from,uint length,CHARSET_INFO *cs)
> > > > >    copy_length= copier.well_formed_copy(field_charset,
> > > > >                                         (char*) value.ptr(), new_length,
> > > > >                                         cs, from, length);
> > > > > +  value.length(copy_length);
> > > >
> > > > good! Could this be a distinct bug with its own test case?
> > >
> > > Do we need to spend time on that bureaucracy? There are real problems to
> > > solve... Tickets drain time from the management people and generally from
> > > anyone who sees them.
> >
> > I didn't mean opening an MDEV. Only writing a test case and extracting
> > it into a separate commit.
> >
> > If this is a genuine bug fix and you don't create a test case - how can
> > you be sure the fix will stay? It may disappear in a merge or a rebase,
> > manually or automatically. Or in a refactoring that "looks fine, all
> > tests pass". If you can create a test case - please, do.
>
> It is tested by federated.federated_partition test case. Why do we
> have to extract this side-fix to a separate commit? Looks anyway as a
> bureaucracy to me. The main bug is critical, the test case tests both
> bugs nicely. If the partitioning code is refactored, the test case
> still tests that side-fix.
>
> >
> > > > >    Field_blob::store_length(copy_length);
> > > > >    bmove(ptr+packlength,(uchar*) &tmp,sizeof(char*));
> > > > >
> > > > > diff --git a/sql/ha_partition.cc b/sql/ha_partition.cc
> > > > > index 9d82df0235c7..7c658082d397 100644
> > > > > --- a/sql/ha_partition.cc
> > > > > +++ b/sql/ha_partition.cc
> > > > > @@ -5125,14 +5128,14 @@ bool ha_partition::init_record_priority_queue()
> > > > >    uint alloc_len;
> > > > >    uint used_parts= bitmap_bits_set(&m_part_info->read_partitions);
> > > > >    /* Allocate record buffer for each used partition. */
> > > > > -  m_priority_queue_rec_len= m_rec_length + PARTITION_BYTES_IN_POS;
> > > > > +  m_priority_queue_rec_len= m_rec_length + ORDERED_REC_OFFSET;
> > > > >    if (!m_using_extended_keys)
> > > > >        m_priority_queue_rec_len += m_file[0]->ref_length;
> > > > >    alloc_len= used_parts * m_priority_queue_rec_len;
> > > > >    /* Allocate a key for temporary use when setting up the scan. */
> > > > >    alloc_len+= table_share->max_key_length;
> > > > >
> > > > > -  if (!(m_ordered_rec_buffer= (uchar*)my_malloc(alloc_len, MYF(MY_WME))))
> > > > > +  if (!(m_ordered_rec_buffer= (uchar*) alloc_root(&m_ordered_root, alloc_len)))
> > > >
> > > > I don't see why you need a memroot here. memroot is needed when you
> > > > allocate later at difeerent points in time some initially unpredictable
> > > > amount of memory that has the same lifetime and needs to be
> > > > freed all at once.
> > >
> > > I know... This is just another convenient way of allocation with presumably
> > > minimized number of mallocs (probably pre_alloc_size should be tweaked to a
> > > better value).
> >
> > memroot wasn't created for that. There are approaches with much less
> > overhead
>
> Using memory pools like that is a nice design pattern for non-critical
> sections. At least it is nicer than manual pointer management. I
> believe I've seen the similar pattern in InnoDB with its memory pool
> mem_heap_t even without preallocation! It would be nice to have a
> special class that preallocates and manages pointers, but I do not see
> the reason to develop that for this specific use case.
>
> I don't believe in *much* overhead: mem_root is a simplest algorithm,
> a couple of ifs and one loop. Initialization call
> init_record_priority_queue() is not a critical section, the major
> execution is mostly busy by a data copy in
> handle_ordered_index_scan(), handle_ordered_next().
>
> Let's see gprof stats collected for 100k SELECTs attached in stats.txt
>
> init_record_priority_queue() takes 0.3%; handle_ordered_index_scan()
> takes 2.3%; handle_ordered_next() takes 12.3%.
>
> Let's see unpatched stats for the same 100k SELECTs in stats_vanilla.txt
>
> init_record_priority_queue() takes 0.1%; handle_ordered_index_scan()
> takes 2.2%; handle_ordered_next() takes 11.4%.
>
> My version of init_record_priority_queue() lost 0.2%. From details we
> see that mem_root took 6 out of 13 which is 0.15%.
>
> One should keep in mind this is an artificial usage of the prio queue.
> In diversified real-world usage the share of such kind SELECT is say
> 0.1%. But even if it is 1% we divide our stats by 100. So, do you
> really care about 0.0015%?
>
> >
> > > >
> > > > But here it seems that you only allocate once. You could just make
> > > >
> > > >    m_priority_queue_rec_len = m_rec_length + PARTITION_BYTES_IN_POS +
> > > >               table->s->blob_fields * sizeof(Ordered_blob_storage)
> > >
> > > I don't wnat this data serialized. It is inconvenient to read the dump of
> > > the record. Why we should make it harder to code, read the code and debug
> > > while we can make it easier? If it is only for guaranteed 1 malloc, I don't
> > > think it's worth it.
> >
> > What do you mean "serialized"?
>
> A sort of memory organization when data is placed into a single
> contiguous buffer.
>
> > What do you mean "read the dump"?
>
> x command in GDB.
>
> >
> > Note that you can also use my_multi_malloc, that allocates many buffers
> > together and hides address arithmetcs from the caller. Like
> >
> >   my_multi_malloc(MYF(MY_WME),
> >      &m_ordered_rec_buffer, m_rec_length + PARTITION_BYTES_IN_POS,
> >      &blob_storage, table->s->blob_fields * sizeof(Ordered_blob_storage),
> >      NULL)
> >
> > (but with error checking). And then, perhaps, with placement new[]
> > you could construct all Ordered_blob_storage objects in one line.
>
> Sorry, I don't understand how my version is much worse.
>
> >
> > > > > @@ -6178,7 +6203,11 @@ int ha_partition::handle_ordered_index_scan(uchar *buf, bool reverse_order)
> > > > >        */
> > > > >        error= file->read_range_first(m_start_key.key? &m_start_key: NULL,
> > > > >                                      end_range, eq_range, TRUE);
> > > > > -      memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> > > > > +      if (!error)
> > > > > +      {
> > > > > +        memcpy(rec_buf_ptr, table->record[0], m_rec_length);
> > > > > +      }
> > > >
> > > > did you have a bug because of that? something didn't work?
> > >
> > > No, I just saw worthless memcpy and avoided it.
> >
> > You're right that memcopy is unnecessary in case of an error.
> > But most of the time there is no error. You made an exceptional code path
> > faster by making a common code path slower. I think it's generally
> > better to do the opposite, making the common, no-error code path
> > faster, fewer conditionals, in particular. At the cost of making the error
> > path slower, if needed.
>
> I'm usually aware of such things in my decisions. The slowdown is
> close to 0, it is not even 0.0001%, it is much lower. OTOH memcpy()
> slowdown is much higher and this is not a fully exceptional code path:
> HA_ERR_KEY_NOT_FOUND continues the loop.
>
> >
> > > > > @@ -6310,6 +6349,43 @@ int ha_partition::handle_ordered_index_scan_key_not_found()
> > > > > +      if (cached)
> > > > > +      {
> > > > > +        cached->swap(s.blob);
> > > > > +        s.set_read_value= set_read_value;
> > > >
> > > > is it indeed possible here for a value to be either in `value` or in
> > > > `read_value` ?
> > > >
> > > > When happens what?
> > >
> > > Yes, it uses `value` for plain fields and `read_value` for virtual fields.
> >
> > Oh, I see.
> >
> > This read_value was introduced in ea1b25046c81db8fdf to solve the case
> > of UPDATE, where a row is moved with store_record(table, record[1]) and
> > blob pointers aren't updated. This is very similar to what you're
> > fixing, perhaps read_value should go away and both bugs should have a
> > one unified solution that allows to "park" a record somewhere out of
> > record[0] and restore later, all with proper blob handling?
>
> Do you think this is necessary now? I believe there are a lot of
> *much* more important tasks.
>
> >
> > > >
> > > > > +      }
> > > > > +    }
> > > > > +  }
> > > > > +  table->move_fields(table->field, table->record[0], rec_buf);
> > > > > +}
> >
> > Regards,
> > Sergei
> > VP of MariaDB Server Engineering
> > and security@xxxxxxxxxxx
>
>
> --
> All the best,
>
> Aleksey Midenkov
> @midenok



-- 
All the best,

Aleksey Midenkov
@midenok
Follow ups

Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Sergei Golubchik, 2021-06-03
References

Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Sergei Golubchik, 2020-11-02
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Aleksey Midenkov, 2020-11-03
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Sergei Golubchik, 2020-12-22
Re: 2f9bbf392072: MDEV-18734 optimization by taking ownership via String::swap()
From: Aleksey Midenkov, 2021-01-25