maria-developers team mailing list archive

Thread
Date
Re: f46382ba93c: Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Michael Widenius <michael.widenius@xxxxxxxxx>
Date: Wed, 31 Mar 2021 21:45:40 +0300
Cc: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAEiQ=_u+5eP4aoHg6BKK5c3s8x3BVJb4erAZx5YkpZWTqaHVHw@mail.gmail.com>
On Wed, Mar 31, 2021 at 9:45 PM Michael Widenius
<michael.widenius@xxxxxxxxx> wrote:
>
> Cut
> > > Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
> > >
> > > The proble was that hen using String::alloc() to allocate a string,
> >
> > "hen" ? after a minute of staring I realized that you probably meant "when".
> >
> > (also, "problem")
> >
> > > the String ensures that there is space for an extra NULL byte in the
> > > buffer and if not, reallocates the string. This is a problem with the
> > > String::set_int() that calls alloc(21), which forces
> > > an extra malloc/free to happen.
> > >
> > > - We do not anymore re-allocate String if alloc() is called with the
> > >   Allocated_length. This reduces number of malloc() allocations,
> > >   especially one big re-allocation in Protocol::send_result_Set_metadata()
> > >   for almost every query that produced a result to the connnected client.
> > > - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE
> > >   This can now be done as alloc() doesn't increase buffers if new length is
> > >   not bigger than old one.
> > > - c_ptr() is redesigned to be safer (but a bit longer) than before.
> > > - Remove wrong usage of c_ptr_quick()
> > >   c_ptr_quick() where used in many cases to get the pointer to the used
> >
> > s/where/was/
> >
> > >   buffer, even when it didn't need to be \0 terminated. In this case
> > >   ptr() is a better substitute.
> > >   Another problem with c_ptr_quick() is that it didn't guarantee that
> > >   the string would be \0 terminated.
> > > - item_val_str(), an API function not used currently by the server,
> > >   now always returns a null terminated string (before it didn't always
> > >   do that).
> > > - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old
> > >   mixed usage of performance keys caused assert's when String buffers
> > >   where shrunk.
> > > - Binary_string::shrink() is simplifed
>
> Will fix the typos. Thanks!
>
> > > diff --git a/sql/item.cc b/sql/item.cc
> > > index f6f3e2720fa..a8a43c6266a 100644
> > > --- a/sql/item.cc
> > > +++ b/sql/item.cc
> > > @@ -4703,7 +4703,7 @@ Item *Item_param::value_clone_item(THD *thd)
> > >      return 0; // Should create Item_decimal. See MDEV-11361.
> > >    case STRING_RESULT:
> > >      return new (mem_root) Item_string(thd, name,
> > > -                                      Lex_cstring(value.m_string.c_ptr_quick(),
> > > +                                      Lex_cstring(value.m_string.ptr(),
> >
> > Hmm, you said that LEX_CSTRING::str should always be \0-terminated.
> > If that's the case, then c_ptr() would be correct here, not ptr()
> >
> > >                                                    value.m_string.length()),
> > >                                        value.m_string.charset(),
> > >                                        collation.derivation,
>
> Yes, LEX_CSTRING *should* always be \0 terminated, but as we sometimes
> build these
> up I do not want to depend on that for generic usage. For usage to
> printf() than we
> should trust that the developer knows what he is doing.
>
> However Item_string() will only use the exact length and never call
> printf, so here this is safe.
>
> > > diff --git a/sql/partition_info.cc b/sql/partition_info.cc
> <cut>
>
> > > -  DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s",
> > > +  DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s",
> > >                        part_def->part_id, part_def->is_subpart,
> > > -                      part_name));
> > > +                      length, part_name));
> >
> > These two changes contradict each other. Either part_name must be
> > \0-terminated, and then you don't need to specify a length in printf.
> > Or it is not \0-terminated and you must specify a length.
>
> part_name is not null terminated and that is why I had to add .* and length.
> What is the contradiction ?
>
> > > diff --git a/sql/sql_string.cc b/sql/sql_string.cc
> > > index f2a0f55aec8..95a57017c53 100644
> > > --- a/sql/sql_string.cc
> > > +++ b/sql/sql_string.cc
> > > @@ -118,9 +121,14 @@ bool Binary_string::realloc_raw(size_t alloc_length)
> > >    return FALSE;
> > >  }
> > >
> > > +
> > >  bool String::set_int(longlong num, bool unsigned_flag, CHARSET_INFO *cs)
> > >  {
> > > -  uint l=20*cs->mbmaxlen+1;
> > > +  /*
> > > +    This allocates a few bytes extra in the unlikely case that cs->mb_maxlen
> > > +    > 1, but we can live with that
> >
> > dunno, it seems that utf8 is the *likely* case and it's getting more and
> > more likely with the time,
>
> Not for integers, as we often use binary strings for these.
>
> > > @@ -1254,21 +1262,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs)
> > >  // Shrink the buffer, but only if it is allocated on the heap.
> > >  void Binary_string::shrink(size_t arg_length)
> > >  {
> > > -    if (!is_alloced())
> > > -        return;
> > > -    if (ALIGN_SIZE(arg_length + 1) < Alloced_length)
> > > +  if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length)
> > > +  {
> > > +    char *new_ptr;
> > > +    /* my_realloc() can't fail as new buffer is less than the original one */
> > > +    if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
> >
> > you don't need an if() because, as you wrote yourself "my_realloc()
> > can't fail" here.
>
> Yes, this should be true, but one never knows with allocation libraries.
> There is implementation of realloc that makes a copy and deletes the old one.
> However I am not sure what happens if there is no place to make a copy.
> Better safe than sorry.
>
> >      {
> > > diff --git a/sql/sql_string.h b/sql/sql_string.h
> > > index b3eca118b63..ba0cff68fb4 100644
> > > --- a/sql/sql_string.h
> > > +++ b/sql/sql_string.h
> > > @@ -600,25 +600,34 @@ class Binary_string: public Static_binary_string
> > >
> > >    inline char *c_ptr()
> > >    {
> > > -    DBUG_ASSERT(!alloced || !Ptr || !Alloced_length ||
> > > -                (Alloced_length >= (str_length + 1)));
> > > -
> > > -    if (!Ptr || Ptr[str_length])              // Should be safe
> > > -      (void) realloc(str_length);
> > > +    if (unlikely(!Ptr))
> > > +      return (char*) "";
> > > +    /*
> > > +      Here we assume that any buffer used to initalize String has
> > > +      an end \0 or have at least an accessable character at end.
> > > +      This is to handle the case of String("Hello",5) efficently.
> > > +    */
> > > +    if (unlikely(!alloced && !Ptr[str_length]))
> > > +      return Ptr;
> >
> > No, this is not good. Note the difference between
> >
> >   String a("Hello", 5)
> >
> > and
> >
> >   char hello[5];
> >   String a(buf, 5);
> >
> > Your assumption should only apply to the first case, not to the second.
> > In  the first case alloced=Alloced_length=0, in the second case only
> > alloced=0 and Alloced_length=5. So in the if() above you need to look
> > at Alloced_length:
> >
> >   if (!Alloced_length && !Ptr[str_length])
> >     return Ptr;
>
> Good point.> > +    if (str_length < Alloced_length)
> > > +    {
> > > +      Ptr[str_length]=0;
> > > +      return Ptr;
> > > +    }
> > > +    (void) realloc(str_length+1);               /* This will add end \0 */
> > >      return Ptr;
> > >    }
> > > +  /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */
> > >    inline char *c_ptr_quick()
> > >    {
> > > -    if (Ptr && str_length < Alloced_length)
> > > -      Ptr[str_length]=0;
> > > -    return Ptr;
> > > +    return c_ptr_safe();
> >
> > 1. why not to remove it now?
> > 2. it's strange that you write "use c_ptr() instead" but you actually
> >    use c_ptr_safe() instead.
>
> What I meant is that the developer should instead use c_ptr().
> I will make the message more clear.
> I decided to call c_ptr_safe() here as this is the most safe
> version to use.  In practice the new c_ptr() should be safe
> for 99% of our code.
>
>
> > >    }
> > >    inline char *c_ptr_safe()
> > >    {
> > >      if (Ptr && str_length < Alloced_length)
> > >        Ptr[str_length]=0;
> > >      else
> > > -      (void) realloc(str_length);
> > > +      (void) realloc(str_length + 1);
> > >      return Ptr;
> > >    }
> >
> >
> > could you write a comment, explaining when one should use c_ptr() and
> > when - c_ptr_safe() ?
>
> Yes, try to.  Will not be easy as most cases is now handled by c_ptr().
>
> Regards,
> Monty
References

Re: f46382ba93c: Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
From: Sergei Golubchik, 2021-03-27
Re: f46382ba93c: Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
From: Michael Widenius, 2021-03-31