← Back to team overview

maria-developers team mailing list archive

Re: 53a88b2b7032: Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()

 

Hi, Michael!

On Dec 03, Michael Widenius wrote:
> revision-id: 53a88b2b7032 (mariadb-10.5.2-272-g53a88b2b7032)
> parent(s): 0aab153e05be
> author: Michael Widenius <michael.widenius@xxxxxxxxx>
> committer: Michael Widenius <michael.widenius@xxxxxxxxx>
> timestamp: 2020-09-24 17:26:40 +0300
> message:
> 
> Optimize usage of c_ptr(), c_ptr_quick() and String::alloc()
> 
> The proble was that hen using String::alloc() to allocate a string,

"hen" ? after a minute of staring I realized that you probably meant "when".

(and "problem")

> the String ensures that there is space for an extra NULL byte in the
> buffer and if not, reallocates the string. This is a problem with the
> String::set_int() that calls alloc(21), which forces
> an extra malloc/free to happen.
> 
> - We do not anymore re-allocate String if alloc() is called with the
>   Allocated_length. This reduces number of malloc() allocations,
>   especially one big re-allocation in Protocol::send_result_Set_metadata()
>   for almost every query that produced a result to the connnected client.
> - Avoid extra mallocs when using LONGLONG_BUFFER_SIZE
>   This can now be done as alloc() doesn't increase buffers if new length is
>   not bigger than old one.
> - c_ptr() is redesigned to be safer (but a bit longer) than before.
> - Remove wrong usage of c_ptr_quick()
>   c_ptr_quick() where used in many cases to get the pointer to the used
>   buffer, even when it didn't need to be \0 terminated. In this case
>   ptr() is a better substitute.
>   Another problem with c_ptr_quick() is that it didn't guarantee that
>   the string would be \0 terminated.
> - item_val_str(), an API function not used currently by the server,
>   now always returns a null terminated string (before it didn't always
>   do that).
> - Ensure that all String allocations uses STRING_PSI_MEMORY_KEY. The old
>   mixed usage of performance keys caused assert's when String buffers
>   where shrunk.
> - Binary_string::shrink() is simplifed
> 
> diff --git a/sql/item.cc b/sql/item.cc
> index 4a26cb6c140c..328fe96018a5 100644
> --- a/sql/item.cc
> +++ b/sql/item.cc
> @@ -4574,7 +4574,7 @@ const String *Item_param::value_query_val_str(THD *thd, String *str) const
>          break;
>        }
>        DBUG_ASSERT(str->length() <= typelen);
> -      buf= str->c_ptr_quick();
> +      buf= (char*) str->ptr();
>        ptr= buf + str->length();
>        *ptr++= '\'';
>        ptr+= (uint) my_TIME_to_str(&value.time, ptr, decimals);

this is just wrong. One cannot simply write data at the String's end
pointer. One must alloc() the space first.

> diff --git a/sql/partition_info.cc b/sql/partition_info.cc
> index 8ad14ca260c4..ce72a7af051d 100644
> --- a/sql/partition_info.cc
> +++ b/sql/partition_info.cc
> @@ -126,7 +126,7 @@ partition_info *partition_info::get_clone(THD *thd)
>  /**
>    Mark named [sub]partition to be used/locked.
>  
> -  @param part_name  Partition name to match.
> +  @param part_name  Partition name to match.  Must be \0 terminated!
>    @param length     Partition name length.
>  
>    @return Success if partition found
> @@ -172,9 +172,9 @@ bool partition_info::add_named_partition(const char *part_name, size_t length)
>      else
>        bitmap_set_bit(&read_partitions, part_def->part_id);
>    }
> -  DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %s",
> +  DBUG_PRINT("info", ("Found partition %u is_subpart %d for name %.*s",
>                        part_def->part_id, part_def->is_subpart,
> -                      part_name));
> +                      length, part_name));

why did you specify a length if the name "must be \0 terminated!" ?
(and I agree that it must be)

>    DBUG_RETURN(false);
>  }
>  
> diff --git a/sql/sql_plugin.cc b/sql/sql_plugin.cc
> index 633d969b3dec..2b603380aa6c 100644
> --- a/sql/sql_plugin.cc
> +++ b/sql/sql_plugin.cc
> @@ -404,12 +404,16 @@ static int item_value_type(struct st_mysql_value *value)
>  static const char *item_val_str(struct st_mysql_value *value,
>                                  char *buffer, int *length)
>  {
> +  size_t org_length= *length;
> -  String str(buffer, *length, system_charset_info), *res;
> +  String str(buffer, org_length, system_charset_info), *res;
>    if (!(res= ((st_item_value_holder*)value)->item->val_str(&str)))
>      return NULL;
>    *length= res->length();
> -  if (res->c_ptr_quick() == buffer)
> +  if (res->ptr() == buffer && res->length() < org_length)
> +  {
> +    buffer[res->length()]= 0;
>      return buffer;
> +  }

Okay. This looked like the *only* correct usage of c_ptr_quick().
Agree, better to remove c_ptr_quick().

>  
>    /*
>      Lets be nice and create a temporary string since the
> diff --git a/sql/sql_string.cc b/sql/sql_string.cc
> index e2defba434dd..8ab156d0e556 100644
> --- a/sql/sql_string.cc
> +++ b/sql/sql_string.cc
> @@ -1251,21 +1259,16 @@ bool String::append_semi_hex(const char *s, uint len, CHARSET_INFO *cs)
>  // Shrink the buffer, but only if it is allocated on the heap.
>  void Binary_string::shrink(size_t arg_length)
>  {
> -    if (!is_alloced())
> -        return;
> -    if (ALIGN_SIZE(arg_length + 1) < Alloced_length)
> +  if (is_alloced() && ALIGN_SIZE(arg_length + 1) < Alloced_length)
> +  {
> +    char *new_ptr;
> +    /* my_realloc() can't fail as new buffer is less than the original one */
> +    if ((new_ptr= (char*) my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
> +                                     MYF(thread_specific ?
> +                                         MY_THREAD_SPECIFIC : 0))))

why do you need an if() if "my_realloc() can't fail" ?
(and it really cannot, you're right)

>      {
> -        char* new_ptr;
> -        if (!(new_ptr = (char*)my_realloc(STRING_PSI_MEMORY_KEY, Ptr, arg_length,
> -            MYF(thread_specific ? MY_THREAD_SPECIFIC : 0))))
> -        {
> -            Alloced_length = 0;
> -            real_alloc(arg_length);
> -        }
> -        else
> -        {
> -            Ptr = new_ptr;
> -            Alloced_length = (uint32)arg_length;
> -        }
> +      Ptr= new_ptr;
> +      Alloced_length= (uint32) arg_length;
>      }
> +  }
>  }
> diff --git a/sql/sql_string.h b/sql/sql_string.h
> index 2ef817ea0adc..01d5211fe850 100644
> --- a/sql/sql_string.h
> +++ b/sql/sql_string.h
> @@ -599,25 +599,34 @@ class Binary_string: public Static_binary_string
>  
>    inline char *c_ptr()
>    {
> -    DBUG_ASSERT(!alloced || !Ptr || !Alloced_length ||
> -                (Alloced_length >= (str_length + 1)));
> -
> -    if (!Ptr || Ptr[str_length])              // Should be safe
> -      (void) realloc(str_length);
> +    if (unlikely(!Ptr))
> +      return (char*) "";
> +    /*
> +      Here we assume that any buffer used to initalize String has
> +      an end \0 or have at least an accessable character at end.
> +      This is to handle the case of String("Hello",5) efficently.
> +    */
> +    if (unlikely(!alloced && !Ptr[str_length]))

No, this is wrong. Note the difference between

   String a("Hello", 5)

and

  char hello[5];
  String a(buf, 5);

Your assumption should only apply to the first case, not to the second.
In  the first case alloced==Alloced_length==0, in the second case only
alloced==0 and Alloced_length==5. So in the if() above you need to look
at Alloced_length.

> +      return Ptr;
> +    if (str_length < Alloced_length)
> +    {
> +      Ptr[str_length]=0;
> +      return Ptr;
> +    }
> +    (void) realloc(str_length+1);               /* This will add end \0 */
>      return Ptr;
>    }
> +  /* Use c_ptr() instead. This will be deleted soon, kept for compatiblity */
>    inline char *c_ptr_quick()
>    {
> -    if (Ptr && str_length < Alloced_length)
> -      Ptr[str_length]=0;
> -    return Ptr;
> +    return c_ptr_safe();
>    }
>    inline char *c_ptr_safe()
>    {
>      if (Ptr && str_length < Alloced_length)
>        Ptr[str_length]=0;
>      else
> -      (void) realloc(str_length);
> +      (void) realloc(str_length + 1);

what's the difference between c_ptr_safe and c_ptr, again?
they're very similar, so they'd need a good comment explaining when to
use what.

>      return Ptr;
>    }
>  
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx