← Back to team overview

maria-developers team mailing list archive

Re: MDEV-21580: Allow packed sort keys in sort buffer

 

Hi Varun,

Please find some input below.

> commit f75829eebe96db55508cbc03c967e1c340da0cfc
> Author: Varun Gupta <varun.gupta@xxxxxxxxxxx>
> Date:   Fri Feb 7 02:30:06 2020 +0530
> 
>     MDEV-21580: Allow packed sort keys in sort buffer
>     
>     This task deals with packing the sort key inside the sort buffer, which  would
>     lead to efficient usage of the memory allocated for the sort buffer.
>     
>     The changes brought by this feature are
>       1) Sort buffers would have sort keys of variable length
>       2) The format for sort keys inside the sort buffer would look like
>          |<sort_length><null_byte><key_part1><null_byte><key_part2>.......|
>           sort_length is the extra bytes that are required to store the variable
>           length of a sort key.
>       3) When packing of sort key is done we store the ORIGINAL VALUES inside
>          the sort buffer and not the STRXFRM form (mem-comparable sort keys).
>       4) Special comparison function packed_keys_comparison() is introduced
>          to compare 2 sort keys.
> 
> diff --git a/mysql-test/main/mdev21580.result b/mysql-test/main/mdev21580.result
> new file mode 100644
> index 00000000000..c504b79d52f
> --- /dev/null
> +++ b/mysql-test/main/mdev21580.result
> @@ -0,0 +1,6427 @@
> +create table t1(a int);
> diff --git a/mysql-test/main/mdev21580.test b/mysql-test/main/mdev21580.test

This test is too long, I assume it will be gone after the "diff solution"
is implemented.

> diff --git a/sql/filesort.cc b/sql/filesort.cc
> index 763f9f59246..1cdc8e7af00 100644
> --- a/sql/filesort.cc
> +++ b/sql/filesort.cc
> @@ -215,16 +219,14 @@ SORT_INFO *filesort(THD *thd, TABLE *table, Filesort *filesort,
>    error= 1;
>    sort->found_rows= HA_POS_ERROR;
>  
> -  param.init_for_filesort(sortlength(thd, filesort->sortorder, s_length,
> -                                     &multi_byte_charset),
> -                          table, max_rows, filesort->sort_positions);
> +  param.sort_keys= sort_keys;
> +  uint sort_len= sortlength(thd, sort_keys, &multi_byte_charset,
> +                            &allow_packing_for_sortkeys);

psergey:
I think this doesn't work as intended. Consider the two tests:

create table t2 (a text collate utf8_unicode_ci,   b varchar(32));
insert into t2 select a,a from ten;

Q1: select * from t2 order by a;
Q2: select * from t2 order by a, b;

When debugging Q1, I can see allow_packing_for_sortkeys=false after the sortlength()
call. This is ok I think.

When debugging Q2, can see allow_packing_for_sortkeys= true after the
sortlength() call. This is wrong.

> @@ -491,12 +503,20 @@ uint Filesort::make_sortorder(THD *thd, JOIN *join, table_map first_table_bit)
>    for (ord = order; ord; ord= ord->next)
>      count++;
>    if (!sortorder)
> -    sortorder= (SORT_FIELD*) thd->alloc(sizeof(SORT_FIELD) * (count + 1));
> +    sortorder= (SORT_FIELD*) thd->alloc(sizeof(SORT_FIELD) * count);
> +  void *rawmem= thd->alloc(sizeof(Sort_keys));
>    pos= sort= sortorder;
>  
>    if (!pos)
>      DBUG_RETURN(0);
>  
> +  Sort_keys_array sort_keys_array(sortorder, count);
> +  Sort_keys* sort_keys= new (rawmem) Sort_keys(sort_keys_array);
> +
psergey:
Why not inherit the class from Sql_alloc and use its operator new? 

The above is probably correct but it makes me wonder what is it doing every
time I encounter this piece of code.

> +  if (!sort_keys)
> +    DBUG_RETURN(0);
psergey:
You're checking this (which cant fail) but not checking the result of
thd->alloc() call above (which can fail)?

> +
> +  pos= sort_keys->begin();
>    for (ord= order; ord; ord= ord->next, pos++)
>    {
>      Item *first= ord->item[0];
...

> +/*
> +  @brief
> +    Comparison function to compare two packed sort keys
> +
> +  @param sort_param        cmp argument
> +  @param a_ptr             packed sort key
> +  @param b_ptr             packed sort key
> +
> +  @retval
> +    >0   key a_ptr greater than b_ptr
> +    =0   key a_ptr equal to b_ptr
> +    <0   key a_ptr less than b_ptr
> +
> +*/
> +
psergey: function names typically start with a verb. Can this one follow the
convention?

> +int packed_keys_comparison(void *sort_param,
> +                           unsigned char **a_ptr, unsigned char **b_ptr)
> +{
> +  int retval= 0;
> +  size_t a_len, b_len;
> +  Sort_param *param= (Sort_param*)sort_param;
> +  Sort_keys *sort_keys= param->sort_keys;
> +  uchar *a= *a_ptr;
> +  uchar *b= *b_ptr;
> +
...

> @@ -772,6 +777,21 @@ static int rr_cmp(uchar *a,uchar *b)
>  #endif
>  }
>  
> +
> +/**
> +  Copy (unpack) values appended to sorted fields from a buffer back to
> +  their regular positions specified by the Field::ptr pointers.
> +
> +  @param addon_field     Array of descriptors for appended fields
> +  @param buff            Buffer which to unpack the value from
> +
> +  @note
> +    The function is supposed to be used only as a callback function
> +    when getting field values for the sorted result set.
> +
> +  @return
> +    void.
psergey: the above two lines do not have any value, please remove :)
> +*/
>  template<bool Packed_addon_fields>
>  inline void SORT_INFO::unpack_addon_fields(uchar *buff)
>  {

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog




Follow ups