← Back to team overview

maria-developers team mailing list archive

Re: eb9fab7: MDEV-9531: GROUP_CONCAT with ORDER BY inside takes a lot of memory while it's executed

 

Hi, Oleksandr!

On Jun 28, Oleksandr Byelkin wrote:
> >> diff --git a/include/my_tree.h b/include/my_tree.h
> >> index f8be55f..f1916b9 100644
> >> --- a/include/my_tree.h
> >> +++ b/include/my_tree.h
> >> @@ -57,11 +57,14 @@ typedef struct st_tree_element {
> >>   } TREE_ELEMENT;
> >>   
> >>   #define ELEMENT_CHILD(element, offs) (*(TREE_ELEMENT**)((char*)element + offs))
> >> +#define R_ELEMENT_CHILD(element, offs) ((TREE_ELEMENT**)((char*)element + offs))
> > what does "R_ELEMENT_CHILD" mean? What's "R"?
> reference

Not intuitive at all. if you need that, call it ELEMENT_CHILD_PTR
(and then, redefine ELEMENT_CHILD to be *ELEMENT_CHILD_PTR(element, offs))

> >>   typedef struct st_tree {
> >>     TREE_ELEMENT *root,null_element;
> >>     TREE_ELEMENT **parents[MAX_TREE_HEIGHT];
> >> +  TREE_ELEMENT *free_element;
> > why do you need a free_element pointer?
> 
> We can free element not when tree grows enough but only when we are 
> going to insert new element, but then we will need to go back and make 
> path jet another time tho the place of insertion (because of possible 
> rebalancing). Instead we check in the beginning if tree is too big to 
> insert and delete element, then probably we will use it to insert new, 
> but if when we went down to the element and see that we do not need to 
> insert, the element will be left as free for the next time.
> >
> >>     uint offset_to_key,elements_in_tree,size_of_element;
> >> +  uint elements_limit, del_direction;
> > why do you need to support different directions here?
> to make the routine more general.

you don't use it. let's add functionality when we need it, not just in
case.

> >>     qsort_cmp2 compare;
> >>     void *custom_arg;
> >> diff --git a/mysql-test/r/mysqld--help.result b/mysql-test/r/mysqld--help.result
> >> index bef0390..3699b18 100644
> >> --- a/mysql-test/r/mysqld--help.result
> >> +++ b/mysql-test/r/mysqld--help.result
> >> @@ -240,6 +240,9 @@ The following options may be given as the first argument:
> >>    --group-concat-max-len=#
> >>    The maximum length of the result of function
> >>    GROUP_CONCAT()
> >> + --group-concat-max-mem=#
> >> + The maximum memory used to calculate result of function
> >> + GROUP_CONCAT()
> > No, please. There is group_concat_max_len already, it's all that user
> > needs. There is no reason why one might need to configure a second limit
> > on for the GROUP_CONCAT.
> 
> OK, the problem is that even by key length we can not predict string 
> output of the values, the only what we can say is that it will be at 
> least one character (coma) because empty lines possible. Do you have 
> ideas how predict minimal possible string size by value?

That, probably, means that limiting the total size of a TREE does not
work very well for GROUP_CONCAT...

You can fix it on the upper level, in the GROUP_CONCAT itself, by
calculating the result length. Like

  tree_insert(tree, cur_row);
  res_length+= cur_row_length;
  if (res_length > group_concat_max_len + threshold)
  {
    remove last tree elements as long as
    res_length stays larger than group_concat_max_len
  }

That won't allow some tricks, like reusing the TREE_ELEMENT, but with a
sufficiently large threshold, and every rebalancing is O(log n) anyway.

An interesting version of that would be to know the length of the left
subtree and when it grows over group_concat_max_len, replace the tree
with its left subtree. I'm not sure it will be easy to maintain, though.
And while it's not O(log n), it's a fixed cost for every operation,
while only a fraction of TREE usages will actually need it.

Perhaps calculating it could be an option, it's O(N), so if done rarely
it might be ok.

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx


Follow ups

References