← Back to team overview

maria-developers team mailing list archive

Re: MDEV-21829:Followup: On Unique object and variable-size keys


Hi Sergey,

On Sun, Sep 13, 2020 at 7:41 PM Sergey Petrunia <sergey@xxxxxxxxxxx> wrote:

> Hi Varun,
> I was looking at the code for MDEV-21829, trying to understand whether
> there
> are any issues with how Unique object was extended to handle variable-size
> keys.
> == TREE object (the RB tree) ==
> TREE (the structure and functions that operate on it) support
> variable-sized
> keys, however the keys "are expected to know" their size.
> tree_insert() has "uint key_size" argument. This way, tree_insert() knows
> how many bytes to allocate and copy when putting the key into the tree.
> Well for the tree for fixed size we also set the size during the
initialization phase
init_tree() function has
   tree->size_of_element= size > 0 ? (uint) size : 0;

For Unique with fixed size keys we use init_tree to pass the fixed size and
then we send 0 as the argument
to key_size in tree_insert().
The allocated size used in tree_insert is:

> The rest of the TREE code doesn't seem to care about the size of the keys.
> When TREE code needs to compare a couple of keys, it will call
> TREE::compare
> callack. The callback only receives key pointers as arguments, that is, it
> is
> expected to infer key size from the contents of the key.
> The same happens when when walking the tree. Only "uchar *key" is passed
> to
> tree_walk_action callback. The walk function has to figure out they key
> size
> on its own.
> == Unique ==
> Unique class cannot do the same what TREE functions function do. It needs
> to be
> aware of the sizes of the values it is storing:
> - It will need to pass value size to tree_insert()
> - It will need to know value size to write it into tmp file.
> - It will need to read it back when merging.
> How does it know it?
> Unique::merge() and Unique::walk() with merge_walk() functions do make
> certain
> assumptions about the data format being used.
> == The review points ==
> Is this bad?
> I think tight coupling between the Unique object and the specifics of the
> data format it is storing is bad. What is worse is that currently this
> coupling
> is implicit, it is not immediately apparent to somebody just looking at the
> code.

One important thing to keep in mind here is when Unique is flushed to disk
then the code that does the merging of the different chunks of file and
returning the results shares the code with filesort merge procedure.
So I think either we move the code for Unique separately or we will have to
make our data format be similar to the one used by packed sort keys
for filesort.

Currently the format for packed sort-keys is like:


where keypart_i_length is optional and only stored for packable key parts

> Possible ways out:
> 1. Explicitly declare in Unique class that variable-sized records must use
>   certain data format.

> 2. Isolate the Unique from the specifics of the data format used.
> 2A. Let Unique get the size of the key as argument of unique_add() and then
>   keep it together with the key. This creates some RAM/storage overhead.
> 2B. Pass a read_size() callback function which would get the size from the
> data?
>  The way packed keys are used is that the length is added as 4 bytes to
the record
in the beginning and then a static function is used to read its length
whenever needed.
Also I moved the implementation for packed keys in a separate class so that
we can extend unique in the future too if needed.


> BR
>  Sergei
> --
> Sergei Petrunia, Software Developer
> MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp