maria-developers team mailing list archive

Thread
Date
Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner

To: Alexander Barkov <bar@xxxxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Wed, 20 Oct 2021 00:48:08 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <Ev7yDCBGxZK.A.v0G.1HsbhB@pweza>
Hi, Alexander!

On Oct 19, Alexander Barkov wrote:
> revision-id: dfa33fb309d (mariadb-10.6.1-84-gdfa33fb309d)
> parent(s): 5cfb31854a4
> author: Alexander Barkov
> committer: Alexander Barkov
> timestamp: 2021-10-19 13:27:19 +0400
> message:
> 
> MDEV-26664 Store UUIDs in a more efficient manner
> 
> UUID values
> 
>   llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn
> 
> are now stored as
> 
>   nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll
> 
> inside the record:
> 
> - the groups (segments separated by dash) are reordered right-to-left.
> - the bytes inside the groups are not reordered (stored as before,
>   in big-endian format).
> 
> This provides a better sorting order: the earlier UUID was generated,
> the higher it appears in the ORDER BY output.
> 
> Also, this change enables a good key prefix compression,
> because the constant part is now in the beginning, while
> the non-constant part (the timestamp) is in the end.
> 
> diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result
> index 6a08ffe8c23..6e2e1cc87ef 100644
> --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result
> +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid.result
> @@ -596,10 +596,10 @@ INSERT INTO t1 VALUES (CAST(CONCAT('2','0000000-0000-0000-0000-000000000003') AS
>  SELECT * FROM t1 ORDER BY a;
>  a
>  10000000-0000-0000-0000-000000000001
> -10000000-0000-0000-0000-000000000002
> -10000000-0000-0000-0000-000000000003
>  20000000-0000-0000-0000-000000000001
> +10000000-0000-0000-0000-000000000002
>  20000000-0000-0000-0000-000000000002
> +10000000-0000-0000-0000-000000000003
>  20000000-0000-0000-0000-000000000003

please, add a test for `ORDER BY CONCAT(a)` just to test the
lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ?
Or `UNHEX` ? Whatever the recommended approach should be.

>  DROP TABLE t1;
>  #
> diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
> index 42bb74d4b01..eb7a51895b9 100644
> --- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
> +++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
> @@ -25,7 +25,7 @@ a
>  00000000-0000-0000-0000-0000000000ff
>  EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff';
>  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> -1	SIMPLE	t1	ref	a	a	17	const	2	Using where
> +1	SIMPLE	t1	ref	a	a	17	const	4	Using where

why?

> diff --git a/plugin/type_uuid/sql_type_uuid.h b/plugin/type_uuid/sql_type_uuid.h
> index 9076c874e98..be9fea8ebc9 100644
> --- a/plugin/type_uuid/sql_type_uuid.h
> +++ b/plugin/type_uuid/sql_type_uuid.h
> @@ -24,8 +24,162 @@ class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH>
...
> +  // Compare two in-memory values
> +  static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b)
> +  {

this is way more LoC that I would've used.
but ok, it's your code and it's a plugin, so as you like
  
> diff --git a/sql/sql_string.h b/sql/sql_string.h
> index fe57c8153bb..795f80c3e08 100644
> --- a/sql/sql_string.h
> +++ b/sql/sql_string.h
> @@ -484,6 +484,11 @@ class Binary_string: public Sql_alloc
>      if (str.Alloced_length)
>        Alloced_length= (uint32) (str.Alloced_length - offset);
>    }
> +  LEX_CSTRING to_lex_cstring() const
> +  {
> +    LEX_CSTRING tmp= {Ptr, str_length};
> +    return tmp;
> +  }
>    inline LEX_CSTRING *get_value(LEX_CSTRING *res)

may be you could remove get_value()? In a separate commit.
the name is quite bad and the method makes rather little sense,
especially if there's to_lex_cstring() now.

>    {
>      res->str=    Ptr;
> diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h
> index c6e3d20bcfa..5141cb9fad4 100644
> --- a/sql/sql_type_fixedbin.h
> +++ b/sql/sql_type_fixedbin.h
> @@ -154,18 +160,13 @@ class FixedBinTypeBundle
>                                               FbtImpl::max_char_length()+1));
>        return false;
>      }
> -    int cmp(const char *str, size_t length) const
> -    {
> -      DBUG_ASSERT(length == sizeof(m_buffer));
> -      return memcmp(m_buffer, str, length);
> -    }
>      int cmp(const Binary_string &other) const

what is this one used for?

>      {
> -      return cmp(other.ptr(), other.length());
> +      return FbtImpl::cmp(FbtImpl::to_lex_cstring(), other.to_lex_cstring());
>      }
>      int cmp(const Fbt &other) const
>      {
> -      return memcmp(m_buffer, other.m_buffer, sizeof(m_buffer));
> +      return FbtImpl::cmp(FbtImpl::to_lex_cstring(), other.to_lex_cstring());
>      }
>    };
>  
> diff --git a/sql/sql_type_fixedbin_storage.h b/sql/sql_type_fixedbin_storage.h
> index 0e8ac81ab59..6e18335bd4c 100644
> --- a/sql/sql_type_fixedbin_storage.h
> +++ b/sql/sql_type_fixedbin_storage.h
> @@ -21,6 +21,38 @@
>    format and their own (variable size) canonical string representation.
>  
>    Examples are INET6 and UUID types.
> +
> +  The MariaDB server uses three binary representations of a data type:
> +
> +  1. In-memory binary representation (user visible)
> +    This representation:
> +    - can be used in INSERT..VALUES (X'AABBCC')
> +    - can be used in WHERE conditions: WHERE c1=X'AABBCC'
> +    - is returned by CAST(x AS BINARY(N))
> +    - is returned by Field::val_native() and Item::val_native()
> +
> +  2. In-record binary representation (user invisible)
> +    This representation:
> +    - is used in records (is pointed by Field::ptr)
> +    - must be comparable by memcmp()
> +
> +  3. Binlog binary (row) representation
> +    Usually, for string data types the binlog representation
> +    is based on the in-record representation with trailing byte compression:
> +    - trailing space compression for text string data types
> +    - trailing zero compression for binary string data types
> +
> +  We have to have separate in-memory and in-record representations
> +  because we use HA_KEYTYPE_BINARY for indexing. The engine API
> +  does not have a way to pass a comparison function as a parameter.
> +
> +  The default implementation below assumes that:
> +  - the in-memory and in-record representations are equal
> +  - the binlog representation is compatible with BINARY(N)
> +  This is OK for simple data types, like INET6.
> +
> +  Data type implementations that need different representations
> +  can override the default implementation (like e.g. UUID does).
>  */

thanks, this is very good

>  
>  /***********************************************************************/
> @@ -64,5 +137,37 @@ class FixedBinTypeStorage
>      return true;
>    }
>  
> +  static ulong KEY_pack_flags(uint column_nr)
> +  {
> +    /*
> +      Return zero by default. A particular data type can override
> +      this method return some flags, e.g. HA_PACK_KEY to enable
> +      key prefix compression.

what if you change it later, will it make old tables corrupted?

> +    */
> +    return 0;
> +  }

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx
Follow ups

Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner
From: Alexander Barkov, 2021-10-21