maria-developers team mailing list archive

Thread
Date

Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Thu, 21 Oct 2021 12:41:59 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <YW9LKO0oL0z/d0Ka@pweza>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

Hello Sergei,

Thanks for your review.

I've pushed a new patch version to bb-10.7-bar-uuid.

Please see comments inline:


On 10/20/21 2:48 AM, Sergei Golubchik wrote:

  20000000-0000-0000-0000-000000000002
+10000000-0000-0000-0000-000000000003
  20000000-0000-0000-0000-000000000003


please, add a test for `ORDER BY CONCAT(a)` just to test the
lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ?
Or `UNHEX` ? Whatever the recommended approach should be.


I propose to recommend using CAST(uuid AS BINARY(16)) for
lexicographical order. This CAST only performs byte reordering
(conversion from the in-record to in-memory format)

While CONCAT and HEX additionally involve BINARY(16) -> VARCHAR(32)conversion through UUID::to_string().


The former should be faster.

  DROP TABLE t1;
  #
diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
index 42bb74d4b01..eb7a51895b9 100644
--- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
+++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
@@ -25,7 +25,7 @@ a
  00000000-0000-0000-0000-0000000000ff
  EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff';
  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	ref	a	a	17	const	2	Using where
+1	SIMPLE	t1	ref	a	a	17	const	4	Using where


why?

Hash collision now happens differently, this affects the optimizerestimation for certain values.



Note, UUID::hash_record() is NOT used by the HEAP engine,
as it might seem.

UUID::hash_record() is only used only by partitioning.

So partitioning distribution has not changed, because Field_fbt::hash()uses UUID::hash_record(), which hashes bytes in the "user visible"(in-memory) order. I made this for partition distribution compatibilitywith the BINARY(16) data type.


But HEAP distribution has changed, because HEAP still uses
my_charset_bin.hash_sort() directly on the entire 16 byte in-record
value, which is now different. We could preserve the old HEAP
distribution by introducing a special CHARSET_INFO based on
my_charset_bin, but overriding hash_sort().
But I think it's not really necessary.


I checked on a bigger table, and it looks fine.
For some constants it returns "2 rows".
For some constants it returns "4 rows".

So it did not change to worse. Just a coincidence.

diff --git a/plugin/type_uuid/sql_type_uuid.h b/plugin/type_uuid/sql_type_uuid.h
index 9076c874e98..be9fea8ebc9 100644
--- a/plugin/type_uuid/sql_type_uuid.h
+++ b/plugin/type_uuid/sql_type_uuid.h
@@ -24,8 +24,162 @@ class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH>

...

+  // Compare two in-memory values
+  static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b)
+  {


this is way more LoC that I would've used.
but ok, it's your code and it's a plugin, so as you like

diff --git a/sql/sql_string.h b/sql/sql_string.h
index fe57c8153bb..795f80c3e08 100644
--- a/sql/sql_string.h
+++ b/sql/sql_string.h
@@ -484,6 +484,11 @@ class Binary_string: public Sql_alloc
      if (str.Alloced_length)
        Alloced_length= (uint32) (str.Alloced_length - offset);
    }
+  LEX_CSTRING to_lex_cstring() const
+  {
+    LEX_CSTRING tmp= {Ptr, str_length};
+    return tmp;
+  }
    inline LEX_CSTRING *get_value(LEX_CSTRING *res)


may be you could remove get_value()? In a separate commit.
the name is quite bad and the method makes rather little sense,
especially if there's to_lex_cstring() now.


I am all for it.

Hope Monty won't mind - he added get_value().

    {
      res->str=    Ptr;
diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h
index c6e3d20bcfa..5141cb9fad4 100644
--- a/sql/sql_type_fixedbin.h
+++ b/sql/sql_type_fixedbin.h
@@ -154,18 +160,13 @@ class FixedBinTypeBundle
                                               FbtImpl::max_char_length()+1));
        return false;
      }
-    int cmp(const char *str, size_t length) const
-    {
-      DBUG_ASSERT(length == sizeof(m_buffer));
-      return memcmp(m_buffer, str, length);
-    }
      int cmp(const Binary_string &other) const


what is this one used for?


It's just a convenience wrapper.

It is used only in stored_field_cmp_to_item() for now.

+ static ulong KEY_pack_flags(uint column_nr)

+  {
+    /*
+      Return zero by default. A particular data type can override
+      this method return some flags, e.g. HA_PACK_KEY to enable
+      key prefix compression.


what if you change it later, will it make old tables corrupted?


I did not check.
This method is used only in mysql_prepare_create_table().
Then flags are stored inside FRM. In theory the change should not
make old tables corrupted.

Should I try with different engines?


Your question made me also think what should be the default.
Probably we should do it the other way around:
- enable compression by default
- disable compression in INET6
What do you think?

+    */
+    return 0;
+  }


Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx

Follow ups

Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner
From: Sergei Golubchik, 2021-10-21

References

Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner
From: Sergei Golubchik, 2021-10-19