← Back to team overview

maria-developers team mailing list archive

Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner

 

Hello Sergei,

Thanks for your review.

I've pushed a new patch version to bb-10.7-bar-uuid.

Please see comments inline:


On 10/20/21 2:48 AM, Sergei Golubchik wrote:
  20000000-0000-0000-0000-000000000002
+10000000-0000-0000-0000-000000000003
  20000000-0000-0000-0000-000000000003

please, add a test for `ORDER BY CONCAT(a)` just to test the
lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ?
Or `UNHEX` ? Whatever the recommended approach should be.

I propose to recommend using CAST(uuid AS BINARY(16)) for
lexicographical order. This CAST only performs byte reordering
(conversion from the in-record to in-memory format)

While CONCAT and HEX additionally involve BINARY(16) -> VARCHAR(32) conversion through UUID::to_string().

The former should be faster.


  DROP TABLE t1;
  #
diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
index 42bb74d4b01..eb7a51895b9 100644
--- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
+++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
@@ -25,7 +25,7 @@ a
  00000000-0000-0000-0000-0000000000ff
  EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff';
  id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	ref	a	a	17	const	2	Using where
+1	SIMPLE	t1	ref	a	a	17	const	4	Using where

why?

Hash collision now happens differently, this affects the optimizer estimation for certain values.


Note, UUID::hash_record() is NOT used by the HEAP engine,
as it might seem.

UUID::hash_record() is only used only by partitioning.
So partitioning distribution has not changed, because Field_fbt::hash() uses UUID::hash_record(), which hashes bytes in the "user visible" (in-memory) order. I made this for partition distribution compatibility with the BINARY(16) data type.

But HEAP distribution has changed, because HEAP still uses
my_charset_bin.hash_sort() directly on the entire 16 byte in-record
value, which is now different. We could preserve the old HEAP
distribution by introducing a special CHARSET_INFO based on
my_charset_bin, but overriding hash_sort().
But I think it's not really necessary.


I checked on a bigger table, and it looks fine.
For some constants it returns "2 rows".
For some constants it returns "4 rows".

So it did not change to worse. Just a coincidence.



diff --git a/plugin/type_uuid/sql_type_uuid.h b/plugin/type_uuid/sql_type_uuid.h
index 9076c874e98..be9fea8ebc9 100644
--- a/plugin/type_uuid/sql_type_uuid.h
+++ b/plugin/type_uuid/sql_type_uuid.h
@@ -24,8 +24,162 @@ class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH>
...
+  // Compare two in-memory values
+  static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b)
+  {

this is way more LoC that I would've used.
but ok, it's your code and it's a plugin, so as you like
diff --git a/sql/sql_string.h b/sql/sql_string.h
index fe57c8153bb..795f80c3e08 100644
--- a/sql/sql_string.h
+++ b/sql/sql_string.h
@@ -484,6 +484,11 @@ class Binary_string: public Sql_alloc
      if (str.Alloced_length)
        Alloced_length= (uint32) (str.Alloced_length - offset);
    }
+  LEX_CSTRING to_lex_cstring() const
+  {
+    LEX_CSTRING tmp= {Ptr, str_length};
+    return tmp;
+  }
    inline LEX_CSTRING *get_value(LEX_CSTRING *res)

may be you could remove get_value()? In a separate commit.
the name is quite bad and the method makes rather little sense,
especially if there's to_lex_cstring() now.

I am all for it.

Hope Monty won't mind - he added get_value().


    {
      res->str=    Ptr;
diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h
index c6e3d20bcfa..5141cb9fad4 100644
--- a/sql/sql_type_fixedbin.h
+++ b/sql/sql_type_fixedbin.h
@@ -154,18 +160,13 @@ class FixedBinTypeBundle
                                               FbtImpl::max_char_length()+1));
        return false;
      }
-    int cmp(const char *str, size_t length) const
-    {
-      DBUG_ASSERT(length == sizeof(m_buffer));
-      return memcmp(m_buffer, str, length);
-    }
      int cmp(const Binary_string &other) const

what is this one used for?

It's just a convenience wrapper.

It is used only in stored_field_cmp_to_item() for now.

+ static ulong KEY_pack_flags(uint column_nr)
+  {
+    /*
+      Return zero by default. A particular data type can override
+      this method return some flags, e.g. HA_PACK_KEY to enable
+      key prefix compression.

what if you change it later, will it make old tables corrupted?

I did not check.
This method is used only in mysql_prepare_create_table().
Then flags are stored inside FRM. In theory the change should not
make old tables corrupted.

Should I try with different engines?


Your question made me also think what should be the default.
Probably we should do it the other way around:
- enable compression by default
- disable compression in INET6
What do you think?



+    */
+    return 0;
+  }

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx



Follow ups

References