maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12947
Re: dfa33fb309d: MDEV-26664 Store UUIDs in a more efficient manner
Hello Sergei,
Thanks for your review.
I've pushed a new patch version to bb-10.7-bar-uuid.
Please see comments inline:
On 10/20/21 2:48 AM, Sergei Golubchik wrote:
20000000-0000-0000-0000-000000000002
+10000000-0000-0000-0000-000000000003
20000000-0000-0000-0000-000000000003
please, add a test for `ORDER BY CONCAT(a)` just to test the
lexicographical ordering of UUIDs. Or may be `ORDER BY HEX(a)` ?
Or `UNHEX` ? Whatever the recommended approach should be.
I propose to recommend using CAST(uuid AS BINARY(16)) for
lexicographical order. This CAST only performs byte reordering
(conversion from the in-record to in-memory format)
While CONCAT and HEX additionally involve BINARY(16) -> VARCHAR(32)
conversion through UUID::to_string().
The former should be faster.
DROP TABLE t1;
#
diff --git a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
index 42bb74d4b01..eb7a51895b9 100644
--- a/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
+++ b/plugin/type_uuid/mysql-test/type_uuid/type_uuid_memory.result
@@ -25,7 +25,7 @@ a
00000000-0000-0000-0000-0000000000ff
EXPLAIN SELECT * FROM t1 WHERE a='00000000-0000-0000-0000-0000000000ff';
id select_type table type possible_keys key key_len ref rows Extra
-1 SIMPLE t1 ref a a 17 const 2 Using where
+1 SIMPLE t1 ref a a 17 const 4 Using where
why?
Hash collision now happens differently, this affects the optimizer
estimation for certain values.
Note, UUID::hash_record() is NOT used by the HEAP engine,
as it might seem.
UUID::hash_record() is only used only by partitioning.
So partitioning distribution has not changed, because Field_fbt::hash()
uses UUID::hash_record(), which hashes bytes in the "user visible"
(in-memory) order. I made this for partition distribution compatibility
with the BINARY(16) data type.
But HEAP distribution has changed, because HEAP still uses
my_charset_bin.hash_sort() directly on the entire 16 byte in-record
value, which is now different. We could preserve the old HEAP
distribution by introducing a special CHARSET_INFO based on
my_charset_bin, but overriding hash_sort().
But I think it's not really necessary.
I checked on a bigger table, and it looks fine.
For some constants it returns "2 rows".
For some constants it returns "4 rows".
So it did not change to worse. Just a coincidence.
diff --git a/plugin/type_uuid/sql_type_uuid.h b/plugin/type_uuid/sql_type_uuid.h
index 9076c874e98..be9fea8ebc9 100644
--- a/plugin/type_uuid/sql_type_uuid.h
+++ b/plugin/type_uuid/sql_type_uuid.h
@@ -24,8 +24,162 @@ class UUID: public FixedBinTypeStorage<MY_UUID_SIZE, MY_UUID_STRING_LENGTH>
...
+ // Compare two in-memory values
+ static int cmp(const LEX_CSTRING &a, const LEX_CSTRING &b)
+ {
this is way more LoC that I would've used.
but ok, it's your code and it's a plugin, so as you like
diff --git a/sql/sql_string.h b/sql/sql_string.h
index fe57c8153bb..795f80c3e08 100644
--- a/sql/sql_string.h
+++ b/sql/sql_string.h
@@ -484,6 +484,11 @@ class Binary_string: public Sql_alloc
if (str.Alloced_length)
Alloced_length= (uint32) (str.Alloced_length - offset);
}
+ LEX_CSTRING to_lex_cstring() const
+ {
+ LEX_CSTRING tmp= {Ptr, str_length};
+ return tmp;
+ }
inline LEX_CSTRING *get_value(LEX_CSTRING *res)
may be you could remove get_value()? In a separate commit.
the name is quite bad and the method makes rather little sense,
especially if there's to_lex_cstring() now.
I am all for it.
Hope Monty won't mind - he added get_value().
{
res->str= Ptr;
diff --git a/sql/sql_type_fixedbin.h b/sql/sql_type_fixedbin.h
index c6e3d20bcfa..5141cb9fad4 100644
--- a/sql/sql_type_fixedbin.h
+++ b/sql/sql_type_fixedbin.h
@@ -154,18 +160,13 @@ class FixedBinTypeBundle
FbtImpl::max_char_length()+1));
return false;
}
- int cmp(const char *str, size_t length) const
- {
- DBUG_ASSERT(length == sizeof(m_buffer));
- return memcmp(m_buffer, str, length);
- }
int cmp(const Binary_string &other) const
what is this one used for?
It's just a convenience wrapper.
It is used only in stored_field_cmp_to_item() for now.
+ static ulong KEY_pack_flags(uint column_nr)
+ {
+ /*
+ Return zero by default. A particular data type can override
+ this method return some flags, e.g. HA_PACK_KEY to enable
+ key prefix compression.
what if you change it later, will it make old tables corrupted?
I did not check.
This method is used only in mysql_prepare_create_table().
Then flags are stored inside FRM. In theory the change should not
make old tables corrupted.
Should I try with different engines?
Your question made me also think what should be the default.
Probably we should do it the other way around:
- enable compression by default
- disable compression in INET6
What do you think?
+ */
+ return 0;
+ }
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx
Follow ups
References