← Back to team overview

maria-developers team mailing list archive

Re: 940d028521f: MDEV-30164 System variable for default collations

 

Hello Sergei,


Please review a new patch version:

https://github.com/MariaDB/server/commit/8d51c6d234b1730d4ff3b2c1fe7828eeca81998b


Comments go inline:

As we discussed on slack, my plan was not to change the hard-coded
default collations in the collation library.
So in the library it will always be utf8mb4_general_ci as a default
collation for utf8mb4. And the server default collations will be
handled by this new variable character_set_collations.
I did not mean this variable to be something fringe or temporary.

I see. Then collations set as default this way must be undistinguishable
in behavior from compiled-in defaults.
In particular, SHOW COLLATIONS, I_S.COLLATIONS, etc (if anything) should
show them as default.

Done.


And performance considerations become more important, I wasn't paying
much attention to that, as I thought it's a fringe feature.

Like, a linear search in Charset_collation_map_st could be replaced with
a binary. A HASH might be too much overhead, but a binary search in the
array could be worth it. As it's a search that every single user will be
using in the future, perhaps even in every single query. Or even many
times per query.

Done. It now uses binary search.



+  LEX_CSTRING character_set_collations_str;

better LEX_CUSTRING, don't you think? It's not even a _str.

Right, it's not _str. It's a binary data. Removed the suffix.

What are benefits of LEX_CUSTRING?

LEX_CSTRING is a string, char*.

LEX_CUSTRING is an array of bytes, uchar*. I think it's more appropriate
here.

I tried LEX_CUSTRING. It required a little bit more code than
the current version. I propose to keep it LEX_CSTRING.


diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
--- a/sql/log_event_server.cc
+++ b/sql/log_event_server.cc
@@ -1194,6 +1194,14 @@ bool Query_log_event::write()
       int2store(start+2, auto_increment_offset);
       start+= 4;
     }
+
+  if (thd && thd->variables.character_set_collations.count())
+  {
+    *start++= Q_COLLATIONS_SESSION;
+    size_t len= thd->variables.character_set_collations.to_binary((char*)start);
+    start+= len;
+  }

Perhaps, detect if it's needed? A cheap way of doing it would be to
extend your Elem_st with a query_id. And every time you
find_elem_by_charset, you set this elem's query_id to thd->query_id.
And here you write only elements with the current query id. If any.

As agreed on Slack, I did it in the following way:

The map is now written to binlog only if the statement
performed a "get default collation for a character set" resolution.

Most DML statements do not do such resolution, so this
approach is very good to reduce the number of
map writes. And it's very cheap and simple: a new flag in THD::used.

Additionally, as discussed on slack, I added automatic PS re-prepare
if @@character_set_collations has changed between PREPARE and EXECUTE.


Thanks.


References