maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #13305
Re: 940d028521f: MDEV-30164 System variable for default collations
Hello Sergei,
Please review a new patch version:
https://github.com/MariaDB/server/commit/8d51c6d234b1730d4ff3b2c1fe7828eeca81998b
Comments go inline:
As we discussed on slack, my plan was not to change the hard-coded
default collations in the collation library.
So in the library it will always be utf8mb4_general_ci as a default
collation for utf8mb4. And the server default collations will be
handled by this new variable character_set_collations.
I did not mean this variable to be something fringe or temporary.
I see. Then collations set as default this way must be undistinguishable
in behavior from compiled-in defaults.
In particular, SHOW COLLATIONS, I_S.COLLATIONS, etc (if anything) should
show them as default.
Done.
And performance considerations become more important, I wasn't paying
much attention to that, as I thought it's a fringe feature.
Like, a linear search in Charset_collation_map_st could be replaced with
a binary. A HASH might be too much overhead, but a binary search in the
array could be worth it. As it's a search that every single user will be
using in the future, perhaps even in every single query. Or even many
times per query.
Done. It now uses binary search.
+ LEX_CSTRING character_set_collations_str;
better LEX_CUSTRING, don't you think? It's not even a _str.
Right, it's not _str. It's a binary data. Removed the suffix.
What are benefits of LEX_CUSTRING?
LEX_CSTRING is a string, char*.
LEX_CUSTRING is an array of bytes, uchar*. I think it's more appropriate
here.
I tried LEX_CUSTRING. It required a little bit more code than
the current version. I propose to keep it LEX_CSTRING.
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
--- a/sql/log_event_server.cc
+++ b/sql/log_event_server.cc
@@ -1194,6 +1194,14 @@ bool Query_log_event::write()
int2store(start+2, auto_increment_offset);
start+= 4;
}
+
+ if (thd && thd->variables.character_set_collations.count())
+ {
+ *start++= Q_COLLATIONS_SESSION;
+ size_t len= thd->variables.character_set_collations.to_binary((char*)start);
+ start+= len;
+ }
Perhaps, detect if it's needed? A cheap way of doing it would be to
extend your Elem_st with a query_id. And every time you
find_elem_by_charset, you set this elem's query_id to thd->query_id.
And here you write only elements with the current query id. If any.
As agreed on Slack, I did it in the following way:
The map is now written to binlog only if the statement
performed a "get default collation for a character set" resolution.
Most DML statements do not do such resolution, so this
approach is very good to reduce the number of
map writes. And it's very cheap and simple: a new flag in THD::used.
Additionally, as discussed on slack, I added automatic PS re-prepare
if @@character_set_collations has changed between PREPARE and EXECUTE.
Thanks.
References