← Back to team overview

maria-developers team mailing list archive

Re: 940d028521f: MDEV-30164 System variable for default collations

 

Hi, Alexander,

I'm sorry it took a while.

I'm still thinking about the whole thing, it's a rather big change for
a really fringe functionality. But I failed to come up with something
better so far.

Code-wise the patch is mostly fine. See few small comments below, and
one slightly larger comment re. replication.

On Mar 07, Alexander Barkov wrote:
> revision-id: 940d028521f (mariadb-10.11.1-4-g940d028521f)
> parent(s): 71dedd0ebcd
> author: Alexander Barkov
> committer: Alexander Barkov
> timestamp: 2022-12-14 20:00:22 +0400
> message:
> 
> MDEV-30164 System variable for default collations
> 
> This patch adds a way to override default collations
> (or "character set collations") for desired character sets.
>
> diff --git a/sql/lex_charset.h b/sql/lex_charset.h
> --- a/sql/lex_charset.h
> +++ b/sql/lex_charset.h
> @@ -544,6 +699,20 @@ struct Lex_exact_charset_extended_collation_attrs_st
>    {
>      return m_ci;
>    }
> +  CHARSET_INFO *charset_info(const Charset_collation_map_st &map) const
> +  {
> +    switch (m_type)
> +    {
> +    case Lex_exact_charset_extended_collation_attrs_st::TYPE_CHARACTER_SET:
> +      return map.get_collation_for_charset(m_ci);
> +    case TYPE_EMPTY:

Lex_exact_charset_extended_collation_attrs_st::TYPE_EMPTY
(or all case labels without a struct name)

> +    case Lex_exact_charset_extended_collation_attrs_st::TYPE_CHARACTER_SET_COLLATE_EXACT:
> +    case Lex_exact_charset_extended_collation_attrs_st::TYPE_COLLATE_CONTEXTUALLY_TYPED:

COLLATE DEFAULT is TYPE_COLLATE_CONTEXTUALLY_TYPED.
shouldn't it use the map too?

> +    case Lex_exact_charset_extended_collation_attrs_st::TYPE_COLLATE_EXACT:
> +      break;
> +    }
> +    return m_ci;
> +  }
>    Type type() const
>    {
>      return m_type;
> diff --git a/sql/log_event.h b/sql/log_event.h
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -2147,6 +2153,8 @@ class Query_log_event: public Log_event
>    bool sql_mode_inited;
>    bool charset_inited;
>  
> +  LEX_CSTRING character_set_collations_str;

better LEX_CUSTRING, don't you think? It's not even a _str.

> +
>    uint32 flags2;
>    sql_mode_t sql_mode;
>    ulong auto_increment_increment, auto_increment_offset;
> diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
> --- a/sql/log_event_server.cc
> +++ b/sql/log_event_server.cc
> @@ -1194,6 +1194,14 @@ bool Query_log_event::write()
>      int2store(start+2, auto_increment_offset);
>      start+= 4;
>    }
> +
> +  if (thd && thd->variables.character_set_collations.count())
> +  {
> +    *start++= Q_COLLATIONS_SESSION;
> +    size_t len= thd->variables.character_set_collations.to_binary((char*)start);
> +    start+= len;
> +  }

Perhaps, detect if it's needed? A cheap way of doing it would be to extend
your Elem_st with a query_id. And every time you find_elem_by_charset,
you set this elem's query_id to thd->query_id. And here you write only
elements with the current query id. If any.

Another approach would be to have a bitmap, like

  uchar used_default_coll_mapping;

and in find_elem_by_charset() you set the bit, like

  used_default_coll_mapping |= 1 << i;

and then, again, print affected collations, if any. Most often
used_default_coll_mapping will likely be zero

one more question. In, say, 10.10->11.1 replication
master and slave will have different default collations, but
thd->variables.character_set_collations will not reflect that.
How do you plan to solve it?

> +
>    if (charset_inited)
>    {
>      *start++= Q_CHARSET_CODE;
 
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


Follow ups