← Back to team overview

maria-developers team mailing list archive

Re: e7762b3a725: MDEV-8334: Rename utf8 to utf8mb3

 

Hi, Rucha!

Looks great!

Just one question below re. using global vs session old_behavior
variable.

On May 05, Rucha Deodhar wrote:
> revision-id: e7762b3a725 (mariadb-10.5.2-583-ge7762b3a725)
> parent(s): bfedf1eb4b6
> author: Rucha Deodhar <rucha.deodhar@xxxxxxxxxxx>
> committer: Rucha Deodhar <rucha.deodhar@xxxxxxxxxxx>
> timestamp: 2021-04-20 12:50:32 +0530
> message:
> 
> MDEV-8334: Rename utf8 to utf8mb3
> 
> This patch changes the main name of 3 byte character set from utf8 to
> utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
> so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.

> diff --git a/libmariadb b/libmariadb
> --- a/libmariadb
> +++ b/libmariadb
> @@ -1 +1 @@
> -Subproject commit b6f8883d9687936a50a7ed79bd9e5af2340efccd
> +Subproject commit 03d983b287f8a1fe855cb5ed479a3f7ab4f922ab

when rebasing and pushing into 10.6 take care not to
rollback C/C changes. That is, only update C/C submodule reference
if it's earlier than your 03d983b287f8a1fe855cb5ed479a3f7ab4f922ab

> Binary files a/mysql-test/suite/sys_vars/r/character_set_results_basic.result and b/mysql-test/suite/sys_vars/r/character_set_results_basic.result differ
> diff --git a/sql/item.cc b/sql/item.cc
> --- a/sql/item.cc
> +++ b/sql/item.cc
> @@ -2359,6 +2359,9 @@ left_is_superset(const DTCollation *left, const DTCollation *right)
>  
>  bool DTCollation::aggregate(const DTCollation &dt, uint flags)
>  {
> +
> +  myf utf8_flag= global_system_variables.old_behavior &
> +                 OLD_MODE_UTF8_IS_UTF8MB3 ? MY_UTF8_IS_UTF8MB3 : 0;

if old_behavior is a session variable, then you should use session
value here.

>    if (!my_charset_same(collation, dt.collation))
>    {
>      /* 
> diff --git a/sql/mysqld.cc b/sql/mysqld.cc
> --- a/sql/mysqld.cc
> +++ b/sql/mysqld.cc
> @@ -4099,6 +4098,8 @@ static int init_common_variables()
>      test purposes, to be able to start "mysqld" even if
>      the requested character set is not available (see bug#18743).
>    */
> +  myf utf8_flag= global_system_variables.old_behavior &
> +                 OLD_MODE_UTF8_IS_UTF8MB3 ? MY_UTF8_IS_UTF8MB3 : 0;

ok, here there's no "session" yet, so global value is correct

>    for (;;)
>    {
>      char *next_character_set_name= strchr(default_character_set_name, ',');
> diff --git a/sql/set_var.cc b/sql/set_var.cc
> --- a/sql/set_var.cc
> +++ b/sql/set_var.cc
> @@ -533,11 +533,12 @@ static my_old_conv old_conv[]=
>  CHARSET_INFO *get_old_charset_by_name(const char *name)
>  {
>    my_old_conv *conv;
> -
> +  myf utf8_flag= global_system_variables.old_behavior &
> +                 OLD_MODE_UTF8_IS_UTF8MB3 ? MY_UTF8_IS_UTF8MB3 : 0;

technically, you should use a session value here too.
but see the old_conv array, it doesn't have "utf8" anywhere,
so it doesn't matter what the flag is, you can use 0 there.

>    for (conv= old_conv; conv->old_name; conv++)
>    {
>      if (!my_strcasecmp(&my_charset_latin1, name, conv->old_name))
> -      return get_charset_by_csname(conv->new_name, MY_CS_PRIMARY, MYF(0));
> +      return get_charset_by_csname(conv->new_name, MY_CS_PRIMARY, MYF(utf8_flag));
>    }
>    return NULL;
>  }
> diff --git a/sql/sp.cc b/sql/sp.cc
> --- a/sql/sp.cc
> +++ b/sql/sp.cc
> @@ -291,7 +291,8 @@ bool load_charset(MEM_ROOT *mem_root,
>                    CHARSET_INFO **cs)
>  {
>    LEX_CSTRING cs_name;
> -
> +  myf utf8_flag= global_system_variables.old_behavior &
> +                 OLD_MODE_UTF8_IS_UTF8MB3 ? MY_UTF8_IS_UTF8MB3 : 0;

but this needs a session value, I suppose

>    if (field->val_str_nopad(mem_root, &cs_name))
>    {
>      *cs= dflt_cs;
> @@ -324,9 +325,10 @@ bool load_collation(MEM_ROOT *mem_root,
>      *cl= dflt_cl;
>      return TRUE;
>    }
> +  myf utf8_flag= thd->get_utf8_flag();

Hmm, here you do use a session value. So, I suppose your using global
value above was not an oversight, but you intentionally did it.

What were your reasons?

>  
>    DBUG_ASSERT(cl_name.str[cl_name.length] == 0);
> -  *cl= get_charset_by_name(cl_name.str, MYF(0));
> +  *cl= get_charset_by_name(cl_name.str, MYF(utf8_flag));
>  
>    if (*cl == NULL)
>    {
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -1051,14 +1052,16 @@ static inline void update_global_memory_status(int64 size)
>    @retval         NULL on error
>    @retval         Pointter to CHARSET_INFO with the given name on success
>  */
> -static inline CHARSET_INFO *
> -mysqld_collation_get_by_name(const char *name,
> +inline CHARSET_INFO *

static inline, please

> +mysqld_collation_get_by_name(const char *name, bool utf8_is_utf8mb3,

up to you, but wouldn't it be more convenient to
pass the flag here? Then you can invoke it with thd->utf8_flag()

>                               CHARSET_INFO *name_cs= system_charset_info)
>  {
>    CHARSET_INFO *cs;
>    MY_CHARSET_LOADER loader;
> +  myf utf8_flag= utf8_is_utf8mb3 ? MY_UTF8_IS_UTF8MB3 : 0;
>    my_charset_loader_init_mysys(&loader);
> -  if (!(cs= my_collation_get_by_name(&loader, name, MYF(0))))
> +
> +  if (!(cs= my_collation_get_by_name(&loader, name, MYF(utf8_flag))))
>    {
>      ErrConvString err(name, name_cs);
>      my_error(ER_UNKNOWN_COLLATION, MYF(0), err.ptr());
> diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
> --- a/sql/sql_parse.cc
> +++ b/sql/sql_parse.cc
> @@ -10449,7 +10449,10 @@ merge_charset_and_collation(CHARSET_INFO *cs, CHARSET_INFO *cl)
>  CHARSET_INFO *find_bin_collation(CHARSET_INFO *cs)
>  {
>    const char *csname= cs->csname;
> -  cs= get_charset_by_csname(csname, MY_CS_BINSORT, MYF(0));
> +  myf utf8_flag= global_system_variables.old_behavior &
> +                  OLD_MODE_UTF8_IS_UTF8MB3 ?
> +                  MY_UTF8_IS_UTF8MB3 : 0;

Why not a session value here?

> +  cs= get_charset_by_csname(csname, MY_CS_BINSORT, MYF(utf8_flag));
>    if (!cs)
>    {
>      char tmp[65];
 
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx