← Back to team overview

maria-developers team mailing list archive

Re: 49ecf935415: MDEV-27009 Add UCA-14.0.0 collations

 

Hi, Alexander,

Few comments/questions below.
Meanwhile I'm reviewing bb-10.9-bar-uca14

On May 26, Alexander Barkov wrote:
> 
> By the way, perhaps some of these statements should display
> short collation names:
> 
>    SHOW CREATE TABLE t1;
>    SHOW CREATE DATABASE db1;
>    SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
>    SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
>    SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
> 
> Can we discuss this?

Short names, I guess. First two - for readability, the last three - so
that one could join with INFORMATION_SCHEMA.COLLATIONS table.

> >> diff --git a/include/m_ctype.h b/include/m_ctype.h
> >> index 4c6628b72b3..706764ead2a 100644
> >> --- a/include/m_ctype.h
> >> +++ b/include/m_ctype.h
> >> @@ -240,6 +242,46 @@ typedef enum enum_repertoire_t
> >>   } my_repertoire_t;
> >>   
> >> +/* ID compatibility */
> >> +typedef enum enum_collation_id_type
> >> +{
> >> +  MY_COLLATION_ID_TYPE_PRECISE=          0,
> >> +  MY_COLLATION_ID_TYPE_COMPAT_100800=    1
> >> +} my_collation_id_type_t;
> >> +
> >> +/* Collation name display modes */
> >> +typedef enum enum_collation_name_mode
> >> +{
> >> +  MY_COLLATION_NAME_MODE_FULL=                                 0,
> >> +  MY_COLLATION_NAME_MODE_CONTEXT=                              1
> >> +} my_collation_name_mode_t;
> >> +
> >> +/* Level flags */
> >> +#define MY_CS_LEVEL_BIT_PRIMARY    0x00
> >> +#define MY_CS_LEVEL_BIT_SECONDARY  0x01
> >> +#define MY_CS_LEVEL_BIT_TERTIARY   0x02
> >> +#define MY_CS_LEVEL_BIT_QUATERNARY 0x03
> >> +
> >> +#define MY_CS_COLL_LEVELS_S1       (1<<MY_CS_LEVEL_BIT_PRIMARY)
> >> +
> >> +#define MY_CS_COLL_LEVELS_AI_CS    (1<<MY_CS_LEVEL_BIT_PRIMARY)| \
> >> +                                   (1<<MY_CS_LEVEL_BIT_TERTIARY)
> >> +
> >> +#define MY_CS_COLL_LEVELS_S2       (1<<MY_CS_LEVEL_BIT_PRIMARY)| \
> >> +                                   (1<<MY_CS_LEVEL_BIT_SECONDARY)
> >> +
> >> +#define MY_CS_COLL_LEVELS_S3       (1<<MY_CS_LEVEL_BIT_PRIMARY)| \
> >> +                                   (1<<MY_CS_LEVEL_BIT_SECONDARY) | \
> >> +                                   (1<<MY_CS_LEVEL_BIT_TERTIARY)
> > 
> > AI_CS and S3 don't seem to be used yet
> 
> Right, there are no old _AI_CS and _AS_CS (aka S3) collations.
> 
> New _AI_CS and _AS_CS collations definitions
> are initialized by this function:
> 
> my_uca1400_collation_definition_init(MY_CHARSET_LOADER *loader,
>                                       struct charset_info_st *dst,
>                                       uint id)
> 
> Level flags are calculated by this function from "id".
> 
> So there are no hard-coded definitions with
> MY_CS_COLL_LEVELS_AI_CS and MY_CS_COLL_LEVELS_S3 either.
> 
> Should I remove these definitions?

as you like, but if you want to keep them - add a comment. otherwise
anyone can remove them when they'll see those defines are unused
 
> >> +
> >> +#define MY_CS_COLL_LEVELS_S4       (1<<MY_CS_LEVEL_BIT_PRIMARY)| \
> >> +                                   (1<<MY_CS_LEVEL_BIT_SECONDARY) | \
> >> +                                   (1<<MY_CS_LEVEL_BIT_TERTIARY)  | \
> >> +                                   (1<<MY_CS_LEVEL_BIT_QUATERNARY)
> >> +
> >>   /* Flags for strxfrm */
> >>   #define MY_STRXFRM_LEVEL1          0x00000001 /* for primary weights   */
> >>   #define MY_STRXFRM_LEVEL2          0x00000002 /* for secondary weights */
> >> diff --git a/strings/ctype-simple.c b/strings/ctype-simple.c
> >> index b579f0af203..d09dfba86ed 100644
> >> --- a/strings/ctype-simple.c
> >> +++ b/strings/ctype-simple.c
> >> @@ -1940,13 +1941,26 @@ my_bool my_propagate_complex(CHARSET_INFO *cs __attribute__((unused)),
...
> > 
> > why do you still use the old concept of "strength"? Why not to use
> > bitmap consistently everywhere?
> 
> The collation definition file Index.xml is based on the LDML syntax.
> 
> It uses tags like this:
> 
> <settings strength="2"/>
> 
> This function is needed to handle these LDML tags.
> 
> Btw, to define user-defined _AI_CS collations we'll need
> to add an LDML extension eventually.

Hmm. Are you saying that LDML cannot describe a UCA 14.0.0 collation?

Or L1+L3 without L2 isn't standard?

> >> +}
> >> +
> >> diff --git a/strings/ctype-uca1400data.h b/strings/ctype-uca1400data.h
> >> --- /dev/null
> >> +++ b/strings/ctype-uca1400data.h
> >> @@ -0,0 +1,44151 @@
> >> +/*
> >> +  Generated from allkeys.txt version '14.0.0'
> >> +*/
> > 
> > if it's generated, do you need to check it in?
> > perhaps it should be generated during the build?
> > you've checked in allkeys1400.txt anyway.
> 
> Right, we can consider it.
> 
> Btw, I've checked it all versions:
> 
> $ ls mysql-test/std_data/unicode/
> 
> allkeys1400.txt
> allkeys400.txt
> allkeys520.txt
> 
> So we can generate sources for all three UCA versions
> from these files.
> 
> But I suggest we do it separately.
> Should I create an MDEV for this?

Why would you want to add to the history a file that is generated?
It'll be there forever.

You already have uca-dump.c in the tree, running it during the build is
a few lines in CMakeLists.txt, basically copy-paste, because we already
do it for gen_lex_hash and comp_err.

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


Follow ups

References