← Back to team overview

maria-developers team mailing list archive

Re: 026e52bad84: MDEV-27743 Remove Lex::charset

 

Hi, Alexander,

On Mar 21, Alexander Barkov wrote:
> >> +
> >> +DELETE FROM results
> >> +WHERE cs=''
> >> +  AND cl0='BINARY'
> >> +  AND cl1=''
> >> +  AND cl2=''
> >> +  AND tcs LIKE 'CHARACTER SET%'
> >> +  AND result NOT LIKE 'ERROR%'
> >> +  AND result RLIKE CONCAT('COLLATE ', tcs_character_set_name, '_bin');
> >> +SELECT ROW_COUNT();
> 
> I suggest too keep DELETEs.

sure, that's up to you

> >> diff --git a/sql/lex_charset.h b/sql/lex_charset.h
> >> new file mode 100644
> >> index 00000000000..d95f0bc1920
> >> --- /dev/null
> >> +++ b/sql/lex_charset.h
> >> @@ -0,0 +1,202 @@
> > ...
> >> +#define LEX_CHARSET_COLLATION_TYPE_BITS 2
> >> +  static_assert(((1<<LEX_CHARSET_COLLATION_TYPE_BITS)-1) >=
> >> +                 TYPE_COLLATE_CONTEXTUALLY_TYPED,
> >> +                 "Lex_charset_collation_st::Type bits check");
> >> +
> >> +protected:
> >> +  CHARSET_INFO *m_ci;
> >> +  Type m_type;
> >> +public:
> >> +  static CHARSET_INFO *find_bin_collation(CHARSET_INFO *cs);
> >> +  static CHARSET_INFO *find_default_collation(CHARSET_INFO *cs);
> >> +public:
> > ...
> >> +  void set_contextually_typed_binary_style()
> >> +  {
> >> +    m_ci= &my_collation_contextually_typed_binary;
> >> +    m_type= TYPE_COLLATE_CONTEXTUALLY_TYPED;
> >> +  }
> >> +  bool is_contextually_typed_collate_default() const
> >> +  {
> >> +    return m_ci == &my_collation_contextually_typed_default &&
> >> +           m_type == TYPE_COLLATE_CONTEXTUALLY_TYPED;
> > 
> > what does it mean when only one of these two expressions is true?
> > Say, the first true, the second false?
> > Or the first false, the second true?
> 
> As you know, in the final patch for "UCA-14.0.0 collations",
> UCA-14.0.0 context collations use a pointer to their utf8mb4
> CHARSET_INFO versions.
> 
> For example, uca1400_as_ci uses a pointer to utf8mb4_uca1400_as_ci,
> with m_type==TYPE_COLLATE_CONTEXTUALLY_TYPED indicating that
> it the charset part of the collation is not really known yet and is a
> subject to further resolution.

I see. So it'll be possible that

  m_type == TYPE_COLLATE_CONTEXTUALLY_TYPED &&
  m_ci != &my_collation_contextually_typed_default

But what about the opposite situation?
If m_ci == &my_collation_contextually_typed_default, then
m_type must be TYPE_COLLATE_CONTEXTUALLY_TYPED, correct?
I mean, it looks that this method will always return the same as yours:

  bool is_contextually_typed_collate_default() const
  {
    return m_ci == &my_collation_contextually_typed_default;
  }

as far as I understand.

> >> +  }
> >> +  bool is_contextually_typed_binary_style() const
> >> +  {
> >> +    return m_ci == &my_collation_contextually_typed_binary &&
> >> +           m_type == TYPE_COLLATE_CONTEXTUALLY_TYPED;

same here

> >> +  }
> > ...
> >> diff --git a/strings/ctype-bin.c b/strings/ctype-bin.c
> >> index bb746ad90b0..a5c9129ebea 100644
> >> --- a/strings/ctype-bin.c
> >> +++ b/strings/ctype-bin.c
> >> @@ -630,3 +630,69 @@ struct charset_info_st my_charset_bin =
> >>       &my_charset_handler,
> >>       &my_collation_binary_handler
> >>   };
> >> +
> >> +
> >> +struct charset_info_st my_collation_contextually_typed_binary=
> >> +{
> >> +    0,0,0,                        /* number        */
> > 
> > I thought you will never need to resove the
> > &my_collation_contextually_typed_binary pointer, that is, I thought the
> > values in the my_collation_contextually_typed_binary struct should not
> > matter as they'll never be used. Was that wrong?
> 
> Correct, I will never need to resolve.
> I just thought that using an initialized memory over a non-initialized 
> memory is better.
> 
> So I'd like to avoid non-initialized variables like this:
> 
> struct charset_info_st my_collation_contextually_typed_binary;
> 
> I agree that full detailed initializing, similar to my_charset_bin,
> is not really needed for these two contextual variables.
> So something like this should be enough:
> 
> struct charset_info_st my_collation_contextually_typed_binary= {0};
> 
> Should I fix this way?

I've just realized that it'd be even better to have them as pointers,
like

  struct charset_info_st *my_collation_contextually_typed_binary=
    (struct charset_info_st *)1;

  struct charset_info_st *my_collation_contextually_typed_default=
    (struct charset_info_st *)2;

you'll have unique values to compare with, but any attempt to resolve
them will crash, as some kind of a built-in assert for an unresolvable
pointer.

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


References