← Back to team overview

maria-developers team mailing list archive

Re: Review request MDEV-6274 Collation usage statistics (for feedback plugin)

 

Hi, Alexander!

I don't particularly like the idea of the COLLATIONPROPERTY function,
The standard way of getting object metadata is to query information
schema. Introducing "property" function has long term implications. What
would be the way to get sysvar data?

  select VARIABLE_VALUE from I_S.GLOBAL_VARIABLES where VARIABLE_NAME=xxx

or

  select VARIABLEPROPERTY('xxx', 'Value')

and the help text?

  select VARIABLEPROPERTY('xxx', 'Help')

Storage engine of a table:

  select TABLEPROPERTY('xxx', 'Engine')

See - you're introducing a new way to access object metadata (and, may
be, data too - "property" is kind of ambigous). And what might look like
making sense now in 10.0 for collations, isn't necessarily reasonable
long term.

As far as collations are concerned, I would rather see this data
in the information_schema. In the new table or in the I_S.COLLATIONS -
I don't know what's better, see also MDEV-6138.

> === modified file 'include/my_sys.h'
> --- include/my_sys.h	2014-04-15 07:29:57 +0000
> +++ include/my_sys.h	2014-05-28 14:01:40 +0000
> @@ -242,6 +242,11 @@ extern MYSQL_PLUGIN_IMPORT CHARSET_INFO
>  extern MYSQL_PLUGIN_IMPORT CHARSET_INFO *all_charsets[MY_ALL_CHARSETS_SIZE];
>  extern struct charset_info_st compiled_charsets[];
>  
> +/* Collation properties and use statistics */
> +extern my_bool my_collation_is_known_id(uint id);
> +extern ulonglong my_collation_statistics_get_use_count(uint id);
> +extern const char *my_collation_get_tailoring(uint id);
> +
>  /* statistics */
>  extern ulong	my_file_opened,my_stream_opened, my_tmp_file_created;
>  extern ulong    my_file_total_opened;
> 
> === modified file 'mysys/charset.c'
> --- mysys/charset.c	2013-11-08 22:20:07 +0000
> +++ mysys/charset.c	2014-05-29 05:20:11 +0000
> @@ -483,6 +483,50 @@ void add_compiled_collation(struct chars
>  static my_pthread_once_t charsets_initialized= MY_PTHREAD_ONCE_INIT;
>  static my_pthread_once_t charsets_template= MY_PTHREAD_ONCE_INIT;
>  
> +typedef struct
> +{
> +  ulonglong use_count;
> +} MY_COLLATION_STATISTICS;
> +
> +
> +static MY_COLLATION_STATISTICS my_collation_statistics[MY_ALL_CHARSETS_SIZE];
> +
> +
> +my_bool my_collation_is_known_id(uint id)
> +{
> +  return id > 0 && id < array_elements(all_charsets) && all_charsets[id] ?
> +         TRUE : FALSE;
> +}

Hmm. I see this function is mostly used in asserts, which is great.
But you also use it in COLLATIONPROPERTY function, and I'm not sure it's
correct. One can specify an id which is not loaded yet.
COLLATIONPROPERTY should not fail because of that.

I believe this function should be strictly for asserts, perhaps, even, as

  void assert_collation_id_is_known(uint id)
  {
    DBUG_ASSERT(id > 0 && id < array_elements(all_charsets) && all_charsets[id]);
  }

> +/*
> +  Collation use statistics functions do not lock
> +  counters to avoid mutex contention. This can lose
> +  some counter increments with high thread concurrency.
> +  But this should be Ok, as we don't need exact numbers.
> +*/
> +static inline void my_collation_statistics_inc_use_count(uint id)
> +{
> +  DBUG_ASSERT(my_collation_is_known_id(id));
> +  my_collation_statistics[id].use_count++;
> +}
> +
> +
> +ulonglong my_collation_statistics_get_use_count(uint id)
> +{
> +  DBUG_ASSERT(my_collation_is_known_id(id));
> +  return my_collation_statistics[id].use_count;

Alternatively, you can define it to return 0 for unknown ids.
Or -1. Would be a bit easier to use in fill_collation_statistics()

> +}
> +
> +
> === modified file 'plugin/feedback/utils.cc'
> --- plugin/feedback/utils.cc	2013-11-20 11:05:39 +0000
> +++ plugin/feedback/utils.cc	2014-05-28 10:34:11 +0000
> @@ -383,6 +383,25 @@ int fill_misc_data(THD *thd, TABLE_LIST
>    return 0;
>  }
>  
> +int fill_collation_statistics(THD *thd, TABLE_LIST *tables)
> +{
> +  TABLE *table= tables->table;
> +  for (uint id= 1; id < MY_ALL_CHARSETS_SIZE; id++)
> +  {
> +    ulonglong count;
> +    if (my_collation_is_known_id(id) &&
> +        (count= my_collation_statistics_get_use_count(id)))
> +    {
> +      char name[MY_CS_NAME_SIZE + 32];
> +      size_t namelen= my_snprintf(name, sizeof(name),
> +                                 "Collation_usecount %s",

Let's rather call it "Collation %s used", to be in line with plugin rows
in the feedback table. Having simply "%s used" would've been even more
consistent, but we don't want to have plugins and collations in the same
namespace. I wish I had plugins prefixed with the "Plugin", but it's too
late to change.

> +                                 get_charset_name(id));
> +      INSERT2(name, namelen, (count, UNSIGNED));
> +    }
> +  }
> +  return 0;
> +};

Nothing to comment about new added function - the code is
straightforward. Two only issues to discuss about it - whether we want
it at all. And, if yes, whether arguments should be strings or, perhaps,
the second should be a keyword. Or, may be, the first one too.

Regards,
Sergei


Follow ups

References