maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07595
Re: Review request MDEV-6274 Collation usage statistics (for feedback plugin)
Hi, Alexander!
On Jul 24, Alexander Barkov wrote:
> On 07/03/2014 01:47 PM, Sergei Golubchik wrote:
> > Hi, Alexander!
> >
> > I don't particularly like the idea of the COLLATIONPROPERTY function,
> > The standard way of getting object metadata is to query information
> > schema. Introducing "property" function has long term implications. What
> > would be the way to get sysvar data?
> >
> > select VARIABLE_VALUE from I_S.GLOBAL_VARIABLES where VARIABLE_NAME=xxx
> >
> > or
> >
> > select VARIABLEPROPERTY('xxx', 'Value')
> >
> > and the help text?
> >
> > select VARIABLEPROPERTY('xxx', 'Help')
> >
> > Storage engine of a table:
> >
> > select TABLEPROPERTY('xxx', 'Engine')
> >
> > See - you're introducing a new way to access object metadata (and, may
> > be, data too - "property" is kind of ambigous). And what might look like
> > making sense now in 10.0 for collations, isn't necessarily reasonable
> > long term.
> >
> > As far as collations are concerned, I would rather see this data
> > in the information_schema. In the new table or in the I_S.COLLATIONS -
> > I don't know what's better, see also MDEV-6138.
>
> I'm thinking of adding many collation properties in the future
> (e.g. tailoring, case/accent/kana sensitivity, Unicode version,
> comparison strength, sort strength, various Unicode Collation Algorithm
> options).
>
> So I didn't want to flood the standard table
> INFORMATION_SCHEMA.COLLATIONS with all these properties.
Ah, yes, you're right. INFORMATION_SCHEMA.COLLATIONS is the standard
one. Indeed, it's better to add a new table then.
> It should be fine to add a new table. But I have some concerns:
>
> 1. I don't like mixing usage statistics and static properties into the
> same table.
>
> Should we add two new tables with about these structures:
>
> CREATE TABLE COLLATION_STATISTICS
> (
> ID BIGINT NOT NULL,
> USE_COUNT BIGINT NOT NULL
> );
>
> CREATE TABLE COLLATION_PROPERTIES
> (
> ID BIGINT NOT NULL,
> TAILORING TEXT CHARACTER SET UTF8MB4
> -- things like Unicode version, sensitivity to be added here later
> );
I'm fine with one table, actually.
> 2. To search by name, one will need a join by ID,
> e.g. using a NATURAL JOIN:
>
> SELECT TAILORING FROM
> INFORMATION_SCHEMA.COLLATIONS
> NATURAL JOIN
> INFORMATION_SCHEMA.COLLATION_PROPERTIES
> WHERE
> COLLATION_NAME='utf8_polish_ci';
>
> A little bit cumbersome. But should probably work.
Yes, ok.
> 3. Performance should be better with a function.
> With an I_S table, we'll have to populate the table every time
> with all rows, but in fact the caller in many cases may want
> only one row (e.g. searching by name or ID).
> But performance is rather not too much important here.
There's condition pushdown for I_S tables. It's only used for
table-related I_S tables at the moment, but there's no reason why it
cannot be used for collations, if this will become an issue.
> >> === modified file 'mysys/charset.c'
> >> --- mysys/charset.c 2013-11-08 22:20:07 +0000
> >> +++ mysys/charset.c 2014-05-29 05:20:11 +0000
...
> >> +my_bool my_collation_is_known_id(uint id)
> >> +{
> >> + return id > 0 && id < array_elements(all_charsets) && all_charsets[id] ?
> >> + TRUE : FALSE;
> >> +}
> >
> > Hmm. I see this function is mostly used in asserts, which is great.
> > But you also use it in COLLATIONPROPERTY function, and I'm not sure it's
> > correct. One can specify an id which is not loaded yet.
> > COLLATIONPROPERTY should not fail because of that.
>
> It should not fail.
...
> So a not loaded collation should not erroneously be treated as unknown,
> if you mean this.
Yes.
> >> +/*
> >> + Collation use statistics functions do not lock
> >> + counters to avoid mutex contention. This can lose
> >> + some counter increments with high thread concurrency.
> >> + But this should be Ok, as we don't need exact numbers.
> >> +*/
> >> +static inline void my_collation_statistics_inc_use_count(uint id)
> >> +{
> >> + DBUG_ASSERT(my_collation_is_known_id(id));
> >> + my_collation_statistics[id].use_count++;
> >> +}
> >> +
> >> +
> >> +ulonglong my_collation_statistics_get_use_count(uint id)
> >> +{
> >> + DBUG_ASSERT(my_collation_is_known_id(id));
> >> + return my_collation_statistics[id].use_count;
> >
> > Alternatively, you can define it to return 0 for unknown ids.
> > Or -1. Would be a bit easier to use in fill_collation_statistics()
>
> Not sure. I think my way is more future prove.
> First, you check if a collation is known and thus can be queried.
> Second, on success you query for its particular property.
sure, that's up to you.
that was just a suggestion, based on how you use this function
in fill_collation_statistics().
> >> + "Collation_usecount %s",
> >
> > Let's rather call it "Collation %s used", to be in line with plugin
> > rows in the feedback table. Having simply "%s used" would've been
> > even more consistent, but we don't want to have plugins and
> > collations in the same namespace. I wish I had plugins prefixed with
> > the "Plugin", but it's too late to change.
>
> Collation names have different lengths, so putting
> collation names as the last part of the value name
> would bring more readable results:
>
> +---------------------------------------+----+
> | Collation used utf8_unicode_ci | 10 |
> | Collation used utf8_bin | 20 |
> | Collation used utf8mb4_unicode_520_ci | 30 |
> +---------------------------------------+----+
Fine.
Readability is not the goal, the result is not really for a human
consumption - the script will read and parse it.
But I don't see why we need to make it intentionally unreadable :)
So, "Collation used %s" is good
Regards,
Sergei
References