← Back to team overview

maria-developers team mailing list archive

Re: Support of the extended syntax for ANALYZE

 

Hi, Igor!

Here're my replies.
I've seen Monty's replies, and in many (not all) cases my reply will be
similar

On May 19, Igor Babaev wrote:
> On 05/19/2012 10:51 AM, Sergei Golubchik wrote:

>> ANALYZE TABLE t1 PERSISTENT FOR COLUMNS (c,e,d) INDEXES (idx2, idx4);
> 
> The global server option --stat-tables says now whether the statistics
> tables have to be taken into account (without this option the server
> ignores them altogether)
> 
> There is a system variable (global/session) that controls how the
> statistics table are used by queries/statements. The variable is called
> optimizer_use_stat_tables.

My first thought was - why do you need two variables, instead of one?
I see that Monty wondered about it too.

> There are 4 possible values for this variable:
> - 'never'
> - 'complementary'
> - 'preferably'
> - 'exclusively'.

Unlike Monty, I have no opinion about 'exclusively' mode. I'm fine with
it being present. I'm fine with it being absent too.

> Q1. Should we issue a warning with the attempt to reset this value
> when --stat-tables is disabled?

Agree with Monty. Having one variable solves this question.

> The variable setting is also affects how ANALYZE works.
> The current syntax of the ANALYZE command is
> 
> ANALYZE TABLE t1, ...;
> 
> An currently if @@optimizer_use_stat_tables is set to
> 1. 'never'
>    then it works as before (e.g. in 5.5)
> 2. 'complementary'
>    then it works as before + persistent statistics
>    for table_stat and column_stat is collected
> 3. 'preferably'
>    then it works as before + persistent statistics
>    for table_stat, column_stat and index_stat is collected
> 4. 'exclusively'
>    only persistent statistics for table_stat, column_stat
>    and index_stat is collected
> 
> Q2. Is this semantics of ANALYZE is natural?

I don't think I like it. I'd prefer @@optimizer_use_stat_tables to
affect *only* the optimizer. And ANALYZE is only affected by various
optional ANALYZE clauses, and not by @@optimizer_use_stat_tables.

> Q3. If I need statistics for all columns and some indexes of table t1
>     and everything on table t2.
>     can I use:
>       ANALYZE TABLE t1 COLUMN[S] ALL INDEX[ES] (idx1,...), t2; ?

Note that I intentionally avoided commas in the new syntax. It allows to
write

   ANALYZE TABLE t1 PERSISTENT FOR COLUMNS ALL INDEXES (idx1,...),
                 t2 PERSISTENT FOR COLUMN (a) INDEXES (i1, i2),
                 t3, t4 PERSISTENT;

That is PERSISTENT FOR COLUMNS ... INDEXES ... clauses apply to the
table.

> Q4. Do we really need the keyword PERSISTENT here?
>     If so what is its semantics?

The idea is to distinguish between

  ANALYZE TABLE t1;

and

  ANALYZE TABLE t1 PERSISTENT;

the first collects no persistent statistics, the second - all of it.

But also it's generally in the spirit of SQL standard. The standard
syntax is very verbose and has quite a few clauses that are redundant,
but make SQL more readable.

> Q5. Isn't 'FOR' in the syntax of ANALYZE confusing in the sense that
>     it commands to collect statistics for column_stat/index_stat,
>     but not for table_stat? Maybe it's better to omit it?

I agree in a sense that with this syntax there's no way *not* to collect
table statistics. Is it a problem? I think not. I mean - when you
collect any column/index statistics, if you collect table statistics at
the same time it will add no overhead, right? So generally users will
never want to avoid collecting it. And in rare cases when one wants to
play with query plans, table statistics can always be deleted from the
table_stat.

I agree that FOR is completely redundant. It's like "INTO" in

INSERT INTO table ...

I'm fine if you make FOR optional (just like INTO is).

> I plan to add a possibility to collect statistics from random sample
> records. I plan to control the process of sampling only through system
> variables.

Uhm. Arguable. Perhaps it's ok, or perhaps extending ANALYZE syntax
would be better. I cannot comment until I see what exactly you want to
control.

Hope that helps.

Regards,
Sergei



References