← Back to team overview

maria-developers team mailing list archive

MySQL Optimizer Trace Questions

 

Hello All,

I work for TokuTek as a Storage Engine developer and I have been using
the, relatively, new optimizer trace functionality.  I have found the
results very helpful when combined with analyzing the source code, but
there a few questions I want to ask.  Here is some output from the
trace for a recent SELECT query on an InnoDB table:

At the end of the trace, it shows the index chosen by, what I presume,
is the optimizer:

...
"considered_execution_plans":
  [
    {
      "plan_prefix": [],
      "table": "`reports` `a`",
      "best_access_path":
        {
            "considered_access_paths": [
               {
                 "access_type": "ref",
                 "index": "reports_group_id",
                 "rows": 1.55e6,
                 "cost": 411057,
                 "chosen": true
               },
               ...
             ]
         },
       "cost_for_plan": 411057,
       "rows_for_plan": 1.55e6,
       "chosen": true
     }
  ]
},
...

Here are the cost results for some rejected alternative indexes,
(note: the row count for two of them is similar to the chosen index):

...
"analyzing_range_alternatives": {
      "range_scan_alternatives"
...
          "rows": 2377102,
          "cost": 2.81e6,
...
          "rows": 1554164,
          "cost": 1.84e6,
...
          "rows": 2346664,
          "cost": 2.78e6,
...
          "rows": 1554164,
          "cost": 1.84e6,
...

Here is the cost analysis for a table scan, from the beginning of the
trace (still not as good as the above chosen index, but close on
cost):

...
"table": "`reports` `a`",
"range_analysis":
 {
   "table_scan":
    {
      "rows": 3108329,
      "cost": 655076
    },
...

Related to the fact that the table scan analysis is dominated by the
large row count, and probably rejected as a valid plan compared to the
chosen index for that very reason:

1. How is the row count more than merely informational to the
optimizer?  I had the impression the row count did not directly
contribute to the optimizer's decision as much as the estimated cost,
or that the row count was already accounted for/factored into in the
final cost measurement.
2.  Related to that question, what is cost exactly?  Is there a unit
of measurement (ex: Time, Disk Seeks, etc.)?
3.  Is cost more important than row count or equally important?
4.  What is the genesis of the cost variable displayed in the trace?
I assume it is just a guess on the respective storage engine's part of
how expensive (in terms of disk access, processing, time, etc.)
getting the rows will be for the query.

thanks in advance,
Christian


Follow ups