← Back to team overview

maria-developers team mailing list archive

Updated (by Guest): index_merge: fair choice between index_merge union and range access (24)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: index_merge: fair choice between index_merge union and range access
CREATION DATE..: Tue, 26 May 2009, 12:10
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Psergey
CATEGORY.......: Server-Sprint
TASK ID........: 24 (http://askmonty.org/worklog/?tid=24)
VERSION........: 9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Guest - Thu, 18 Jun 2009, 14:56)=-=-
Low Level Design modified.
--- /tmp/wklog.24.old.15612     2009-06-18 14:56:09.000000000 +0300
+++ /tmp/wklog.24.new.15612     2009-06-18 14:56:09.000000000 +0300
@@ -1 +1,162 @@
+<contents>
+1. Current implementation overview
+1.1. Problems in the current implementation 
+2. New implementation
+2.1 New tree_and()
+2.2 New tree_or()
+</contents>
+
+1. Current implementation overview
+==================================
+At the moment, range analyzer works as follows:
+
+SEL_TREE structure represents 
+  
+  # There are sel_trees, a sel_tree is either range or merge tree  
+  sel_tree =  range_tree | imerge_tree
+  
+  # a range tree has range access options, possibly for several keys
+  range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
+  
+  # merge tree represents several way to index_merge
+  imerge_tree = imerge1 AND imerge2 AND ...
+  
+  # a way to do index merge == a set to use of different indexes.
+  imergeX = range_tree1 OR range_tree2 OR ..
+    where no pair of range_treeX have ranges over the same index.
+
+
+  tree_and(A, B) 
+  {
+    if (both A and B are range trees)
+      return a range_tree with computed intersection for each range;
+    if (only one of A and B is a range tree)
+      return that tree;  // DISCARD-IMERGE-1
+    // at this point both trees are index_merge trees
+    return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
+  }
+
+
+  tree_or(A, B)
+  {
+    if (A and B are range trees)
+    {
+      R = new range_tree;
+      for each index i
+        R.add(range_union(A.range(i), B.range(i)));
+
+      if (R has at least one range access)
+        return R;
+      else
+      {
+        /* could not build any range accesses. construct index_merge */
+        remove non-ranges from A; // DISCARD-IMERGE-2
+        remove non-ranges from B;
+        return new index_merge(A, B);
+      }
+    }
+    else if (A is range tree and B is index_merge tree (or vice versa))
+    {
+      Perform this transformation: 
+
+      range_treeA                // this is A
+      OR
+      (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
+      (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
+      ...
+      (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND 
+      =
+      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+      (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
+      ...
+      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
+
+      Now each line represents an index_merge..
+    }
+    else if (both A and B are index_merge trees)
+    {
+      Perform this transformation:
+      
+      imergeA1 AND imergeA2 AND ... AND imergeAN
+      OR 
+      imergeB1 AND imergeB2 AND ... AND imergeBN 
+
+      -> (discard all imergeA{i=2,3,...} ->           // DISCARD-IMERGE-3
+
+      imergeA1 
+      OR 
+      imergeB1 AND imergeB2 AND ... AND imergeBN  =
+
+      = (combine imergeA1 with each of the imergeB{i} ) = 
+      
+      combine(imergeA1 OR imergeB1) AND 
+      combine(imergeA1 OR imergeB2) AND 
+      ...                           AND 
+      combine(imergeA1 OR imergeBN)
+    }
+  }
+ 
+1.1. Problems in the current implementation 
+-------------------------------------------
+As marked in the code above:
+
+DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
+the WHERE clause has this form:
+
+ (t.key1=c1 OR t.key2=c2) AND t.badkey < c3
+
+DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
+the WHERE clause has this form (conditions t.badkey may have abritrary form):
+
+ (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)
+
+DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are 
+two indexes:
+
+  INDEX i1(col1, col2), 
+  INDEX i2(col1, col3)
+
+and this WHERE clause:
+  
+  col1=c1 AND (col2=c2 OR col3=c3)
+
+The optimizer will generate the plans that only use the "col1=c1" part. The
+right side of the AND will be ignored even if it has good selectivity.
+
+
+2. New implementation
+=====================
+
+<general idea>
+* Don't start fighting combinatorial explosion until we've actually got one.
+</>
+
+SEL_TREE structure will be now able to hold both index_merge and range scan
+candidates at the same time. That is, 
+
+  sel_tree2 = range_tree AND imerge_tree
+
+where both parts are optional (i.e. can be empty)
+
+Operations on SEL_ARG trees will be modified to produce/process the trees of
+this kind:
+
+2.1 New tree_and()
+------------------
+In order not to lose plans, we'll make these changes:
+
+1. Don't remove index_merge part of the tree.
+
+2. Push range conditions down into index_merge trees that may support them.
+   if one tree has range(key1) and the other tree has imerge(key1 OR key2)
+   then perform an equvalent of this operation:
+
+     rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =
+
+     (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))
+
+3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
+   concatenate them together.
+
+2.2 New tree_or()
 

-=-=(Guest - Sat, 13 Jun 2009, 06:29)=-=-
Category updated.
--- /tmp/wklog.24.old.25753     2009-06-13 06:29:10.000000000 +0300
+++ /tmp/wklog.24.new.25753     2009-06-13 06:29:10.000000000 +0300
@@ -1 +1 @@
-Server-BackLog
+Server-Sprint

-=-=(Guest - Sat, 13 Jun 2009, 06:14)=-=-
Category updated.
--- /tmp/wklog.24.old.24991     2009-06-13 06:14:03.000000000 +0300
+++ /tmp/wklog.24.new.24991     2009-06-13 06:14:03.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-BackLog

-=-=(Psergey - Wed, 03 Jun 2009, 12:09)=-=-
Dependency created: 30 now depends on 24

-=-=(Guest - Mon, 01 Jun 2009, 23:30)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.21580     2009-06-01 23:30:06.000000000 +0300
+++ /tmp/wklog.24.new.21580     2009-06-01 23:30:06.000000000 +0300
@@ -64,6 +64,9 @@
 
 * How strict is the limitation on the form of the WHERE?
 
+* Which version should this be based on? 5.1? Which patches are should be in 
+  (google's/percona's/maria/etc?) 
+
 * TODO: The optimizer didn't compare costs of index_merge and range before (ok 
   it did but that was done for accesses to different tables). Will there be any
   possible gotchas here?

-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Category updated.
--- /tmp/wklog.24.old.8414      2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414      2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Client-BackLog
+Server-RawIdeaBin

-=-=(Guest - Wed, 27 May 2009, 14:41)=-=-
Version updated.
--- /tmp/wklog.24.old.8414      2009-05-27 14:41:43.000000000 +0300
+++ /tmp/wklog.24.new.8414      2009-05-27 14:41:43.000000000 +0300
@@ -1 +1 @@
-Server-9.x
+9.x

-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Title modified.
--- /tmp/wklog.24.old.9498      2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498      2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-index_merge optimizer: dont discard index_merge union strategies when range is available
+index_merge: fair choice between index_merge union and range access

-=-=(Guest - Wed, 27 May 2009, 13:59)=-=-
Version updated.
--- /tmp/wklog.24.old.9498      2009-05-27 13:59:23.000000000 +0300
+++ /tmp/wklog.24.new.9498      2009-05-27 13:59:23.000000000 +0300
@@ -1 +1 @@
-Benchmarks-3.0
+Server-9.x

-=-=(Guest - Tue, 26 May 2009, 13:27)=-=-
High-Level Specification modified.
--- /tmp/wklog.24.old.305       2009-05-26 13:27:32.000000000 +0300
+++ /tmp/wklog.24.new.305       2009-05-26 13:27:32.000000000 +0300
@@ -1 +1,70 @@
+(Not a ready HLS but draft)
+<contents>
+Solution overview
+Limitations
+TODO
+
+</contents>
+
+Solution overview
+=================
+The idea is to delay discarding potential index_merge plans until the point 
+where it is really necessary. 
+
+This way, we won't have to do much changes in the range analyzer, but will be
+able to keep potential index_merge plan just enough so that it's possible to 
+take it into consideration together with range access plans. 
+
+Since there are no changes in the optimizer, the ability to consider both
+range and index_merge options will be limited to WHERE clauses of this form:
+
+ WHERE := range_cond(key1_1) AND 
+          range_cond(key2_1) AND
+          other_cond       AND 
+          index_merge_OR_cond1(key3_1, key3_2, ...)
+          index_merge_OR_cond2(key4_1, key4_2, ...)
+  
+where 
+  
+ index_merge_OR_cond{N} := (range_cond(keyN_1) OR 
+                            range_cond(keyN_2) OR ...)
+
+
+ range_cond(keyX) := condition that allows to construct range access of keyX
+                     and doesn't allow to construct range/index_merge accesses
+                     for any keys of the table in question. 
+
+
+For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:
+
+  SEL_TREE(
+    range(key1_1),
+    ...
+    range(key2_1),
+    SEL_IMERGE(                         (1)
+      SEL_TREE(key3_1})
+      SEL_TREE(key3_2})
+      ...
+    )
+    ...
+  )
+
+which can be used to make a cost-based choice between range and index_merge.
+
+Limitations
+-----------
+This will not be a full solution in a sense that the range analyzer will not
+be able to produce sel_tree (1) if the WHERE clause is specified in other form
+(e.g. brackets were opened).
+
+TODO
+----
+* is it a problem if there are keys that are referred to both from
+  index_merge and from range access?
+
+* How strict is the limitation on the form of the WHERE?
+
+* TODO: The optimizer didn't compare costs of index_merge and range before (ok 
+  it did but that was done for accesses to different tables). Will there be any
+  possible gotchas here?
 



DESCRIPTION:

Current range optimizer will discard possible index_merge/[sort]union 
strategies when there is a possible range plan. This action is a part of
measures we take to avoid combinatorial explosion of possible range/
index_merge strategies.

A bad side effect of this is that for WHERE clauses  in form 

t.key1= 'very-frequent-value' AND (t.key2='rare-value1' OR t.key3='rare-value2')

the optimizer will 
 - discard union(key2,key3) in favor of range(key1)
 - consider costs of using range(key1) and discard that plan also
and the overall effect is that possible poor range access will cause possible
good index_merge access not to be considered.

This WL is to about lifting this limitation at least for some subset of WHERE
clauses.


HIGH-LEVEL SPECIFICATION:



(Not a ready HLS but draft)
<contents>
Solution overview
Limitations
TODO

</contents>

Solution overview
=================
The idea is to delay discarding potential index_merge plans until the point 
where it is really necessary. 

This way, we won't have to do much changes in the range analyzer, but will be
able to keep potential index_merge plan just enough so that it's possible to 
take it into consideration together with range access plans. 

Since there are no changes in the optimizer, the ability to consider both
range and index_merge options will be limited to WHERE clauses of this form:

 WHERE := range_cond(key1_1) AND 
          range_cond(key2_1) AND
          other_cond       AND 
          index_merge_OR_cond1(key3_1, key3_2, ...)
          index_merge_OR_cond2(key4_1, key4_2, ...)
  
where 
  
 index_merge_OR_cond{N} := (range_cond(keyN_1) OR 
                            range_cond(keyN_2) OR ...)


 range_cond(keyX) := condition that allows to construct range access of keyX
                     and doesn't allow to construct range/index_merge accesses
                     for any keys of the table in question. 


For such WHERE clauses, the range analyzer will produce SEL_TREE of this form:

  SEL_TREE(
    range(key1_1),
    ...
    range(key2_1),
    SEL_IMERGE(                         (1)
      SEL_TREE(key3_1})
      SEL_TREE(key3_2})
      ...
    )
    ...
  )

which can be used to make a cost-based choice between range and index_merge.

Limitations
-----------
This will not be a full solution in a sense that the range analyzer will not
be able to produce sel_tree (1) if the WHERE clause is specified in other form
(e.g. brackets were opened).

TODO
----
* is it a problem if there are keys that are referred to both from
  index_merge and from range access?

* How strict is the limitation on the form of the WHERE?

* Which version should this be based on? 5.1? Which patches are should be in 
  (google's/percona's/maria/etc?) 

* TODO: The optimizer didn't compare costs of index_merge and range before (ok 
  it did but that was done for accesses to different tables). Will there be any
  possible gotchas here?


LOW-LEVEL DESIGN:



<contents>
1. Current implementation overview
1.1. Problems in the current implementation 
2. New implementation
2.1 New tree_and()
2.2 New tree_or()
</contents>

1. Current implementation overview
==================================
At the moment, range analyzer works as follows:

SEL_TREE structure represents 
  
  # There are sel_trees, a sel_tree is either range or merge tree  
  sel_tree =  range_tree | imerge_tree
  
  # a range tree has range access options, possibly for several keys
  range_tree = range(key1) AND range(key2) AND ... AND range(keyN);
  
  # merge tree represents several way to index_merge
  imerge_tree = imerge1 AND imerge2 AND ...
  
  # a way to do index merge == a set to use of different indexes.
  imergeX = range_tree1 OR range_tree2 OR ..
    where no pair of range_treeX have ranges over the same index.


  tree_and(A, B) 
  {
    if (both A and B are range trees)
      return a range_tree with computed intersection for each range;
    if (only one of A and B is a range tree)
      return that tree;  // DISCARD-IMERGE-1
    // at this point both trees are index_merge trees
    return concat_lists( A.imerge1 ... A.imergeN, B.imerge1 ... B.imergeN);
  }


  tree_or(A, B)
  {
    if (A and B are range trees)
    {
      R = new range_tree;
      for each index i
        R.add(range_union(A.range(i), B.range(i)));

      if (R has at least one range access)
        return R;
      else
      {
        /* could not build any range accesses. construct index_merge */
        remove non-ranges from A; // DISCARD-IMERGE-2
        remove non-ranges from B;
        return new index_merge(A, B);
      }
    }
    else if (A is range tree and B is index_merge tree (or vice versa))
    {
      Perform this transformation: 

      range_treeA                // this is A
      OR
      (range_treeB_11 OR range_treeB_12 OR ... OR range_treeB_1N) AND
      (range_treeB_21 OR range_treeB_22 OR ... OR range_treeB_2N) AND
      ...
      (range_treeB_K1 OR range_treeB_K2 OR ... OR range_treeB_kN) AND 
      =
      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND
      (range_treeA OR range_treeB_21 OR ... OR range_treeB_2N) AND
      ...
      (range_treeA OR range_treeB_11 OR ... OR range_treeB_1N) AND

      Now each line represents an index_merge..
    }
    else if (both A and B are index_merge trees)
    {
      Perform this transformation:
      
      imergeA1 AND imergeA2 AND ... AND imergeAN
      OR 
      imergeB1 AND imergeB2 AND ... AND imergeBN 

      -> (discard all imergeA{i=2,3,...} ->           // DISCARD-IMERGE-3

      imergeA1 
      OR 
      imergeB1 AND imergeB2 AND ... AND imergeBN  =

      = (combine imergeA1 with each of the imergeB{i} ) = 
      
      combine(imergeA1 OR imergeB1) AND 
      combine(imergeA1 OR imergeB2) AND 
      ...                           AND 
      combine(imergeA1 OR imergeBN)
    }
  }
 
1.1. Problems in the current implementation 
-------------------------------------------
As marked in the code above:

DISCARD-IMERGE-1 step will cause index_merge option to be discarded when
the WHERE clause has this form:

 (t.key1=c1 OR t.key2=c2) AND t.badkey < c3

DISCARD-IMERGE-2 step will cause index_merge option to be discarded when
the WHERE clause has this form (conditions t.badkey may have abritrary form):

 (t.badkey<c1 AND t.key1=c1) OR (t.key1=c2 AND t.badkey < c2)

DISCARD-IMERGE-3 manifests itself as the following effect: suppose there are 
two indexes:

  INDEX i1(col1, col2), 
  INDEX i2(col1, col3)

and this WHERE clause:
  
  col1=c1 AND (col2=c2 OR col3=c3)

The optimizer will generate the plans that only use the "col1=c1" part. The
right side of the AND will be ignored even if it has good selectivity.


2. New implementation
=====================

<general idea>
* Don't start fighting combinatorial explosion until we've actually got one.
</>

SEL_TREE structure will be now able to hold both index_merge and range scan
candidates at the same time. That is, 

  sel_tree2 = range_tree AND imerge_tree

where both parts are optional (i.e. can be empty)

Operations on SEL_ARG trees will be modified to produce/process the trees of
this kind:

2.1 New tree_and()
------------------
In order not to lose plans, we'll make these changes:

1. Don't remove index_merge part of the tree.

2. Push range conditions down into index_merge trees that may support them.
   if one tree has range(key1) and the other tree has imerge(key1 OR key2)
   then perform an equvalent of this operation:

     rangeA(key1) AND ( rangeB(key1) OR rangeB(key2)) =

     (rangeA(key1) AND rangeB(key1)) OR (rangeA(key1) AND rangeB(key2))

3. Just as before: if both sel_tree A and sel_tree B have index_merge options,
   concatenate them together.

2.2 New tree_or()


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)