maria-developers team mailing list archive

Thread
Date

Review #3 for: MDEV-26278 Elimination: consider GROUP BY as unique key

To: Oleg Smirnov <oleg.smirnov@xxxxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Sun, 12 Jun 2022 18:18:48 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx

Hi Oleg,

Please fix compile errors on this platform:
https://buildbot.mariadb.org/#/builders/146/builds/13126/steps/5/logs/stdio

also please apply the attached diff that fixes the coding style and wording
of comments (input and/or adjustments are welcome).

After this, the patch is ok to push. 

The next steps are to create a branch with name like 
preview-10.10-mdev26278-table-elimination and to officially submit it for
preview tree build and testing.

> commit 7275bf4939248959b5017314d6b0393296831c0c (HEAD, origin/bb-10.9-MDEV-26278-look)
> Author: Oleg Smirnov <olernov@xxxxxxxxx>
> Date:   Thu Feb 17 22:53:37 2022 +0700
> 
>     MDEV-26278 Add functionality to eliminate derived tables from the query
>     
>     Elimination of unnecessary tables from SQL queries is already present
>     in MariaDB. But it only works for regular tables and not for derived ones.
>     
>     Imagine we have a view:
>       CREATE VIEW v1 AS SELECT a, b, max(c) AS maxc FROM t1 GROUP BY a, b
>     
>     Due to "GROUP BY a, b" the values of combinations {a, b} are unique,
>     and this fact can be treated as like derived table "v1" has a unique key
>     on fields {a, b}.
>     
>     Suppose we have a SQL query:
>       SELECT t2.* FROM t2 LEFT JOIN v1 ON t2.a=v1.a and t2.b=v1.b
>     
>     1. Since {v1.a, v1.b} is unique and both these fields are bound to t2,
>        "v1" is functionally dependent on t2.
>        This means every record of "t2" will be either joined with
>        a single record of "v1" or NULL-complemented.
>     2. No fields of "v1" are present on the SELECT list
>     
>     These two facts allow the server to completely exclude (eliminate)
>     the derived table "v1" from the query.
 
BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net

diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc
index ab8bf07868b..587ff800fb8 100644
--- a/sql/opt_table_elimination.cc
+++ b/sql/opt_table_elimination.cc
@@ -1675,8 +1675,8 @@ Dep_analysis_context::create_table_value(TABLE_LIST *table_list)
     if (key->flags & HA_NOSAME)
     {
       Dep_module_key *key_dep;
-      if (!(key_dep=
-                new Dep_module_key(tbl_dep, i, key->user_defined_key_parts)))
+      if (!(key_dep= new Dep_module_key(tbl_dep, i,
+                                        key->user_defined_key_parts)))
         return NULL;
       *key_list= key_dep;
       key_list= &(key_dep->next_table_key);
@@ -1689,8 +1689,16 @@ Dep_analysis_context::create_table_value(TABLE_LIST *table_list)
 
 
 /*
-  Create unique pseudo-key and add it as a dependency
-  if the query is not a UNION (ALL) and has GROUP BY expression
+  @brief
+    Check if we can create a unique pseudo-key for the passed table.
+    If we can, create a dependency for it
+
+  @detail
+    Currently, pseudo-key is created for the list of GROUP BY columns.
+
+    TODO: also it can be created if the query uses
+     - SELECT DISTINCT
+     - UNION DISTINCT (not UNION ALL)
 */
 
 void Dep_analysis_context::create_unique_pseudo_key_if_needed(
@@ -1716,10 +1724,6 @@ void Dep_analysis_context::create_unique_pseudo_key_if_needed(
   */
   if (first_select && first_select->group_list.elements > 0)
   {
-    /*
-      Make sure GROUP BY elements contain only fields
-     and no functions or other expressions
-    */
     bool valid= true;
     std::set<field_index_t> exposed_fields_indexes;
     for (auto cur_group= first_select->group_list.first;
@@ -1727,6 +1731,10 @@ void Dep_analysis_context::create_unique_pseudo_key_if_needed(
          cur_group= cur_group->next)
     {
       auto elem= *(cur_group->item);
+      /*
+        Make sure GROUP BY elements contain only fields
+        and no functions or other expressions
+      */
       if (elem->type() != Item::FIELD_ITEM)
       {
         valid= false;
@@ -1736,30 +1744,29 @@ void Dep_analysis_context::create_unique_pseudo_key_if_needed(
       if (field_idx == -1)
       {
         /*
-          This is not a invalid case.
-          We can construct a unique pseudo-key only in case when all field from
-          the GROUP BY list are present on the SELECT list. For example,
-          "SELECT a FROM t1 GROUP by a, b":
-          values of "a" are not unique, only a combination of "a" and "b"
-          is unique. So both fields must be present in the SELECT list
+          This GROUP BY element is not present in the select list. This is a
+          case like this:
+             (SELECT a FROM t1 GROUP by a,b) as TBL
+          Here, the combination of (a,b) is unique, but the select doesn't
+          include "b". "a" alone is not unique, so TBL doesn't have a unique
+          pseudo-key.
         */
         valid= false;
         break;
       }
-
       exposed_fields_indexes.insert(field_idx);
-
     }
     if (valid)
     {
       Dep_module_pseudo_key *pseudo_key;
-      pseudo_key= new Dep_module_pseudo_key(
-                tbl_dep, std::move(exposed_fields_indexes));
+      pseudo_key= new Dep_module_pseudo_key(tbl_dep, 
+                                            std::move(exposed_fields_indexes));
       tbl_dep->pseudo_key= pseudo_key;
     }
   }
 }
 
+
 /*
   Iterate the list of fields and look for the given field.
   Returns the index of the field if it is found on the list
@@ -1771,16 +1778,13 @@ int Dep_analysis_context::find_field_in_list(List<Item> &fields_list,
 {
   List_iterator<Item> it(fields_list);
   int field_idx= 0;
-  while (auto next_field= it.peek())
+  while (auto next_field= it++)
   {
     if (next_field->eq(field, false))
-    {
       return field_idx;
-    }
     field_idx++;
-    it++;
   }
-  return -1 /*not found*/;
+  return -1; /*not found*/
 }
 
 
@@ -1941,8 +1945,7 @@ Dep_module_pseudo_key::get_next_unbound_value(Dep_analysis_context *dac,
 
 
 /*
-  Determine whether the field_index is present in the previously composed
-  array of derived_table_field_indexes
+  Check if column number field_index is covered by the pseudo-key.
 */
 
 bool Dep_module_pseudo_key::covers_field(int field_index)

Follow ups

Re: Review #3 for: MDEV-26278 Elimination: consider GROUP BY as unique key
From: Oleg Smirnov, 2022-06-13