maria-developers team mailing list archive

Thread
Date
Review for: b4696ee97e684 : MDEV-15777: Support Early NULLs filtering-like restrictions in the range optimizer

To: Varun <varunraiko1803@xxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Wed, 22 May 2019 19:14:24 +0300
Cc: commits@xxxxxxxxxxx, maria-developers@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.9.4 (2018-02-28)
Hi Varun,

Please find review input below. Only cosmetic changes are requested.

I also assume that after this is pushed
- we add a optimizer trace printout into 10.4
- the feature is enabled by default in 10.5
?  (do we need separate MDEVs for this?)

One final question is the choice of name for @@optimizer_switch flag. Let me get back to
you on this.

> commit b4696ee97e6846ba49ef203cfc7189f50c1e53a7
> Author: Varun Gupta <varun.gupta@xxxxxxxxxxx>
> Date:   Mon May 20 15:14:30 2019 +0530
> 
>     MDEV-15777: Support Early NULLs filtering-like restrictions in the range optimizer
>     
>     For eqjoin conditions, we add a NOT NULL predicate so as to allow the range optimizer
>     to use the predicate and possibly create a range access on the given table.
>     
>     Example:
>       select * from t1,t2 where t1.a=t2.a;  we have KEY(a) on t1
>       we would inject a NOT NULL predicate t1.a IS NOT NULL for table t1
>       this would allow the range optimizer to create ranges and we get a
>       range access on table t1, then we will be able
>       to do an early NULL filtering for ranges too.
> 
> diff --git a/mysql-test/main/range.result b/mysql-test/main/range.result
> index 32e0cf2868c..d75d6734277 100644
> --- a/mysql-test/main/range.result
> +++ b/mysql-test/main/range.result
> @@ -3024,3 +3024,151 @@ drop table t1;
>  #
>  # End of 10.2 tests
>  #
> +#
> +# MDEV-15777: Support Early NULLs filtering-like restrictions in the range optimizer
> +#
> +set @save_optimizer_switch= @@optimizer_switch;
> +set @@optimizer_switch='null_rejecting_for_ranges=on';
> +create table ten(a int);
> +insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
> +create table one_k(a int);
> +insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

Did we abandon the convention of naming the tables t1,t2,t3,... ?
If not, please rename.

> +create table t1 (
> +id int(10) unsigned NOT NULL,
> +subset_id int(11) DEFAULT NULL,
> +PRIMARY KEY (id),
> +KEY t1_subset_id (subset_id));
...

> diff --git a/sql/opt_range.cc b/sql/opt_range.cc
> index ec7b3dbbd7a..cb7800d1ba6 100644
> --- a/sql/opt_range.cc
> +++ b/sql/opt_range.cc
> @@ -2539,6 +2546,42 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
>      TRP_GROUP_MIN_MAX *group_trp;
>      double best_read_time= read_time;
>  
> +    /*
> +      For the range optimizer, null_rejecting_conds are just an extra condition
> +      on which one can use index to create ranges.
> +
> +      Also here we try to allow consider range access for a column involed in a
> +      null rejecting predicate only if it is the first keypart of an index.
> +      Let try to make it clear with examples:
> +
> +      Query 1
> +        select * from t1,t2 where t1.a=2 and t1.b=t2.b;
> +        and we have a key(a,b)
> +        So lets consider range access for table t1:
> +        Null rejecting predicate : t1.b IS NOT NULL
> +        but the column t1.b is NOT the first keypart in the index,
> +        so we don't consider it while building ranges
> +
> +      Query 2
> +        select * from t1,t2 where t1.a=t2.a and t1.b=t2.b;
> +        and we have a key (a,b)
> +
> +        So lets consider range access for table t1:
> +        Null rejecting predicate : t1.a IS NOT NULL AND t1.b IS NOT NULL
> +        so here t1.a is the first keypart in the index, so we consider
> +        the predicate for building the ranges.
> +
> +        The first keypart prerequisite is made as we try to reuse range
> +        estimates (because they are more accurate) at different places
> +        and we needed to make sure that correct/accurate estimates are used there.
> +    */
> +
> +    if (null_rejecting_conds)
> +      not_null_cond_tree= null_rejecting_conds->get_mm_tree(&param,
> +                                                        &null_rejecting_conds);
Please fix indentation and fit within 80 chars. ^ 
And also use curly braces as the code inside if doesn't fit on one line.

> +    if (not_null_cond_tree)
> +      remove_nonrange_trees(&param, not_null_cond_tree);
> +
>      if (cond)
>      {
>        if ((tree= cond->get_mm_tree(&param, &cond)))
> @@ -2557,6 +2600,7 @@ int SQL_SELECT::test_quick_select(THD *thd, key_map keys_to_use,
>            tree= NULL;
>        }
>      }
> +    tree= tree_and(&param, tree, not_null_cond_tree);
>  
>      /*
>        Try to construct a QUICK_GROUP_MIN_MAX_SELECT.
> @@ -4422,6 +4466,128 @@ static bool create_partition_index_description(PART_PRUNE_PARAM *ppar)
>    return FALSE;
>  }
>  
> +inline void add_cond(THD *thd, Item **e1, Item *e2)
> +{
> +  if (*e1)
> +  {
> +    if (!e2)
> +      return;
> +    Item *res;
> +    if ((res= new (thd->mem_root) Item_cond_and(thd, *e1, e2)))
> +    {
> +      res->fix_fields(thd, 0);
> +      res->update_used_tables();
> +      *e1= res;
> +    }
> +  }
> +  else
> +    *e1= e2;
> +}
> +
> +/*
> +  Create null rejecting conditions for a table, for all the equalites
> +  present in the WHERE clause of a query.
> +
> +  SYNOPSIS
> +    make_null_rejecting_conds()
> +    @param TABLE        - Keys of this table will participate in null
> +                          rejecting conditions
> +    @param keyuse_array - array that has all the equalites of the
> +                          WHERE clasuse

The above doesn't match the function's definition. Out of date comment?
> +
> +  DESCRIPTION
> +    This function creates null rejecting conditions for a table. These
> +    conditions are created to do range analysis on them , the conditions
> +    are of the form tbl.key.keypart IS NOT NULL.
> +
> +  IMPLEMENTATION
> +    Lookup in the keyuse array to check if it has equalites that belong
> +    to the given table. If yes then find out if the conditions are null
> +    rejecting and accordingly create all the condition for the keys of a
> +    given table and AND them.
> +
> +
> +  RETURN
> +    NOT NULL - Found null rejecting conditions for the given table
> +    NULL - No null rejecting conditions for the given table
The above doesn't match the function's definition. Out of date comment?

> +*/
> +
> +void make_null_rejecting_conds(THD *thd, JOIN_TAB *tab)
> +{
> +  KEY *keyinfo;
> +  Item *cond= NULL;
> +  KEYUSE* keyuse;
> +  TABLE *table= tab->table;
> +  key_map *const_keys= &tab->const_keys;
^ Does it make sense to introduce a variable if it is only used once? 

> +
> +  /*
> +    No need to add NOT NULL condition for materialized derived tables
> +    or materialized subqueries as we do not run the range optimizer
> +    on their conditions
> +  */
> +
> +  if (!optimizer_flag(thd, OPTIMIZER_SWITCH_NULL_REJECTING_FOR_RANGES))
> +    return;
> +
> +  if (tab->table->is_filled_at_execution() ||
> +      (tab->table->pos_in_table_list->derived &&
> +       tab->table->pos_in_table_list->is_materialized_derived()))
> +    return;
> +
> +  Field_map not_null_keypart_map;
> +
> +  /*
> +    The null rejecting conds added will be on the keypart of a key, so for
> +    that we need the table to atleast have a key.
> +  */
> +  if (!table->s->keys || table->null_rejecting_conds || !tab->keyuse)
> +    return;
> +
> +  for(keyuse= tab->keyuse; keyuse->table == table; keyuse++)
> +  {
Please add space, "for ".

> +    /*
> +      No null rejecting conds for a hash key or full-text keys
> +    */
> +    if (keyuse->key == MAX_KEY || keyuse->keypart == FT_KEYPART)
> +      continue;
> +    keyinfo= keyuse->table->key_info + keyuse->key;
> +    Field *field= keyinfo->key_part[keyuse->keypart].field;
> +
> +    if (not_null_keypart_map.is_set(field->field_index))
> +      continue;
> +
> +    /*
> +      No need to add null-rejecting condition if we have a
> +      keyuse element as
> +        - table.key.keypart= const
> +        - (table.key.keypart= tbl.otherfield or table.key.keypart IS NULL)
> +        - table.key.keypart IS NOT NULLABLE
> +    */
> +
> +    if (keyuse->val->const_item() ||
> +        !(keyuse->null_rejecting && field->maybe_null()) ||
> +        keyuse->optimize & KEY_OPTIMIZE_REF_OR_NULL)
> +      continue;
> +
> +    Item_field *field_item= new (thd->mem_root)Item_field(thd, field);
> +    Item* not_null_item= new (thd->mem_root)Item_func_isnotnull(thd, field_item);
> +
> +    /*
> +      adding the key to const keys as we have the condition
> +      as key.keypart IS NOT NULL
> +    */
> +
> +    const_keys->set_bit(keyuse->key);
> +    not_null_item->fix_fields(thd, 0);
^^ please change 0 to &not_null_item.

> +    not_null_item->update_used_tables();
> +    add_cond(thd, &cond, not_null_item);
> +    not_null_keypart_map.set_bit(field->field_index);
> +  }
> +  DBUG_EXECUTE("where", print_where(cond, "Early Null Filtering",
> +                                    QT_ORDINARY););
> +  table->null_rejecting_conds= cond;
> +}
> +
>  
>  #ifndef DBUG_OFF
>  
...
> diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> index 8cdcf3afc8b..140ceed5c2d 100644
> --- a/sql/sql_select.cc
> +++ b/sql/sql_select.cc
> @@ -5510,15 +5512,30 @@ add_key_field(JOIN *join,
>      If the condition has form "tbl.keypart = othertbl.field" and 
>      othertbl.field can be NULL, there will be no matches if othertbl.field 
>      has NULL value.
> -    We use null_rejecting in add_not_null_conds() to add
> -    'othertbl.field IS NOT NULL' to tab->select_cond.
> +
> +    The field KEY_FIELD::null_rejecting is set to TRUE if either
> +    the left or the right hand side of the equality is NULLABLE
> +
> +    null_rejecting is used
> +      - by add_not_null_conds(), to add "othertbl.field IS NOT NULL" to
> +        othertbl's tab->select_cond. (This is called "Early NULLs filtering")
> +
> +      - by make_null_rejecting_conds(), to provide range optimizer with
> +        additional "tbl.keypart IS NOT NULL" condition.
>    */
>    {
>      Item *real= (*value)->real_item();
> -    if (((cond->functype() == Item_func::EQ_FUNC) ||
> -         (cond->functype() == Item_func::MULT_EQUAL_FUNC)) &&
> -        (real->type() == Item::FIELD_ITEM) &&
> -        ((Item_field*)real)->field->maybe_null())
> +    if ((
> +         (cond->functype() == Item_func::EQ_FUNC) ||
> +         (cond->functype() == Item_func::MULT_EQUAL_FUNC)
> +        ) &&
> +        (
> +          (
> +            (real->type() == Item::FIELD_ITEM) &&
> +            ((Item_field*)real)->field->maybe_null()
> +          ) ||
> +          (field->maybe_null())
> +        ))
>        (*key_fields)->null_rejecting= true;
>      else
>        (*key_fields)->null_rejecting= false;
> @@ -9982,8 +9999,12 @@ static bool create_ref_for_key(JOIN *join, JOIN_TAB *j,
>        uint maybe_null= MY_TEST(keyinfo->key_part[i].null_bit);
>        j->ref.items[i]=keyuse->val;		// Save for cond removal
>        j->ref.cond_guards[i]= keyuse->cond_guard;
> -      if (keyuse->null_rejecting) 
> +      Item *real= (keyuse->val)->real_item();
> +      if (keyuse->null_rejecting &&
> +          (real->type() == Item::FIELD_ITEM) &&
> +          ((Item_field*)real)->field->maybe_null())

This code looks as if

  if keyuse->val->maybe_null() != keyuse->val->real_item()->maybe_null()

do you have any specific case in mind or this is just to match the code in
add_key_field?

>          j->ref.null_rejecting|= (key_part_map)1 << i;
> +
>        keyuse_uses_no_tables= keyuse_uses_no_tables && !keyuse->used_tables;
>        /*
>          We don't want to compute heavy expressions in EXPLAIN, an example would

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
Follow ups

Re: Review for: b4696ee97e684 : MDEV-15777: Support Early NULLs filtering-like restrictions in the range optimizer
From: varun gupta, 2019-05-23