maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #11343
Re: f3dda8e: MDEV-15357 Wrong result with join_cache_level=4, BNLH join
Hi, Igor!
On Apr 30, Igor Babaev wrote:
> revision-id: f3dda8e2002dc2ad630c07bda0d8c00be7269907 (mariadb-10.2.14-72-gf3dda8e)
> parent(s): b4c5e4a717e3ce2c2d434106cc74417fe9a1d3dc
> author: Igor Babaev
> committer: Igor Babaev
> timestamp: 2018-04-30 21:26:38 -0700
> message:
>
> MDEV-15357 Wrong result with join_cache_level=4, BNLH join
>
> This bug was introduced by the architectural changes of the patch
> for MDEV-11640. The patch moved initialization of join caches after
> the call of make_join_readinfo(). As any failure to initialize
> a join cache caused denial of its usage the execution code for
> the query had to be revised. This revision required rolling back many
> actions taken by make_join_readinfo(). It was not done in the patch.
> As a result if a denial of join cache happened some join conditions
> could be lost. This was exactly the cause of wrong results in the
> bug's reported test case.
>
> Thus the introduced architectural change is not valid and it would be
> better to roll it back. At the same time two new methods
> adjust_read_set_for_vcol_keyread() and get_covering_index_for_scan()
> were added to the class JOIN_TAB to resolve the problems of MDEV-11640.
>
> diff --git a/sql/sql_join_cache.cc b/sql/sql_join_cache.cc
> index 3612cb6..242cb91 100644
> --- a/sql/sql_join_cache.cc
> +++ b/sql/sql_join_cache.cc
> @@ -226,6 +226,16 @@ void JOIN_CACHE::calc_record_fields()
> flag_fields+= MY_TEST(tab->table->maybe_null);
> fields+= tab->used_fields;
> blobs+= tab->used_blobs;
> + if (tab->type == JT_ALL || tab->type == JT_HASH)
> + {
> + uint idx= tab->get_covering_index_for_scan();
as you moved JOIN_CACHE initialization back, I suspect that
this calc_record_fields() now happens too early for
get_covering_index_for_scan() to be useful.
And why only JT_ALL and JT_HASH?
> + if (idx != MAX_KEY)
> + {
> + fields-= bitmap_bits_set(tab->table->read_set);
> + tab->adjust_read_set_for_vcol_keyread(idx);
> + fields+= bitmap_bits_set(tab->table->read_set);
> + }
> + }
> }
> if ((with_match_flag= join_tab->use_match_flag()))
> flag_fields++;
> diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> index 30ef26f..e5b75ec 100644
> --- a/sql/sql_select.cc
> +++ b/sql/sql_select.cc
> @@ -11404,6 +11375,61 @@ void JOIN_TAB::remove_redundant_bnl_scan_conds()
> }
>
>
> +uint JOIN_TAB::get_covering_index_for_scan()
> +{
> + int idx= MAX_KEY;
> + if (table->no_keyread)
> + return idx;
> + if (select && select->quick &&
> + select->quick->index != MAX_KEY && //not index_merge
> + table->covering_keys.is_set(select->quick->index))
> + idx= select->quick->index;
> + else if (!table->covering_keys.is_clear_all() &&
> + !(select && select->quick))
> + { // Only read index tree
> + if (loosescan_match_tab)
> + idx= loosescan_key;
> + else
> + {
> +#ifdef BAD_OPTIMIZATION
> + /*
> + It has turned out that the below change, while speeding things
> + up for disk-bound loads, slows them down for cases when the data
> + is in disk cache (see BUG#35850):
> + See bug #26447: "Using the clustered index for a table scan
> + is always faster than using a secondary index".
> + */
> + if (table->s->primary_key != MAX_KEY &&
> + table->file->primary_key_is_clustered())
> + idx= table->s->primary_key;
> + else
> +#endif
> + idx= find_shortest_key(table, & table->covering_keys);
> + }
> + }
> + return idx;
> +}
> +
> +
> +void JOIN_TAB::adjust_read_set_for_vcol_keyread(uint keyread_idx)
> +{
> + if (!table->vcol_set || bitmap_is_clear_all(table->vcol_set))
> + return;
> +
> + MY_BITMAP *keyread_set= &table->cond_set; // is free for use at call
normally one uses tmp_set for this, not hijacks cond_set
> + bitmap_clear_all(keyread_set);
> + table->mark_columns_used_by_index(keyread_idx, keyread_set);
> + bitmap_intersect(keyread_set, table->read_set);
> + bitmap_intersect(keyread_set, table->vcol_set);
this looks wrong, you probably should do
keyread_set INTERSECT (read_set UNION vcol_set)
> + if (!bitmap_is_clear_all(keyread_set))
> + {
> + bitmap_clear_all(keyread_set);
> + table->mark_columns_used_by_index(keyread_idx, keyread_set);
> + bitmap_intersect(table->read_set, keyread_set);
> + }
I don't understand that.
1. What if bitmap_is_clear_all(keyread_set) - what does it mean?
2. Why do you recalculate keyread_set, why not simply copy keyread_set
into read_set ?
> +}
> +
> +
> /*
> Plan refinement stage: do various setup things for the executor
>
> @@ -11526,7 +11552,10 @@ make_join_readinfo(JOIN *join, ulonglong options, uint no_jbuf_after)
> tab->read_first_record= tab->type == JT_SYSTEM ? join_read_system
> : join_read_const;
> if (table->covering_keys.is_set(tab->ref.key) && !table->no_keyread)
> + {
> + tab->adjust_read_set_for_vcol_keyread(tab->ref.key);
> table->file->ha_start_keyread(tab->ref.key);
better combine both into
inline void JOIN_TAB::start_keyread()
{
adjust_read_set_for_vcol_keyread(ref.key);
tab->table->file->ha_start_keyread(ref.key);
}
or even JOIN_TAB::start_keyread_if_needed() putting the if() inside too
> + }
> else if ((!jcl || jcl > 4) && !tab->ref.is_access_triggered())
> push_index_cond(tab, tab->ref.key);
> break;
> @@ -11606,35 +11641,19 @@ make_join_readinfo(JOIN *join, ulonglong options, uint no_jbuf_after)
> }
> }
> }
> - if (!table->no_keyread)
> + uint idx= tab->get_covering_index_for_scan();
> + if (idx != MAX_KEY)
> {
> if (tab->select && tab->select->quick &&
> tab->select->quick->index != MAX_KEY && //not index_merge
> table->covering_keys.is_set(tab->select->quick->index))
> - table->file->ha_start_keyread(tab->select->quick->index);
> + table->file->ha_start_keyread(idx);
why do you need an if() here - you already checked that in
get_covering_index_for_scan().
I generally don't like this code at all. In JOIN_CACHE you simply
use tab->get_covering_index_for_scan(), but here you use it
*plus* a bunch of if()s afterwards. It surely creates an impression
that JOIN_CACHE and make_join_readinfo may have a different idea
about whether keyread is used.
Ideally here should be just
if (idx != MAX_KEY)
tab->start_keyread(idx);
and even then it would be kinda questionable (because preconditions
might change between two get_covering_index_for_scan() calls).
> else if (!table->covering_keys.is_clear_all() &&
> !(tab->select && tab->select->quick))
> - { // Only read index tree
> - if (tab->loosescan_match_tab)
> - tab->index= tab->loosescan_key;
> - else
> - {
> -#ifdef BAD_OPTIMIZATION
> - /*
> - It has turned out that the below change, while speeding things
> - up for disk-bound loads, slows them down for cases when the data
> - is in disk cache (see BUG#35850):
> - See bug #26447: "Using the clustered index for a table scan
> - is always faster than using a secondary index".
> - */
> - if (table->s->primary_key != MAX_KEY &&
> - table->file->primary_key_is_clustered())
> - tab->index= table->s->primary_key;
> - else
> -#endif
> - tab->index=find_shortest_key(table, & table->covering_keys);
> - }
> + { // Only read index tree
> + tab->index= idx;
> tab->read_first_record= join_read_first;
> + table->file->ha_start_keyread(tab->index);
> /* Read with index_first / index_next */
> tab->type= tab->type == JT_ALL ? JT_NEXT : JT_HASH_NEXT;
> }
Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx