maria-developers team mailing list archive

Thread
Date

Re: Accessing read records during join for condition pushdown

To: "Berrocal, Eduardo" <eduardo.berrocal@xxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Mon, 12 Jul 2021 12:49:27 +0300
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CO1PR11MB47702ED269545971656CC9D18B199@CO1PR11MB4770.namprd11.prod.outlook.com>
User-agent: Mutt/1.9.4 (2018-02-28)

Hi Eduardo,

On Thu, Jul 08, 2021 at 06:43:05PM +0000, Berrocal, Eduardo wrote:
> I have tried to use that route but without success. The problem is that I can only access the last record read from table1. 
> 
Yes, this is expected.

> Let say that, during the JOIN, 3 rows are read from table1. These 3 rows will
> be compared against all rows from table2. I want to avoid a full scan of
> table2 by passing the value for the field from those 3 rows down to the
> storage engine. 

Looks reasonable.

> When condition pushdown is called on table2, and I try to
> access those 3 rows, I can only access the last one read through
> f1->field->ptr (or f1->table->record[0]).
> 
> For what I understand, f1->table does not store an array of read records
> during JOIN. Am I right? That is stored in the JOIN_CACHE object. Please
> correct me if I am wrong here.

You're correct.

Other storage engines have only requested pushdown of basic conditions, so the 
code at SQL layer side doesn't handle the case where the Storage Engine would 
want to check the conditions that refer to the contents of the join buffer (aka
JOIN_CACHE structure).

Note that join buffer can get complex for multi-table joins.
Consider a query:

select * 
from 
  t1, t2, t3 
where
    t1.col1=t3.col1 and t2.col2=t3.col2

And a query plan (one gets something like this by default):

+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                                  |
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 10   |                                                        |
|    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL | 20   | Using join buffer (flat, BNL join)                     |
|    1 | SIMPLE      | t3    | ALL  | NULL          | NULL | NULL    | NULL | 40   | Using where; Using join buffer (incremental, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------+

Suppose t3 is your storage engine table. you would want to push down the whole

    t1.col1=t3.col1 and t2.col2=t3.col2

When the join buffer is "incremental" (like in the above EXPLAIN), it will
actually be two join buffers linked through pointers. Traversing these might
be complicated.

(whether join buffer is incremental is controlled by this setting:
https://mariadb.com/kb/en/server-system-variables/#join_cache_level)

A question: do you intend to push down and check conditions in arbitrary form? 
In case they are just equalities, it might make sense for storage engine to
pretend having an index and then Batched Key Access optimization will be used,
and the SQL layer will provide batches of keys to lookup. 

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net

Follow ups

Re: Accessing read records during join for condition pushdown
From: Berrocal, Eduardo, 2021-07-12

References

Accessing read records during join for condition pushdown
From: Berrocal, Eduardo, 2021-07-06
Re: Accessing read records during join for condition pushdown
From: Sergei Golubchik, 2021-07-08
Re: Accessing read records during join for condition pushdown
From: Berrocal, Eduardo, 2021-07-08