maria-developers team mailing list archive

Thread
Date

Re: Accessing read records during join for condition pushdown

To: Sergey Petrunia <sergey@xxxxxxxxxxx>
From: "Berrocal, Eduardo" <eduardo.berrocal@xxxxxxxxx>
Date: Thu, 22 Jul 2021 00:28:09 +0000
Accept-language: en-US
Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DG68420JqPluqQmkPlJ097/hONIyvuQ0o3id5h2IBdw=; b=CDzXsS6uEvcdWgX4xhW39kPbQ/LnLHhSCwPpA9FgbM5AZMoAlFmiHMDwVVKResNQzXQ6U0QGzgoPg6Uf20HAurtVCRU9zqtx2ofAmVmKo67O/201WY7pOG24lP53xXxtIJ97mY2aErTC8a9P5/HXfRfTqGSTt7RkxKHvyK5sDY1CAdkTwxI/KazhWBox8BZx+VgkvjwkLFI9Hl6+IV7wctDtdLXreavt19deHysdpSAN9wK1AW1Y8TqIdKN0DGkNzzN1XeNGzHJxGKG8C29HmOs0X7ck9pFYbzZko2DvYbkQ6mCt1OHr5W3uy77PsZgKzKpnCUXeoC5TglN1zGSYaA==
Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G8YiR6t9yGVMrOXMoGCBvFtB726nR+663UzR/Wk7YtLd2+HCVNE+k8+M0OiLrveJf80i3XfxRbEoqi4JFrpvBndW5PGAf2Npmli+nFx5s+iLf72G/v2pVgkGO2O5CDHZ++z2RHXcN1Xd3+rvm3wJipy7Uv8vHvT2a7m+WOrZGVvTPEREaq72Q8KholNUeo8Ltg7BVfw6LsmNgu3lFlQjHaVc2044Hc0JvFt/yJ9KcFEqXi5w/n5Tt9SqNO9XtTAqtMhV8wx2VcaM8jwSHd1cNWX/AEqSerSN4RYIka9CqspNkOYvTeCDekMz/kagPML06H+ngtHFxn8cU27nPolGTw==
Authentication-results: mariadb.com; dkim=none (message not signed) header.d=none;mariadb.com; dmarc=none action=none header.from=intel.com;
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
Dlp-product: dlpe-windows
Dlp-reaction: no-action
Dlp-version: 11.5.1.3
In-reply-to: <20210714092548.GB19790@pslp4>
Thread-index: Addyu53/Di069vVzQH+2/nbHVe69JABEUwuAABbMdoAAtsX4gAAcMSkwAEeQmQABf37j0A==
Thread-topic: [Maria-developers] Accessing read records during join for condition pushdown

Hello Sergei,

Thanks a lot for your response. The way you showed seems to be the way to go.
There is just one problem: all those functions are protected inside the JOIN_CACHE (and JOIN_CACHE_*)
Classes. Which seems to suggest that code changes to the MariaDB core code is a necessity in this case.

Cheers,

Eduardo Berrocal García de Carellán
Senior Software Engineer

Intel Corporation  |  intel.com

-----Original Message-----
From: Sergey Petrunia <sergey@xxxxxxxxxxx> 
Sent: Wednesday, July 14, 2021 2:26 AM
To: Berrocal, Eduardo <eduardo.berrocal@xxxxxxxxx>
Cc: Sergei Golubchik <serg@xxxxxxxxxxx>; maria-developers@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Maria-developers] Accessing read records during join for condition pushdown

Hi Eduardo,

On Mon, Jul 12, 2021 at 11:30:35PM +0000, Berrocal, Eduardo wrote:
>     "A question: do you intend to push down and check conditions in arbitrary form?
>     In case they are just equalities, it might make sense for storage engine to pretend having an index and then Batched Key    
>     Access optimization will be used, and the SQL layer will provide batches of keys to lookup."
> 
> 
> The answer is yes. I would like to check more than just equalities, if possible. The Batched key access is something worth looking into, in case that that can be used as a way to speed up at least conditions with equalities. 
> 
> A follow up question for you: When you say that traversing the linked 
> join buffers is difficult... how difficult exactly? Like a linked list, a tree...
> ?  I still think it is worth exploring in our case given that the 
> speedup can be huge if we can avoid a full scan of a table with TBs of data.
> 

Iterating the join buffer shouldn't be hard. It is just a memory area which is filled with variable-size records which have certain fields.

Debugging a query that uses join buffers, one can see that the iteration over contents of the join buffer is done in JOIN_CACHE::join_matching_records()
using these calls:

  prepare_look_for_matches();
  while (get_next_candidate_for_match()) {
     read_next_candidate_for_match();
  }

The code in JOIN_CACHE::join_matching_records() is complex, because it does other things besides just iterating the cache:

* Checking of "first match" flag for queries which only need one matching row from the second table.

* Handling outer JOINs, where we apply the restriction from ON expression always, and apply the restriction from WHERE expression only after we've figured out if LEFT JOIN has a match.

AFAIU, you should ignore these inside the storage engine (produce all matches, don't check the Item_func_trig_cond items).  This will make your copy of join_matching_records simpler.

BR
 Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net

Follow ups

Re: Accessing read records during join for condition pushdown
From: Sergey Petrunia, 2021-07-22

References

Accessing read records during join for condition pushdown
From: Berrocal, Eduardo, 2021-07-06
Re: Accessing read records during join for condition pushdown
From: Sergei Golubchik, 2021-07-08
Re: Accessing read records during join for condition pushdown
From: Berrocal, Eduardo, 2021-07-08
Re: Accessing read records during join for condition pushdown
From: Sergey Petrunia, 2021-07-12
Re: Accessing read records during join for condition pushdown
From: Berrocal, Eduardo, 2021-07-12
Re: Accessing read records during join for condition pushdown
From: Sergey Petrunia, 2021-07-14