← Back to team overview

maria-developers team mailing list archive

Re: engine condition pushdown

 

Hi Sergej,
attached you will find a file with commands for generating the effect.
It's not a script so please copy and paste the lines into your
favourite front-end.
Here is the result on my test:
# the result in cond_push():
for TestSmall: OK
for TestBig: OK

start scanning TestSmall: OK

1st scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] name=<PZN>

2nd scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN]
str_length=<2>  str_value=<37>  name=<PZN>

3rd scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<7>  str_value=<222    >
name=<PZN>

4th scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<3>  str_value=<371>
name=<PZN>

5th scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<7>  str_value=<3717997>
name=<PZN>

as you cann see from the INSERT-statement above it is:
1st: is <>,         should have been <12>
2nd: is <2>,        should have been <3717968>
3rd: is <222    >,  should have been <222>
4th: ia <371>,      should have been <3717980>
5th: is <3717997>,  this is correct!The data is almost identical to the data in my text.Hope this helps.
AugustQ
Am Sonntag, den 19.02.2017, 17:32 +0100 schrieb Sergei Golubchik:
> Hi, AugustQ!
> 
> First, there won't be an answer to your question here, sorry :(
> 
> A couple of thoughts:
> 
> cond->str_value is not the how you get the value of an Item.
> Item::str_value is kind of a internal storage in the Item, can be
> used
> as a cache or temporary value holder or anything. Or not used at all,
> it's up to the individual item implementation. For some items it
> might
> hold the value.
> 
> Condition pushdown was originally created for engines like NDB, that
> store data remotely and need to send to the SQL executor for
> processing.
> With condition pushdown they could avoid sending data that will be
> anyway discarded by the server. So, if something isn't pushed down,
> it
> may be because pushing it wasn't helping in that particular use case.
> 
> Now, if you'd provided a complete test case - an sql file that I
> could
> just feed into mysql command line client and it would create and
> populate tables and execute your query - I could've looked in a
> debugger and answer your question exactly.
> 
> Without a test case - I tried to, but couldn't (read your blog too).
> 
> On Feb 16, AugustQ wrote:
> > 
> > 
> > by playing with a SQL-statement I encountered a problem. I'm not
> > sure
> > if this is a bug or if I did not use the correct method to get the
> > information I expected. 
> > 
> > So let me please explain the situation in which I encountered this
> > problem. But it will take some lines of text to come to the point.
> > 
> > My environment: MariaDB 10.0.10 source-code-version on Linux.
> > 
> > The statement: select SQL_NO_CACHE   B.PZN, B.ArtikelText  from
> > TestSmall B where exists ( select 1 from TestBig A where
> > A.Hersteller =
> > '36367' and A.PZN = B.PZN) and B.Hersteller = '00020'
> > Engine used is MyISAM but the effect should be independent of any
> > engine (not checked). No indexes exist on the tables involved.
> > 
> > You will find a lot more of details here: http://augustq.blogspot.c
> > om/2
> > 017/02/subselect-execution-4.html
> > 
> > Execution of this (silly) statement is done by a table-scan on the
> > table TestSmall. When a record is found the matches the WHERE-
> > condition 
> > the server switches over to the table TestBig and does a table-scan 
> > on
> > this table. When a match is found it returns to the table
> > TestSmall,
> > otherwise the table TestBig is read to the end and then the server
> > returns to the table TestSmall. The server continues scanning the
> > table
> > TestSmall, switching over to TestBig, returning to TestSmall,
> > continues
> > reading TestSmall and so on, until it reaches the end of the table
> > TestSmall.
> > 
> > Of interest for me was the condition tree which can be inspected by
> > looking at the function cond_push() (in my case
> > ha_myisam::cond_push()).
> > 
> > So let's start.
> > 
> > The function cond_push() is initally called for the tables
> > TestSmall
> > and TestBig, this looks good.
> > 
> > Then the server starts reading the table TestSmall and therefore
> > the
> > function cond_push()  is called again. This looks good too.
> > 
> > Now a matching record is found  in TestSmall: it switches over to
> > the
> > table TestBig and calls cond_push() with the condition-tree for
> > this
> > table. This tree looks like this:
> > 
> >     COND-ITEM   args: 0 type=[COND_AND_FUNC]    
> > FUNC-ITEM   [=] args: 2 type=[EQ_FUNC]  
> > FIELD-ITEM  [TestOpt] [A] [Hersteller]  name=<Hersteller>
> > STRING-ITEM     str_length=<5>  str_value=<36367>   name=<36367>
> > FUNC-ITEM   [=] args: 2 type=[EQ_FUNC]  
> > FIELD-ITEM  [TestOpt] [A] [PZN] name=<PZN>
> > FIELD-ITEM  [TestOpt] [B] [PZN] name=<PZN>
> > 
> > The last line is of interest (I've marked it in bold). For the
> > table
> > with the alias B (= TestSmall) no value is given for the column PZN
> > but
> > this value is available. This is the first point of interest.
> > 
> > Let's continue with the operation. When the table TestBig is read
> > to
> > the end, the scanning of TestSmall continues and a matching record
> > is
> > found over there the server again switches over to TestBig. Before
> > it
> > starts reading the records from this table the function cond_push()
> > is
> > called again. This is the output of the condition-tree, restricted
> > to
> > the line of interest (the last line):
> > 
> >     FIELD-ITEM  [TestOpt] [B] [PZN]
> >     str_length=<2>  str_value=<37>  name=<PZN>
> > 
> > So let's continue: when it switches to TestBig for the 3rd time
> > this
> > output looks like:
> > 
> >     FIELD-ITEM  [TestOpt] [B] [PZN]
> > str_length=<7>  str_value=<222    >
> >     name=<PZN>
> > 
> > Enough of data. I can deliver more examples if needed or you may
> > look
> > into my text.
> > 
> > Judgement:
> > The output of the first case des not show the value of the column
> > PZN
> > from the current record in TestSmall although the value is
> > available.
> > 
> > The output in the second case shows the value 37 (because of the
> > length
> > eq. 2) but the correct value would be 3717968 (and length eq. 7).
> > 
> > The output in the third case shows the value 222 followed by 4
> > spaces
> > (length eq. 7), but the correct value is 222 (length eq. 3).
> > 
> > What I see is the correct value (exclude the first example
> > presented
> > here) but the length-information is one row behind (one row of
> > TestSmall).
> > 
> > For accessing the information presented here I used COND-
> > >str_value.
> > 
> > So the question is:
> > 
> > 	- the information is available but I used the wrong
> > function/variable?
> > 
> > OR
> > 
> > 	- is this a bug in the code?
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
# the statements:

create database TestECP;

use TestECP;

CREATE TABLE `TestSmall` (
  `Id` char(8) DEFAULT NULL,
  `PZN` char(7) DEFAULT NULL,
  `EVP` decimal(7,2) DEFAULT NULL,
  `HAP` decimal(7,2) DEFAULT NULL,
  `ArtikelBez` varchar(40) DEFAULT NULL,
  `ArtikelText` varchar(26) DEFAULT NULL,
  `Hersteller` char(5) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
;

CREATE TABLE `TestBig` (
  `Id` char(8) DEFAULT NULL,
  `PZN` char(7) DEFAULT NULL,
  `EVP` decimal(7,2) DEFAULT NULL,
  `HAP` decimal(7,2) DEFAULT NULL,
  `ArtikelBez` varchar(40) DEFAULT NULL,
  `ArtikelText` varchar(26) DEFAULT NULL,
  `Hersteller` char(5) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
;

insert into TestSmall values( '1002100', '12',      3.95, 1.83, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '00020'), 
                            ( '1025266', '3717968', 0.00, 2.90, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '00020'),
                            ( '1025267', '222',     0.00, 4.45, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '00020'),
                            ( '1025268', '3717980', 0.00, 6.30, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '00020'),
                            ( '1025269', '3717997', 0.00, 6.30, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '00020') ;
                            

insert into TestBig values ( '01000001', '111',     0.00,    18.91,  'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000025', '999999',  82.95,   0.00,   'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000045', '222',     0.00,    383.92, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000072', '999999',  0.00,    18.32,  'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000110', '999999',  0.00,    0.00,   'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000114', '999999',  13.06,   12.54,  'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000118', '204665',  25.43,   7.87,   'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000120', '999999',  0.00,    0.00,   'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000139', '999999',  11.80,   3.90,   'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367'),
                           ( '01000141', '999999',  0.00,    0.00,   'XXXXXXXXXXXXXXXXXXXXXXXXXXXXX', 'XXXXXXXXXXXXXXXXXXXXXXXX', '36367') ;

# to produce the effect described:
set optimizer_switch='engine_condition_pushdown=on';

select SQL_NO_CACHE   B.PZN, B.ArtikelText  from TestSmall B where exists ( select 1 from TestBig A where A.Hersteller = '36367' and A.PZN = B.PZN) and B.Hersteller = '00020';

# the result in cond_push():
for TestSmall: OK
for TestBig: OK

start scanning TestSmall: OK

1st scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] name=<PZN>

2nd scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<2>  str_value=<37>  name=<PZN>

3rd scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<7>  str_value=<222    > name=<PZN>

4th scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<3>  str_value=<371> name=<PZN>

5th scanning TestBig:
FIELD-ITEM  [TestECP] [B] [PZN] str_length=<7>  str_value=<3717997> name=<PZN>

as you cann see from the INSERT-statement above it is:
1st: is <>,  should have been <12>
2nd: is <2>, should have been <3717968>
3rd: is <222    >, should have been <222>
4th: ia <371>, should have been <3717980>
5th: is <3717997>, this is correct!

Attachment: signature.asc
Description: This is a digitally signed message part


Follow ups

References