maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #02379
Re: stored programs
Hi,
I was a little confusing. The query orders of the order by columns (salary
is project out into the temporary table) but, later ob_hash is used. This
is because the frame extends to all the values the order the same, and
comparing multiple columns is hard, so a hash is used instead.
--Justin
On Tue, Mar 3, 2015 at 1:28 PM, Justin Swanhart <greenlion@xxxxxxxxx> wrote:
> Hi,
>
> Well here is how Shard-Query does them. I assume you would do something
> very similarly internally with a temp table.
>
> a) it create a temporary table for the query reserving null rows for the
> window functions
> b) it adds to the temporary a unique id for each row of the resultset.
> This is used for framing.
> c) it adds a hash of the order by columns for the window function for
> ordering
> d) it adds a hash of the partition columns for partitioning
>
> After the regular resultset is stored in the temp table, a function sweeps
> the table for
> each window function, calculating the result of the function based on the
> framing clause,
> then the column in the resultset is updated to reflect the computed value.
>
> Finally, the resultset is returned to the client.
>
> Here is the SQL log of the following query:
> mysql> call shard_query.sq_helper("SELECT depname, empno, salary,
> cume_dist() OVER (PARTITION BY depname ORDER by salary rows between 1
> following and 1 following) ss FROM empsalary", "", 'test','testtab',1,1);
> +-----------+-------+--------+------------------+
> | depname | empno | salary | ss |
> +-----------+-------+--------+------------------+
> | develop | 7 | 4200 | 0.2 |
> | develop | 9 | 4500 | 0.4 |
> | develop | 11 | 5200 | 0.8 |
> | develop | 10 | 5200 | 0.8 |
> | develop | 8 | 6000 | 1 |
> | sales | 3 | 4800 | 0.66666666666667 |
> | sales | 4 | 4800 | 0.66666666666667 |
> | sales | 1 | 5000 | 1 |
> | personnel | 5 | 3500 | 0.5 |
> | personnel | 2 | 3900 | 1 |
> +-----------+-------+--------+------------------+
> 10 rows in set (0.08 sec)
>
> Query OK, 0 rows affected (0.09 sec)
>
> 150303 13:12:05 1533 Query call shard_query.sq_helper("SELECT depname,
> empno, salary, cume_dist() OVER (PARTITION BY depname ORDER by salary rows
> between 1 following and 1 following) ss FROM empsalary", "",
> 'test','testtab',1,1)
>
> -- get meta data for resultset (notice 0=1 in where clause)
> 1553 Query SELECT NULL as wf_rownum, depname AS expr$0,empno AS
> expr$1,salary AS expr$2,NULL as wf0,depname,SHA1(CONCAT_WS('#',depname)) as
> wf0_hash,salary,SHA1(CONCAT_WS('#',salary)) as wf0_obhash
> FROM empsalary AS `empsalary` WHERE 1=1 LIMIT 0
>
> -- create temp table
> 1553 Query CREATE TABLE aggregation_tmp_45403179 (wf_rownum bigint
> auto_increment primary key,expr$0 VARCHAR(255),expr$1 VARCHAR(255),expr$2
> VARCHAR(255),wf0 VARCHAR(255),depname VARCHAR(255),wf0_hash
> VARCHAR(255),salary VARCHAR(255),wf0_obhash VARCHAR(255)) ENGINE=MYISAM
>
> -- get resultset
> 1556 Query SELECT NULL as wf_rownum, depname AS expr$0,empno AS
> expr$1,salary AS expr$2,NULL as wf0,depname,SHA1(CONCAT_WS('#',depname)) as
> wf0_hash,salary,SHA1(CONCAT_WS('#',salary)) as wf0_obhash
> FROM empsalary AS `empsalary` WHERE 1=1
>
> -- store resultset
> 1557 Query INSERT INTO `aggregation_tmp_45403179` VALUES
> (NULL,'develop',11,5200,NULL,'develop','418a6bc4deccf0f7d5182192d51a54e504b3f3c9',5200,'2fad4efb8e6aaaa53ecdf638137d390e002f9783'),(NULL,'develop',7,4200,NULL,'develop','418a6bc4deccf0f7d5182192d51a54e504b3f3c9',4200,'c79fce75b1583ddd36a96178757e0d8d0ac91228'),(NULL,'develop',9,4500,NULL,'develop','418a6bc4deccf0f7d5182192d51a54e504b3f3c9',4500,'97a87b470fe9ed5ff51ff9b8543e937e6016d48c'),(NULL,'develop',8,6000,NULL,'develop','418a6bc4deccf0f7d5182192d51a54e504b3f3c9',6000,'31d9ddeaa80bc88c1f3117b9724726ebcc7fc72d'),(NULL,'develop',10,5200,NULL,'develop','418a6bc4deccf0f7d5182192d51a54e504b3f3c9',5200,'2fad4efb8e6aaaa53ecdf638137d390e002f9783'),(NULL,'personnel',5,3500,NULL,'personnel','fc08ce9ebee8734c2dc883c0dbd607686bdce8f3',3500,'65609286cc04ece831a844984a6bc9eb80450cf7'),(NULL,'personnel',2,3900,NULL,'personnel','fc08ce9ebee8734c2dc883c0dbd607686bdce8f3',3900,'5446569e8572251dcac168152d6c37074427eae3'),(NULL,'sales',3,4800,NULL,'sales','59248c4dae276a021cb296d2ee0e6a0c962a8d7f',4800,'98e9f55262269a05dd4f1ed788626580fdef2e95'),(NULL,'sales',1,5000,NULL,'sales','59248c4dae276a021cb296d2ee0e6a0c962a8d7f',5000,'f8237d8959e03355010bb85cc3dc46a46fb31110'),(NULL,'sales',4,4800,NULL,'sales','59248c4dae276a021cb296d2ee0e6a0c962a8d7f',4800,'98e9f55262269a05dd4f1ed788626580fdef2e95')
>
> -- handle window function 0 (the only one in this case)
>
> -- get the hashes for each partition
> 1553 Query SELECT distinct wf0_hash h from aggregation_tmp_45403179
> ORDER BY salary asc
>
> -- compute the values for each partition (three in this case)
>
> -- process the window and update the temp table (see code at the end for
> wf_cume) for each partition in turn.
> -- as you can see there is a select followed by updates
>
> 1553 Query SELECT *,NULL as wf0
> FROM `aggregation_tmp_45403179` where
> wf0_hash='fc08ce9ebee8734c2dc883c0dbd607686bdce8f3' ORDER BY salary asc
>
>
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 0.5 WHERE wf_rownum
> in (6)
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 1 WHERE wf_rownum
> in (7)
> 1553 Query SELECT *,NULL as wf0
> FROM `aggregation_tmp_45403179` where
> wf0_hash='59248c4dae276a021cb296d2ee0e6a0c962a8d7f' ORDER BY salary asc
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 0.66666666666667
> WHERE wf_rownum in (8,10)
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 1 WHERE wf_rownum
> in (9)
> 1553 Query SELECT *,NULL as wf0
> FROM `aggregation_tmp_45403179` where
> wf0_hash='418a6bc4deccf0f7d5182192d51a54e504b3f3c9' ORDER BY salary asc
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 0.2 WHERE wf_rownum
> in (2)
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 0.4 WHERE wf_rownum
> in (3)
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 0.8 WHERE wf_rownum
> in (1,5)
> 1553 Query UPDATE aggregation_tmp_45403179 SET wf0 = 1 WHERE wf_rownum
> in (4)
>
> -- return resultset to client
>
> 1559 Query SELECT expr$0 AS `depname`,expr$1 AS `empno`,expr$2 AS
> `salary`,wf0 as `ss`
> FROM `aggregation_tmp_45403179` ORDER BY wf0_hash,salary asc
>
> -- remove temp table
>
> 1559 Query DROP TABLE IF EXISTS aggregation_tmp_45403179
> 1559 Quit
> 1550 Quit
>
> 150303 13:12:12 1533 Query set global general_log=0
>
>
>
> protected function wf_cume_dist($num,$state) {
> static $sum;
> $win = $state->windows[$num];
> if(empty($win['order'])) {
> if($percent)
> $sql = "update " . $state->table_name . " set wf{$num}=1";
> else
> $sql = "update " . $state->table_name . " set wf{$num}=0";
>
> $state->DAL->my_query($sql);
> if($err = $state->DAL->my_error()) {
> $this->errors[] = $err;
> return false;
> }
> return true;
> } else {
> /* running sum*/
> $sql = "SELECT distinct wf{$num}_hash h from " . $state->table_name
> . " ORDER BY " . $win['order_by'];
> $stmt = $state->DAL->my_query($sql);
> if($err = $state->DAL->my_error()) {
> $this->errors[] = $err;
> return false;
> }
> $last_hash = "";
> $hash = "";
> $last_ob_hash = "";
> $ob_hash = "";
> while($row = $state->DAL->my_fetch_assoc($stmt)) {
> #$sql = "select * from " . $state->table_name . " where
> wf{$num}_hash='" . $row['h'] . "' ORDER BY " . $win['order_by'];
> $sql = "SELECT *," . $state->winfunc_sql . " where
> wf{$num}_hash='" . $row['h'] . "' ORDER BY " . $win['order_by'];
> $stmt2 = $state->DAL->my_query($sql);
> if($err = $state->DAL->my_error()) {
> $this->errors[] = $err;
> return false;
> }
> $done=array();
> $rows=array();
> while($row2=$state->DAL->my_fetch_assoc($stmt2)) {
> $rows[] = $row2;
> }
> $last_hash = "";
> $last_ob_hash = "";
> $i = 0;
> $rowlist="";
> $rank = 0;
>
> while($i<count($rows)) {
> $row2 = $rows[$i];
> ++$rank;
> $ob_hash = $row2["wf{$num}_obhash"];
> $rowlist=$row2['wf_rownum'];
> for($n=$i+1;$n<count($rows);++$n) {
> $row3 = $rows[$n];
> $new_ob_hash = $row3["wf{$num}_obhash"];
> if($new_ob_hash != $ob_hash) {
> break;
> }
> $rowlist .= "," . $row3['wf_rownum'];
> ++$i;
> ++$rank;
> }
> $dist = $rank/count($rows);
> $sql = "UPDATE " . $state->table_name . " SET wf{$num} = {$dist}
> WHERE wf_rownum in ({$rowlist})";
> $state->DAL->my_query($sql);
> if($err = $state->DAL->my_error()) {
> $this->errors[] = $err;
> return false;
> }
> ++$i;
> }
> }
> }
> return true;
> }
>
>
>
> On Tue, Mar 3, 2015 at 12:55 PM, Igor Babaev <igor@xxxxxxxxxxxx> wrote:
>
>> On 03/03/2015 10:25 AM, Sergei Golubchik wrote:
>> > Hi, Igor!
>> >
>> > On Mar 03, Igor Babaev wrote:
>> >>>
>> >>>> I'd also like to discuss window functions too. I've implemented them
>> >>>> in shard-query and have ideas about how to implement them in the
>> >>>> server, but pluggable parser would be really useful here.
>> >>>
>> >>> Window functions have a good chance of being in 10.2, it's MDEV-6115.
>> >>> But I don't think that somebody is working on MDEV-6115 yet.
>> >>
>> >> I started working on MDEV-6115 some time ago.
>> >
>> > Ah, great. Sorry, I didn't know it.
>> > Could you then discuss it with Justin, please?
>> > See above, he has some ideas about the implementation.
>>
>>
>> Justin,
>> How do you prefer discussing your ideas?
>> On IRC (#maria) or by phone? (I don't have skype at the moment)
>>
>> Regards,
>> Igor.
>>
>> >
>> > Regards,
>> > Sergei
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-discuss
>> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~maria-discuss
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
Follow ups
References