← Back to team overview

maria-developers team mailing list archive

Re: c76ac1e6de8: MDEV-24089 support oracle syntax: rownum

 

Hi!

On Mon, Apr 19, 2021 at 10:52 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
>
> Hi, Monty!
>
> On Apr 18, Michael Widenius wrote:
> > > > diff --git a/mysql-test/main/rownum.test b/mysql-test/main/rownum.test
> > > > new file mode 100644
> > > > index 00000000000..8d89eed2f86
> > > > --- /dev/null
> > > > +++ b/mysql-test/main/rownum.test
> > > > @@ -0,0 +1,542 @@
> > > > +--source include/have_sequence.inc
> > > > +#
> > > > +# Test of basic rownum() functionallity
> > > > +# Test are done with Aria to ensure that row order is stable
> > >
> > > what does that mean?
> >
> > It means that SELECT * will return the rows in the same order for the test.
> > (There is normally no guaranteed that insert and select order are the same).
>
> But both MyISAM and InnoDB guarantee that too. What's so special with
> Aria?

MyISAM does guarantee that, but not InnoDB (returns rows in primary
key order) and
Aria with page storage will not do it either.


> > > > +  OPEN c_cursor;
> > > > +  FETCH c_cursor INTO v_a, v_b, v_rn;
> > > > +  WHILE c_cursor%FOUND LOOP
> > > > +     SELECT concat(v_a,'--',v_b,'--',v_rn);
> > > > +     FETCH c_cursor INTO v_a, v_b, v_rn;
> > > > +  END LOOP;
> > > > +  CLOSE c_cursor;
> > > > +END;|
> > > > +DELIMITER ;|
> > > > +
> > > > +--error ER_FUNCTION_CANNOT_BE_USED_IN_CLAUSE
> > > > +select a, rownum from t1 group by a, rownum having rownum < 3;
> > > > +select a, rownum from t1 group by a, rownum having "rownum" < 3;
> > >
> > > what's the difference between these two? Why the first is an error and the
> > > second is not?
> >
> > There are quotes around rownum in the second version.  In Oracle mode that is
> > an error, in MariaDB mode it is not.
>
> I can see the quotes, that's what I asked about.
> Why *in Oracle mode* the first is an error and the second is not?
> I'd expect them to be identical.

In Oracle you cannot use rownum function in having.
You can however, thanks to MariaDB, refer to a column in the select part.

> > > > +select a, rownum as r from t1 group by a, rownum having r < 3;
> > >
> > > And the this one. Why is it not an error either?
> >
> > Because 'r' is a reference, not a function.  'r' works on the result set.
> > rownum cannot be executed in the having clause (which is executed
> > after all rows are returned).
>
> I don't understand it. HAVING is not executed *after* all rows are
> returned, it's executed *while* they're returned. What is the difference
> between rownum directly in HAVING and referring to it by an alias?

having rownum < 0 is NOT using an alias.

You can not use a row function on a group result.
The group does NOT have a rownum.  Each group consist of a set of rows
each having it's own rownum.  Which rownum should the rownum usage in
having get?

To put things simple, we are doing exactly what Oracle is doing, which
is the purpose of this patch.

<cut>

> > > how can rownum be NULL?
> >
> > In context where there are no rows.   like SET @A=rownum;
>
> Do you have test cases for that?

Yes, of course. Was part of the patch:

--echo #
--echo # Rownum() used in not supported places (returns 0 or gives an error)
--echo #

set @a=rownum();
select @a;

> > > > +    return FALSE;
> > > > +  }
> > > > +  void cleanup() override
> > > > +  {
> > > > +    Item_longlong_func::cleanup();
> > > > +    /* Ensure we don't point to freed memory */
> > > > +    accepted_rows= 0;
> > > > +  }
> > > > +  bool check_vcol_func_processor(void *arg) override
> > > > +  {
> > > > +    return mark_unsupported_function(func_name(), "()", arg,
> > > > +                                     VCOL_IMPOSSIBLE);
> > > > +  }
> > > > +  bool check_handler_func_processor(void *arg) override
> > > > +  {
> > > > +    return mark_unsupported_function(func_name(), "()", arg,
> > > > +                                     VCOL_IMPOSSIBLE);
> > >
> > > first, as I wrote above, I think that there's no logical reason why rownum should
> > > not work with handler.
> >
> > Please read the patch. It cannot be in the handler for multiple reasons:
> > - INSERT, UPDATE, LOAD DATA
> > - handler does not know if the row is accepted
> > - joins
> >  etc.
>
> I mean the HANDLER statement, that's why my comment is just below
> check_handler_func_processor() method.

Oracle does not support HANDLER, so there is no point in even trying.
Also with handler, you have full control of the rows that you are getting,
so there is no extra benefit of having rownum.


> > > but even if it doesn't, it's much better to check for SQLCOM_HA_READ
> > > from Item_func_rownum::fix_fields instead of traversing the whole
> > > item tree for every single HANDLER READ command looking for Item_func_rownum
> > > that nobody (literally, not a single user out of all our millions) will use there.
> >
> > There is only a traverse of the tree looking for rownum if the rownum
> > function is used.
> >
> > Why do you think there is some extra travelling of the tree for HANDLER_READ ?
>
> Because, see above, this comment refers to
> check_handler_func_processor() method this is called on every item by
> walrking the item tree.

We do not do this for every item, we only do it for items in the where condition
which are in most cases very few and thus almost no overhead.
I do not think it would be good to have switch over all possible items
that cannot
be used in the handler and test for those in the caller.

Generally, I don't think items should have a list of command where
they cannot be used.
Normally it is the caller that should do the checking.

> As there is only one item weird enough to possibly need it, I'd rather
> do the check in the item directly.

And when we flag more items to not be used for handler, or other cases
we whould add a list
of commands that we check against in all of them?

I do not think it is worth the trouble for doing it in this case as I
am sure that there are other
items that we should also prohibit with handler read (like sub queries
for example).

> > > > +  }
> > > > +  Item *get_copy(THD *thd) override
> > > > +  {
> > > > +    return this;
> > >
> > > It looks wrong. We don't have any other Item that returns `this` here.
> > > I think rownum cannot be pushed down because of RAND_TABLE_BIT,
> > > So better return NULL here, meaning "cannot be pushed down".
> > > Instead of `this` meaning "if you push it down, it'll surreptitiously
> > > start corrupting the memory"
> >
> > I don't think it can trash memory even if it would be pushed down.
> > As long as the item exists, it should be safe to use anywhere.
> > I also don't trust the code to be able to cope with returning null
> > in get_copy.
>
> Quite the contrary. There are other items that return NULL from get_copy().
> It's expected and normal, the code is designed to support it, it was
> part of the original implementation. No get_copy() returns this, the
> code is not designed to handle it.

Actually, originally we had it in many places and it did work and thus
we supported it.
There is no reason to assume that this would not work in this case.
The reason this works is that Item_rownum does not change it null
state between copies and does not store it's value in itself.
That is the critera for an Item that can return itself.


> > > > +  /* This function is used in insert, update and delete */
> > > > +  void store_pointer_to_row_counter(ha_rows *row_counter)
> > >
> > > So this isn't a virtual method, and to use it you downcast after
> > > checking type() == FUNC_ITEM and functype() == ROWNUM_FUNC. To avoid
> > > adding a new virtual method thus blowing up the vtable, I presume.
> >
> > Yes, it is not a virtual and should not be.  And yes, we do not want
> > to add any new virtual functions to items when we have close to 600+
> > items in the system...
> >
> > It is also much faster to do an if on upper level than doing a
> > function call.
> >
> > > And fix_after_optimize() is virtual. To avoid error-prone
> > > downcasting, would be my guess.
> >
> > Yes. This is something we may need for other items in the future, which
> > is why I made it virtaul.
>
> Do you have in mind any other items that will need it any time soon?
> If not, then I agree with what you wrote above, we do not want to add
> any new virtual functions to items when we have close to 600+ items in
> the system.

I don't know just now of any items that would need it, except maybe
Item_equal, that is currently sub optimal.  However, I will know much more
after I start working full time on the optimizer.


> > > > --- a/sql/item_func.cc
> > > > +++ b/sql/item_func.cc
> > > > @@ -7206,3 +7206,72 @@ void Item_func_setval::print(String *str, enum_query_type query_type)
> > > >    str->append_ulonglong(round);
> > > >    str->append(')');
> > > >  }
> > > > +
> > > > +
> > > > +/*
> > > > +  Return how many row combinations has accepted so far + 1
> > > > +
> > > > +  The + 1 is to ensure that 'WHERE ROWNUM <=1' returns one row
> > >
> > > this is a very weird explanation, it sounds as if it's
> > > a heuristical a hack for ROWNUM <=1 to work
> > >
> > > Why not to write "+1 takes into account the currently computed row"
> >
> > Because the row is not yet computed or accepted. We can call rownum any
> > time, after a row is accpted, before etc.
> >
> > I personally prefer the current comment as it is more well defined.
>
> So, you mean, +1 is not needed for WHERE ROWNUM <=2 to return two rows?
> Or for WHERE ROWNUM<=100 to return 100 rows?
>
> You pick one very specific case as an explanation, this implicitly
> assumes it's the main reason. Which it is not, even if we'll only
> consider exactly the same condition template, there are still
> 18446744073709551614 other reasons :)

Sorry, I neither understand your reasoning or logic here.
The comment is correct and explains what is going on in a clear
manner.  The <= 1 is just an example.  You can replace 1 with any
number and 'one' with same number verbatim
and the comment still works.

> +1 is, exactly, to take into account the current row, the one being
> processed right now.
>
> By the way, instead of +1 here, you could simply initialize your
> accepted_rows with 1, not with 0.

Which would be even harder to explain.  Why would we have one accepted
row before we even run the query?

> > > > +*/
> > > > +
> > > > +longlong Item_func_rownum::val_int()
> > > > +{
> > > > +  if (!accepted_rows)
> > > > +  {
> > > > +    /*
> > > > +      Rownum is not properly set up. Probably used in wrong context when
> > > > +      it should not be used. In this case returning 0 is probably the best
> > > > +      solution.
> > > > +    */
> > >
> > > DBUG_ASSERT(0); here?
> >
> > No.  One can use rownum in a lot of context where it does makes sence.
> > Instead of trying to generate an error for every of these cases, I decided
> > to return 0 for these.  In the future maybe some of these will be valid
> > and then things can change.  I do not want to see asserts in test cases
> > people add for things like this.
>
> Do you have test cases for that?

Yes, we it is in the test suite (We already discussed this before).

> > > > +  select->with_rownum= 1;
> > > > +  thd->lex->with_rownum= 1;
> > > > +  thd->lex->uncacheable(UNCACHEABLE_RAND);
> > >
> > > Why? ROWNUM doesn't make a query uncacheable, it's perfectly
> > > deterministic and repeatable.
> >
> > Not necessartly.  Row order may be different from the engine on
> > subsequent reads for example.
>
> This is the same with LIMIT. Even if the row order might be different,
> the first result is still correct, so it can be cached and reused.
> LIMIT doesn't make a query uncacheable.

I think you are missing the point.
marking things uncacheable() means that we cannot cache the value
internally, for example in sub queries and into item cache.

You can verify this by removing the above line and running
mtr rownum and verify that it is failing.

> > > > diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc
> > > > index 806049b4465..5c4dff33914 100644
> > > > --- a/sql/sql_insert.cc
> > > > +++ b/sql/sql_insert.cc
> > > > @@ -746,6 +746,9 @@ bool mysql_insert(THD *thd, TABLE_LIST *table_list,
> > > >      DBUG_RETURN(TRUE);
> > > >    }
> > > > +
> > > > +  if (thd->lex->with_rownum && table_list->lock_type == TL_WRITE_DELAYED)
> > > > +    table_list->lock_type= TL_WRITE;
> > >
> > > why? do you have a test case when it's needed?
> >
> > Because we cannot calculate row number inside insert delayed...
> > (We do not know if an row is accepted or not).
>
> "Not accepted"? The only possibility for a row to be not accepted is
> INSERT IGNORE and a unique key violation, right?
> WHERE clause is done before the delayed thread.
>
> I don't see any tests for INSERT IGNORE in your patch,
> so I cannot say what the intended behavior should be.
> Could you please add a test for it? For example,
>
>   create table t1 (a int not null primary key, b int);
>   insert ignore t1 values (1,rownum()),(2,rownum()),(2,rownum()),(3,rownum());

Add edit, and it worked perfectly.

> > > > @@ -2133,8 +2141,14 @@ int write_record(THD *thd, TABLE *table, COPY_INFO *info, select_result *sink)
> > > >      autoinc values (generated inside the handler::ha_write()) and
> > > >      values updated in ON DUPLICATE KEY UPDATE.
> > > >    */
> > > > -  if (sink && sink->send_data(thd->lex->returning()->item_list) < 0)
> > > > -    trg_error= 1;
> > > > +  if (sink)
> > > > +  {
> > > > +    /* We have to adjust 'copied' to make ROWNUM() work in RETURNING */
> > > > +    info->copied--;
> > > > +    if (sink->send_data(thd->lex->returning()->item_list) < 0)
> > > > +      trg_error= 1;
> > > > +    info->copied++;
> > >
> > > this is quite silly. Can you simply increment info->copied after this
> > > sink->send_data call?
> >
> > No, as this leads to wrong resutls.  The problem is that for
> > 'returning' the row is already accepted
> > while in all other code it is called before the row is accepted.
>
> Sorry, I don't understand. Why it would lead to wrong results?

There is some cases where copied is incremented twice for a row, like
in case of versioning. (This could be a bug).
In case of "insert into t1 values(1) on duplicate key update f1=1"
If the new row is identical to the old, we are not incrementing rownum.
This means that we cannot increment it after, as we don't know if it should be
incremented 0, 1, or 2 times.

> > > > --- a/sql/sql_limit.h
> > > > +++ b/sql/sql_limit.h
> > > > @@ -67,6 +67,15 @@ class Select_limit_counters
> > > >     { return select_limit_cnt; }
> > > >     ha_rows get_offset_limit()
> > > >     { return offset_limit_cnt; }
> > > > +
> > > > +   void set_limit_if_lower(ha_rows limit)
> > > > +   {
> > > > +     if (!offset_limit_cnt)
> > >
> > > I suppose, alternatively, you could've had
> > >
> > >   if (offset_limit_cnt + select_limit_cnt > limit)
> > >     select_limit_cnt= limit - offset_limit_cnt;
> > >
> > > > +     {
> > > > +       if (select_limit_cnt > limit)
> > > > +         select_limit_cnt= limit;
> > > > +     }
> > > > +   }
> > > >  };
> >
> > In some cases LIMIT is absolute.  With rownum it only applies if limit
> > is higher. I think it is good to separate those tests.
>
> What do you mean "LIMIT is absolute"?

LIMIT 10

> I only suggested that LIMIT can take ROWNUM into account also when
> OFFSET is non-zero. In your case you only adjust LIMIT if there is no
> OFFSET.

The problem is that limit and offset counts different things.
If we have for example

SELECT * FROM (SELECT * from t1 LIMIT 10,1000)) WHERE rownum < 100
In this case we have to ignore that 10 first rows and then check
rownum and limit on the
rest. This means that we have to adjust select_limit_cnt independent of offset.

However in this case:

SELECT * from t1 where rownum < 100 LIMIT 10,1000

The rownum will have to be applied before the offset of 10.

To avoid this confusion and possible bugs, it was easer to ignore pushing the
condition down when offset exists.
In practice, very few queries has both rownum and limit, so I don't
think it is worth
the effort to spend too much time on this.

> It's not a particularly common case, so not a very important
> optimization, of course. On the other hand, the suggested change is
> quite simple.

Simple, but not trivial.

> > > >  #endif // INCLUDES_MARIADB_SQL_LIMIT_H
> > > > diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> > > > index 2bd2575b882..57ba9df42c0 100644
> > > > --- a/sql/sql_yacc.yy
> > > > +++ b/sql/sql_yacc.yy
> > > > @@ -10286,6 +10287,22 @@ function_call_nonkeyword:
> > > >              if (unlikely($$ == NULL))
> > > >                MYSQL_YYABORT;
> > > >            }
> > > > +/* Start SQL_MODE_ORACLE_SPECIFIC
> > > > +         | ROWNUM_SYM optional_braces
> > > > +          {
> > > > +            $$= new (thd->mem_root) Item_func_rownum(thd);
> > > > +            if (unlikely($$ == NULL))
> > > > +              MYSQL_YYABORT;
> > > > +          }
> > > > +End SQL_MODE_ORACLE_SPECIFIC */
> > > > +/* Start SQL_MODE_DEFAULT_SPECIFIC */
> > > > +         | ROWNUM_SYM '(' ')'
> > > > +          {
> > > > +            $$= new (thd->mem_root) Item_func_rownum(thd);
> > > > +            if (unlikely($$ == NULL))
> > > > +              MYSQL_YYABORT;
> > > > +          }
> > > > +/* End SQL_MODE_DEFAULT_SPECIFIC */
> > >
> > > You have two nearly identical rules compiled conditionally.
> > > I can pretty much guarantee that they'll go out of sync eventually.
> >
> > > Better do something like
> > >
> > >         | ROWNUM_SYM
> > > /* Start SQL_MODE_ORACLE_SPECIFIC
> > >           optional_braces
> > > End SQL_MODE_ORACLE_SPECIFIC */
> > > /* Start SQL_MODE_DEFAULT_SPECIFIC */
> > >           '(' ')'
> > > /* End SQL_MODE_DEFAULT_SPECIFIC */
> > >          {
> > >            $$= new (thd->mem_root) Item_func_rownum(thd);
> > >            if (unlikely($$ == NULL))
> > >              MYSQL_YYABORT;
> > >          }
> >
> > A little better, but not much. Still not readable.
>
> This whole work of removing sql_yacc_ora.yy and using comments for parts
> of the grammar that differ - this was done *exactly* to avoid two nearly
> identical grammars that always needs to be modified in sync. Which is
> very error-prone.
>
> By literally copying a part of the grammar you're reintroducing the old
> problem of two copies. Not completely, of course, by bit by bit we'll
> have two copies to maintain again.
>
> Please, either use the "little better" grammar that I suggested or
> rewrite it any way you want that doesn't involve two identical copies of
> the same rule.

There is now way they can go out of sync as we have test cases
to detect this.
(I don't find the new suggested code readable).

> > <cut>
> >
> > > > diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> > > > @@ -14210,7 +14236,24 @@ static ORDER *
> > > >  remove_const(JOIN *join,ORDER *first_order, COND *cond,
> > > >               bool change_list, bool *simple_order)
> > > >  {
> > > > -  *simple_order= join->rollup.state == ROLLUP::STATE_NONE;
> > > > +  /*
> > > > +    We can't do ORDER BY using filesort if the select list contains a non
> > > > +    deterministic value like RAND() or ROWNUM().
> > > > +    For example:
> > > > +    SELECT RAND() as 'r' from t1 order by r;
> > >
> > > This one uses temporary for me. Can you put here a correct example please?
> >
> > It is a correct example and probably the best possible example.
> > I do not understand your comment (does not make any sense).
> > Could you please provide a correct comment?
>
> I tried this in the 10.5 branch:
>
> MariaDB [test]> explain select rand() as r from mysql.user order by r\G
> *************************** 1. row ***************************
>            id: 1
>   select_type: SIMPLE
>         table: user
>          type: index
> possible_keys: NULL
>           key: PRIMARY
>       key_len: 228
>           ref: NULL
>          rows: 3
>         Extra: Using index; Using temporary; Using filesort
> 1 row in set (0.002 sec)
>
> as you see, it uses temporary, so there is no need to do your

> +  *simple_order= !join->rand_table_in_field_list;

> because there is no bug here. At least for the example that you've used.
> This is why I asked for an example of a query that indeed behaves
> incorrectly without this fix of yours.

The purpose of this function is to set simple_order.
If we do not have my line, it will be set always false, which is what you are
seeing.  However always having 'use temporary' for any filesort is of course not
a good idea.

Here is an example that shows this if I remove my line.

MariaDB [test]> explain select a from t1 order by a;
         +------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len |
ref  | rows | Extra                           |
+------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    |
NULL | 4    | Using temporary; Using filesort |
+------+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

There is no reason to do a temporary table for this query

> > > > +static bool is_const(Item *item)
> > > > +{
> > > > +  /*
> > > > +    Constant expression in WHERE can be also subquery or other expensive
> > > > +    thing we shoud not evaluate on OPTIMIZATION phase
> > > > +  */
> > > > +  return item->const_item() && !item->is_expensive();
> > > > +}
> > > > +
> > > > +
> > > > +// Check if this is an unexpensive functional expression over basic constants
> > > > +
> > > > +static bool is_basic_const(Item *item)
> > > > +{
> > > > +  if (item->basic_const_item())
> > > > +    return true;
> > > > +  if (!item->const_item() || item->is_expensive())
> > > > +    return false;
> > > > +  if (item->type() == Item::FUNC_ITEM)
> > > > +  {
> > > > +    Item_func *func= ((Item_func *)item);
> > > > +    if (func->argument_count() == 0)
> > > > +      return false;
> > > > +    bool result= true;
> > > > +    Item **args= func->arguments();
> > > > +    for (uint i= 0; i < func->argument_count() && result; i++)
> > > > +      result= result && is_basic_const(args[i]);
> > > > +    return result;
> > > > +  }
> > > > +  return false;
> > > > +}
> > >
> > > is_basic_const() is a misleading name, a function of basic consts isn't
> > > a basic const itself.
> >
> > This is how bar has named the item and classes. Do a grep after
> > basic_const..
>
> I know, that was not my question. See below
>
> > > what's an example of an expression has is_const() == true, but
> > > is_basic_const() == false?
> >
> > "1+1"  This is the sum of two basic constants.
>
> Yes, this is an example of the item, where item->const_item() == true
> and item->basic_const() is false. This is not what I asked.
>
> I asked for an example, where is_const(item) is true and
> is_basic_const(item) is false. The question was not about old items, but
> about your new code. Because your is_basic_const() is not just
> item->basic_const(), so I asked for an example, to understand a
> difference.

You have to ask Sanja (this is his code).
Anyway, in some part the answer to your quesiton is the same as asking when
Item::basic_const_item() is true and Item::is_const() is true, as the
functions are using
this as a base for their return value.
The function comments explains in quite a detal what is tested.

The main point with is_basic_const() is to detect functions where all
arguments are basic_consts()

like rownum < 1+2

This is true for is_basic_const.

> > > > +/**
> > > > +  Check posibility of LIMIT setting by rownum() of upper SELECT and do it
> > > > +
> > > > +  @note Ideal is to convert something like
> > > > +    SELECT ...
> > > > +      FROM (SELECT ...) table
> > > > +      WHERE rownum() < <CONSTANT>;
> > > > +  to
> > > > +    SELECT ...
> > > > +      FROM (SELECT ... LIMIT <CONSTANT>) table
> > > > +      WHERE rownum() < <CONSTANT>;
> > >
> > > 1. what if the outer select uses LIMIT, not ROWNUM?
> > >    why not to push that down too?
> >
> > We already have code for that.
>
> But then you don't need to push ROWNUM into a subquery. You have code to
> convert ROWNUM into a LIMIT, and you just said that we already have code
> to push LIMIT. So, you do not need to push ROWNUM specifically, it'll
> happen automatically, won't it?

No, as rownum is not regarded as limit in the upper code and does not
work the same way
We cannot use the same code as limit.  In other words, nothing is
happening automatically.

For example:
SELECT a,count(*) from t1 groupt by a where rownum < 100 limit 100;
We cannot push the rows to limit here as rownum works on selected rows
and limit works on result rows.
We can also not do it with order by, because of the same reason.
The only case the rownum makes sence to move to limit on the same join
group is for simple queries,
without order by, group by, joins etc.  In this case the current code
is as fast as the limit, so there is
no reason to push this either.

> > > 2. why are you not removing rownum from the outer select?
> >
> > Because it may not be safe to do in some cases. Better safe than sorry..
> >
> > >    if you remove rownum and reset with_rownum=false, it'll generally
> > >    give optimizer more freedom.
> >
> > Has no effect on our optimizer in this particular case.
>
> I'm not sure about it. ROWNUM cannot be pushed down into a subquery.
> LIMIT can be, as you said, so if you convert a ROWNUM into a LIMIT and
> remove ROWNUM from WHERE, then the whole WHERE can be pushed into a
> subquery, and a LIMIT can be too. So
>
>   SELECT * FROM (SELECT col1, col2 FROM tables WHERE cond ORDER BY something)
>   WHERE ROWNUM <=100 AND col1 = 10;
>
> can first be changed to
>
>   SELECT * FROM (SELECT col1, col2 FROM tables WHERE cond ORDER BY something)
>   WHERE col1 = 10 LIMIT 100;

That would not work if there would be an order by on the upper level
or there would be a join involved.

The code "col1 = 10 LIMIT 100" and "rownum <= 100 and col1 = 10' are
equally fast,
so there is no notable win here.

>   SELECT * FROM (SELECT col1, col2 FROM tables WHERE cond AND col1 = 10
>   ORDER BY something LIMIT 100);
>
> and for that you don't even need any special code for pushing ROWNUM
> into a subquery as a limit, you only need to convert ROWNUM to a LIMIT
> on the same level of select.

You cannot convert rownum to a limit on the same level.  The only time
you can convert rownum to a limit is when
you push it down to single inner select, which is what the current
code is doing.

We also have to consider that when pushing rownum as part of the
limit, is not the same as doing limit on the upper level or even limit
on the lower level. We also have to ensure we don't remove any
underlaying ORDER BY in
the sub query.

Still this idea has no effect on the optimizer.  The only win is the
comparison of rownum with a constant once per result row.
We also need a lot more code to cover all cases (like multiple use of
rownum etc).

I understand what you are trying to suggest. I did take a look at our
current code and almost all of it would
still be needed to handle the conversion of some rownum's to LIMIT and
to handle the general case.

> > > > +static bool check_rownum_usage(Item_func *func_item, longlong *limit,
> > > > +                               bool *inv_order, const_probe probe)
> > > > +{
> > > > +  Item *arg1, *arg2;
> > > > +  *inv_order= 0;
> > > > +  DBUG_ASSERT(func_item->argument_count() == 2);
> > > > +
> > > > +  /* 'rownum op const' or 'const op field' */
> > > > +  arg1= func_item->arguments()[0]->real_item();
> > > > +  if (arg1->type() == Item::FUNC_ITEM &&
> > > > +      ((Item_func*) arg1)->functype() == Item_func::ROWNUM_FUNC)
> > > > +  {
> > > > +    arg2= func_item->arguments()[1]->real_item();
> > > > +    if ((*probe)(arg2))
> > > > +    {
> > > > +      *limit= arg2->val_int();
> > > > +      return *limit <= 0 || (ulonglong) *limit >= HA_POS_ERROR;
> > > > +    }
> > > > +  }
> > > > +  else if ((*probe)(arg1))
> > > > +  {
> > > > +    arg2= func_item->arguments()[1]->real_item();
> > >
> > > why not to assign arg2 outside of the if() ?
> >
> > Because we should not evaluate it if probe is not true.
> > It can be expensive to evalute it or it can have side effects that
> > we may want to avoid.  For example, it could be a sub query
> > and probe is there to ensure we don't evaluate it here.
>
> Sure, but func_item->arguments()[1]->real_item() is not evaluating
> anything. I only mean arg2= line, that's identical in both branches.
> I didn't mean doing val_int() unconditionally.

Good point. Normally I don't like to use functions on items before
they are properly checked.
However in this function, we know the item is a function with 2
elements, so it is safe to calculate arg2.

The only benefit of the curent code is that we avoid the call
completely if there is no rownum function,
so there is one line more code, but the code is faster as in most
cases we don't have to
call real_item for arg2.

<cut>

> > > > +static void optimize_rownum(THD *thd, SELECT_LEX_UNIT *unit, Item *cond)
> > > > +{
> > > > +  DBUG_ENTER("optimize_rownum");
> > > > +
> > > > +  if (cond->type() == Item::COND_ITEM)
> > > > +  {
> > > > +    if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC)
> > > > +    {
> > > > +      List_iterator<Item> li(*((Item_cond*) cond)->argument_list());
> > > > +      Item *item;
> > > > +      while ((item= li++))
> > > > +        optimize_rownum(thd, unit, item);
> > > > +    }
> > > > +    DBUG_VOID_RETURN;
> > > > +  }
> > > > +
> > > > +  process_direct_rownum_comparison(thd, unit, cond, &is_const,
> > > > +                                   &temporary_set_limit);
> > >
> > > please, explain why it's is_const/temporary_set_limit here and
> > > is_basic_const/permanent_limit in optimize_upper_rownum_func().
> >
> > Please ask Sanja.
> > I think the short version is that const/temporary is for the current select
> > why is_basic_const/permanent is for pushing limit from an outer select
> > to an inner select and there is more restrictions for that.
>
> Sanja said that it's your code and if it were up to him he'd do both
> permanent.

All code related to pushing down rownum is Sanjas.

In the above function, the loop is mine, but the call to
 process_direct_rownum_comparison(thd, unit, cond, &is_const,

&temporary_set_limit);

Is Sanjas.

The whole idea of temporary set limit was his idea.
If I remember correctly,  he did only want to have LIMIT pushdown shown
in explain extend for one case, but not for another.

For example:
SELECT * from (SELECT * from a1) as tmp where rownum < 10

In this case the LIMIT is shown in show example

SELECT * from t1 where rownum < 10

In this case the limit is now shown (and we are not using a limit anway).

Regards,
Monty


Follow ups

References