maria-developers team mailing list archive

Thread
Date

Re: Sachin weekly report

To: Sachin Setia <sachinsetia1001@xxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Fri, 19 Aug 2016 11:12:12 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CANhFX6U-MkMdT11s1Ac-Xx0=ch6_tTk7Vx9xBxjV-y=hFQ4sCg@mail.gmail.com>
User-agent: Mutt/1.5.24 (2015-08-30)

Hi, Sachin!

First. I believe you'll need to do your final evaluation soon, and it
will need to have a link to the code. Did you check google guidelines
about it? Is everything clear there? Do you need help publishing your
work in a format that google requires?

They don't accept delays for any reasons, so even if your code is not
100% complete and ready, you'd better still publish it and submit the
evaluation, because otherwise google will fail you and that'd be too sad.

If you'd like you can publish the google-way only the unique-constraint
part without further optimizer work. Or at least please mention that
you'd completed the original project and went working on extensions.
I mean, it's better than saying "the code is not 100% complete" :)

On Aug 18, Sachin Setia wrote:
> Hello Sergei!
> I am stuck at one problem
> consider table t1(a blob , b blob , unique(a,b));
> although select * from t1 where a= 12 and b= 23 works
> but consider the case like select * from t1 where ( a= 12  or a=45 ) and
> (b= 23 or b=45 ) does not works
> and also update and delete using long index does not work
> simple query like
> delete/ update table t1 where a=1 and b=3;
> does not works

"does not work" means "return wrong results"? Or "does not use the index
but results are correct" ?

> The reason is that because these query uses test_quick_select function
> which does not recognize hash index basically in get_mm_parts it always
> return false because it compare db_row_hash_1 to a and b i solved this
> problem
> but now the problems
> 1. first how to assure we have both column a, and b is in where currently i
> do this as crawl through tree->key[i]->next_key_part untill
> next_key_part is null this works for simple case like
> a= 1 and b=2 but i am not sure how will i do this in
> conditions like
> ((a =  1 or a =34) and (b=34 or b=33)) or (b=34)
> ^^^^^^^^^^^^^^^^^^^^^^^
> in this condition before or past it is okay to use hash but for b=34 we
> should not use hash index but do not know how to do

Range optimizer can do it. With normal indexes, for example, it needs to
convert that query into four "ranges" (I use quotes, because these are
degenerate ranges of one value only, but they're still ranges
internally):

  (1,33), (1,34), (34,44), (34, 34)

For every range the optimizer will call handler::records_in_range() to
do its cost estimations. If you'd run a test, like

  create table t1 (a int, b int, index(a,b), c int);
  insert t1 values (1,2,3),(3,4,5),(5,6,7);
  explain select * from t1 force index (a) where (a=1 or a=3) and (b=2 or b=4);

and put a breakpoint on ha_myisam::records_in_range(), you'll be able to
find where the SEL_ARG tree is converted to ranges.

check_quick_select() sets

 RANGE_SEQ_IF seq_if = {NULL, sel_arg_range_seq_init, sel_arg_range_seq_next, 0, 0};

and later in handler::multi_range_read_info_const() these two functions
(seq_if.init() and seq_if.next()) are used to iterate the tree and
create ranges. The main work is done in sel_arg_range_seq_next().

But please try not to copy-paste it.

[ Side note: could you please try to use more punctuation in your
emails?  :) It's a bit difficult to understand what you mean when
sentences get long. ]

> 2. when SEL_TREE is evaluated ?

I believe I've answered that above :)
handler::multi_range_read_info_const() converts the tree into two (min
and max) key images.

> 3. where should i evaluate hash i can do that but main problem is that i
> need to preserve original data  like a=1 and b=3
> etc  because hash does not guaranty same.

Hmm... You can compute the hash as above, where the key image is
created. Original data... SQL layer does not always trust the engine,
in some cases it evaluates the WHERE expression after getting the row
from the engine. I think (not sure) that if you keep the a=1 and b=3
part in the COND then the upper layer will evaluate it normally, like

  if (!cond || cond->val_int() != 0)

so you just need to make sure that
 1. a=1 and b=3 is not removed from COND
 2. when cond->val_int() is 0, upper layer does *not* treat it
    as reaching the end of the index range and does *not* stop
    the loop of index_next() calls. (for a normal index, if you search
    for a=5 and do index_next, as soon as you found a!=5 you can stop
    searching).

> 4 there is one more problem  in test_quick_select
> 
> there is one code
> 
> if ((range_trp= get_key_scans_params(&param, tree, FALSE, TRUE,
>                                            best_read_time)))
> 
> in the case of hash key before reaching this function we must have to
> calculate hash after this it will work but i guess upto this point our
> tree is not evaluated , i do not know

Sorry, I didn't quite understand that :(

> please review branch
> https://github.com/SachinSetiya/server/tree/unique_index_where_up_de
> and tell me what should i do and also review the orignal branch
> unique_index_sachin  i added more test cases and reinsert also works

I won't be able to do that until next week. I've already started
reviewing a big diff with your commits, if I'll recreate it now with
more commits, I'll need to start over. I'd better finish reviewing what
I have now (up to commit 03e29c6) and then look at newer changes.

> On Sun, Aug 14, 2016 at 1:46 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> >
> > > >> +    cs= str->charset();
> > > >> +    uchar l[4];
> > > >> +    int4store(l,str->length());
> > > >> +    cs->coll->hash_sort(cs,l,sizeof(l), &nr1, &nr2);
> >
> > You use
> >
> >   cs->coll->hash_sort(cs,l,sizeof(l), &nr1, &nr2);
> >
> > to sort the length, the byte value of str->length().
> > This is binary data, you should have
> >
> >   cs= my_charset_binary;
> i do not find any variable name my_charset_binary but i used
> 
> my_charset_utf8_bin is it ok ?

No, utf8 charset uses a concept of "chracter" and "letter", it still
compares characters and expects the string to be a valid utf8 string.

You have just four bytes, they can contain anything (invalid utf8 too).
And must be compated as binary data.

The correct name is my_charset_bin, sorry. You can grep for it, it's
used everywhere.

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx

References

Re: Sachin weekly report
From: Sergei Golubchik, 2016-06-14
Re: Sachin weekly report
From: Sachin Setia, 2016-06-20
Re: Sachin weekly report
From: Sachin Setia, 2016-06-26
Re: Sachin weekly report
From: Sachin Setia, 2016-07-05
Re: Sachin weekly report
From: Sergei Golubchik, 2016-07-11
Re: Sachin weekly report
From: sachin setiya, 2016-08-13
Re: Sachin weekly report
From: Sergei Golubchik, 2016-08-14
Re: Sachin weekly report
From: Sachin Setia, 2016-08-18