Re: Sachin weekly report


Hi Sergei!

Actually I completed the work on update and delete. Now they will use index
for looking up records.

But I am thinking I have done a lot of changes in optimizer which may break
it , and also there are lots of queries where my
code does not work, fixing this might take a long amount of time.
I am thinking of a change in my existing code :-
Suppose a table t1
    create table t1 (a blob, b blob, c blob, unique(a,b,c));
In current code , for query like there will a KEY with only one keypart
which points to field DB_ROW_HASH_1.
It was okay for normal updates , insert and delete , but in the case of
where optimization  I have do a lot of stuff , first to match field (like
in add_key_part), then see whether all the fields in hash_str are present
in where or not, then create keys by calculating hash. I do this by
checking  the HA_UNIQUE_HASH flag in KEY , but this also makes (I think)
optimizer code
bad because of too much dependence. Also  I need to patch get_mm_parts and
get_mm_leaf function , which I think
should not be patched.

I am thinking of a another approach to this problem at server level instead
of having just one keypart we can have 1+3
keypart. Last three keypart will be for field a, b, c and first one for
DB_ROW_HASH_1 .These will be only at server level not at
storage level. key_info->key_part will point at keypart containing field a
, while key_part having field DB_ROW_HASH_1 will
-1 index. By this way I do not have to patch more of optimizer code. But
there is one problem , what should be the length of
key_part? I am thinking of it equal to field->pack_length(), this would not
work because while creating keys optimizer
calls get_key_image() (which is real data so can exceed pack_lenght() in
case of blob), so to get this work I have to patch
optimizer  where it calls  get_key_image() and see if key is HA_UNIQUE_HASH
. If yes then instead of get_key_image just use
                                     memcpy(key, field->ptr(),

this wont copy the actual data, but we do not need actual data. I will
patch handler methods like ha_index_read, ha_index_idx_read ,
basically handler methods which are related to index or range search.
In these methods i  need to calculate hash , which I can calculate from
key_ptr but key_ptr doe not have actual data(in case
 of blobs etc).So to get the date for hash , I will make a field clone of
 (a,b,c etc) but there ptr will point in key_ptr. Then
field->val_str() method will work simply and i can calculate hash. And also
I can compare returned  result with actual key in
handler method itself.
What do you think of this approach ?


On Sat, Aug 20, 2016 at 11:16 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Sachin!
> On Aug 19, Sachin Setia wrote:
> > On Fri, Aug 19, 2016 at 2:42 PM, Sergei Golubchik <serg@xxxxxxxxxxx>
> wrote:
> >
> > > First. I believe you'll need to do your final evaluation soon, and
> > > it will need to have a link to the code. Did you check google
> > > guidelines about it? Is everything clear there? Do you need help
> > > publishing your work in a format that google requires?
> > >
> > > They don't accept delays for any reasons, so even if your code is
> > > not 100% complete and ready, you'd better still publish it and
> > > submit the evaluation, because otherwise google will fail you and
> > > that'd be too sad.
> > >
> > > If you'd like you can publish the google-way only the
> > > unique-constraint part without further optimizer work. Or at least
> > > please mention that you'd completed the original project and went
> > > working on extensions.  I mean, it's better than saying "the code is
> > > not 100% complete" :)
> > >
> > Okay I am thinking of writing a blog post with a link to my github
> > repository.
> > Blog Link <http://sachin1001gsoc.blogspot.in/2016/08/gsoc-2016.html>
> > Please check this.
> I think that'll do, yes.
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx

