maria-developers team mailing list archive

Thread
Date

Re: GSoC weekly reports (Unique indexes for blobs)

To: Jan Lindström <jan.lindstrom@xxxxxxxxxxx>, maria-developers@xxxxxxxxxxxxxxxxxxx, Sergei Golubchik <serg@xxxxxxxxxxx>
From: Shubham Barai <shubhambaraiss@xxxxxxxxx>
Date: Wed, 13 Jul 2016 13:47:10 +0530
In-reply-to: <CALxAEPtRp5iwwxBPg++12K5shj3Wp-eP3FJCcJV13_Kis50Myw@mail.gmail.com>

GSoC (week 7)

Hello everyone,

After implementing actual  row comparison, there was a problem with
retrieving multiple clustered records with same hash value. The problem was
solved by putting search function of clustered record in the mini
transaction. The next thing I implemented in this week is alter table
operation. The most of alter table operations are working fine except
renaming a column.I modified the function which renames the column and name
of fields in an index containing that column in  dictionary cache. It works
fine till the server is running but after restarting the server, the error
is generated that table doesn't exist in the storage engine. After
debugging I found out that changes are not getting written to disk for the
hash index. This might be because I have shifted index->fields pointer.
    The another problem I was trying to solve was a duplicate-key error
message when the unique index is added through alter table operation and
there are already duplicate key entries in the table.  The error message
contained null values at first but I have solved the problem now and it
displays the correct error message.

Following operations from alter table are working.
1.alter table add column.
2.alter table drop column(if the column was present in hash index, hash
value is recalculated)
3.alter table add index/ drop index.(table is rebuilt if the hash index is
dropped as hash column in the table has to be dropped also)
4.alter ignore table add index.

On 5 July 2016 at 00:33, Shubham Barai <shubhambaraiss@xxxxxxxxx> wrote:

> GSoC (week 6)
>
> Hello everyone,
>
> 1. defined some new functions to get the clustered record from secondary
> index record and extract its key value to compare with the secondary index
> key.It works for single clustered record. Currently trying to solve
> the problem with multiple records with same hash value.
>
> 2.implemented some new functions for the update operation.
>
>     2.1 a function which checks if hash columns in a table need to be
> updated.
>     2.2 a  function to add hash fields in update vector.
>
> 3.When updating a row, sql layer calls index_read function for faster
> retrieval of a row if the column used in where clause is one of the keys or
> a part of the key. So I modified index_read function to convert mysql
> search key in innobase format and then create a new search key with a hash
> value.As hash index stores only hash value, it will be only possible to
> search a row with a hash index if all of the key parts are present in
> search key.
>
> current branch for InnoDB : temp
>
> On 27 June 2016 at 19:23, Shubham Barai <shubhambaraiss@xxxxxxxxx> wrote:
>
>> GSoC (week 5)
>>
>>
>> Hello everyone,
>>
>> Here is the list of things I have done in the 5th week of GSoC.
>>
>> 1.implemented unique key violation with a hash collision. (actual row
>> comparison is remaining ).
>>
>> 2.modified hash function for data types like varchar and binary data
>> types.
>>
>> 3.fixed a bug which was causing a server to crash for complex unique keys.
>>
>> 4.added support to allow any number of nulls which will not cause any
>> unique key violation.
>>
>> 5.added test cases for above features.
>>
>> On 22 June 2016 at 16:36, Shubham Barai <shubhambaraiss@xxxxxxxxx> wrote:
>>
>>> Hi,
>>>
>>> can we discuss on IRC first?
>>>
>>>
>>> Regards,
>>> Shubham
>>>
>>> On 22 June 2016 at 13:21, Jan Lindström <jan.lindstrom@xxxxxxxxxxx>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Please commit and push these changes to your git branch I have not yet
>>>> seen them, in my opinion as this is only a working branch you can push
>>>> often. I still fail to see any test cases on InnoDB branch, do you have
>>>> more than one branch and if you have why ? Depending on extent of these
>>>> changes my estimate is that you are behind schedule to complete project in
>>>> time. Based on your progress report you are still missing update and delete
>>>> and redo-logging. For alter table you should start from forcing copy-method
>>>> and then if time permits develop on-line method. This naturally only after
>>>> everything else
>>>> has been completed and tested.
>>>>
>>>> R: Jan
>>>>
>>>> On Tue, Jun 21, 2016 at 11:06 PM, Shubham Barai <
>>>> shubhambaraiss@xxxxxxxxx> wrote:
>>>>
>>>>> GSoC (week 4)
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>> After working on create table operation,next thing I had to work on
>>>>> was insert operations.So I explored some of the functions like
>>>>> row_ins_scan_index_for_duplicates, btr_pcur_get_rec to get clear
>>>>> understanding about how to implement duplicate search on hash index.
>>>>> There was a problem in hash function that I wrote .It would calculate
>>>>> same hash value for two different keys if the prefix length of blob key
>>>>> part was zero. Now it seems to be working after I checked it in debugger.I
>>>>> still have to modify it for data types like varchar etc.
>>>>> I have added test cases for insert operations in myisam.
>>>>> In MyIsam, I found one problem in update operation. When updating a
>>>>> row,if the key is conflicting then server crashes because some pointer goes
>>>>> invalid in compare_record. I haven't fixed this issue yet.
>>>>>
>>>>> I also modified some functions in dict0load.cc to  adjust some members
>>>>> of dict_index_t for a new index type.The main problem is that index entry
>>>>> for hash based index cointains only two fields(hash value and row id) while
>>>>> dict_index_t  contains hash field and other user defined fields which are
>>>>> used to calculate hash value.Some of the operations like alter table( e.g.
>>>>> rename column) needs to get access to all fields while other functions like
>>>>> rec_get_offsets and row_build_index_entry_low needs to get access to only
>>>>> hash field and row id. I am still working on this to find efficient
>>>>> solution to this problem.
>>>>>
>>>>> On 16 June 2016 at 23:29, Sergei Golubchik <vuvova@xxxxxxxxx> wrote:
>>>>>
>>>>>> Hi, Shubham!
>>>>>>
>>>>>> What I wanted to say on IRC was:
>>>>>>
>>>>>> here's what the comment of cmp_dtuple_rec_with_match_low() says:
>>>>>>
>>>>>>   ...............   If rec has an externally stored field we do not
>>>>>>   compare it but return with value 0 if such a comparison should be
>>>>>>   made.
>>>>>>
>>>>>> Note that blobs are externally stored fields in InnoDB, so, I think,
>>>>>> this means that you cannot use cmp_dtuple_rec() to compare blobs.
>>>>>>
>>>>>> Regards,
>>>>>> Sergei
>>>>>> Chief Architect MariaDB
>>>>>> and security@xxxxxxxxxxx
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~maria-developers
>>>>> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~maria-developers
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>
>>
>

Follow ups

Re: GSoC weekly reports (Unique indexes for blobs)
From: Shubham Barai, 2016-07-20

References

GSoC weekly reports (Unique indexes for blobs)
From: Shubham Barai, 2016-05-30
Re: GSoC weekly reports (Unique indexes for blobs)
From: Shubham Barai, 2016-06-21
Re: GSoC weekly reports (Unique indexes for blobs)
From: Jan Lindström, 2016-06-22
Re: GSoC weekly reports (Unique indexes for blobs)
From: Shubham Barai, 2016-06-27
Re: GSoC weekly reports (Unique indexes for blobs)
From: Shubham Barai, 2016-07-04