← Back to team overview

maria-developers team mailing list archive

Re: InnoDB blob for primary key

 

Hello Sergei
Sorry I did not see your mail. Actually i was thinking something like
this before implementing the prototype but if i am more closer to
innodb the more performance i will i get. I will definitively think
about it.
Regards
sachin

On Wed, Apr 13, 2016 at 9:17 PM, Sachin Setia <sachinsetia1001@xxxxxxxxx> wrote:
> Hello Jan!
>
> Actually I was going through the problems raised by you
>
> 1>Firstly, existing tables will not have that column
>
> Yes Sir, you are right that existing table woudn't have DB_BLOB_HASH
> column, but this is fine because they indexed there blob column then
> they must have provided blob length, and if so then length of key_part
> would not be zero and so need to make changes in patch to detect this.
> And also by doing I can also eliminate another problem what if existing
>  table have a coloumn  named 'DB_BLOB_HASH'
>
> 2>secondly it will effect e.g maximum row size
> For this i can  increase reclength by 4 in pack_header function(but i
> am not sure whether this will work or not)
>
> 3>and external tools like backup
> Here, I am assuming that by backup you mean mysqldump actually i tried
> this by create a table with blob coloum (length was given ) and than
> changing mariadb binary(to changed code) and .sql file of backup was
> working fine
>
> But I think there are more problems with this this prototype
> 1>Existing table with coloumn named 'DB_BLOB_HASH' .We can solve this
> as written above but some code in patch compares first coloumn name
> with 'DB_BLOB_HASH' so this will create problem Sir, can we add a flag
> in dict_table_t->flag so that we can reliable check for blob hash coloumn
>
> 2>This is very bad problem , what happen if if data is different and crc
> hash of data is same.And this will occur when we have medium amount of
> data like 100 thousand records.To avoid this we can do like
>
> 2.1> use crc 64 reduces probability
> 2.2> use two different algorithm for hash and treat these two algoriths
> hash for uniquesness.Make  collusion probability very very less
>
> But the main problem is what would happen if collusion occurs to solve
> this i have two ideas
> 2.3> like 2.2 but instead of using two algorithm use one and add column
> for number of times this hash is repeated collusion frequency will be zero
> i think this will work fine but need to test it
> 2.4> whole prototype is crap start again with some different method. Can you
> please suggest some differnt way of doing this
> Regards
> sachin
>
> On Tue, Apr 12, 2016 at 4:52 PM, Jan Lindström
> <jan.lindstrom@xxxxxxxxxxx> wrote:
>> Hi,
>>
>> Sachin, naturally you may continue, this is open source, please read
>> https://mariadb.com/kb/en/mariadb/community-contributing-to-the-mariadb-project/
>> From InnoDB point of view there is issue if you add a new system generated
>> column to the row. Firstly, existing tables will not have that column and
>> secondly it will effect e.g maximum row size and external tools like backup.
>>
>> R: Jan
>>
>> On Mon, Apr 11, 2016 at 12:33 PM, Sachin Setia <sachinsetia1001@xxxxxxxxx>
>> wrote:
>>>
>>> Hello Developers,
>>> Hi this is sachin.Actually i was currently experimenting with with
>>> blob uniques in innodb
>>> there is three main problems
>>> 1.Unique blob
>>> 2.Blob Forigen key
>>> 3.Blob primary key
>>>
>>> 1. For blob unique we can simply store hash in unclustered index
>>> 2. Blob Forigen key i am currently working on it
>>> 3. Blob primary key :- for this i thought we create a 4 byte column
>>> which stores the hash of blob primary key.Internally this column will work
>>> as
>>> primary key and key for clustered index. I already successufully tested
>>> this
>>> here is my output
>>>
>>> MariaDB [sachin]> create table t4 (abc blob primary key);
>>> Query OK, 0 rows affected (0.10 sec)
>>>
>>> MariaDB [sachin]> insert into t4 values('sachin');
>>> Query OK, 1 row affected (0.01 sec)
>>>
>>> MariaDB [sachin]> insert into t4 values('sachin');
>>> ERROR 1062 (23000): Duplicate entry 'sachin' for key 'PRIMARY'
>>> MariaDB [sachin]> insert into t4 values('sachin setiya');
>>> Query OK, 1 row affected (0.00 sec)
>>>
>>> MariaDB [sachin]> insert into t4 values('sachin setiya');
>>> ERROR 1062 (23000): Duplicate entry 'sachin setiya' for key 'PRIMARY'
>>>
>>> MariaDB [sachin]> desc t4;
>>> +-------+------+------+-----+---------+-------+
>>> | Field | Type | Null | Key | Default | Extra |
>>> +-------+------+------+-----+---------+-------+
>>> | abc   | blob | NO   | PRI | NULL    |       |
>>> +-------+------+------+-----+---------+-------+
>>> 1 row in set (0.01 sec)
>>>
>>> MariaDB [sachin]> select * from t4;
>>> +---------------+
>>> | abc           |
>>> +---------------+
>>> | sachin        |
>>> | sachin setiya |
>>> +---------------+
>>> 2 rows in set (0.01 sec)
>>>
>>> @Sergei hi! Actually i invested arround 2 months in mariadb So for me now
>>> it  does not matter either i got selected in gsoc i want to do work in
>>> innodb blob unique from today.Please sir allow me to do this
>>>
>>> I am including the patch file and t4.ibd and t4.frm file
>>> Regards
>>> sachin
>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~maria-developers
>>> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~maria-developers
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>


References