← Back to team overview

maria-developers team mailing list archive

Re: Gsoc 2016 Mdev 371 Unique index for blob

 

Hello Sergei
Today I made some progress related to project.
MyISAM/ARIA
Got clear understanding of how to implement unique index for query like
create table tbl(col1 int primary key , col2 blob ,col3 blob ,
unique(col2,col3))
InnoDB
Reading about it.Actually Sir, I want to do this project whether I will
select in
gsoc or not(because InnoDB is amazing).
Proposal
Still Writing

Actually sir i have one doubt in table2myisam function definition

recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
^^^^^     ^   ^
why we allocating these many number of recinfo because we only require
share->fields + 1 .

One more doubt in optimizing "select distinct coloumn_name(here it is a
blob coloumn)  from table"
query. In mi write which take one record and write it we check for unique
constraint. It takes O(n^2)
time. I was thinking if we can optimize this by first fetching the whole
table record and calculating hash for
each record.Instead of comparing one hash with all other we can sort the
hashes and ignore the duplicate
(we can make an array of 0 and 1 and if it 1 that means record is not
duplicate and for 0 it is duplicte)
.buy doing this we can reduce the time complexity to O(nlog(n)).This will
work fast if we have enough buffer_storage
in case of low buffer memory this will turn to tradeoff between cpu and i/o
requests because in order to sort keys
in low ram we need to use m way merge sort which ultimately result in more
I/O because we have to send back records to
hard disk which we can not store in ram and then once again fetch unique
record for storing in tmp table.But we can get
performance if records fit in ram .For caching the records  we can do it
over here
sql/sql_select.cc
  18313 error= info->read_record(info);
 Regards
 sachin

On Wed, Mar 23, 2016 at 12:06 AM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Sachin!
>
> On Mar 22, Sachin Setia wrote:
> > Hello Sergei
> > Actually I was prototyping for blob and varchar   for aria and myisam
> > storage engine.
> > My prototype worked for complex definations like
> > craete table(abc int primary key, blob_col blob unique, varchar_col
> > varchar(1000) unique) engine=myisam;
> > Solved the issue of frm file incosistance.
> >
> > As you suggested for doing it for innodb i am current working on
> it.Innodb
>
> *I* did not suggest that. But you're welcome to try, of course.
> If you think that just MyISAM is too simple for a three month project.
> (Aria doesn't count it's a couple of days after you done MyISAM).
>
> > does not natively support hash based index.
> > when we run select distinct column from tbl;
> > it use create_internal_tmp_table() which uses maria storage engine for
> > creating tmp table.
> > But query like this works
> > MariaDB [sachin]> create table iu2(abc blob unique);
> > Query OK, 0 rows affected (0.04 sec)
> >
> > MariaDB [sachin]> insert into iu2 values(1);
> > Query OK, 1 row affected (0.03 sec)
> >
> > MariaDB [sachin]> insert into iu2 values(1);
> > ERROR 1062 (23000): Duplicate entry '1' for key 'abc'
> > this query does not use hash but it simply compares values
>
> Interesting.
>
> > Will write a proposal shortly.
>
> Okay.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
>

Follow ups

References