maria-developers team mailing list archive

Thread
Date

Re: Gsoc 2016 Mdev 371 Unique index for blob

To: Sachin Setia <sachinsetia1001@xxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Wed, 23 Mar 2016 18:47:40 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CANhFX6VEp5Scv5qSkF1D9=zTyMaBEEAWtUBTi3pUuH7n6drYNg@mail.gmail.com>
User-agent: Mutt/1.5.24 (2015-08-30)

Hi, Sachin!

On Mar 23, Sachin Setia wrote:
> Hello Sergei
> Today I made some progress related to project.
> MyISAM/ARIA
>   Got clear understanding of how to implement unique index for query like
>   create table tbl(col1 int primary key , col2 blob ,col3 blob , unique(col2,col3))
> InnoDB
>   Reading about it.Actually Sir, I want to do this project whether I will
>   select in gsoc or not(because InnoDB is amazing).

:)

> Proposal
>   Still Writing

don't miss the deadline :)

> Actually sir i have one doubt in table2myisam function definition
> 
> recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
> ^^^^^     ^   ^
> why we allocating these many number of recinfo because we only require
> share->fields + 1 .

good question :) I don't know. this line apparently goes back at least
to 2001, there is no revision history beyond that date.

It could be that it allocates twice as much columns as necessary
nowadays, even if it made sense many years ago.

> One more doubt in optimizing "select distinct coloumn_name(here it is
> a blob coloumn)  from table" query. In mi write which take one record
> and write it we check for unique constraint. It takes O(n^2) time. I

This isnt O(n^2), because hashes are stored in the index, in a b-tree.
So, it's O(n*log(n)).

> was thinking if we can optimize this by first fetching the whole table
> record and calculating hash for each record.Instead of comparing one
> hash with all other we can sort the hashes and ignore the duplicate
> (we can make an array of 0 and 1 and if it 1 that means record is not
> duplicate and for 0 it is duplicte). by doing this we can reduce the
> time complexity to O(nlog(n)).

As you see, we already have O(n*log(n)). But if we put these hashes into
a hash table in memory (instead of just sorting them), the cost will go
down to O(n). Sounds interesting :)

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx

References

Re: Gsoc 2016 Mdev 371 Unique index for blob
From: Sergei Golubchik, 2016-03-16
Re: b1a1e5b: MDEV-6058 MySQL Bug #11766693: LOG-SLOW-ADMIN-STATEMENTS AND LOG-SLOW-SLAVE-STATEMENTS NOT DISPLAYED.
From: Sachin Setia, 2016-03-18
Re: Gsoc 2016 Mdev 371 Unique index for blob
From: Sergei Golubchik, 2016-03-18
Re: Gsoc 2016 Mdev 371 Unique index for blob
From: Sachin Setia, 2016-03-22
Re: Gsoc 2016 Mdev 371 Unique index for blob
From: Sergei Golubchik, 2016-03-22
Re: Gsoc 2016 Mdev 371 Unique index for blob
From: Sachin Setia, 2016-03-23