← Back to team overview

maria-developers team mailing list archive

Re: Extending storage engine API for random-row extraction for histogram collection (and others)

 

Hi, Vicențiu!

On Dec 11, Vicențiu Ciorbaru wrote:
> On Tue, 11 Dec 2018 at 14:33 Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> >
> > But then I was thinking, why do you need to specify an index at all?
> > Shouldn't it be just "get me a random row"? Index or whatever -
> > that's engine implementation detail. For example, MyISAM with a
> > fixed-size rows can just read from
> > lseek(floor((file_size/row_size)*rand())*row_size).
> 
> I agree that the need for an index seems a bit much. My reasoning was
> that I wanted to allow random sampling on a particular range. This
> could help for example when one wants to collect histograms for a
> multi-distribution dataset, to get individual distributions (if the
> indexed column is able to separate them).
> 
> A more generic idea would be if one could pass some conditions for
> random row retrieval to the storage engine, but it feels like this
> would complicate storage engine implementation by quite a bit.
> 
> For the first iteration, after considering your input, I'd go with
> "init function", "get random row", "end function", without imposing an
> index, but somehow passing a (COND or similar) arg to the init
> function.

For the first iteration I'd go without a condition. You, probably
shouldn't add an API that you won't use, and in the first iteration you
won't use it, right? It can be added later when needed.

Regards,
Sergei


Follow ups

References