← Back to team overview

maria-developers team mailing list archive

Re: Extending storage engine API for random-row extraction for histogram collection (and others)


Hi, Vicențiu!

On Dec 11, Vicențiu Ciorbaru wrote:
> On Tue, 11 Dec 2018 at 14:33 Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> >
> > But then I was thinking, why do you need to specify an index at all?
> > Shouldn't it be just "get me a random row"? Index or whatever -
> > that's engine implementation detail. For example, MyISAM with a
> > fixed-size rows can just read from
> > lseek(floor((file_size/row_size)*rand())*row_size).
> I agree that the need for an index seems a bit much. My reasoning was
> that I wanted to allow random sampling on a particular range. This
> could help for example when one wants to collect histograms for a
> multi-distribution dataset, to get individual distributions (if the
> indexed column is able to separate them).
> A more generic idea would be if one could pass some conditions for
> random row retrieval to the storage engine, but it feels like this
> would complicate storage engine implementation by quite a bit.
> For the first iteration, after considering your input, I'd go with
> "init function", "get random row", "end function", without imposing an
> index, but somehow passing a (COND or similar) arg to the init
> function.

For the first iteration I'd go without a condition. You, probably
shouldn't add an API that you won't use, and in the first iteration you
won't use it, right? It can be added later when needed.


Follow ups