← Back to team overview

maria-developers team mailing list archive

Re: GSoC 2015: Indexes on Virtual Columns and Array Based UDFs

 

Thanks Sergei for the explanation. The modified project for MDEV-5199
(Table Functions) now makes sense - as in the proposals mentioned in the
ticket as possible solutions now looks appropriate :)

I am looking into the index of virtual columns. One question: can we submit
proposals for 2 projects at MariaDB?

Now as for Table Functions:

As mentioned there are two ways:
1. Materializing the table - this has a disadvantage for extra space.
2. Treating the UDF result as an "abstract table". For eg, the code might
have a base class called Table. Then we can create a derived class called
UDFTable and pass this instance in the regular query processing. This class
would implement all functionalities like Init(), Next(), Read(int
col_index), Seek(), etc
Consider:
 Select A where A.x > 10 FROM UDF_TABLE

In this case, we will first construct the table UDF_TABLE through regular
query - evaluation of the UDF. Then we can create an instance of UDFTable
from the table and run normal query processing on this.

Is this the right approach? Can you send me the next steps?

Thanks
Richa

On Fri, Mar 13, 2015 at 4:23 AM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Richa!
>
> On Mar 13, Richa Sehgal wrote:
> > Dear mentors,
> >
> > I am currently pursuing Master's in University of Illinois at Urbana
> > Champaign, USA and completed by B.Tech from Indian Institute of
> Technology
> > - Delhi (IIT- Delhi)
> >
> > This is regarding GSoC 2015. I am really interested in databases, and I
> was
> > very excited to see all these projects listed here. The exciting part was
> > that some of the projects are really “hard” as in they have challenged
> the
> > database community since a long time, and thus it would be very
> interesting
> > to solve some of these challenges as part of GSoC.
> >
> > I want to discuss 2 projects:
> >
> > A. Indexes on virtual columns
> >
> > Materialization gives us two things:
> >
> > 1. A name to the column which we can use in queries
>
> Not quite. Even non-materialized columns always have names.
> See https://mariadb.com/kb/en/mariadb/virtual-computed-columns/
>
> > 2. A formal "regular" column which is stored and indexed in the regular
> > fashion - Disadvantage: Extra memory requirements for the materialized
> > column.
>
> Exactly.
>
> > My initial thoughts on this project are the following:
> >
> > We do need the name of the column which can be used to query. So maybe we
> > can expose a command such as:
> >
> > create virtual_index <name> on <column_name> <expression>
> >
> > What this would do would run a regular query which evaluates expressions
> > (like in WHERE clause) and the feed the result into the indexer. This
> index
> > can then be stored in the regular fashion.
>
> Yes, in a sense.
>
> The update_virtual_fields() function calculates values of virtual
> columns (e.g. if one does SELECT * from table_with_virtual columns).
>
> In MyISAM, _mi_make_key() creates a key from the row image.
>
> Pure virtual columns are not in the row image, so they cannot be indexed
> that way.
>
> So, in a very simplified way, this project is about calling
> update_virtual_fields() from _mi_make_key().
>
> > B. Having UDFs returning an array/set
> >
> > There are three approaches that I can think of:
> ...
> > I would like to discuss these and then decide on one of them. Am I
> > approaching this in the right direction? Can you please point me to
> > the next steps?
>
> I'm afraid, the project was badly formulated.
>
> I've rewritten it now to highlight what it is really about.
> It is not about arrays, the important part is a "result set".
> See MDEV-5199
>
> Regards,
> Sergei
>
>

Follow ups

References