← Back to team overview

maria-developers team mailing list archive

GSoC 2015: Indexes on Virtual Columns and Array Based UDFs

 

Dear mentors,

I am currently pursuing Master's in University of Illinois at Urbana
Champaign, USA and completed by B.Tech from Indian Institute of Technology
- Delhi (IIT- Delhi)

This is regarding GSoC 2015. I am really interested in databases, and I was
very excited to see all these projects listed here. The exciting part was
that some of the projects are really “hard” as in they have challenged the
database community since a long time, and thus it would be very interesting
to solve some of these challenges as part of GSoC.


I want to discuss 2 projects:

A. Indexes on virtual columns

Materialization gives us two things:

1. A name to the column which we can use in queries
2. A formal "regular" column which is stored and indexed in the regular
fashion - Disadvantage: Extra memory requirements for the materialized
column.

My initial thoughts on this project are the following:

We do need the name of the column which can be used to query. So maybe we
can expose a command such as:

create virtual_index <name> on <column_name> <expression>

What this would do would run a regular query which evaluates expressions
(like in WHERE clause) and the feed the result into the indexer. This index
can then be stored in the regular fashion.


B. Having UDFs returning an array/set

There are three approaches that I can think of:

1. Supporting array/set as native datatype inside MariaDB (like int64,
double, etc) - This might be hard and touches all levels of stack.

2. Have the array/set pass in serialized form to the above node of query
execution and have appropriate deserializer when we want to interpret the
result - Coming up with ser/deser strategy might be tough and this would be
expensive too.

3. The query execution would be in a Tree structure where each node must be
exposing functions like init(), next(), read(int col_index), etc. Maybe we
can use this to emulate the evaluation of UDF against row. I think this is
the suggestion that is listed in the project. I would like to get some
direction on this from the mentors.

I would like to discuss these and then decide on one of them. Am I
approaching this in the right direction? Can you please point me to the
next steps?

Thanks
Richa

Follow ups