maria-discuss team mailing list archive

Thread
Date
Re: The future of Dynamic Columns

To: Tom Worster <fsb@xxxxxxxxxx>
From: Roberto Spadim <roberto@xxxxxxxxxxxxx>
Date: Tue, 27 Jan 2015 14:44:42 -0200
Cc: maria-discuss email list <maria-discuss@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <D0ED1B65.5CA18%fsb@thefsb.org>
i use dynamic columns without problem, it don't have a good future cause
it's only a lot of function to encode/decode data, in other words, the
future is the today present + some new features to easly handle data or
optimize space or optimize cpu/memory use

make sure you are using a good encode system, dynamic columns is a encode
system, like json, like php encode/decode, like base64 encode/decode, but
with some pros/cons

when using a encode system, check if cpu and data size used is ok to your
problem, for example a "1024" integer, can be saved with a 4 bytes string,
or a 2 bytes integer, if you have 1.000.000 rows, you can save 2.000.000
bytes (allowing a better cpu/memory use), think about encoding a 64bits
integer with string, and with a bigint column, use it with 1M rows....

the problem of encoding data, isn't only encode, the problem is how
database search this encoded values
we don't have index over functions and optimizer don't rewrites virtual
column and functions (yet), that's probably something that document based
databases do very well something like ~ search inside documents

---
some ideas....

1) specific index idea
we could have a dynamic columns index, instead of a btree index, something
like full text indexes do...

2) general idea
if we could have a "index(  some_function(field)   )", and could optimize
"some_function(field) ='xxxx'" using the index, this could solve many
problems of dynamic column and encoding data at database, with more
complexity we could include some index optimizations to functions and
"rewrite" the queries

examples....
2.1) the optimize should be inteligent to rewrite virtual columns using the
same function

we have a virtual_column_indexed = substring(column,1,20), let's execute:

WHERE substring(column,1,20)='1234',  <- full table scan + execute a
function each row (is substring cpu/memory intensive?)

optimizer should rewrite to

WHERE virtual_column_indexed = '1234'  <- using index without executing the
substring function each row (less cpu/memory used)

or if cpu time expent with substring function is very high and we have a
virtual_column_not_index=substring(column,1,20), optimizer could rewrite to

WHERE virtual_column_not_indexed = '1234'  <- ful table scan - without
executing SUBSTRING function each row


2.2) the functions should be indexed (only deterministic functions)
example
when we create a index over a function, we don`t need a new column, think
about myisam storage... we don't need data at MYD file, we only need data
at MYI file

index functions ~= "indexed hidden virtual columns"
when we execute

WHERE substring(column,1,20)='1234' <- full table scan + cpu/memory use of
substring function for each row

optimizer should rewrite to

WHERE internal_hidden_column = '1234' <- internal index


2.3) the functions could explain about others rewrite tips
example...
when we execute
WHERE substring(any_column,1,20)='1234'

we can rewrite it with some understand of substring function
(value,start,end)

we know that it return the same string, but starting at start position, and
with a possible length of end-start, in other words, if we want to
'optimize' (or not) this function, we could rewrite to

WHERE any_column LIKE '1234%' AND   <-- (1) possible a column using index?
length(any_column)<=20 AND <-- (2) length is 'faster' than compare strings
substring(any_column,1,20)='1234' <-- (3) include the function to execute
the last check if (1) and (2) are true

check if we have many negative cases to (3), we can optimize some searchs
with (1) and (2)
if any_column is a index column, we have a optimization at LIKE operator
(really fast)
but if we have many positive cases to (1,2) we just include more cpu time
to execute two functions, in this case only substring could work faster
(cause it use less cpu than 3 functions)


the same for
WHERE substring(any_column,5,20)='1234'

could be rewrite to

WHERE any_column LIKE '_____1234%' AND length(any_column)<=20 AND
substring(any_column,5,20)='1234'

2.4)if we have a very cpu intensive function, the SELECT part could be
optimized too

SELECT EXPENSIVE_FUNCTION()
FROM table
WHERE single where

today we can optimize it with virtual columns...
virtaul column = EXPENSIVE_FUNCTION()
and instead of SELECT ENPENSIVE_FUNCTION we write SELECT virtual_column

but since optimizer is inteligent, why it can't rewrite the query it self?

----

check that's not a limit of dynamic columns, only a limit of
optimizer/index and rewrite function to search faster when using functions

i don't know if we could include others ideas to tune/optimize database
Follow ups

Re: The future of Dynamic Columns
From: Peter Laursen, 2015-01-27
References

The future of Dynamic Columns
From: Tom Worster, 2015-01-27