← Back to team overview

maria-discuss team mailing list archive

Re: The future of Dynamic Columns

 

On 12/26/14 I posted here I posted the question to this mailing list:



*Dynamic columns support in libmaria client library?*

*This page
https://mariadb.com/kb/en/mariadb/documentation/nosql/dynamic-columns-api/
<https://mariadb.com/kb/en/mariadb/documentation/nosql/dynamic-columns-api/>
does not mention anything about it. it  just says*

*The API is a part of libmysql C client library. In order to use it, one
needs to include this header file*
*#include <mysql/ma_dyncol.h>*
*and link against libmysql.*

*Now can ma_dyncol.h be linked against libmaria as well?*

*There are 2 questions actually*
*1) technical: will it work with libmaria? *
*2) legal: can a client program using libmaria under LGPL license include
ma_dyncol.h?*




.. no reply till yet. Very bad!  :-(  :-(



-- Peter
-- Webyog

On Tue, Jan 27, 2015 at 5:44 PM, Roberto Spadim <roberto@xxxxxxxxxxxxx>
wrote:

> i use dynamic columns without problem, it don't have a good future cause
> it's only a lot of function to encode/decode data, in other words, the
> future is the today present + some new features to easly handle data or
> optimize space or optimize cpu/memory use
>
> make sure you are using a good encode system, dynamic columns is a encode
> system, like json, like php encode/decode, like base64 encode/decode, but
> with some pros/cons
>
> when using a encode system, check if cpu and data size used is ok to your
> problem, for example a "1024" integer, can be saved with a 4 bytes string,
> or a 2 bytes integer, if you have 1.000.000 rows, you can save 2.000.000
> bytes (allowing a better cpu/memory use), think about encoding a 64bits
> integer with string, and with a bigint column, use it with 1M rows....
>
> the problem of encoding data, isn't only encode, the problem is how
> database search this encoded values
> we don't have index over functions and optimizer don't rewrites virtual
> column and functions (yet), that's probably something that document based
> databases do very well something like ~ search inside documents
>
> ---
> some ideas....
>
> 1) specific index idea
> we could have a dynamic columns index, instead of a btree index, something
> like full text indexes do...
>
> 2) general idea
> if we could have a "index(  some_function(field)   )", and could optimize
> "some_function(field) ='xxxx'" using the index, this could solve many
> problems of dynamic column and encoding data at database, with more
> complexity we could include some index optimizations to functions and
> "rewrite" the queries
>
> examples....
> 2.1) the optimize should be inteligent to rewrite virtual columns using
> the same function
>
> we have a virtual_column_indexed = substring(column,1,20), let's execute:
>
> WHERE substring(column,1,20)='1234',  <- full table scan + execute a
> function each row (is substring cpu/memory intensive?)
>
> optimizer should rewrite to
>
> WHERE virtual_column_indexed = '1234'  <- using index without executing
> the substring function each row (less cpu/memory used)
>
> or if cpu time expent with substring function is very high and we have a
> virtual_column_not_index=substring(column,1,20), optimizer could rewrite to
>
> WHERE virtual_column_not_indexed = '1234'  <- ful table scan - without
> executing SUBSTRING function each row
>
>
> 2.2) the functions should be indexed (only deterministic functions)
> example
> when we create a index over a function, we don`t need a new column, think
> about myisam storage... we don't need data at MYD file, we only need data
> at MYI file
>
> index functions ~= "indexed hidden virtual columns"
> when we execute
>
> WHERE substring(column,1,20)='1234' <- full table scan + cpu/memory use of
> substring function for each row
>
> optimizer should rewrite to
>
> WHERE internal_hidden_column = '1234' <- internal index
>
>
> 2.3) the functions could explain about others rewrite tips
> example...
> when we execute
> WHERE substring(any_column,1,20)='1234'
>
> we can rewrite it with some understand of substring function
> (value,start,end)
>
> we know that it return the same string, but starting at start position,
> and with a possible length of end-start, in other words, if we want to
> 'optimize' (or not) this function, we could rewrite to
>
> WHERE any_column LIKE '1234%' AND   <-- (1) possible a column using index?
> length(any_column)<=20 AND <-- (2) length is 'faster' than compare strings
> substring(any_column,1,20)='1234' <-- (3) include the function to execute
> the last check if (1) and (2) are true
>
> check if we have many negative cases to (3), we can optimize some searchs
> with (1) and (2)
> if any_column is a index column, we have a optimization at LIKE operator
> (really fast)
> but if we have many positive cases to (1,2) we just include more cpu time
> to execute two functions, in this case only substring could work faster
> (cause it use less cpu than 3 functions)
>
>
> the same for
> WHERE substring(any_column,5,20)='1234'
>
> could be rewrite to
>
> WHERE any_column LIKE '_____1234%' AND length(any_column)<=20 AND
> substring(any_column,5,20)='1234'
>
> 2.4)if we have a very cpu intensive function, the SELECT part could be
> optimized too
>
> SELECT EXPENSIVE_FUNCTION()
> FROM table
> WHERE single where
>
> today we can optimize it with virtual columns...
> virtaul column = EXPENSIVE_FUNCTION()
> and instead of SELECT ENPENSIVE_FUNCTION we write SELECT virtual_column
>
> but since optimizer is inteligent, why it can't rewrite the query it self?
>
> ----
>
> check that's not a limit of dynamic columns, only a limit of
> optimizer/index and rewrite function to search faster when using functions
>
> i don't know if we could include others ideas to tune/optimize database
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>

References