← Back to team overview

maria-discuss team mailing list archive

Re: The future of Dynamic Columns

 

2015-01-29 12:34 GMT-02:00 Tom Worster <fsb@xxxxxxxxxx>:

> I have one final technical question. It is for my understanding and does
> not reflect a problem I have as a user.
>
> The manual says:
>
> > The answer is: SQL is a statically-typed language. The SQL interpreter
> needs to know the datatypes of all expressions before the query is run
> (for example, when one is using prepared statements and runs
>  "select COLUMN_GET(...)", the prepared statement API requires the server
>  to inform the client about the datatype of the column being read before
> the
>  query is executed and the server can see what datatype the column actually
>  has).
>
>
> This appears to say that the constraint is that the server needs to
> determine the type of each column in the data sent to the client. SQL such
> as:
>
>   SELECT x, y, COLUMN_GET(stuff, 'z') FROM t
>
> can't work because the datatype of the 3rd column in "table" returned to
> the client can't be determined before reading data.


hum... no
at 3rd column, you have a function returning a value, the function have a
know data type, check this:
http://dev.mysql.com/doc/refman/5.1/en/udf-calling.html
that's how user defined functions are implemented at mysql, think about
dynamic column as many functions to handle a string and extract/include
information about variables with defined data type

for example... when you create a dynamic column with:

-- MariaDB 5.3+:INSERT INTO tbl SET dyncol_blob=COLUMN_CREATE(1
/*column id*/, "value");-- MariaDB 10.0.1+:INSERT INTO tbl SET
dyncol_blob=COLUMN_CREATE("column_name", "value");

check: https://mariadb.com/kb/en/mariadb/dynamic-columns/#column_create
you have a STRING as return of function COLUMN_CREATE()

The return value is suitable for

   -
      - storing in a table
      - further modification with other dynamic columns functions


 at "value" parameter, COLUMN_CREATE know that we are using a STRING, but
should be any other data type

The *as type* part allows one to specify the value type. In most cases,
this is redundant because MariaDB will be able to deduce the type of the
value. Explicit type specification may be needed when the type of the value
is not apparent. For example, a literal '2012-12-01' has a CHAR type by
default, one will need to specify'2012-12-01' AS DATE to have it stored as
a date. See the Datatypes
<https://mariadb.com/kb/en/mariadb/dynamic-columns/#Datatypes> section for
further details.





> I think I can follow this explanation. But I have no use for SQL like that
> (the naked
> COLUMN_GET, i.e. a COLUMN_GET that has no context). I use COLUMN_JSON() to
> load data into memory. It's a better fit for the AR ORM I use.
>
> I only need COLUMN_GET in the context of a larger expression, such as:
>
>   SELECT x, y FROM t WHERE COLUMN_GET(stuff, 'z') = 5
>
> this is a example of why we need index =)



> In this select, the datatypes for the returned data are clear before
> reading any data.

hummm in this example, the COLUMN_GET will check if stuff column have the
'z' key each row, and if it exists, it will get the value/data type, if it
not exist it will return NULL and compare NULL = 5?
check:
COLUMN_GET

COLUMN_GET(dyncol_blob, column_nr as type);
COLUMN_GET(dyncol_blob, column_name as type);

Get the value of a dynamic column by its name. If no column with the given
name exists, NULL will be returned.

*column_name as type* requires that one specify the datatype of the dynamic
column they are reading.

This may seem counter-intuitive: why would one need to specify which
datatype they're retrieving? Can't the dynamic columns system figure the
datatype from the data being stored?

The answer is: SQL is a statically-typed language. The SQL interpreter
needs to know the datatypes of all expressions before the query is run (for
example, when one is using prepared statements and runs"select
COLUMN_GET(...)", the prepared statement API requires the server to inform
the client about the datatype of the column being read before the query is
executed and the server can see what datatype the column actually has).

See the Datatypes
<https://mariadb.com/kb/en/mariadb/dynamic-columns/#datatypes> section for
more information about datatypes.


THAT'S A BIG PROBLEM FOR EXPRESSION INDEX!
since COLUMN_GET could return STRING OR DECIMAL OR INTEGER OR ....  we need
only one data type to create the index, instead of many data type + many
data, since today index at mysql/mariadb only accept one datatype for each
index, for example if we only have COLUMN_GET()  returning INTEGER, well
let's create a INTEGER index... but if COLUMN_GET() return STRING AND/OR
INTEGER, what should we do? create a STRING index? i think yes... but check
that big STRINGs are a problem to index, we have a limit, today we can't
create index over BLOBs columns



> So why do I have to explicitly cast the dyncol 'z'?

well you dont need, but if you want good and deterministic results you
should cast


> Or what if there is no returned data at all:


>   DELETE FROM t WHERE COLUMN_GET(stuff, 'z') = 5

COLUMN_GET should return NULL in this case, if WHERE part return true the
row will be deleted


>
> If the need to cast is to determine the type of data sent to the client,
>
yes and no... mysql use the data type internally too


> that argument surely doesn't apply here. It really wouldn't apply to this
> either:
>
>
>   SELECT x, y, CONCAT('Hello. My name is ', COLUMN_GET(stuff, 'z')) FROM t
>

in this case if COLUMN_GET return INTEGER, CONCAT  funciton will cast it to
string


>
> If I understand correctly (doubtful) and the cast is only necessary for
> the naked COLUMN_GET, then isn't the cast in the wrong place?

the cast if for COLUMN_GET return, each dynamic column could have different
keys/values from others dynamic columns... think about a field where you
create a table each row/column you use dynamic columns


> The cast should be put after the COLUMN_GET function to remove the
> nakedness.

well, from what i understand COLUMN_GET cast as a parameter of COLUMN_GET
funciton, is to predict what COLUMN_GET function should we use, think about
a function like this:

int function(parameter)
string function(parameter)

what function should be used in this cases?
function()+0    <- the int function(parameter)
function()+" "  <- the string function(parameter)

if you don't include the cast, the COLUMN_GET will return the data type
based on dynamic column information



> The
> server should throw an SQL error if it encounters a naked COLUMN_GET.

well.. i don't know, but if you want a error, the error should be generated
by COLUMN_CREATE, but that's a problem...
each row/column have a dynamic column value, the data type of one row don't
need to be the same of another row, in a row/column you could use 'z' as
integer, at another row/column you could use 'z' as string


> In
> other words, COLUMN_GET can be used only as _part_of_ an expression.
>

COLUMN_GET is a function, use as a function =]


>
>
> Tom
>
>
>
>

Follow ups

References