← Back to team overview

maria-discuss team mailing list archive

Re: Dynamic columns Library



>>>>> "Oleksandr" == Oleksandr Byelkin <sanja@xxxxxxxxxxxx> writes:


Oleksandr> - I changed indices from int to uint + some other minor changes in prototypes.
Oleksandr> - dynamic_column_exists() has no way to report error (what to do?)
>> I assume that same problem exists for all the other functions.
>> Two options:
>> - Change function type from my_bool to int where -1 means something
>> went bad (like bad format for the blob).
>> - Add an 'error' argument to all functions that is set to <> 0 if
>> something goes wrong.

Oleksandr> OK, then imho better to have something like this:
Oleksandr> #define ER_DYNCOL_EOM  -1
Oleksandr> #define ER_DYNCOL_FORMAT  -2
Oleksandr> ...

That's ok

Oleksandr> [skip]
>>> +my_bool dynamic_column_init_str(DYNAMIC_COLUMN *str, size_t size)
>>> +{
>>> +  if (!size)
>>> +    size= DYNCOL_SYZERESERVE;
>>> +  /*
>>> +    Make string with no fields (empty header)
>>> +    - First \0 is flags
>>> +    - other four \0 is 0 fileds counter
>>> +  */
>>> +  if (init_dynamic_string(str, NULL,
>>> +                              size + 5, DYNCOL_SYZERESERVE))
>>> +    return TRUE;
>>> +  return dynstr_append_mem(str, "\0\0\0\0\0", 5);
>>> +}
>> Why is an empty data 5 bytes instead of 1 or even 0?

Oleksandr> We have fixed size part of headers - flags and number of columns.

Yes, but in case of dynamic_column data that has no columns,
the size should be zero.

The reason for this is to allow one to easy do COLUMN_ADD() on an
empty blob.

What does the header consist of?

Name              Offset  Length
Flag              0       1
Column_count      1       2

This is 3 bytes. Why 5 bytes?
(We don't have to support more than 65K columns)

Oleksandr> [skip]
>>> +  for (i= 0; i < column_count - 1; i++)
>>> +    if (columns_order[i][0] == columns_order[i+1][0])
>>> +      goto err;
>>> +
>>> +  DBUG_ASSERT(str->length == FIXED_HEADER_SIZE);
>>> +  str->str[0]|= (offset_size - 1); // size of offset
>>> +  int4store(str->str + 1, non_null_count); // columns number
>> Lets limit the number of columns to 65536 (2 bytes)
>> Don't understand why we store non_null_count in header
>> Don't we need to store total number of columns ?

Oleksandr> No. We need number of columns stored in the string and we do not store NULLs.  we even decided that NULL mean removing the column.

Ok, lets for now have 'NULL' mean that we are not storing the data.
(But we should be prepared to change this in the future if it's
*critically* needed;  I don't know of any such case just now but let
at least be prepared and think how we could store it if needed).

>> I think we should store also null's in the dynamic columns as someone
>> may want to distinguish between a non existing columns and a column
>> with not a set value.
>> We could store this either as an extended type or as datetime with
>> size 0.

Oleksandr> We discussed it and decided that NULL mean nothing (BTW it was your idea). In case of we need it we reserved offset of all 1 bits.

We can't use offset for this as we need the offset to always point to
the place where the data start.  This is becasue we will use the
offset to calculate the length of the previous entry.

Anyway, using datetime with length 0 should work as a way to mark
'null' columns.

Oleksandr> [skip]
>> We also need functions:
>> Initialize the DYNAMIC_COLUMN. (Basicly just a call to
>> init_dynamic_string())

Oleksandr> I think about it, but creation is that function.

Not the same thing as creation means that there is at least one field
of data.

init_dynamic_string() should create a string with no fields of data.
(Ie, length should be 0, not 3 or 5)


>>> +my_bool dynamic_column_update(DYNAMIC_COLUMN *str, uint column_nr,
>>> +                              DYNAMIC_COLUMN_VALUE *value)
>>> +{
>>> +  size_t offset_size, new_offset_size, data_size, new_data_size,
>>> +         add_data_size, entry_size, new_entry_size,
>>> +         header_size, new_header_size, additional_size, next_offs,
>>> +         data_ins_offset;
>>> +  uint column_count, i;
>>> +  uchar *read, *write;
>>> +  my_bool added;
>>> +
>>> +  /* TODO: optimize update without deleting */
>>> +  if (dynamic_column_delete(str, column_nr))
>>> +    return TRUE;
>> The problem with doing delete + insert later is that the whole
>> function needs to be rewritten and you can't use any of the old code
>> (for columns that exists).
>> So I suggest you write this ASAP!

Oleksandr> IMHO SQL layer is mor important now :)

I will start working on the SQL layer today.
You can fix the above in the mean time.

>>> +
>>> +  if (value->type == DYN_COL_NULL)
>>> +    return FALSE;
>> Should be stored.

Oleksandr> It is strange that first you persuaded me to make as I did, then request to do it as I had wanted. Let me live it as is and change it if it will be need by somebody really. 

The reason for why I changed my mind was that I did later see a need
for being able to store nulls in the dynamic_column if we want to use
it as a layer to a non-sql table.  In this case it may be useful for
the user to be able to distingush between columns stored in the engine
(even if they have a value other than null) and columns that doesn't

However, as I said above we can wait for this, as long as we are
prepared to do a change that is not incompatible with current format
if we critically need it later.

>>> +
>>> +  if (!str || str->length < FIXED_HEADER_SIZE ||
>>> +      (str->str[0] & (~DYNCOL_FLG_KNOWN)))
>>> +    return TRUE;
>> length of 0 should be regarded as ok.

Oleksandr> Why? Any column that went through create functions will be at least size 5. If it is not true that string is malformed.

This is becasue one is allowed to update empty columns too!
We can't assume that one has done a create for every column.

In other words, the following code should work:

CREATE TABLE t1 (a int, b blob);
insert into t1 values (1,"");
update t1 set b=column_add(b,"hello") where a=1;

The same goes for delete. One should be able to do:

update t1 b=column_delete(b,1);

to delete all column '1' in the table and it should work ok even if
the column is empty.

>> We should just have one function to rewrite the headers!
>> To make life easer, maybe we should assume that header size change
>> operation is a very unlikely one and accept to do an extra memmove
>> when header offset changes?
>> This would allow us to have a simple function that just rewrites the
>> header (from a given point) and another that increases/decreases the
>> header size + all data ?

Oleksandr> Why it should be rare? (We can't know balance of delete/update/insert)

Because the header change only happens when we go over a given
threshold and it's not very likely that we will cross that for any
given update/delete/insert.

If we notice that this happens a lot, we should just fix the offset
length to the biggest found so far and never go down in size.  (For
which we would need 2 bits in the flag to tell us what offset size to