maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #00355
Re: Dynamic columns Library
Hi!
Sorry for delay, but I found that the answer could not be postponed.
28 марта 2011, в 16:19, Michael Widenius написал(а):
>
> Hi!
>
> I am cc:ing maria-discuss as you accidently forgot to also email the
> patch there.
>
>>>>>> "Oleksandr" == Oleksandr Byelkin <sanja@xxxxxxxxxxxx> writes:
>
> Oleksandr> Hi!
> Oleksandr> Here it is dynamic columns library. I do not think that you will have time to review it completely, but please look at it.
>
> Oleksandr> Some notes:
>
> Oleksandr> - I changed indices from int to uint + some other minor changes in prototypes.
>
> Oleksandr> - dynamic_column_exists() has no way to report error (what to do?)
>
> I assume that same problem exists for all the other functions.
>
> Two options:
> - Change function type from my_bool to int where -1 means something
> went bad (like bad format for the blob).
> - Add an 'error' argument to all functions that is set to <> 0 if
> something goes wrong.
OK, then imho better to have something like this:
#define ER_DYNCOL_EOM -1
#define ER_DYNCOL_FORMAT -2
...
[skip]
>>
>> +my_bool dynamic_column_init_str(DYNAMIC_COLUMN *str, size_t size)
>> +{
>> + if (!size)
>> + size= DYNCOL_SYZERESERVE;
>> + /*
>> + Make string with no fields (empty header)
>> + - First \0 is flags
>> + - other four \0 is 0 fileds counter
>> + */
>> + if (init_dynamic_string(str, NULL,
>> + size + 5, DYNCOL_SYZERESERVE))
>> + return TRUE;
>> + return dynstr_append_mem(str, "\0\0\0\0\0", 5);
>> +}
>
> Why is an empty data 5 bytes instead of 1 or even 0?
We have fixed size part of headers - flags and number of columns.
[skip]
>
>> + for (i= 0; i < column_count - 1; i++)
>> + if (columns_order[i][0] == columns_order[i+1][0])
>> + goto err;
>> +
>> + DBUG_ASSERT(str->length == FIXED_HEADER_SIZE);
>> + str->str[0]|= (offset_size - 1); // size of offset
>> + int4store(str->str + 1, non_null_count); // columns number
>
> Lets limit the number of columns to 65536 (2 bytes)
>
> Don't understand why we store non_null_count in header
> Don't we need to store total number of columns ?
No. We need number of columns stored in the string and we do not store NULLs. we even decided that NULL mean removing the column.
>
> I think we should store also null's in the dynamic columns as someone
> may want to distinguish between a non existing columns and a column
> with not a set value.
>
> We could store this either as an extended type or as datetime with
> size 0.
We discussed it and decided that NULL mean nothing (BTW it was your idea). In case of we need it we reserved offset of all 1 bits.
[skip]
>>
>
> We also need functions:
>
> Initialize the DYNAMIC_COLUMN. (Basicly just a call to
> init_dynamic_string())
I think about it, but creation is that function.
>
>> + /* rewrite header*/
>> + str->str[0]|= (new_offset_size - 1); // size of offset
>> + int4store(str->str + 1, column_count - 1); // columns number
>> + for (i= 0, write= read= (uchar *)str->str + FIXED_HEADER_SIZE;
>> + i < column_count;
>> + i++, read+= entry_size, write+= new_entry_size)
>> + {
>> + size_t offs;
>> + uint nm;
>> + DYNAMIC_COLUMN_TYPE tp;
>> + if (read == header_entry)
>> + {
>> +#ifndef DBUG_OFF
>> + nm= uint2korr(read);
>> + type_and_offset_read(&tp, &offs, read + 2,
>> + offset_size);
>> + DBUG_ASSERT(nm == column_nr);
>> + DBUG_ASSERT(offs == deleted_entry_offset);
>> +#endif
>> + write-= new_entry_size; // do not move writer
>> + continue; // skip removed field
>> + }
>> +
>> + nm= uint2korr(read),
>> + type_and_offset_read(&tp, &offs, read + 2,
>> + offset_size);
>> +
>> + if (offs > deleted_entry_offset)
>> + offs-= length; // data stored after removed data
>> +
>> + int2store(write, nm);
>> + type_and_offset_store(write + 2, new_offset_size, tp, offs);
>> + }
>
> Please optimize that if new_offset_size == old_offset_size then
> we should start from header_entry.
>
> You should also make the above a function as update also needs to do this!
>
>> +
>> + /* move data */
>> + {
>> + size_t first_chunk_len= (data - (uchar *)str->str) -
>> + FIXED_HEADER_SIZE - header_size;
>> + size_t second_chunk_len= new_data_size - first_chunk_len;
>> + if (first_chunk_len)
>> + memmove(str->str + FIXED_HEADER_SIZE + new_header_size,
>> + str->str + FIXED_HEADER_SIZE + header_size,
>> + first_chunk_len);
>> + if (second_chunk_len)
>> + memmove(str->str +
>> + FIXED_HEADER_SIZE + new_header_size + first_chunk_len,
>> + str->str +
>> + FIXED_HEADER_SIZE + header_size + first_chunk_len + length,
>> + second_chunk_len);
>> + }
>> +
>> + /* fix str length */
>> + DBUG_ASSERT(str->length >=
>> + FIXED_HEADER_SIZE + new_header_size + new_data_size);
>> + str->length= FIXED_HEADER_SIZE + new_header_size + new_data_size;
>> +
>> + return FALSE;
>> +}
>> +
>> +
>> +my_bool dynamic_column_update(DYNAMIC_COLUMN *str, uint column_nr,
>> + DYNAMIC_COLUMN_VALUE *value)
>> +{
>> + size_t offset_size, new_offset_size, data_size, new_data_size,
>> + add_data_size, entry_size, new_entry_size,
>> + header_size, new_header_size, additional_size, next_offs,
>> + data_ins_offset;
>> + uint column_count, i;
>> + uchar *read, *write;
>> + my_bool added;
>> +
>> + /* TODO: optimize update without deleting */
>> + if (dynamic_column_delete(str, column_nr))
>> + return TRUE;
>
> The problem with doing delete + insert later is that the whole
> function needs to be rewritten and you can't use any of the old code
> (for columns that exists).
>
> So I suggest you write this ASAP!
IMHO SQL layer is mor important now :)
>
>> +
>> + if (value->type == DYN_COL_NULL)
>> + return FALSE;
>
> Should be stored.
It is strange that first you persuaded me to make as I did, then request to do it as I had wanted. Let me live it as is and change it if it will be need by somebody really.
>> +
>> + if (!str || str->length < FIXED_HEADER_SIZE ||
>> + (str->str[0] & (~DYNCOL_FLG_KNOWN)))
>> + return TRUE;
>
> length of 0 should be regarded as ok.
Why? Any column that went through create functions will be at least size 5. If it is not true that string is malformed.
[skip]
>
>
>> + /* rewrite header */
>> + next_offs= str->length - FIXED_HEADER_SIZE - new_header_size;
>> + for (i= column_count,
>> + read= (uchar *)str->str + FIXED_HEADER_SIZE + header_size - entry_size,
>> + write= ((uchar *)str->str + FIXED_HEADER_SIZE +
>> + (new_entry_size * column_count));
>> + i > 0;
>> + i--, read-= entry_size, write-= new_entry_size)
>> + {
>> + size_t offs;
>> + uint nm;
>> + DYNAMIC_COLUMN_TYPE tp;
>> +
>> + nm= uint2korr(read);
>> + type_and_offset_read(&tp, &offs, read + 2, offset_size);
>> +
>> + if (!added && column_nr > nm)
>> + {
>> + /* place to put the new column */
>> + int2store(write, column_nr);
>> + type_and_offset_store(write + 2, new_offset_size,
>> + value->type,
>> + (data_ins_offset= next_offs - add_data_size));
>> + /* move previous data */
>> + memmove(str->str + FIXED_HEADER_SIZE + new_header_size,
>> + str->str + FIXED_HEADER_SIZE + new_header_size + add_data_size,
>> + next_offs - add_data_size);
>> + added= TRUE;
>> + write-= new_entry_size;
>> + }
>> + int2store(write, nm);
>> + if (!added)
>> + offs+= add_data_size;
>> + type_and_offset_store(write + 2, new_offset_size, tp, offs);
>> + next_offs= offs;
>> + }
>
> We should just have one function to rewrite the headers!
>
> To make life easer, maybe we should assume that header size change
> operation is a very unlikely one and accept to do an extra memmove
> when header offset changes?
>
> This would allow us to have a simple function that just rewrites the
> header (from a given point) and another that increases/decreases the
> header size + all data ?
Why it should be rare? (We can't know balance of delete/update/insert)
>
> Then all operations are basicly:
>
> - If offset_pointer_size changes, fix data
> - Add / remove things from header and data (with new offset size)
> - Update header
>
> The whole things above would then be very trivial:
>
> - realloc str->length if it needs to grow
> - If offset_pointer_size changes, fix data and update entry_pos
> - Update data (and create new header entry last)
> - memmove(entry, entry + entry_size, header_end - entry);
> - Write data at entry
>
[skip]
Follow ups
References