← Back to team overview

maria-discuss team mailing list archive

Re: Dynamic columns Library

 

Hi!

Sorry for delay, but I found that the answer could not be postponed.

28 марта 2011, в 16:19, Michael Widenius написал(а):

> 
> Hi!
> 
> I am cc:ing maria-discuss as you accidently forgot to also email the
> patch there.
> 
>>>>>> "Oleksandr" == Oleksandr Byelkin <sanja@xxxxxxxxxxxx> writes:
> 
> Oleksandr> Hi!
> Oleksandr> Here it is dynamic columns library. I do not think that you will have time to review it completely, but please look at it.
> 
> Oleksandr> Some notes:
> 
> Oleksandr> - I changed indices from int to uint + some other minor changes in prototypes.
> 
> Oleksandr> - dynamic_column_exists() has no way to report error (what to do?)
> 
> I assume that same problem exists for all the other functions.
> 
> Two options:
> - Change function type from my_bool to int where -1 means something
>  went bad (like bad format for the blob).
> - Add an 'error' argument to all functions that is set to <> 0 if
>  something goes wrong.

OK, then imho better to have something like this:
#define ER_DYNCOL_EOM  -1
#define ER_DYNCOL_FORMAT  -2
...

[skip]
>> 
>> +my_bool dynamic_column_init_str(DYNAMIC_COLUMN *str, size_t size)
>> +{
>> +  if (!size)
>> +    size= DYNCOL_SYZERESERVE;
>> +  /*
>> +    Make string with no fields (empty header)
>> +    - First \0 is flags
>> +    - other four \0 is 0 fileds counter
>> +  */
>> +  if (init_dynamic_string(str, NULL,
>> +                              size + 5, DYNCOL_SYZERESERVE))
>> +    return TRUE;
>> +  return dynstr_append_mem(str, "\0\0\0\0\0", 5);
>> +}
> 
> Why is an empty data 5 bytes instead of 1 or even 0?

We have fixed size part of headers - flags and number of columns.

[skip]
> 
>> +  for (i= 0; i < column_count - 1; i++)
>> +    if (columns_order[i][0] == columns_order[i+1][0])
>> +      goto err;
>> +
>> +  DBUG_ASSERT(str->length == FIXED_HEADER_SIZE);
>> +  str->str[0]|= (offset_size - 1); // size of offset
>> +  int4store(str->str + 1, non_null_count); // columns number
> 
> Lets limit the number of columns to 65536 (2 bytes)
> 
> Don't understand why we store non_null_count in header
> Don't we need to store total number of columns ?

No. We need number of columns stored in the string and we do not store NULLs.  we even decided that NULL mean removing the column.

> 
> I think we should store also null's in the dynamic columns as someone
> may want to distinguish between a non existing columns and a column
> with not a set value.
> 
> We could store this either as an extended type or as datetime with
> size 0.

We discussed it and decided that NULL mean nothing (BTW it was your idea). In case of we need it we reserved offset of all 1 bits.

[skip]
>> 
> 
> We also need functions:
> 
> Initialize the DYNAMIC_COLUMN. (Basicly just a call to
> init_dynamic_string())

I think about it, but creation is that function.

> 
>> +  /* rewrite header*/
>> +  str->str[0]|= (new_offset_size - 1); // size of offset
>> +  int4store(str->str + 1, column_count - 1); // columns number
>> +  for (i= 0, write= read= (uchar *)str->str + FIXED_HEADER_SIZE;
>> +       i < column_count;
>> +       i++, read+= entry_size, write+= new_entry_size)
>> +  {
>> +    size_t offs;
>> +    uint nm;
>> +    DYNAMIC_COLUMN_TYPE tp;
>> +    if (read == header_entry)
>> +    {
>> +#ifndef DBUG_OFF
>> +      nm= uint2korr(read);
>> +      type_and_offset_read(&tp, &offs, read + 2,
>> +                           offset_size);
>> +      DBUG_ASSERT(nm == column_nr);
>> +      DBUG_ASSERT(offs == deleted_entry_offset);
>> +#endif
>> +      write-= new_entry_size; // do not move writer
>> +      continue; // skip removed field
>> +    }
>> +
>> +    nm= uint2korr(read),
>> +    type_and_offset_read(&tp, &offs, read + 2,
>> +                         offset_size);
>> +
>> +    if (offs > deleted_entry_offset)
>> +      offs-= length; // data stored after removed data
>> +
>> +    int2store(write, nm);
>> +    type_and_offset_store(write + 2, new_offset_size, tp, offs);
>> +  }
> 
> Please optimize that if new_offset_size == old_offset_size then
> we should start from header_entry.
> 
> You should also make the above a function as update also needs to do this!
> 
>> +
>> +  /* move data */
>> +  {
>> +    size_t first_chunk_len= (data - (uchar *)str->str) -
>> +      FIXED_HEADER_SIZE - header_size;
>> +    size_t second_chunk_len= new_data_size - first_chunk_len;
>> +    if (first_chunk_len)
>> +      memmove(str->str + FIXED_HEADER_SIZE + new_header_size,
>> +              str->str + FIXED_HEADER_SIZE + header_size,
>> +              first_chunk_len);
>> +    if (second_chunk_len)
>> +      memmove(str->str +
>> +              FIXED_HEADER_SIZE + new_header_size + first_chunk_len,
>> +              str->str +
>> +              FIXED_HEADER_SIZE + header_size + first_chunk_len + length,
>> +              second_chunk_len);
>> +  }
>> +
>> +  /* fix str length */
>> +  DBUG_ASSERT(str->length >=
>> +              FIXED_HEADER_SIZE + new_header_size + new_data_size);
>> +  str->length= FIXED_HEADER_SIZE + new_header_size + new_data_size;
>> +
>> +  return FALSE;
>> +}
>> +
>> +
>> +my_bool dynamic_column_update(DYNAMIC_COLUMN *str, uint column_nr,
>> +                              DYNAMIC_COLUMN_VALUE *value)
>> +{
>> +  size_t offset_size, new_offset_size, data_size, new_data_size,
>> +         add_data_size, entry_size, new_entry_size,
>> +         header_size, new_header_size, additional_size, next_offs,
>> +         data_ins_offset;
>> +  uint column_count, i;
>> +  uchar *read, *write;
>> +  my_bool added;
>> +
>> +  /* TODO: optimize update without deleting */
>> +  if (dynamic_column_delete(str, column_nr))
>> +    return TRUE;
> 
> The problem with doing delete + insert later is that the whole
> function needs to be rewritten and you can't use any of the old code
> (for columns that exists).
> 
> So I suggest you write this ASAP!


IMHO SQL layer is mor important now :)

> 
>> +
>> +  if (value->type == DYN_COL_NULL)
>> +    return FALSE;
> 
> Should be stored.

It is strange that first you persuaded me to make as I did, then request to do it as I had wanted. Let me live it as is and change it if it will be need by somebody really. 


>> +
>> +  if (!str || str->length < FIXED_HEADER_SIZE ||
>> +      (str->str[0] & (~DYNCOL_FLG_KNOWN)))
>> +    return TRUE;
> 
> length of 0 should be regarded as ok.

Why? Any column that went through create functions will be at least size 5. If it is not true that string is malformed.

[skip]
> 
> 
>> +  /* rewrite header */
>> +  next_offs= str->length - FIXED_HEADER_SIZE - new_header_size;
>> +  for (i= column_count,
>> +       read= (uchar *)str->str + FIXED_HEADER_SIZE + header_size - entry_size,
>> +       write= ((uchar *)str->str + FIXED_HEADER_SIZE +
>> +               (new_entry_size * column_count));
>> +       i > 0;
>> +       i--, read-= entry_size, write-= new_entry_size)
>> +  {
>> +    size_t offs;
>> +    uint nm;
>> +    DYNAMIC_COLUMN_TYPE tp;
>> +
>> +    nm= uint2korr(read);
>> +    type_and_offset_read(&tp, &offs, read + 2, offset_size);
>> +
>> +    if (!added && column_nr > nm)
>> +    {
>> +      /* place to put the new column */
>> +      int2store(write, column_nr);
>> +      type_and_offset_store(write + 2, new_offset_size,
>> +                            value->type,
>> +                            (data_ins_offset= next_offs - add_data_size));
>> +      /* move previous data */
>> +      memmove(str->str + FIXED_HEADER_SIZE + new_header_size,
>> +              str->str + FIXED_HEADER_SIZE + new_header_size + add_data_size,
>> +              next_offs - add_data_size);
>> +      added= TRUE;
>> +      write-= new_entry_size;
>> +    }
>> +    int2store(write, nm);
>> +    if (!added)
>> +      offs+= add_data_size;
>> +    type_and_offset_store(write + 2,  new_offset_size, tp, offs);
>> +    next_offs= offs;
>> +  }
> 
> We should just have one function to rewrite the headers!
> 
> To make life easer, maybe we should assume that header size change
> operation is a very unlikely one and accept to do an extra memmove
> when header offset changes?
> 
> This would allow us to have a simple function that just rewrites the
> header (from a given point) and another that increases/decreases the
> header size + all data ?

Why it should be rare? (We can't know balance of delete/update/insert)

> 
> Then all operations are basicly:
> 
> - If offset_pointer_size changes, fix data
> - Add / remove things from header and data (with new offset size)
> - Update header
> 
> The whole things above would then be very trivial:
> 
> - realloc str->length if it needs to grow
> - If offset_pointer_size changes, fix data and update entry_pos
> - Update data (and create new header entry last)
> - memmove(entry, entry + entry_size, header_end - entry);
> - Write data at entry
> 

[skip]


Follow ups

References