← Back to team overview

maria-developers team mailing list archive

Re: InnoDB blob for primary key

 

Hi, Sachin!

First, it's *very* good that you commit often and ask me to review
early. This way we can do adjustments early on.

See below my first review of your patch. To summarize it here, there're
three main points:

1. Coding style. Look at how the rest of the code base look like,
   particularly files that you're modifying, and try to make you code to
   look similarly.

2. Simplifications: use virtual columns, and the server will do most of
   the job for you. Plug your code where the server has already detected
   the very long unique key, and you won't need to search the column
   list and code all checks manually.

Now, the review:

=======================

Ok, this is no longer a prototype, right?
So, I'll be a bit more strict about coding style from now on.

> diff --git a/sql/item_func.cc b/sql/item_func.cc
> index e5c1f4c..57d87d7 100644
> --- a/sql/item_func.cc
> +++ b/sql/item_func.cc
> @@ -1915,7 +1915,17 @@ void Item_func_int_div::fix_length_and_dec()
>    unsigned_flag=args[0]->unsigned_flag | args[1]->unsigned_flag;
>  }
> 
> -
> +longlong Item_func_hash::val_int(){
> +    String * str =  args[0]->val_str();
> +    int length = str->length();
> +    char * ptr= str->c_ptr();
> +    int return_val = 1;
> +    for(int i=0;i<length;i++){
> +        return_val +=((int)*ptr)*2;
> +        ptr++;
> +    }
> +    return return_val;
> +}

1. two empty lines between functions/methods
2. opening curly bracket on its own line
3. one space after the assignment, no spaces before

and it's very bad hash function, that you're using. Lots of collisions,
not charset-aware (wrong results for utf8), and c_str() does realloc()
that you don't need here.

Instead, do like my_hash_sort() does:

 ulong nr1= 1, nr2= 4;
 CHARSET_INFO *cs= str->charset();
 cs->coll->hash_sort(cs, (uchar*) str->ptr(), str->length(), &nr1, &nr2);
 return nr1;

Besides, you've forgot to handle NULLs:

 String *str= args[0]->val_str();
 if ((null_value= !str))
   return 0;

>  longlong Item_func_mod::int_op()
>  {
>    DBUG_ASSERT(fixed == 1);
> diff --git a/sql/item_func.h b/sql/item_func.h
> index a5f6bf2..aa2e6d6 100644
> --- a/sql/item_func.h
> +++ b/sql/item_func.h
> @@ -773,6 +773,15 @@ class Item_func_int_div :public Item_int_func
>  };
> 
> 
> +class Item_func_hash :public Item_int_func
> +{
> +public:
> +  Item_func_hash(THD *thd, Item *a): Item_int_func(thd, a)
> +  {}
> +  longlong val_int();
> +  const char *func_name() const { return "HASH"; }

you probably need to have fix_length_and_dec() method here to set max_length.

> +};
> +
>  class Item_func_mod :public Item_num_op
>  {
>  public:
> diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc
> index 9b4238b..e411da7 100644
> --- a/sql/sql_insert.cc
> +++ b/sql/sql_insert.cc
> @@ -173,7 +173,12 @@ static bool check_view_single_update(List<Item> &fields, List<Item> *values,
>    return TRUE;
>  }
> 
> -
> +/*
> + * to chech the total number of blob uniques in the table
> + * */

Use comment style like everywhere else in this file, please!

> +int check_total_blob_unique(TABLE *table){

Why do you need this function? It doesn't seem to be used now.

> +   // table->s->field;
> +}
>  /*
>    Check if insert fields are correct.
> 
> @@ -723,7 +728,21 @@ bool mysql_insert(THD *thd,TABLE_LIST *table_list,
>      if (open_and_lock_tables(thd, table_list, TRUE, 0))
>        DBUG_RETURN(TRUE);
>    }
> -
> +    //now assume that table list contains only one table //work
> +    //simple insert
> +    //now i need to add default value into  value_list

There's a much easier solution where the server will do almost everything for
you. When a table is created, you automatically create a *virtual column*
for every long unique constraint. Make it persistent and create an index
over it. That is, your

  create table tbl (abc blob unique);

should create (almost) the same table definition as

  create table tbl (abc blob,
                    db_row_hash_1 int as hash(abc) persistent,
                    unique index (db_row_hash_1));

when you do that (basically, automatically creating columns and indexes
as necessary), MariaDB will take care of almost everything - defaults,
calculating hash values, and so on.

When it'll work, you'll need to fix deficiencies of this solution:
1. automatically chosing column names to avoid conflict with user columns
2. hiding these automatic columns from the user

> +    context= &thd->lex->select_lex.context;
> +    Field ** f=table_list->table->field;
> +    List<Item> * i_list = (List<Item> *)values_list.first_node()->info;
> +    for(uint i=0;i<table_list->table->s->fields;i++){
> +        if(strncmp((*f)->field_name,"DB_ROW_HASH_",12)==0){
> +            i_list->push_back(new (thd->mem_root)
> +                Item_default_value(thd,context),thd->mem_root);
> +        }
> +        f++;
> +    }
> +     List_iterator_fast<List_item> its(values_list);
> +
>    lock_type= table_list->lock_type;
> 
>    THD_STAGE_INFO(thd, stage_init);
> diff --git a/sql/sql_table.cc b/sql/sql_table.cc
> index dfce503..bc37660 100644
> --- a/sql/sql_table.cc
> +++ b/sql/sql_table.cc
> @@ -3223,6 +3223,82 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info,
>    bool tmp_table= create_table_mode == C_ALTER_TABLE;
>    DBUG_ENTER("mysql_prepare_create_table");
> 
> +//scan the the whole alter list  //work
> +//and add one field if length of blob is zero

not quite. this is the wrong place to do it.
See this code (in mysql_prepare_create_table, the function is correct):

	  key_part_length= MY_MIN(column->length,
                              blob_length_by_type(sql_field->sql_type)
                              * sql_field->charset->mbmaxlen);
	  if (key_part_length > max_key_length ||
	      key_part_length > file->max_key_part_length())
	  {
	    key_part_length= MY_MIN(max_key_length, file->max_key_part_length());
	    if (key->type == Key::MULTIPLE)
	    {
	      /* not a critical problem */
	      push_warning_printf(thd, Sql_condition::WARN_LEVEL_WARN,
                                 ER_TOO_LONG_KEY,
                                 ER_THD(thd, ER_TOO_LONG_KEY),
                                 key_part_length);
             /* Align key length to multibyte char boundary */
             key_part_length-= key_part_length % sql_field->charset->mbmaxlen;
	    }
	    else
	    {
	      my_error(ER_TOO_LONG_KEY, MYF(0), key_part_length);
	      DBUG_RETURN(TRUE);
	    }

it checks whether a key is too long. for non-unique keys it simply truncates
the key, for unique keys it issues an error.
you can change it to not issue an error, but instead use your HASH approach
for too long unique keys.
This way every unique key was used to be too long, will now be allowed,
and it'll use HASH automatically.
No need to iterate fields, look for blobs, or anything...

> +
> +    List_iterator<Key> key_iter(alter_info->key_list);
> +    Key *key_iter_key;
> +    Key_part_spec *temp_colms;
> +    char num = '1';
> +    bool is_blob_unique=false;
> +    int key_initial_elements=alter_info->key_list.elements;
> +    while((key_iter_key=key_iter++)&&key_initial_elements){

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx


References