maria-developers team mailing list archive

Thread
Date
Re: HASH and unique constraint for blobs

To: Sachin Setia <sachinsetia1001@xxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Mon, 6 Jun 2016 13:30:51 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CANhFX6VscL9-3j1OWidL_Kz3w-8RkUdd3uEc2QBTtGh5G47jTg@mail.gmail.com>
User-agent: Mutt/1.5.24 (2015-08-30)
Hi, Sachin!

On Jun 06, Sachin Setia wrote:
> Hello Sir
> Weekly Report for second week of gsoc
> 1. Implemented unique in the case of null
> 2. Implemented unique(A,b,c....)
> 
> Currently working on
> tests , field hiding

Good, thanks!

I'd suggest to postpone field hiding until bb-10.2-default
branch is merged into 10.2.

Start with tests, it's very important to have them.
See mysql-test/t/*.test and the manual is here:
https://mariadb.com/kb/en/mariadb/mysql-test/
Don't forget direct tests of your HASH function.

Then, may be, look at UPDATE.

And here's a code review up to the commit b69e141

> diff --git a/sql/handler.cc b/sql/handler.cc
> index fb2921f..e68e37b 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -5863,16 +5863,86 @@ int handler::ha_reset()
>    DBUG_RETURN(reset());
>  }
>  
> +/**  @breif

typo: brief

> +   Compare two records
> +   Need a better place for this function 
> +   @returns true if equal else false*/
> +my_bool rec_hash_cmp(TABLE *tbl ,Field * hash_field)

in C++ you can use 'bool'. 'my_bool' is only for C

and, please, pay attention to where you put spaces :)

> +{
> +  Item * t_item;
> +  t_item=hash_field->vcol_info->expr_item->next;

No, please don't rely on the 'next' pointer. It used to link
all items into a list, so that they could be destroyed all at once.
It's not how you traverse expression arguments.

Use, for example,

   Item **arg,**arg_end;
   for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++)
   {
     Item t_item= *arg;

or
 for (uint i= 0; i < arg_count; i++)
 {
   Item *t_item= args[i];

See these and other examples in item_func.cc

> +  int diff = tbl->record[1]-tbl->record[0];

record[1] - record[0] is always table->s->rec_buff_length
But there's one problem with that - you cannot use record[1],
see a comment below.

> +  Field * t_field;
> +  int first_length,second_length;
> +  while(t_item)
> +  {
> +    for(int i=0;i<tbl->s->fields;i++)
> +    {
> +      t_field=tbl->field[i];
> +      if(!my_strcasecmp(system_charset_info,
> +           t_item->name,t_field->field_name))
> +        break;
> +    }

your t_item is an Item_field, so your t_field
will be t_item->field, no need to iterate anything.

> +    /* First check for nulls */
> +    if(t_field->is_real_null()||t_field->is_real_null(diff))
> +    {
> +      return false; 
> +    }

This shouldn't be necessary, see below, you shouldn't call this function
when a field value is NULL.

>  
> +    /*
> +       length of field is not equal to pack length when it is
> +       subclass of field_blob and field _varsting
> +     */
> +    first_length = t_field->data_length();
> +    second_length = t_field->data_length(diff);
> +    if(first_length!=second_length)
> +      return false;
> +    if(t_field->cmp_max(t_field->ptr,t_field->ptr+diff,first_length))
> +      return false;
> +    t_item=t_item->next;
> +  }
> +  return true;
> +}
>  int handler::ha_write_row(uchar *buf)
>  {
> -  int error;
> +  int error,map;
>    Log_func *log_func= Write_rows_log_event::binlog_row_logging_function;
> +  Field **field_iter;
>    DBUG_ASSERT(table_share->tmp_table != NO_TMP_TABLE ||
>                m_lock_type == F_WRLCK);
>    DBUG_ENTER("handler::ha_write_row");
>    DEBUG_SYNC_C("ha_write_row_start");
> +  /* First need to whether inserted record is unique or not */
>  
> +  /* One More Thing  if i implement hidden field then detection can be easy */
> +  field_iter=table->field;
> +  for(uint i=0;i<table->s->fields;i++)

You can iterate only virtual fields, there's a separate list for them.

> +  {
> +    if(strncmp((*field_iter)->field_name,"DB_ROW_HASH_",12)==0)
> +    {
> +      table->file->ha_index_init(0,0);

this might be a problem, as it changes the current index.
I think it's ok for INSERT, but may be not ok for UPDATE
(imagine, UPDATE ... WHERE indexed_column=5, the server does
ha_index_init for indexed_column, and then, in a loop,
ha_index_next/ha_update_row. So you cannot swicth to another index
in the middle of UPDATE loop)

So, use ha_index_read_idx_map(), it doesn't change the current index.

by the way, how could it be ha_index_init(0,0) ???
you need to specify your unique key number, not 0.

> +      /* We need to add the null bit */
> +      /* If the column can be NULL, then in the first byte we put 1 if the
> +	   field value is NULL, 0 otherwise. */
> +      uchar * ptr = (uchar *)my_malloc(sizeof(char ) *1+8, MYF(MY_WME));

1. where do you free it?
2. doing malloc/free for _every row_ is too expensive, mallocs
generally are not very good for concurrency.
You can simply use

   uchar ptr[9]

can you not?

> +      *ptr=0;

why *ptr=0 ? You need to check whether the column value is actually NULL,
because now you set the key to "not null" unconditionally.
Besides, because NULL value can never cause HA_ERR_FOUND_DUPP_KEY,
you don't need to search if the column is NULL, you can skip this whole
check.

> +      ptr++;
> +      memcpy(ptr,(*field_iter)->ptr,8);
> +      ptr--;

No need to go low-level, you can do: Why wouldn't you use
key_copy() function? Like

  uchar buf[9];
  DBUG_ASSERT(key->length == sizeof(buf));
  key_copy(ptr, buf, key, 0, false);

> +      map= table->file->ha_index_read_map(table->record[1],ptr,HA_WHOLE_KEY,HA_READ_KEY_EXACT);

1. why 'map'? what does it mean?
2. Now it's getting a bit tricky. in UPDATE you cannot use record[1] -
it's the row before the update. I'm afraid you'll need to allocate another
table->s->rec_buff_length buffer for this, I don't see any other choice.
Of course, you can do it on demand, on the table's memroot, preserve
between calls, and so on.

> +      if(!map)
> +      {
> +        if(rec_hash_cmp(table,*field_iter))
> +        {
> +          table->file->ha_index_end();
> +          DBUG_RETURN(HA_ERR_FOUND_DUPP_KEY);
> +        }
> +          
> +      }
> +      table->file->ha_index_end();
> +    }
> +    field_iter++;
> +  }
>    MYSQL_INSERT_ROW_START(table_share->db.str, table_share->table_name.str);
>    mark_trx_read_write();
>    increment_statistics(&SSV::ha_write_count);
> diff --git a/sql/item_func.cc b/sql/item_func.cc
> index e5c1f4c..fb23495 100644
> --- a/sql/item_func.cc
> +++ b/sql/item_func.cc
> @@ -1916,6 +1916,41 @@ void Item_func_int_div::fix_length_and_dec()
>  }
>  
>  
> +longlong  Item_func_hash::val_int()
> +{
> +  unsigned_flag =true;

the other way around, no space before, one space after :)

> +  ulong nr1= 1,temp=1, nr2= 0;
> +  CHARSET_INFO *cs;
> +  for(int i=0;i<arg_count;i++)
> +  {
> +    String * str = args[i]->val_str();
> +    if(args[i]->null_value)
> +    {
> +      null_value=1;

don't forget to set null_value=0 if no arguments are NULL

> +      return 0;
> +    }
> +    nr2=args[i]->max_length;

Hmm, I don't know. This makes your hash depend on metadata, not only on data.
I don't think it's a good idea.

> +    cs=str->charset();
> +    char l[4];
> +    int4store(l,str->length());
> +    cs->coll->hash_sort(cs, (uchar *)l,sizeof(str->length()), &temp, &nr2);

not sizeof(str->length()), but sizeof(l).

> +    nr1^=temp;
> +    cs->coll->hash_sort(cs, (uchar *)str->ptr(),str->length(), &temp, &nr2);

Why did you change the hashing function? &nr1 should be the 4th argument
of hash_sort(), no xors afterwards.

If you change it, you need to prove mathematically that your version
is better. That's why I personally prefer to not to mess with it :)

> +    nr1^=temp;
> +  }
> +  return   (longlong)nr1;
> +}
> +
> +
> +void  Item_func_hash::fix_length_and_dec()
> +{
> +  maybe_null= 1;
> +  null_value= 0;

no need to set null_value here, the value is calculated
in val_int, not in fix_length_and_dec.
fix_length_and_dec is called once, at the very beginning of a query,
val_int is called for every row. And you need to set null_value=0
for every row.

> +  decimals= 0;
> +  max_length= 8;
> +}
> +
> +
>  longlong Item_func_mod::int_op()
>  {
>    DBUG_ASSERT(fixed == 1);
> diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc
> index 9b4238b..2c90f3c 100644
> --- a/sql/sql_insert.cc
> +++ b/sql/sql_insert.cc
> @@ -724,6 +723,20 @@ bool mysql_insert(THD *thd,TABLE_LIST *table_list,
>        DBUG_RETURN(TRUE);
>    }
>      

Please mark places you're going to fix later, e.g.
// TODO when we'll have hidden fields, this will be removed

> +  context= &thd->lex->select_lex.context;
> +  Field ** f=table_list->table->field;
> +  List<Item> * i_list = (List<Item> *)values_list.first_node()->info;
> +  for(uint i=0;i<table_list->table->s->fields;i++)
> +  {
> +    if(strncmp((*f)->field_name,"DB_ROW_HASH_",12)==0)
> +    {
> +      i_list->push_back(new (thd->mem_root) 
> +      Item_default_value(thd,context),thd->mem_root);
> +    }
> +    f++;
> +  }
> +  List_iterator_fast<List_item> its(values_list);
> +     
>    lock_type= table_list->lock_type;
>  
>    THD_STAGE_INFO(thd, stage_init);
> diff --git a/sql/sql_table.cc b/sql/sql_table.cc
> index dfce503..ac2988f 100644
> --- a/sql/sql_table.cc
> +++ b/sql/sql_table.cc
> @@ -3012,8 +3012,8 @@ int prepare_create_field(Create_field *sql_field,
>                            (sql_field->decimals << FIELDFLAG_DEC_SHIFT));
>      break;
>    }
> -  if (!(sql_field->flags & NOT_NULL_FLAG) ||
> -      (sql_field->vcol_info))  /* Make virtual columns allow NULL values */
> +  if (!(sql_field->flags & NOT_NULL_FLAG) )
> +     /* (sql_field->vcol_info))   Make virtual columns allow NULL values */

don't forget to test (see above about mysql-test) that your change
does not affect normal generated columns and they are still always nullable.
to test that, create a table with a persistent generated column
with an expression that can return NULL.

>      sql_field->pack_flag|= FIELDFLAG_MAYBE_NULL;
>    if (sql_field->flags & NO_DEFAULT_VALUE_FLAG)
>      sql_field->pack_flag|= FIELDFLAG_NO_DEFAULT;
> @@ -3223,6 +3223,80 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info,
>    bool tmp_table= create_table_mode == C_ALTER_TABLE;
>    DBUG_ENTER("mysql_prepare_create_table");
>  
> +  /* 
> +    scan the the whole alter list  
> +    and add one field if length of blob is zero 
> +  */

you need a TODO marker here (TODO change to work for all too-long unique keys)

> +    
> +    List_iterator<Key> key_iter(alter_info->key_list);
> +    Key *key_iter_key;
> +    Key_part_spec *temp_colms;
> +    char num = '1';
> diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> index 24dc3c4..0135d9c 100644
> --- a/sql/sql_yacc.yy
> +++ b/sql/sql_yacc.yy
> @@ -9543,6 +9543,15 @@ function_call_keyword:
>              if ($$ == NULL)
>                MYSQL_YYABORT;
>            }
> +        /*Currently hash for just 1 field
> +         * */

your comment is already obsolete :)

> +        |HASH_SYM '(' expr_list ')'
> +            {
> +                $$= new (thd->mem_root)Item_func_hash(thd,*$3);
> +                if($$==NULL)
> +                    MYSQL_YYABORT;
> +            }
> +        
>          | INSERT '(' expr ',' expr ',' expr ',' expr ')'
>            {
>              $$= new (thd->mem_root) Item_func_insert(thd, $3, $5, $7, $9);

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx
Follow ups

Re: HASH and unique constraint for blobs
From: Sachin Setia, 2016-07-01
References

Re: InnoDB blob for primary key
From: Sergei Golubchik, 2016-04-18
Re: InnoDB blob for primary key
From: Sergei Golubchik, 2016-04-18
Re: InnoDB blob for primary key
From: Sachin Setia, 2016-04-19
Re: InnoDB blob for primary key
From: Sachin Setia, 2016-04-19
Re: InnoDB blob for primary key
From: Sachin Setia, 2016-04-22
Re: InnoDB blob for primary key
From: Sachin Setia, 2016-06-06