maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09796
Re: HASH and unique constraint for blobs
Hello Sergei,
Sir actually I added flags for unique_hash and index_hash for KEY these are
#define HA_INDEX_HASH 4
#define HA_UNIQUE_HASH 8
but ./mtr inoodb test are failing bacause KEY flag is 40 which is
HA_BINARY_PACK_KEY + HA_UNIQUE_HASH but HA_UNIQUE_HASH should not be
there . Is 4 and 8 are already used ,or should i go for another
approach. Sir please review
the code.
Regards
sachin
On Mon, Jun 6, 2016 at 5:00 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> Hi, Sachin!
>
> On Jun 06, Sachin Setia wrote:
>> Hello Sir
>> Weekly Report for second week of gsoc
>> 1. Implemented unique in the case of null
>> 2. Implemented unique(A,b,c....)
>>
>> Currently working on
>> tests , field hiding
>
> Good, thanks!
>
> I'd suggest to postpone field hiding until bb-10.2-default
> branch is merged into 10.2.
>
> Start with tests, it's very important to have them.
> See mysql-test/t/*.test and the manual is here:
> https://mariadb.com/kb/en/mariadb/mysql-test/
> Don't forget direct tests of your HASH function.
>
> Then, may be, look at UPDATE.
>
> And here's a code review up to the commit b69e141
>
>> diff --git a/sql/handler.cc b/sql/handler.cc
>> index fb2921f..e68e37b 100644
>> --- a/sql/handler.cc
>> +++ b/sql/handler.cc
>> @@ -5863,16 +5863,86 @@ int handler::ha_reset()
>> DBUG_RETURN(reset());
>> }
>>
>> +/** @breif
>
> typo: brief
>
>> + Compare two records
>> + Need a better place for this function
>> + @returns true if equal else false*/
>> +my_bool rec_hash_cmp(TABLE *tbl ,Field * hash_field)
>
> in C++ you can use 'bool'. 'my_bool' is only for C
>
> and, please, pay attention to where you put spaces :)
>
>> +{
>> + Item * t_item;
>> + t_item=hash_field->vcol_info->expr_item->next;
>
> No, please don't rely on the 'next' pointer. It used to link
> all items into a list, so that they could be destroyed all at once.
> It's not how you traverse expression arguments.
>
> Use, for example,
>
> Item **arg,**arg_end;
> for (arg=args, arg_end=args+arg_count; arg != arg_end ; arg++)
> {
> Item t_item= *arg;
>
> or
> for (uint i= 0; i < arg_count; i++)
> {
> Item *t_item= args[i];
>
> See these and other examples in item_func.cc
>
>> + int diff = tbl->record[1]-tbl->record[0];
>
> record[1] - record[0] is always table->s->rec_buff_length
> But there's one problem with that - you cannot use record[1],
> see a comment below.
>
>> + Field * t_field;
>> + int first_length,second_length;
>> + while(t_item)
>> + {
>> + for(int i=0;i<tbl->s->fields;i++)
>> + {
>> + t_field=tbl->field[i];
>> + if(!my_strcasecmp(system_charset_info,
>> + t_item->name,t_field->field_name))
>> + break;
>> + }
>
> your t_item is an Item_field, so your t_field
> will be t_item->field, no need to iterate anything.
>
>> + /* First check for nulls */
>> + if(t_field->is_real_null()||t_field->is_real_null(diff))
>> + {
>> + return false;
>> + }
>
> This shouldn't be necessary, see below, you shouldn't call this function
> when a field value is NULL.
>
>>
>> + /*
>> + length of field is not equal to pack length when it is
>> + subclass of field_blob and field _varsting
>> + */
>> + first_length = t_field->data_length();
>> + second_length = t_field->data_length(diff);
>> + if(first_length!=second_length)
>> + return false;
>> + if(t_field->cmp_max(t_field->ptr,t_field->ptr+diff,first_length))
>> + return false;
>> + t_item=t_item->next;
>> + }
>> + return true;
>> +}
>> int handler::ha_write_row(uchar *buf)
>> {
>> - int error;
>> + int error,map;
>> Log_func *log_func= Write_rows_log_event::binlog_row_logging_function;
>> + Field **field_iter;
>> DBUG_ASSERT(table_share->tmp_table != NO_TMP_TABLE ||
>> m_lock_type == F_WRLCK);
>> DBUG_ENTER("handler::ha_write_row");
>> DEBUG_SYNC_C("ha_write_row_start");
>> + /* First need to whether inserted record is unique or not */
>>
>> + /* One More Thing if i implement hidden field then detection can be easy */
>> + field_iter=table->field;
>> + for(uint i=0;i<table->s->fields;i++)
>
> You can iterate only virtual fields, there's a separate list for them.
>
>> + {
>> + if(strncmp((*field_iter)->field_name,"DB_ROW_HASH_",12)==0)
>> + {
>> + table->file->ha_index_init(0,0);
>
> this might be a problem, as it changes the current index.
> I think it's ok for INSERT, but may be not ok for UPDATE
> (imagine, UPDATE ... WHERE indexed_column=5, the server does
> ha_index_init for indexed_column, and then, in a loop,
> ha_index_next/ha_update_row. So you cannot swicth to another index
> in the middle of UPDATE loop)
>
> So, use ha_index_read_idx_map(), it doesn't change the current index.
>
> by the way, how could it be ha_index_init(0,0) ???
> you need to specify your unique key number, not 0.
>
>> + /* We need to add the null bit */
>> + /* If the column can be NULL, then in the first byte we put 1 if the
>> + field value is NULL, 0 otherwise. */
>> + uchar * ptr = (uchar *)my_malloc(sizeof(char ) *1+8, MYF(MY_WME));
>
> 1. where do you free it?
> 2. doing malloc/free for _every row_ is too expensive, mallocs
> generally are not very good for concurrency.
> You can simply use
>
> uchar ptr[9]
>
> can you not?
>
>> + *ptr=0;
>
> why *ptr=0 ? You need to check whether the column value is actually NULL,
> because now you set the key to "not null" unconditionally.
> Besides, because NULL value can never cause HA_ERR_FOUND_DUPP_KEY,
> you don't need to search if the column is NULL, you can skip this whole
> check.
>
>> + ptr++;
>> + memcpy(ptr,(*field_iter)->ptr,8);
>> + ptr--;
>
> No need to go low-level, you can do: Why wouldn't you use
> key_copy() function? Like
>
> uchar buf[9];
> DBUG_ASSERT(key->length == sizeof(buf));
> key_copy(ptr, buf, key, 0, false);
>
>> + map= table->file->ha_index_read_map(table->record[1],ptr,HA_WHOLE_KEY,HA_READ_KEY_EXACT);
>
> 1. why 'map'? what does it mean?
> 2. Now it's getting a bit tricky. in UPDATE you cannot use record[1] -
> it's the row before the update. I'm afraid you'll need to allocate another
> table->s->rec_buff_length buffer for this, I don't see any other choice.
> Of course, you can do it on demand, on the table's memroot, preserve
> between calls, and so on.
>
>> + if(!map)
>> + {
>> + if(rec_hash_cmp(table,*field_iter))
>> + {
>> + table->file->ha_index_end();
>> + DBUG_RETURN(HA_ERR_FOUND_DUPP_KEY);
>> + }
>> +
>> + }
>> + table->file->ha_index_end();
>> + }
>> + field_iter++;
>> + }
>> MYSQL_INSERT_ROW_START(table_share->db.str, table_share->table_name.str);
>> mark_trx_read_write();
>> increment_statistics(&SSV::ha_write_count);
>> diff --git a/sql/item_func.cc b/sql/item_func.cc
>> index e5c1f4c..fb23495 100644
>> --- a/sql/item_func.cc
>> +++ b/sql/item_func.cc
>> @@ -1916,6 +1916,41 @@ void Item_func_int_div::fix_length_and_dec()
>> }
>>
>>
>> +longlong Item_func_hash::val_int()
>> +{
>> + unsigned_flag =true;
>
> the other way around, no space before, one space after :)
>
>> + ulong nr1= 1,temp=1, nr2= 0;
>> + CHARSET_INFO *cs;
>> + for(int i=0;i<arg_count;i++)
>> + {
>> + String * str = args[i]->val_str();
>> + if(args[i]->null_value)
>> + {
>> + null_value=1;
>
> don't forget to set null_value=0 if no arguments are NULL
>
>> + return 0;
>> + }
>> + nr2=args[i]->max_length;
>
> Hmm, I don't know. This makes your hash depend on metadata, not only on data.
> I don't think it's a good idea.
>
>> + cs=str->charset();
>> + char l[4];
>> + int4store(l,str->length());
>> + cs->coll->hash_sort(cs, (uchar *)l,sizeof(str->length()), &temp, &nr2);
>
> not sizeof(str->length()), but sizeof(l).
>
>> + nr1^=temp;
>> + cs->coll->hash_sort(cs, (uchar *)str->ptr(),str->length(), &temp, &nr2);
>
> Why did you change the hashing function? &nr1 should be the 4th argument
> of hash_sort(), no xors afterwards.
>
> If you change it, you need to prove mathematically that your version
> is better. That's why I personally prefer to not to mess with it :)
>
>> + nr1^=temp;
>> + }
>> + return (longlong)nr1;
>> +}
>> +
>> +
>> +void Item_func_hash::fix_length_and_dec()
>> +{
>> + maybe_null= 1;
>> + null_value= 0;
>
> no need to set null_value here, the value is calculated
> in val_int, not in fix_length_and_dec.
> fix_length_and_dec is called once, at the very beginning of a query,
> val_int is called for every row. And you need to set null_value=0
> for every row.
>
>> + decimals= 0;
>> + max_length= 8;
>> +}
>> +
>> +
>> longlong Item_func_mod::int_op()
>> {
>> DBUG_ASSERT(fixed == 1);
>> diff --git a/sql/sql_insert.cc b/sql/sql_insert.cc
>> index 9b4238b..2c90f3c 100644
>> --- a/sql/sql_insert.cc
>> +++ b/sql/sql_insert.cc
>> @@ -724,6 +723,20 @@ bool mysql_insert(THD *thd,TABLE_LIST *table_list,
>> DBUG_RETURN(TRUE);
>> }
>>
>
> Please mark places you're going to fix later, e.g.
> // TODO when we'll have hidden fields, this will be removed
>
>> + context= &thd->lex->select_lex.context;
>> + Field ** f=table_list->table->field;
>> + List<Item> * i_list = (List<Item> *)values_list.first_node()->info;
>> + for(uint i=0;i<table_list->table->s->fields;i++)
>> + {
>> + if(strncmp((*f)->field_name,"DB_ROW_HASH_",12)==0)
>> + {
>> + i_list->push_back(new (thd->mem_root)
>> + Item_default_value(thd,context),thd->mem_root);
>> + }
>> + f++;
>> + }
>> + List_iterator_fast<List_item> its(values_list);
>> +
>> lock_type= table_list->lock_type;
>>
>> THD_STAGE_INFO(thd, stage_init);
>> diff --git a/sql/sql_table.cc b/sql/sql_table.cc
>> index dfce503..ac2988f 100644
>> --- a/sql/sql_table.cc
>> +++ b/sql/sql_table.cc
>> @@ -3012,8 +3012,8 @@ int prepare_create_field(Create_field *sql_field,
>> (sql_field->decimals << FIELDFLAG_DEC_SHIFT));
>> break;
>> }
>> - if (!(sql_field->flags & NOT_NULL_FLAG) ||
>> - (sql_field->vcol_info)) /* Make virtual columns allow NULL values */
>> + if (!(sql_field->flags & NOT_NULL_FLAG) )
>> + /* (sql_field->vcol_info)) Make virtual columns allow NULL values */
>
> don't forget to test (see above about mysql-test) that your change
> does not affect normal generated columns and they are still always nullable.
> to test that, create a table with a persistent generated column
> with an expression that can return NULL.
>
>> sql_field->pack_flag|= FIELDFLAG_MAYBE_NULL;
>> if (sql_field->flags & NO_DEFAULT_VALUE_FLAG)
>> sql_field->pack_flag|= FIELDFLAG_NO_DEFAULT;
>> @@ -3223,6 +3223,80 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info,
>> bool tmp_table= create_table_mode == C_ALTER_TABLE;
>> DBUG_ENTER("mysql_prepare_create_table");
>>
>> + /*
>> + scan the the whole alter list
>> + and add one field if length of blob is zero
>> + */
>
> you need a TODO marker here (TODO change to work for all too-long unique keys)
>
>> +
>> + List_iterator<Key> key_iter(alter_info->key_list);
>> + Key *key_iter_key;
>> + Key_part_spec *temp_colms;
>> + char num = '1';
>> diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
>> index 24dc3c4..0135d9c 100644
>> --- a/sql/sql_yacc.yy
>> +++ b/sql/sql_yacc.yy
>> @@ -9543,6 +9543,15 @@ function_call_keyword:
>> if ($$ == NULL)
>> MYSQL_YYABORT;
>> }
>> + /*Currently hash for just 1 field
>> + * */
>
> your comment is already obsolete :)
>
>> + |HASH_SYM '(' expr_list ')'
>> + {
>> + $$= new (thd->mem_root)Item_func_hash(thd,*$3);
>> + if($$==NULL)
>> + MYSQL_YYABORT;
>> + }
>> +
>> | INSERT '(' expr ',' expr ',' expr ',' expr ')'
>> {
>> $$= new (thd->mem_root) Item_func_insert(thd, $3, $5, $7, $9);
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
Follow ups
References