← Back to team overview

maria-developers team mailing list archive

Re: MDEV-9369 IN operator with ( num, NULL ) gives inconsistent result

 

Hi, Alexander!

On Feb 08, Alexander Barkov wrote:
> Hello Sergei,
> 
> Please review a patch for MDEV-9369.
> 
> Briefly, it fixes cmp_item::cmp() to return three possible values: 
> FALSE, TRUE, UNKNOWN. These values are then recursively pulled to
> the very top level, to the owner Item_xxx.
> 
> The owner Item_func_in (as well as Item_func_case and Item_func_equal)
> then further "translate" these tree-values logic into the traditional
> val_int()+null_value combination using some additional information.
> 
> For example, Item_func_in::val_int() uses "negated" as an additional
> information.
> 
> The patch looks Ok for me.

Looks ok to me too. A couple of stylistic comments, see below.
ok to push afterwards.

> diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc
> index d5f5087..39003de 100644
> --- a/sql/item_cmpfunc.cc
> +++ b/sql/item_cmpfunc.cc
> @@ -3837,15 +3843,15 @@ void cmp_item_decimal::store_value(Item *item)
>    /* val may be zero if item is nnull */
>    if (val && val != &value)
>      my_decimal2decimal(val, &value);
> +  set_null_value(item->null_value);
>  }
>  
>  
>  int cmp_item_decimal::cmp(Item *arg)
>  {
>    my_decimal tmp_buf, *tmp= arg->val_decimal(&tmp_buf);
> -  if (arg->null_value)
> -    return 1;
> -  return my_decimal_cmp(&value, tmp);
> +  return (m_null_value || arg->null_value) ?

if you access m_null_value directly, you don't need a setter for it.
it'll be less confusing if you replace set_null_value(X) with m_null_value=X.

alternatively, you can add a getter is_null_value(). But I personally
wouldn't do it, cmp_xxx is a small set of simple classes, no need to get
too enterprisey there.

> +    UNKNOWN : (my_decimal_cmp(&value, tmp) != 0);
>  }
>  
>  
> @@ -3892,10 +3900,10 @@ cmp_item *cmp_item_datetime::make_same()
>  }
>  
>  
> -bool Item_func_in::nulls_in_row()
> +bool Item_func_in::list_contains_null()

I found the old name clear enough. But ok, whatever...

>  {
>    Item **arg,**arg_end;
> -  for (arg= args+1, arg_end= args+arg_count; arg != arg_end ; arg++)
> +  for (arg= args + 1, arg_end= args+arg_count; arg != arg_end ; arg++)
>    {
>      if ((*arg)->null_inside())
>        return 1;
> @@ -4010,6 +4018,32 @@ void Item_func_in::fix_length_and_dec()
>      }
>    }
>  
> +  /*
> +    First conditions for bisection to be possible:
> +     1. All types are similar, and
> +     2. All expressions in <in value list> are const
> +  */
> +  bool bisection_possible=
> +    type_cnt == 1 &&                                   // 1
> +    const_itm;                                         // 2
> +  if (bisection_possible)
> +  {
> +    /*
> +      In the presence of NULLs, the correct result of evaluating this item
> +      must be UNKNOWN or FALSE. To achieve that:
> +      - If type is scalar, we can use bisection and the "have_null" boolean.
> +      - If type is ROW, we will need to scan all of <in value list> when
> +        searching, so bisection is impossible. Unless:
> +        3. UNKNOWN and FALSE are equivalent results
> +        4. Neither left expression nor <in value list> contain any NULL value
> +      */
> +
> +    if (cmp_type == ROW_RESULT &&
> +        !((is_top_level_item() && !negated) ||              // 3
> +          (!list_contains_null() && !args[0]->maybe_null))) // 4
> +      bisection_possible= false;

I think the condition will be easier to understand if you remove the
negation of that complex condition in parentheses:

    if (cmp_type == ROW_RESULT &&
        ((!is_top_level_item() || negated) &&             // 3
          (list_contains_null() || args[0]->maybe_null))) // 4

I find this ^^^ much easier to understand.

> +  }
> +
>    if (type_cnt == 1)
>    {
>      if (cmp_type == STRING_RESULT && 

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx


Follow ups