← Back to team overview

maria-developers team mailing list archive

Re: INET6, data type plugins

 

  Hi Sergei,

Thanks for review. Sorry for delay with this.
I've been busy with the other tasks.


I won't reply to every your comment in this letter.
I skip all comments that I agree with, and those that I need
some time to think.


Only some critical and unclear follow below:


On 08/10/2014 03:22 PM, Sergei Golubchik wrote:
<skip>
=== modified file 'sql/field.h'
--- sql/field.h	2014-06-11 08:08:08 +0000
+++ sql/field.h	2014-07-01 10:38:12 +0000
@@ -59,6 +60,15 @@ enum Derivation
    DERIVATION_EXPLICIT= 0
  };

+
+/* Data type derivation */
+enum Type_derivation
+{
+  TYPE_DERIVATION_COERCIBLE= 1,  // Non-typified literals
+  TYPE_DERIVATION_IMPLICIT= 0   // Fields, functions, typified literals
+};

Interesting. Does it affect the existing behavior?
Or you only use it for new types?
I think both alternatives are far from ideal: the former means
incompatibilities, the latter - that new types behave differently from old
ones. Still, given the choice, I'd prefer consistency over compatibility.

Currently my patch implements this only for the new data types.
But I've been thinking about the idea of type derivation for a long
time for the standard types as well.

I find that taking into account type derivation during type aggregation
makes behavior clearer in most cases.
Also, this approach forces using indexes in more cases, when you do:
  WHERE field = literal


There should be three levels actually:

enum Type_derivation
{
 TYPE_DERIVATION_COERCIBLE= 2,  // Non-typified literals
 TYPE_DERIVATION_IMPLICIT=  1,  // Fields, functions, typified literals
 TYPE_DERIVATION_EXPLICIT=  0,  // Explicit CAST(expr AS type)
};

So when you do:

  WHERE field=CAST(literal AS force_type)

comparison is done according to "force_type",
no matter what the type of "field" is.



Some examples:

1. WHERE char_field=10
will compare as string.
Currently it compares as real.

2. WHERE int_field='10'
will compare as int.
Currently it compares as real.

3. WHERE int_field='string'
will return "impossible condition",
as the literal does not cast to INT safely.
Currently it compares as real, and just wastes time,
because the result set is empty anyway.

4. WHERE int_field=10.1
will return "impossible condition",
as the literal does not cast to INT safely.
Currently it compares as real (and wastes time again).

5. WHERE time_field='00:00:00'
will compare as TIME.
Currently it does the same. Nothing change.

6. WHERE string_field=TIME'00:00:00'
Open question. Some variants are possible:
a. typified literals are weaker than fields
  (as it is still a literal on the right side)
b. typified literals are stronger than fields
  (as mentioning the type is a sort of CAST)
c. typified literals have the same type strength with fields,
use the current rules that we have for the type aggregation now.

If we go (a) or (b), then a separate type derivation level is needed.

Currently it compares as TIME (in all cases when a TIME argument is given).


If we agree on this proposal with the idea of type derivation,
then it should be a separate task.



<skip>


I don't like the idea of fallback_field_type.
You have Field_type_joiner for new types, but you fallback to
Field::field_type_merge() for old types.

Not only. We need something to do when two different non-standard
types are being aggregated.

My idea was:

Data types will go in packages. INET6 and INET4 will be in the same
package and will know about each other, and will know how to aggregate
to each other (INET6 is a super-type for INET4).

But we also need to do something when we have two absolutely different data types, which do not know each other.

1. We could return an error in such cases, which would be more SQL standard strict behavior.

2. Or, we could use fallback_field_type(), which would be a more MariaDB
behavior (and more consistent with the standard data types).


So what would you prefer?
N1,
or N2 (have fallback_field_type(), but limit its use only for the cases when two data types that do not know each other are being aggregated)?


I'd rather not distinguish
between "old" and "new" types. Just like we have with storage engines or
authentication - there are no "native" and "plugin" storage engines (or
authentication), *all* storage engines are plugins. So there's no code like

   if (is_plugin(engine_type))
     ...
   else
     ...

so, I prefer uniform code with no special checks for type handlers.
Implementations differ, but the interface is the same for all types.


<skip>


+  virtual bool varbinary_to_raw(const String *from, String *to) const = 0;

1. This most certainly needs a comment. It took me quite a while to
understand what this method is for and why str_to_raw isn't used instead.
A comment should say that it's used when a field binary representation is
specified in a query directly was 0xHHHHHHHH...

2. Still, I'm not sure this method is needed. It means a different
behavior for

    where   ipv6 = 0xHHHHHH...

and

    create t1 (b binary(16)); insert t1 values (0xHHHHH....);
    ... where ipv6 = a ...

Right. Currently it also works differently if you mix numbers and strings:

 WHERE string_field=0xHHHHHH

and

 CREATE TABLE t1 (int_column);
 INSERT INTO t1 VALUES (0xHHHHHH);
 SELECT .. WHERE string_field=int_column;



an alternative would be to treat any binary string as raw,
directly in str_to_raw. But a drawback would be that

    where ipv6 = "ffff::1"

and

    where ipv6 = convert("ffff::1" to binary)

will work differently. And it's not particularly good either.

So, it's a tradeoff, and the question is - what is "less gotcha" ?
I don't know. I have a slight preference towards removing varbinary_to_raw()
just to make the API smaller. And thinking about adding it if there would be
user complains.

Another thought. I'm starting to doubt that we should support direct
raw values in the SQL at all. It might work for ipv6

    where ipv6 = 0xHHHHHH

but it certainly won't work for, say, integers:

    where int = 0xHHHHH

^^^ this is not raw, but a normal hexadecimal number.
So, unless you introduce a special unambiguous syntax (like RAW(0xHHHHH)),
I think that raw values should not be supported in SQL.

So we have three alternatives:
1. Add a method, like I did in the patch, just rename it properly.
  Each type will define its behavior when processing a hex literal.

2. Add RAW().
  Return error for:
   WHERE new_type=0xHHHHHH
  Preserve the existing behavior for:
   WHERE old_type=0XHHHHHH

3. Treat all binary strings as RAW for the new data types.
  Preserve the current behavior for the standard data types.


I find that N1 is the best.
The price is not high:
a method in API, with a clear purpose
(providing that we rename and comment it properly),
and the reward is a good flexibility.


<skip>

+  bool make_sort_key(struct st_sort_field *order, uchar *to) const
+  {
+    Item *item= order->item;
+    String buf((char *) to, order->length, &my_charset_bin);
+    /*
+      make_sortkey() uses str_result(), val_result(), val_int_result(),
+      val_decimal_result() for the standard result types.
+      QQ: should we introduce val_raw_result() and use here instead of
+      val_raw() ?

Or, perhaps, we should fix make_sortkey() to use val_str, val_real, etc.
I failed to understant why *result() methos are used there.
I've even replaced them with non-*result versions and not a single test
failed. Including the test case that was added when val* methods
were replaced with val*result methods.

I don't know. I copied this code from filesort.cc.
It uses val_xxx_result() methods.

Are you saying that filesort.cc can be fixed to use val_xxx()
instead of val_xxx_result()? If so, then it should be just
fixed in a separate patch in 10.1 before my patch.


+Type_handler_register::Type_handler_register()
+{
+  memset(m_handler, 0, sizeof(m_handler));
+  m_min_type= 256;
+  m_max_type= 0;
+  m_min_length= 256;
+  m_max_length= 0;
+  add(&Type_handler_inet6::s_singleton);

Eh? I thought it's added in the Type_handler_inet6 constructor.

Well, I could not understand how to guarantee the proper order
call of the constructors for Type_handler_register and Type_handler_inet6. They there called in the opposite order
and it crashed.

Perhaps it should not be even done in constructors at all.


The code somewhere in mysqld.cc could add Type_handler_inet6
and Type_hander_xxx for the all standard types:

  Type_handlers.add(&Type_handler_int);
  Type_handlers.add(&Type_handler_tinyint);
  ... all standard types ...
  Type_handlers.add(&Type_handler_inet6);


Later Type_handlers.add() will also be used when a type handler
is loaded from a dynamic library.

Does it sound fine?


=== modified file 'sql/item_strfunc.cc'
--- sql/item_strfunc.cc	2014-06-10 18:20:33 +0000
+++ sql/item_strfunc.cc	2014-06-27 15:48:18 +0000
@@ -4303,6 +4303,7 @@ bool Item_func_dyncol_create::prepare_ar
      DYNAMIC_COLUMN_TYPE type= defs[i].type;
      if (type == DYN_COL_NULL) // auto detect
      {
+      DBUG_ASSERT(!Type_handlers.handler(args[valpos]->field_type()));

mark this TODO, please

Can you clarify please?


        /*
          We don't have a default here to ensure we get a warning if
          one adds a new not handled MYSQL_TYPE_...

=== modified file 'sql/rpl_utility.cc'
--- sql/rpl_utility.cc	2014-05-13 09:53:30 +0000
+++ sql/rpl_utility.cc	2014-06-27 15:48:45 +0000
@@ -675,6 +675,9 @@ can_convert_field_to(Field *field,
    */

    DBUG_PRINT("debug", ("Base types are different, checking conversion"));
+  if (Type_handlers.handler(source_type))
+    DBUG_RETURN(false);

add a TODO comment here, please

Can you clarify please?


+
    switch (source_type)                      // Source type (on master)
    {
    case MYSQL_TYPE_DECIMAL:

Regards,
Sergei


Thanks.


References