maria-developers team mailing list archive

Thread
Date

Type_handler and INET6

To: Sergei Golubchik <serg@xxxxxxxxxxx>, maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Fri, 12 Feb 2016 14:32:55 +0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

Hello Sergei,

I'm sending a new version for Type_handler and INET6 for your review.


There are three patches attached.

1. inet6-v65-prepare.diff.gz
  Includes preparatory changes, to avoid different execution paths
  for different data types.

2. inet6-v65-incremental.diff.gz
  Actually adds INET6, together with val_raw family methods,
  last minute changes that did not get into #1.

3. inet6-v65-full.diff.gz is a full patch that includes
   the previous two.


Details on the patches as well as my answers to your previous review
suggestions follow below.


Thanks.



The first patch makes preparatory changes, to avoid different execution
paths and adds some helper classes needed for the API.

- Adds a class Record_addr, to simplify API (pass just one value
  instead of 3 separate values for ptr, null_ptr and null_bit) and share
  some code moving initialization (e.g. for a temporary table purposes)
  to Record_addr constructors. Derives Field from it.

- Adds a class Type_ext_attributes. This collects extra members of
  Item_type_holder. Currently it covers only extra attributes for GEOMETRY,
  DECIMAL, ENUM/SET. We'll add a generic "void *extra" pointer here,
  when user-defined types need even more extra attributes.
  This class will also be needed inside Item_hybrid_func, which currently
  have a bug (do not preserve geometry type and srid of the arguments).
  I'll fix it later. See MDEV-9405.

- Adds a class Create_attr, to pass Create_field related attributes tohandlers.


- Adds a class Type_handler_register,
  to look up handlers by type code and names.

- Continues unification of the code that creates temporary fields
  for various Items, see ::create_tmp_field() in sql_select.cc.
  This task is not fully finished. Some more chages are still needed.
  The intent is to get rid of ::create_tmp_field() and switch to
  use methods in Item instead.

- Adds a data type handler pointer to Arg_comparator.
  Previously the was only "Item_result m_compare_type".

- Splits Item_func_case::fix_length_and_dec() into three separate steps:
  fix_field_type(), fix_return_arguments(), fix_comparison_arguments(),
  to incorporate the type handler methods easier.

- Splits Item_func_between::val_int() into virtual methods inType_handler_xxx.


- Adds virtual methods to Type_handler, to avoid different execution paths
  for non-traditional types:

  calc_pack_length(), check_column_definition(),
  prepare_column_definition(), make_table_field(),
  Item_save_in_field(), make_cache_item(), calc_pack_length(),
  calc_display_length(), Item_type_holder_join_attributes(),
  Item_func_hex_val_str_ascii(), Item_func_hybrid_field_type_val_xxx(),
  Item_func_between_fix_length_and_dec(), Item_func_between_val_int(),
  Item_sum_hybrid_fix_length_and_dec(),

- Adds Type_handler_numeric, to share some code in real, int decimalhandlers.


- Moves field_type_for_temporal_comparison() from Item to Type_handler.

- Makes some other changes in data type aggregation in various pieces of the
  code. This will need some more changes though.

- Removes Field::result_merge_type(). Uses Type_handler::cmp_type()
  and Type_handler::result_type() instead.

- Simplifies ::make_field() in field.cc, usingType_handler::make_table_field().


- Removes the global Arg_comparator::comparator_matrix[6][2] and
  uses methods in Type_handler instead.



The second patch actually adds INET6 and:
- Adds val_raw family methods into Items
- Adds some pieces into the type aggregation code in a few places
- Fixes sql_yacc.yy to handle non-traditional types,
  adds Cast_type and fixes Lex_cast_type_st.
- Fixes Type_handler_register to search non-traditional types by name.
- Disables equal expression propagation for non-traditional types
  (the code is not ready yet for this, e.g. Item_equal needs some changes)
- Adds the INET6 handler with tests



Also, I answer your questions and suggestions that you made during
the first review. Please see below.



>> > > === added file 'mysql-test/r/type_inet6.result'
> > > --- mysql-test/r/type_inet6.result	1970-01-01 00:00:00 +0000
> > > +++ mysql-test/r/type_inet6.result	2014-07-01 11:10:33 +0000
> > > +::ff
> > > +::ffff
> > > +::0.255.255.255
> > > +::255.255.255.255
> > > +::ff:ffff:ffff
> > > +::ffff:255.255.255.255
>
> Hmm. A bit unexpected. Do you have a link to some
> document that specifies how ipv6 numbers should be parsed and printed?


https://en.wikipedia.org/wiki/IPv6_address says:

:ffff:0:0/96 — This prefix is designated as an IPv4-mapped IPv6 address.

Note, I also checked this in PostgreSQL and it works in the same way:

bar=> select inet'::ffff:ffff:ffff';
          inet
------------------------
 ::ffff:255.255.255.255


> > > +SELECT * FROM t1 WHERE a=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF;
> > > +a
> > > +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
> > > +ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
> > > +Warnings:

> > > +Warning 1105 Cast to INET6 converted'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'to 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff'

>
> Too many warnings. I'd rather use Item_cache or something similar
> to have only one actual type conversion and one warning

It now creates Item_cache. So there's only one warning.



> > > === modified file 'sql/field.h'
> > > --- sql/field.h	2014-06-11 08:08:08 +0000
> > > +++ sql/field.h	2014-07-01 10:38:12 +0000
> > > @@ -366,6 +437,10 @@ class Field
> > >    virtual ~Field() {}
> > >    /* Store functions returns 1 on overflow and -1 on fatal error */

> > > virtual int store(const char *to, uint length,CHARSET_INFO*cs)=0;

> > > +  virtual int  store_raw(const String *raw, enum_field_types type)
> > > +  {
> > > +    return store(raw->ptr(), raw->length(), raw->charset());
> > > +  }
>

> How do you guarantee that the 'raw' is in the correct internal rawformat?

> Or is it only used after int_to_raw, str_to_raw, etc?

raw representation can only be created by a method of the corresponding
Type_handler_xxx, or come from "ptr" of the corresponding Field_xxx.
Bad raw format means that there is a bug inside Type_handler_xxx.

Btw, In the latest patch I changed it to:

  int  store_raw_native(const char *str, uint length);

Conversion (when needed) is done outside of Field.

> > > +class Field_type_joiner
>
> It's rather "aggregator", not "joiner".
> But in fact, it's not even that, it's a pair of { type, derivation }.
> May be it should be Field_type_with_derivation ? :)
> And a method Field_type_with_derivation::aggregate(...)

It's now Type_handler_hybrid_field_type, which I earlier added in 10.1.

There is no "derivation" in this version, as we discussed in Luxembough.

Sigh, I'd really really prefer if we had it though.

I think when field=expr is compared according to the field data type isclearer

than they're compared accoring to the expr datatype.
Also, comparing according to field will use the optimizer by default,
without having to cast "expr" to the field data type.


> > > +class Type_handler
> > > +{
> > > +public:
> > > +  Type_handler() {}
> > > +  ~Type_handler() {}
> > > +  virtual const char *field_type_name() const = 0;
> > > +  virtual uint field_type_name_length() const = 0;
>
> may be, remove field_ prefix, or even a field_type_ prefix?
> the method is still unambiguous, but much shorter and readable.
> and, may be, make it non-virtual, like
>
>   ...
>   private:
>   LEX_STRING type_name;
>   public:
>   Type_handler(const char *name_arg)
>   {
>      type_name.str= name_arg;
>      type_name.length= strlen(name_arg);
>   }
>   const char *name() const { return type_name.str; }
>   const char *name_length() const { return type_name.length; }
>   ...
>
> and your new type will do in the constructor:
>
>    Type_handler_inet6(): Type_handler("INET6") { }
>
> less boilerplate code for the new types, and you know how I'm trying
> to avoid boilerplate code in the APIs :)


I'm using the very needed new class Name for it.
It now looks simpler, I believe.


>
> > > +  virtual enum_field_types field_type() const = 0;
>
> That doesn't work for loadable types.
> I think that we eventually need to do with types what we've done with

> storage engines. There was db_type enum, but now the engine is almostalways

> identified by the pointer to the handlerton. So, for types a pointer to
> the Type_handler might be a unique type identifier in the server.
> Replacing enum_field_types where possible.


Sure, in the end we should get a code that will identify
handlers by a pointer rather than field_type().
This will need a few additional steps, which I just did not have
a chance to do yet:

- Fix all existing Items to return type_handler() instead of field_type()

as a type identifier. This will be easier after we add virtualinheritance

  (or templates) soon, on top of my Type_handler changes.
- Get rid of all field_type() calls: split the remaining pieces of the code
  that use field_type() into more virtual methods in Type_handler.

- Fix FRM to store new field types as a textual name rather than anumeric ID.


Can we do these steps a separately?
Note, it can be done after adding INET6 and UUID,
before adding dynamic loadability.

> > > +  virtual enum_field_types fallback_field_type() const = 0;
>
> I don't like the idea of fallback_field_type.
> You have Field_type_joiner for new types, but you fallback to
> Field::field_type_merge() for old types. I'd rather not distinguish
> between "old" and "new" types. Just like we have with storage engines or
> authentication - there are no "native" and "plugin" storage engines (or

> authentication), *all* storage engines are plugins. So there's nocode like

>
>   if (is_plugin(engine_type))
>     ...
>   else
>     ...
>
> so, I prefer uniform code with no special checks for type handlers.
> Implementations differ, but the interface is the same for all types.

I nodified the code, so there is no a separate fallback_field_type() anymore.

But there is still cmp_type(), which is uses in very rare cases,

to exchange data to/from other data types: INET6->something,something->INET6.


Also, there are still some different execution paths inside the methods
that aggregate data types. We'll need to discuss it. I could not come up
with a simple single method for aggregation.
Also, currently field type aggregation is done in different ways in
various pieces of the code (hybrid functions, UNION, comparison).
We should try to unify these ways.


> > > +  virtual Item_result cast_to_int_type() const = 0;
>
> Why is this abstract? It could have a reasonable default implementation
> reducing the amount of boilerplate code in the Type_handler's
>
> > > +  virtual int cmp(Item *a, Item *b) const = 0;
> > > +  virtual int cmp(String *a, String *b) const = 0;
>
> These too


Not sure that exactly cast_to_int_type() can have a reasonable default.

For INET6 and UUID it's DECIMAL_RESULT.
For compressed URLs (MDEV-4630) and external ENUM (MDEV-4913) it's
more likely to be STRING_RESULT.

Anyway, I agree that we should add some reasonable defaults when possible.

Perhaps we could add two abstract classes with some defaults:


Type_handler
 |
 +-- Type_handler_traditional (derive all existing types from this)
 |

+-- Type_handler_binary_fixed_length_storage (derive INET6 and UUIDfrom this)



But I'd propose not to hurry with that. We have time before we
add loadabale data type plugins.
Can we add UUID first? It will show how much INET6 and UUID can share
and thus how much can be moved to Type_handler_binary_fixed_length_storage.

> > > + virtual int save_in_field(Item_hex_hybrid *item, Field *field)const = 0;

>
> Why do you need that virtual? It could be always val_raw/store_raw

I guess this question is not relevant any more, because we now
have handlers for the traditional data types, and they need to treat
hex hybrids differenty (as string for Field_str and Field_temporal,
and as a longlong for the other datatypes).

Note, I modified this piece slighly.
It's now a virtual method in Field rather than in Type_handler:

    int Field::store_hex_hybrid(const char *str, uint length);

> > > + virtual int set_cmp_func(Arg_comparator *cmp, Item *a, Item*b) const = 0;

>
> this can be removed, see below (under Type_handler_inet6::set_cmp_func)

I could not get rid of it. Moreover, the traditional types now use thismethod.

For example:

bool Type_handler_int_result::set_comparator_func(Arg_comparator *cmp) const
{
  return cmp->set_cmp_func_int();
}

So now I have the same execution path for all data types here.

> > > + virtual bool varbinary_to_raw(const String *from, String *to)const = 0;

>
> 1. This most certainly needs a comment. It took me quite a while to
> understand what this method is for and why str_to_raw isn't used instead.
> A comment should say that it's used when a field binary representation is
> specified in a query directly was 0xHHHHHHHH...
>
> 2. Still, I'm not sure this method is needed. It means a different
> behavior for
>
>    where   ipv6 = 0xHHHHHH...
>
> and
>
>    create t1 (b binary(16)); insert t1 values (0xHHHHH....);
>    ... where ipv6 = a ...
>

Note, it's the same with what we do for numbers:

CREATE OR REPLACE TABLE t1 (a INT, b BLOB);
INSERT INTO t1 VALUES (0x31,0x31);
SELECT COUNT(*) FROM t1 WHERE a=0x31;  -- returns a row
SELECT * FROM t1 WHERE a=b;            -- returns no rows

See below.

> an alternative would be to treat any binary string as raw,
> directly in str_to_raw. But a drawback would be that
>
>    where ipv6 = "ffff::1"
>
> and
>
>    where ipv6 = convert("ffff::1" to binary)
>
> will work differently. And it's not particularly good either.

I like the idea to treat all binary strings as raw with INET6,
and then with UUID.

I think this is not really a drawback.

Remember, when we discussed "is [VAR]BINARY a data type or a character set"
around the year of 2005, we fortunately decided that they are data types, so
we introduced data types [VAR]BINARY as a replacement to
"CHAR CHARACTER SET binary".

So [VAR]CHAR abd [VAR]BINARY are different data types!

It should be OK that INET6 will communicate with two
***different*** datatypes [VAR]CHAR vs [VAR]BINARY differently.

If we agree on that, I'll change the code accordingly.

We preliminary agreed on IRC to make explicit and implicit conversionbetween

a data type and BINARY use methods in Type_handler.
(but I have not added them yet).


>
> So, it's a tradeoff, and the question is - what is "less gotcha" ?

> I don't know. I have a slight preference towards removingvarbinary_to_raw()> just to make the API smaller. And thinking about adding it if therewould be

> user complains.
>
> Another thought. I'm starting to doubt that we should support direct
> raw values in the SQL at all. It might work for ipv6
>
>    where ipv6 = 0xHHHHHH
>
> but it certainly won't work for, say, integers:
>
>    where int = 0xHHHHH
>
> ^^^ this is not raw, but a normal hexadecimal number.

> So, unless you introduce a special unambiguous syntax (likeRAW(0xHHHHH)),

> I think that raw values should not be supported in SQL.

This syntax looks useful for me:
  WHERE inet6_field=0xFFFF;

Btw, in case of integer numbers, it's exactly the low level raw formatin Field.

This is how integer numbers are stored, isn't it? I didn't understand why
it won't work for numbers.

Note, comparison to a hex hybrid works fine for INET6 in my patch anddoes not

break the existing data types.

Numeric comparion goes through Item_hex_hybrid::val_int().
String comparison goes through Item_hex_hybrid::val_str().
INET6 comparison goes through Item_hex_hybrid::val_raw().
This is Arg_comparator::func who decides which method to use.
In the future, we could probably unify this: implement val_raw()
for the traditional data types, and use val_raw() for comparison
instead of the usual methods like val_int().


It's more tricky what should happen in case of VARCHAR-alike or BLOB-alike
data types (suppose we'll decide the standard CLOB type in the future).

Should val_raw() include the length bytes? I think no, as it will assume
duplication of "length", once in String::length(), and one more time in
the beginning of String::ptr().

So I was thinking of "raw" as something very close to how the data is stored

in Field, but it's not necessary a 100% match. Perhaps we just need adifferent

name than "raw"...

I agree we don't need the field-style low level raw format
(e.g. with length bytes for a VARCHAR-alike data type).



> > > +  virtual Item *make_cast(MEM_ROOT *mem_root, Item *arg) const = 0;
>
> Hmm. This implies an Item_typecast_xxx for every type.
> I would be easier to have one Item_typecast_handled, that stores
> a pointer to Type_handler. Would be much easier than implementing
> new Item_typecast_xxx for every type.

Some data types could probably share the same Item_typecast_xxx,
but I don't think it's generally doable with a single Item for
all data types.

There are a few reasons why I prefer a separate method in handler
to create a CAST Item.

- Different data types can have different set of extra attributes,
  which the corresponding Item_typecast_xxx should store in.
  It's easier to store attributes in typed structures rather than
  in a generic memory pointed by "void *". It will make our life
  easier when we start debugging in gdb :)

- I plan to simplify sql_yacc.yy so the loadable datatypes
  and the traditional datatypes share the same CAST and field_type grammar.
  It will be needed soon even for the traditional data types.

- A method in handler will help to add the missing CAST types easier,
  e.g. FLOAT, SMALLINT, etc. It looks natural that every Field_xxx has
  a corresponding Item_typecast_xxx.


> > > +    int store_raw(const String *str, enum_field_types type_arg)
> > > +    {
> > > +      if (type_arg != type())
> > > +      {
> > > +        set_null();
> > > +        return 1; // TODO: go through str/int/real/dec/date?
> > > +      }
> > > +      DBUG_ASSERT(str->length() == binary_length());
> > > +      memcpy(ptr, str->ptr(), binary_length());
> > > +      return 0;
> > > +    }
>
> val_raw and store_raw should not be virtual, they are always
> identical for all types. Just don't forget the validity check.


<offtopic>
Btw, it now looks like this:
  int store_raw_native(const char *str, uint length)
  {
    DBUG_ASSERT(length == binary_length());
    memcpy(ptr, str, binary_length());
    return 0;
  }
I moved type conversion to the upper level.
</offtopic>


Right, the actual "store" is often a simple memcpy() for fixed length data
types like INET6. But it's not generally the case.

- It can be different for non-fixed length data types.
  It's again about the fact that the value of val_raw() and the byte array
  stored in Field::ptr() can have slighly differrent formats.
  Variable length types will pass the values using a normal String format,
  without additional length bytes coded inside the array pointed by
  String::ptr().

- Also, in case we have a shared method to store, we'll have to haveadditional

  virtual methods for validity check, to call inside.
  There's no sense for two virtual calls.

- Also, in some cases it will be possible to use val_raw() instead of
  val_int(), val_decimal(), val_str(), val_real(), get_date(), and thus
  share almost duplicate code that's now implemented for different
  Item_result types. I already mentioned Arg_comparator.
  But there could also be only one universal Item_cache, etc.


> > > +    int store_time_dec(MYSQL_TIME *ltime, uint dec)
> > > +    {
> > > +      my_decimal tmp;
> > > +      return store_decimal(date2my_decimal(ltime, &tmp));
> > > +    }
>
> Numerous val_xxx and store_xxx could be replaced by xxx_to_raw
> and raw_to_xxx + val_raw and store_raw. You shouldn't introduce
> both val_xxx and raw_to_xxx. Same for store_xxx and xxx_to_raw


Right. I don't expose raw_to_xxx publically any more.
They are now internal protected Type_handler_inet6 routines.


>
> > > +
> > > +    uint size_of() const { return sizeof(*this); }
> > > +  };
> > > +
> > > +
> > > +  class Arg_comparator_inet6: public Arg_comparator
> > > +  {
> > > +  public:
> > > +    int compare_inet6()
> > > +    {
> > > +      if ((*a)->val_raw(&value1, cmp_field_type()) ||
> > > +          (*b)->val_raw(&value2, cmp_field_type()))
> > > +        goto null;
> > > +      if (set_null)
> > > +        owner->null_value= 0;

> > > + DBUG_ASSERT(value1.length() ==Type_handler_inet6::binary_length());> > > + DBUG_ASSERT(value2.length() ==Type_handler_inet6::binary_length());> > > + return memcmp(value1.ptr(), value2.ptr(),Type_handler_inet6::binary_length());

> > > +    null:
> > > +      if (set_null)
> > > +        owner->null_value= 1;
> > > +      return -1;
> > > +    }
> > > +    int compare_e_inet6()
> > > +    {
> > > +      bool a_is_null= (*a)->val_raw(&value1, cmp_field_type());
> > > +      bool b_is_null= (*b)->val_raw(&value2, cmp_field_type());
> > > +      if (a_is_null || b_is_null)
> > > +        return MY_TEST(a_is_null == b_is_null);

> > > + DBUG_ASSERT(value1.length() ==Type_handler_inet6::binary_length());> > > + DBUG_ASSERT(value2.length() ==Type_handler_inet6::binary_length());> > > + return MY_TEST(memcmp(value1.ptr(), value2.ptr(),Type_handler_inet6::binary_length()) == 0);

> > > +    }
>
> we should try to get rid of one of the functions, see below
> ...
> ...
> ...
> > > +  int set_cmp_func(Arg_comparator *cmp, Item *a, Item *b) const
> > > +  {
> > > +    cmp->set_func(cmp->is_owner_equal_func() ?

> > > + (arg_cmp_func)&Arg_comparator_inet6::compare_e_inet6 :> > > + (arg_cmp_func)&Arg_comparator_inet6::compare_inet6,

> > > +                  field_type());
>
> Again, boilerplate code, better:
>
>  cmp->set_func(Arg_comparator_inet6::compare_e_inet6,
>                Arg_comparator_inet6::compare_inet6,
>                field_type());
>
> no "cmp->is_owner_equal_func() ?" - this can be done inside set_func()

> Also, I wonder, whether every type really needs two functions, forequal func> and not. Perhaps having only one - compare_e_inet6() should besufficient and

> checks for NULL can be done in the common wrapper?
> And, even more, set_cmp_func() is not necessary at all. Type_handler
> should only have compare_type() and compare_e_type() virtual methods
> (or only one compare_e_type()) and the implementation of set_cmp_func()
> becomes identical for all types, and thus doesn't have to be virtual
> anymore. One method less to implement.

I agree.

We should try to do that. Let's try to decompose the functions of theexisting

data types and the INET6 related functions in a separate patch.

>
> > > +    return 0;
> > > +  }

> > > + void sortlength(struct st_sort_field *order, const Item *item)const

> > > +  {
> > > +    order->result_type= item->cmp_type();
> > > +    order->length= Type_handler_inet6::binary_length();
> > > +    order->suffix_length= 0;
> > > +    order->need_strxnfrm= 0;
> > > +  }
> > > +  bool make_sort_key(struct st_sort_field *order, uchar *to) const
> > > +  {
> > > +    Item *item= order->item;
> > > +    String buf((char *) to, order->length, &my_charset_bin);
> > > +    /*

> > > + make_sortkey() uses str_result(), val_result(),val_int_result(),

> > > +      val_decimal_result() for the standard result types.

> > > + QQ: should we introduce val_raw_result() and use hereinstead of

> > > +      val_raw() ?
>
> Or, perhaps, we should fix make_sortkey() to use val_str, val_real, etc.
> I failed to understant why *result() methos are used there.

So did I.

But as I had to implement the val_raw_result() methods anyway for otherpurposes,

it was easy to use val_raw_result() here, for symmetry with the other types,
and in the latest patch I still use _result_ here.
It can be fixed later for all data types to use the usual val_xxx() instead.


> I've even replaced them with non-*result versions and not a single test
> failed. Including the test case that was added when val* methods
> were replaced with val*result methods.
>
> > > +
> > > +Type_handler_register::Type_handler_register()
> > > +{
> > > +  memset(m_handler, 0, sizeof(m_handler));
> > > +  m_min_type= 256;
> > > +  m_max_type= 0;
> > > +  m_min_length= 256;
> > > +  m_max_length= 0;
> > > +  add(&Type_handler_inet6::s_singleton);
>
> Eh? I thought it's added in the Type_handler_inet6 constructor.


I think in the end there should be two separate steps actually:
- a constructor for Type_handler_xxx
- a method to add it into the registry

So the plugin loading routine will first create a handler from a dynamiclibrary,

and then put it into the registry.

Now I just implemented it by a special "class Type_handler_inet6_singleton".

Attachment: inet6-v65-full.diff.gz
Description: application/gzip

Attachment: inet6-v65-incremental.diff.gz
Description: application/gzip

Attachment: inet6-v65-prepare.diff.gz
Description: application/gzip

Follow ups

Re: Type_handler and INET6
From: Sergei Golubchik, 2016-02-26
Re: Type_handler and INET6
From: Sergei Golubchik, 2016-02-26