maria-developers team mailing list archive

Thread
Date
Re: Fwd: mixing of user-defined data types with other data types

To: Alexander Barkov <bar@xxxxxxxxxxx>, maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
From: Vicențiu Ciorbaru <vicentiu@xxxxxxxxxxx>
Date: Wed, 28 Dec 2016 14:58:55 +0000
In-reply-to: <5d8795d9-e5e8-7f4d-51cf-9b91abf51f27@mariadb.org>
Hi Alexander,

I've reviewed your second patch. Comments inline.

>> +}
> >>
> >>
> >>  /*
> >> diff --git a/sql/field.h b/sql/field.h
> >> index 541da5a..fa84e7d 100644
> >> --- a/sql/field.h
> >> +++ b/sql/field.h
> >> @@ -835,7 +835,6 @@ class Field: public Value_source
> >>    virtual Item_result cmp_type () const { return result_type(); }
> >>    static bool type_can_have_key_part(enum_field_types);
> >>
> > The field_type_merge function breaks all the other naming patterns. We
> > have result_type, cmp_type real_type and now field_type_merge. Wouldn't
> it be
> > better to name it field_merge_type, to be consistent? Or since this
> > is a lookup kind of operation, perhaps name it
> (get|lookup)_merge_field_type?
> > I know it's not _exactly_ like result_type and compare_type, but it
> generally
> > is used in a simillar context.
>
> Don't we make our life harder in respect of merging when renaming?
>
> Usually I try not to rename existing functions:
> - There is a chance that we'll merge something from earlier versions,
>   and renaming can cause conflicts.
> - Also, developers are used to this name and can do
>   "grep field_type_merge" or similar when searching the code.
>

I see your point. I am not 100% in favor of the idea of protecting
ourselves (making life easier) in future merges, as it tends to keep us
stuck with the same version of (often) difficult to read code. For this
case, I guess we can live with it, but I'll keep on pointing such things in
future reviews. :)


>
> >>
> >>    static enum_field_types field_type_merge(enum_field_types,
> enum_field_types);
> >> -  static Item_result result_merge_type(enum_field_types);
> >>    virtual bool eq(Field *field)
> >>    {
> >>      return (ptr == field->ptr && null_ptr == field->null_ptr &&
> >> diff --git a/sql/item.cc b/sql/item.cc
> >> index c97f41f..c9b6155 100644
> >> --- a/sql/item.cc
> >> +++ b/sql/item.cc
> >> @@ -9657,20 +9657,8 @@ Item_type_holder::Item_type_holder(THD *thd,
> Item *item)
> >>    maybe_null= item->maybe_null;
> >>    collation.set(item->collation);
> >>    get_full_info(item);
> >> -  /**
> >> -    Field::result_merge_type(real_field_type()) should be equal to
> >> -    result_type(), with one exception when "this" is a Item_field for
> >> -    a BIT field:
> >> -    - Field_bit::result_type() returns INT_RESULT, so does its
> Item_field.
> >> -    - Field::result_merge_type(MYSQL_TYPE_BIT) returns STRING_RESULT.
> >> -    Perhaps we need a new method in Type_handler to cover these type
> >> -    merging rules for UNION.
> >> -  */
> >> -  DBUG_ASSERT(real_field_type() == MYSQL_TYPE_BIT ||
> >> -              Item_type_holder::result_type()  ==
> >> -
> Field::result_merge_type(Item_type_holder::real_field_type()));
> >>    /* fix variable decimals which always is NOT_FIXED_DEC */
> >> -  if (Field::result_merge_type(real_field_type()) == INT_RESULT)
> >>
> > Alright so this seems to be fixed here, looking at Type_handler_bit,
> > inheriting from Type_handler_int_result. Do we test this somewhere
> though?
> > I couldn't find it in the test case, perhaps you can point it out to me.
>
>
> In theory, decimals is always 0 if result_type() is INT_RESULT.
> But I'm not fully sure that in reality non of the Items return
> non-zero decimals in combination with INT_RESULT.
> There's so many hacks in the code, so we combination can
> be used somewhere.
>
> I just tried to comment out these two lines:
> > -  if (Item_type_holder::result_type() == INT_RESULT)
> > -    decimals= 0;
> > +//  if (Item_type_holder::result_type() == INT_RESULT)
> > +//    decimals= 0;
> both in the constructor and in the method join_types()
> and run test. Nothing failed.
>
> So perhaps these two lines can be just replaced to:
>
> DBUG_ASSERT(decimals == 0 ||
>             Item_type_holder::result_type() != INT_RESULT);
>
> push, and see.
>
> Any suggestions?
>

This is indeed ugly. I've tried to look into the calling places, then to
backtrack from there but I gave up after about 30 minutes of seeing a never
ending branching possibility of items. Please add the assert.


>
> > Let's discuss about cleaning this up later. To me it feels like this
> Item does
> > not really belong in the Item class and should be factored out. Probably
> > a whole project on its own :)
>
> I made attempts to move Item_type_holder out of the Item hierarchy in
> the past, but failed. It caused too much refactoring, because
> Item_type_holder is used with List<Item> all around the UNION
> and table creation code.
>
> Perhaps we should make another attempt eventually.
> I suggest to postpone this at least after the main Type_handler related
> changes are done.
>
>
I agree.


> >>
> >>        item_decimals= 0;
> >>      decimals= MY_MAX(decimals, item_decimals);
> >>    }
> >> diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc
> >> index 98b179b..e5e366e 100644
> >> --- a/sql/item_cmpfunc.cc
> >> +++ b/sql/item_cmpfunc.cc
> >> @@ -180,32 +179,40 @@ static int agg_cmp_type(Item_result *type, Item
> **items, uint nitems)
> >>    @return aggregated field type.
> >>  */
> >>
> >> -enum_field_types agg_field_type(Item **items, uint nitems,
> >> -                                bool treat_bit_as_number)
> >>
> > Why is this function in item_cmpfunc.cc and not in sql_type.cc?
>
> Moving this to sql_type.cc can cause additional merge conflicts.
>
> But perhaps it's Ok to move it, as it gets changed significantly anyway
> (not logically, but textually).
>
>
I vote for moving it. I hate it when the implementation is all over the
place.


>
> >> +bool
> >> +Type_handler_hybrid_field_type::aggregate_for_result(const char
> *funcname,
> >> +                                                     Item **items,
> uint nitems,
> >> +                                                     bool
> treat_bit_as_number)
> >>  {
> > I would move uint i to be a local for variable. This is a C-style loop.
> > Is there a compiler that doesn't support this in one of our builders?
>
> Done.
>
> > Also, how about size_t instead of uint? (Probably not necessary but
> > wlad made a point of prefering to use that for iterators and such).
>
> size_t is fine. But this should be done together with changing
> Item_args::arg_count, which is passed to this method.
> I suggest not to do this change under terms of this patch.
>
>
I agree.


> <cut>
>
> >> diff --git a/sql/sql_type.cc b/sql/sql_type.cc
> >> index 8746595..397b5cf 100644
> >> --- a/sql/sql_type.cc
> >> +++ b/sql/sql_type.cc
> >> @@ -54,6 +51,41 @@ static Type_handler_set         type_handler_set;
> >>  Type_handler_null        type_handler_null;
> >>  Type_handler_row         type_handler_row;
> >>  Type_handler_varchar     type_handler_varchar;
> >> +Type_handler_newdecimal  type_handler_newdecimal;
> >> +Type_handler_longlong    type_handler_longlong;
> >> +Type_handler_bit         type_handler_bit;
> >> +
> >> +
> > I'm sure there's a better way to write this so that it gets initialized
> at
> > compile time instead of at runtime (before main).
> > Perhaps we can define the Type_aggregator differently. I need to look
> this
> > Up. For now it will work.
> > Standard offers no guarantees regarding the order, but it shouldn't
> matter
> > for us as the address shouldn't change for global objects during
> > initialization.
>
> The part from Static_data_initializer should eventually be gone.
> We need to extend the Type_handler API first, so a data type handler
> (plugin) can provide an array of its aggregation rules.
> So in the future the server will do the calls like
> type_aggregator_for_result.add() when loading a new data
> type plugin, either on startup, or during INSTALL PLUGIN.
>
> For now, type handlers reside statically in the server anyway,
> so this should work fine, and I think it's 100% safe.
> As you noted, the addresses should not change even if
> the get initialized in some non-reliable order.
>
>
> I chose this approach because I didn't want to expose this code to
> mysqld.cc now. Exposing it would be too early at this point.
>
>
I agree.


> I don't like this class too much. One can easily break it by either
> changing
> > LEX_CSTRING::str or LEX_CSTRING::length without changing the other one.
> > I suggest we make the inheritance private so that the only way to access
> the
> > members is through the methods available.
>
> Done: I changed it to derive privately.
>
> > A suggestion I have is to create a generic "String" class that has this
> same
> > behaviour, without calling it Name. Afterwards typedefing it to Name.
>
> Earlier I proposed to add similar classes to struct.h,
> something like this:
>
> struct Lex_cstring_st: public LEX_CSTRING; // without initialization
>
> class Lex_cstring: protected Lex_cstring_st; // with initialization
>
> and to move all global functions operating
> on LEX_CSTRING as methods into these new struct and class.
>
>
> But Monty disliked it. Monty thinks that having more globally visible
> classes makes the code harder to read.
> I think it makes the code easier to read, to use, and to reuse.
> We never could agree :)
>
> So if we're adding Lex_cstring_st at this point we should be ready:
> - either to convince Monty that this is good
> - or to revert our changes in struct.h and move the class locally
>   to sql_type.h again.
>
> Let's go with Lex_cstring_st in struct.h ?
>
> :)
>

I say we keep it as is for the moment. The patch changes enough as is. It
can happen in a follow up, once we're done.


>> +  static const
> >> +  Type_handler *aggregate_for_result_traditional(const Type_handler
> *h1,
> >> +                                                 const Type_handler
> *h2);
> >> +
> > This function can return a const reference from a const static object
> > within the class's namespace. Why create an object every time this is
> called?
> > Compiler might optimize it, but why risk it?
> > This goes for all the implementations.
>
> For my opinion it's 100% safe. There is no any risk here.
> It should be the same safe with doing "return 10" from a "int" function.
>
> It just reserves 16 bytes for a LEX_CSTRING on the stack and populates
> it, and then the caller uses this populated LEX_CSTRING to access
> its members though the methods ptr() or length().
> Some compilers should probably be able even to use registers instead
> of stack for this. But my intent was not to rely on using registers.
> I just found this style as the shortest possible and the most readable.


> As for performance, it requires the same amount of resources with for
> example passing LEX_CSTRING by value to some function, or just
> to create a local LEX_CSTRING/LEX_STRING variable.
>
> Here are some examples in the existing code:
>
> sp_sql->append(C_STRING_WITH_LEN("CREATE "));
> sp_sql->append(C_STRING_WITH_LEN("PROCEDURE "));
> LEX_STRING pw= { C_STRING_WITH_LEN("password") };
>
> Notice, we don't create static LEX_STRING or LEX_CSTRING for all
> possible strings we need in the server.
> The proposed code should be exactly the same cheap with these examples.
>
> Another approach would be to:
> - have a static variable for every Type_handler name
> - return this variable from the method name().
>
> This could give slight benefits when we need ptr() without length(),
> or the other way around. And the caller in item.cc actually uses
> name().ptr() without name().length().
>
> But as name() will be used for errors and for DBUG_PRINT mostly,
> so I thought it would be more useful to save the number of lines.
> And it's easier to read this way.
> You can see the name right inside the class definition,
> you don't have to go to sql_class.cc to see it.
>


The solution I propose is this:
// in sql_type.h
Type_handler {
    const Name& name() const = 0;
}

Type_handler_xxx {
     static const Name xxx_name;   // xxx
     const Name& name() const;
}

// In sql_type.cc
Type_handler_xxx::xxx_name = Name(C_STRING_WITH_LEN("xxx"));
const Name& Type_handler_xxx::name() const
{
   return xxx_name;
}

Performance wise this is superior. Have a look at assembly for the
following function calls:

// these are defined in say sql_type.cc;
Type_handler *give_random_type_handler()
{
  return new Type_handler_date();
}

bool use_random_name(const Name& name)
{
  printf("%s\n", name.ptr());
  return true;
}

Using them in sql_<something_else>.cc

////// Assembly generated by proposed patch: (GCC 6.2.0)
  Type_handler *hnd = give_random_type_handler();
   1cc5c:       e8 00 00 00 00          call   1cc61
  const Name& n = hnd->name();
   1cc61:       48 8b 10                mov    rdx,QWORD PTR [rax]
   1cc64:       48 89 c7                mov    rdi,rax
   1cc67:       ff 12                   call   QWORD PTR [rdx]
   1cc69:       48 8d bd 00 cf ff ff    lea    rdi,[rbp-0x3100]
   1cc70:       48 89 85 00 cf ff ff    mov    QWORD PTR [rbp-0x3100],rax
   1cc77:       48 89 95 08 cf ff ff    mov    QWORD PTR [rbp-0x30f8],rdx

  use_random_name(n);
   1cc7e:       e8 00 00 00 00          call   1cc83

///// Assembly generated by my suggestion: (GCC 6.2.0)
  Type_handler *hnd = give_random_type_handler();
   1cc5c:       e8 00 00 00 00          call   1cc61
  const Name& n = hnd->name();
   1cc61:       48 8b 10                mov    rdx,QWORD PTR [rax]
   1cc64:       48 89 c7                mov    rdi,rax
   1cc67:       ff 12                   call   QWORD PTR [rdx]
  use_random_name(n);
   1cc69:       48 89 c7                mov    rdi,rax
   1cc6c:       e8 00 00 00 00          call   1cc71

The difference is that the optimized version only uses registers, while the
first version has to write to memory (or cache I guess).

To me, readability seems the same for both implementations. I'm not going
to insist on this too much, but I believe it's best practice to have this
sort of code that passes a const reference, instead of always creating a
new item on the stack. We can discuss more in detail on this if you'd like.


So to sum up, please add the assert you proposed and remove setting
decimals to 0 as a "safety measure". I would like to make use of a static
variable instead of how we now construct "Name" objects for every name()
call.

Ok to push otherwise.

Regards,
Vicențiu
References

Fwd: mixing of user-defined data types with other data types
From: Vicențiu Ciorbaru, 2016-12-12
Re: Fwd: mixing of user-defined data types with other data types
From: Alexander Barkov, 2016-12-13