← Back to team overview

maria-developers team mailing list archive

Re: Please review MDEV-8092 Change Create_field::field from "const char *" to LEX_CSTRING

 

Hi Vicențiu,

On 01/08/2016 02:27 PM, Vicențiu Ciorbaru wrote:
Hi Alexander,

Thanks for the explanation. I agree with your points regarding
std::string. Although it is possible to control allocation using an
allocator, it does indeed make sense that std::string is not easily
applicable in this case.

I am personally for the introduction of an extra string class if it
serves our purpose better. However we should strive to make the
interfaces for our classes more uniform, as it quickly becomes confusing
which operations are supported and which are not. The classes at hand
seem reasonable in this regard. I think I would override the equality
operator also.

It should be fine to override the equality operator.

But on the other hand, it's much easier to search for a code
using "grep eq_name". So I'm not sure what is better.



Vicentiu


On Fri, 8 Jan 2016 at 09:36, Alexander Barkov <bar@xxxxxxxxxxx
<mailto:bar@xxxxxxxxxxx>> wrote:

    Hi Vicențiu,


    On 01/08/2016 02:48 AM, Vicențiu Ciorbaru wrote:
     > Hi Alexander, Sergei,
     >
     > On Fri, 8 Jan 2016 at 00:10 Alexander Barkov <bar@xxxxxxxxxxx
    <mailto:bar@xxxxxxxxxxx>
     > <mailto:bar@xxxxxxxxxxx <mailto:bar@xxxxxxxxxxx>>> wrote:
     >
     >     Hi Sergei,
     >
     >
     >     On 01/07/2016 08:53 PM, Sergei Golubchik wrote:
     >      > Hi, Alexander!
     >      >
     >      > On Nov 27, Alexander Barkov wrote:
     >      >>     Hi Serg,
     >      >>
     >      >> Please review a patch for MDEV-8092.
     >      >
     >      > I didn't review a lot of it, sorry. Because my first two
    comments
     >     - if
     >      > we agree on them - might cause quite a few changes
     >     (search/replace kind
     >      > of changes, but still). So I'd better look at the whole
    patch after
     >      > that.
     >
     >     Sure. Thanks.
     >
     >     Please see below.
     >
     >      >
     >      >> diff --git a/sql/field.h b/sql/field.h
     >      >> index 0591914..a40e307 100644
     >      >> --- a/sql/field.h
     >      >> +++ b/sql/field.h
     >      >> @@ -537,6 +537,115 @@ inline bool
     >     is_temporal_type_with_time(enum_field_types type)
     >      >>   }
     >      >>
     >      >>
     >      >> +/**
     >      >> +  @class Name
     >      >> +  An arbitrary SQL name, consisting of a NULL-terminated
    string
     >      >> +  and its length.
     >      >> +  Later it can be reused for table and schema names.
     >      >> +*/
     >      >> +class Name
     >      >> +{
     >      >> +  const char *m_str;
     >      >> +  uint m_length;
     >      >
     >      > Sorry, we have too many "string" classes already.
     >      > Either use LEX_STRING or create a Lex_string
     >      > which is inherited from LEX_STRING, binary compatible with it
     >      > and only adds C++ methods (and can be casted back and forth
     >     automatically).
     >      > So that one could use Lex_string where a LEX_STRING is
    expected
     >     and vice
     >      > versa
     >
     >     Sorry for a long reply.
     >
     >     This is a very convenient moment of time when
     >     we can improve a few things. I disagree to go the easier way
     >     with LEX_STRING.
     >
     >
     >     LEX_STRING is a very unfortunately designed thing for the server
     >     side code.
     >
     >     It was a kind of Ok on 32 bit platforms.
     >     But now on 64bit platforms it's just a waste of memory.
     >     Length can never be large enough to use the entire "size_t".
     >
     >     Btw, it can even never be long enough to use "uint". It
    should be enough
     >     to use uint16 for length and use the available remainder (6
    bytes) to
     >     store extra useful things, like string metadata.
     >     It will help to avoid some operations, like character set
    conversion
     >     in the client-server protocol, and I planned to add some
    optimization
     >     in the future.
     >
     >
     >     I propose to do the other way around:
     >     - gradually get rid of using LEX_STRING and LEX_CSTRING for names
     >     in the server side and use more suitable Name-alike classes
    instead.
     >     - keep LEX_XXX only for compatibility with the existing pure
    C API,
     >         where it already presents now.
     >
     >
     >     Please note, in the current patch version the Name class has
     >     constructors that can accept LEX_STRING and LEX_CSTRING.
     >     This means that you can pass LEX_STRING and LEX_CSTRING to any
     >     function or method that is designed to accept Name,
     >     and the patch already uses this. So this switch will go very
    smoothly.
     >
     >
     >     As for "too many string classes", I have heard this a few
    times already,
     >     both in MariaDB and earlier in Oracle :)
     >     I don't agree with this statement.
     >
     >
     > Hi Alexander, Sergei!
     >
     > Since this discussion might be somewhat similar to a difficulty I'm
     > facing (see recent email on dev list "Use of std::string"), I figured
     > I'd join in.
     >
     > In that email I suggested making use of std::string instead of
    regular
     > raw pointers and length fields. From looking at the patch (mostly
    just
     > the name classes, did not investigate everything), the
    std::string class
     > would work just fine for the Name class and be very clear cut parent
     > class for the Column_name. Implementing an equality operator perhaps,
     > instead of eq_name function would improve clarity also.
     >
     > The advantages of using the standard library would be that compilers
     > provide specific optimizations for them. At the same time, we
    would not
     > be adding an extra string API.

    I think std::string would be a non-optimal storage for column,
    table, database names.


    There are at least two serious reasons not to use std::string:

    1. std::string is a true dynamic string. It supports reallocation.
    Internally it consists of a structure which is effectively similar to:

    {
        size_t m_length;
        size_t m_alloced_length;
        int m_reference_count;
        char *m_ptr;
    }

    Note, sizeof(std::string) reports only 8 bytes,
    but this is just because of implementation tricks.
    The real minimum size is in fact 32 bytes.

    Column names are not dynamic string. Once they're created,
    they never change. So they don't need reallocation.
    They only need 10 bytes for a structure like this:

    {
        char *m_ptr;      // 8 bytes
        uint16 m_length;  // 2 bytes
        // we can use the remaining unaligned 6 bytes for some other
    purposes
        // e.g. for string metadata, to avoid character set conversion in
    many cases
    }

    10 vs 32 is better.


    2. Another, and the most important thing, is that names are created on
    MEM_ROOT, not directly on heap. This is true both the structure itself,
    and the char array buffer.

    std::string must be freed via destructor or delete, otherwise you'll get
    memory leaks.

    Names are automatically freed whenever the containing MEM_ROOT is freed.
    If we switch to std::string, we'll have to maintain the list of all
    created names, to free them properly at the end. This will make the
    things more complicated.


     >
     > Regards,
     > Vicentiu



References