← Back to team overview

maria-developers team mailing list archive

Re: Please review MDEV-8092 Change Create_field::field from "const char *" to LEX_CSTRING

 

Hi Vicențiu,


On 01/08/2016 02:48 AM, Vicențiu Ciorbaru wrote:
Hi Alexander, Sergei,

On Fri, 8 Jan 2016 at 00:10 Alexander Barkov <bar@xxxxxxxxxxx
<mailto:bar@xxxxxxxxxxx>> wrote:

    Hi Sergei,


    On 01/07/2016 08:53 PM, Sergei Golubchik wrote:
     > Hi, Alexander!
     >
     > On Nov 27, Alexander Barkov wrote:
     >>     Hi Serg,
     >>
     >> Please review a patch for MDEV-8092.
     >
     > I didn't review a lot of it, sorry. Because my first two comments
    - if
     > we agree on them - might cause quite a few changes
    (search/replace kind
     > of changes, but still). So I'd better look at the whole patch after
     > that.

    Sure. Thanks.

    Please see below.

     >
     >> diff --git a/sql/field.h b/sql/field.h
     >> index 0591914..a40e307 100644
     >> --- a/sql/field.h
     >> +++ b/sql/field.h
     >> @@ -537,6 +537,115 @@ inline bool
    is_temporal_type_with_time(enum_field_types type)
     >>   }
     >>
     >>
     >> +/**
     >> +  @class Name
     >> +  An arbitrary SQL name, consisting of a NULL-terminated string
     >> +  and its length.
     >> +  Later it can be reused for table and schema names.
     >> +*/
     >> +class Name
     >> +{
     >> +  const char *m_str;
     >> +  uint m_length;
     >
     > Sorry, we have too many "string" classes already.
     > Either use LEX_STRING or create a Lex_string
     > which is inherited from LEX_STRING, binary compatible with it
     > and only adds C++ methods (and can be casted back and forth
    automatically).
     > So that one could use Lex_string where a LEX_STRING is expected
    and vice
     > versa

    Sorry for a long reply.

    This is a very convenient moment of time when
    we can improve a few things. I disagree to go the easier way
    with LEX_STRING.


    LEX_STRING is a very unfortunately designed thing for the server
    side code.

    It was a kind of Ok on 32 bit platforms.
    But now on 64bit platforms it's just a waste of memory.
    Length can never be large enough to use the entire "size_t".

    Btw, it can even never be long enough to use "uint". It should be enough
    to use uint16 for length and use the available remainder (6 bytes) to
    store extra useful things, like string metadata.
    It will help to avoid some operations, like character set conversion
    in the client-server protocol, and I planned to add some optimization
    in the future.


    I propose to do the other way around:
    - gradually get rid of using LEX_STRING and LEX_CSTRING for names
    in the server side and use more suitable Name-alike classes instead.
    - keep LEX_XXX only for compatibility with the existing pure C API,
        where it already presents now.


    Please note, in the current patch version the Name class has
    constructors that can accept LEX_STRING and LEX_CSTRING.
    This means that you can pass LEX_STRING and LEX_CSTRING to any
    function or method that is designed to accept Name,
    and the patch already uses this. So this switch will go very smoothly.


    As for "too many string classes", I have heard this a few times already,
    both in MariaDB and earlier in Oracle :)
    I don't agree with this statement.


Hi Alexander, Sergei!

Since this discussion might be somewhat similar to a difficulty I'm
facing (see recent email on dev list "Use of std::string"), I figured
I'd join in.

In that email I suggested making use of std::string instead of regular
raw pointers and length fields. From looking at the patch (mostly just
the name classes, did not investigate everything), the std::string class
would work just fine for the Name class and be very clear cut parent
class for the Column_name. Implementing an equality operator perhaps,
instead of eq_name function would improve clarity also.

The advantages of using the standard library would be that compilers
provide specific optimizations for them. At the same time, we would not
be adding an extra string API.

I think std::string would be a non-optimal storage for column,
table, database names.


There are at least two serious reasons not to use std::string:

1. std::string is a true dynamic string. It supports reallocation.
Internally it consists of a structure which is effectively similar to:

{
  size_t m_length;
  size_t m_alloced_length;
  int m_reference_count;
  char *m_ptr;
}

Note, sizeof(std::string) reports only 8 bytes,
but this is just because of implementation tricks.
The real minimum size is in fact 32 bytes.

Column names are not dynamic string. Once they're created,
they never change. So they don't need reallocation.
They only need 10 bytes for a structure like this:

{
  char *m_ptr;      // 8 bytes
  uint16 m_length;  // 2 bytes
  // we can use the remaining unaligned 6 bytes for some other purposes
// e.g. for string metadata, to avoid character set conversion in many cases
}

10 vs 32 is better.


2. Another, and the most important thing, is that names are created on
MEM_ROOT, not directly on heap. This is true both the structure itself,
and the char array buffer.

std::string must be freed via destructor or delete, otherwise you'll get memory leaks.

Names are automatically freed whenever the containing MEM_ROOT is freed.
If we switch to std::string, we'll have to maintain the list of all
created names, to free them properly at the end. This will make the
things more complicated.



Regards,
Vicentiu


Follow ups

References