← Back to team overview

maria-developers team mailing list archive

Re: Please review MDEV-8092 Change Create_field::field from "const char *" to LEX_CSTRING

 

Hi Alexander,

Thanks for the explanation. I agree with your points regarding std::string.
Although it is possible to control allocation using an allocator, it does
indeed make sense that std::string is not easily applicable in this case.

I am personally for the introduction of an extra string class if it serves
our purpose better. However we should strive to make the interfaces for our
classes more uniform, as it quickly becomes confusing which operations are
supported and which are not. The classes at hand seem reasonable in this
regard. I think I would override the equality operator also.

Vicentiu


On Fri, 8 Jan 2016 at 09:36, Alexander Barkov <bar@xxxxxxxxxxx> wrote:

> Hi Vicențiu,
>
>
> On 01/08/2016 02:48 AM, Vicențiu Ciorbaru wrote:
> > Hi Alexander, Sergei,
> >
> > On Fri, 8 Jan 2016 at 00:10 Alexander Barkov <bar@xxxxxxxxxxx
> > <mailto:bar@xxxxxxxxxxx>> wrote:
> >
> >     Hi Sergei,
> >
> >
> >     On 01/07/2016 08:53 PM, Sergei Golubchik wrote:
> >      > Hi, Alexander!
> >      >
> >      > On Nov 27, Alexander Barkov wrote:
> >      >>     Hi Serg,
> >      >>
> >      >> Please review a patch for MDEV-8092.
> >      >
> >      > I didn't review a lot of it, sorry. Because my first two comments
> >     - if
> >      > we agree on them - might cause quite a few changes
> >     (search/replace kind
> >      > of changes, but still). So I'd better look at the whole patch
> after
> >      > that.
> >
> >     Sure. Thanks.
> >
> >     Please see below.
> >
> >      >
> >      >> diff --git a/sql/field.h b/sql/field.h
> >      >> index 0591914..a40e307 100644
> >      >> --- a/sql/field.h
> >      >> +++ b/sql/field.h
> >      >> @@ -537,6 +537,115 @@ inline bool
> >     is_temporal_type_with_time(enum_field_types type)
> >      >>   }
> >      >>
> >      >>
> >      >> +/**
> >      >> +  @class Name
> >      >> +  An arbitrary SQL name, consisting of a NULL-terminated string
> >      >> +  and its length.
> >      >> +  Later it can be reused for table and schema names.
> >      >> +*/
> >      >> +class Name
> >      >> +{
> >      >> +  const char *m_str;
> >      >> +  uint m_length;
> >      >
> >      > Sorry, we have too many "string" classes already.
> >      > Either use LEX_STRING or create a Lex_string
> >      > which is inherited from LEX_STRING, binary compatible with it
> >      > and only adds C++ methods (and can be casted back and forth
> >     automatically).
> >      > So that one could use Lex_string where a LEX_STRING is expected
> >     and vice
> >      > versa
> >
> >     Sorry for a long reply.
> >
> >     This is a very convenient moment of time when
> >     we can improve a few things. I disagree to go the easier way
> >     with LEX_STRING.
> >
> >
> >     LEX_STRING is a very unfortunately designed thing for the server
> >     side code.
> >
> >     It was a kind of Ok on 32 bit platforms.
> >     But now on 64bit platforms it's just a waste of memory.
> >     Length can never be large enough to use the entire "size_t".
> >
> >     Btw, it can even never be long enough to use "uint". It should be
> enough
> >     to use uint16 for length and use the available remainder (6 bytes) to
> >     store extra useful things, like string metadata.
> >     It will help to avoid some operations, like character set conversion
> >     in the client-server protocol, and I planned to add some optimization
> >     in the future.
> >
> >
> >     I propose to do the other way around:
> >     - gradually get rid of using LEX_STRING and LEX_CSTRING for names
> >     in the server side and use more suitable Name-alike classes instead.
> >     - keep LEX_XXX only for compatibility with the existing pure C API,
> >         where it already presents now.
> >
> >
> >     Please note, in the current patch version the Name class has
> >     constructors that can accept LEX_STRING and LEX_CSTRING.
> >     This means that you can pass LEX_STRING and LEX_CSTRING to any
> >     function or method that is designed to accept Name,
> >     and the patch already uses this. So this switch will go very
> smoothly.
> >
> >
> >     As for "too many string classes", I have heard this a few times
> already,
> >     both in MariaDB and earlier in Oracle :)
> >     I don't agree with this statement.
> >
> >
> > Hi Alexander, Sergei!
> >
> > Since this discussion might be somewhat similar to a difficulty I'm
> > facing (see recent email on dev list "Use of std::string"), I figured
> > I'd join in.
> >
> > In that email I suggested making use of std::string instead of regular
> > raw pointers and length fields. From looking at the patch (mostly just
> > the name classes, did not investigate everything), the std::string class
> > would work just fine for the Name class and be very clear cut parent
> > class for the Column_name. Implementing an equality operator perhaps,
> > instead of eq_name function would improve clarity also.
> >
> > The advantages of using the standard library would be that compilers
> > provide specific optimizations for them. At the same time, we would not
> > be adding an extra string API.
>
> I think std::string would be a non-optimal storage for column,
> table, database names.
>
>
> There are at least two serious reasons not to use std::string:
>
> 1. std::string is a true dynamic string. It supports reallocation.
> Internally it consists of a structure which is effectively similar to:
>
> {
>    size_t m_length;
>    size_t m_alloced_length;
>    int m_reference_count;
>    char *m_ptr;
> }
>
> Note, sizeof(std::string) reports only 8 bytes,
> but this is just because of implementation tricks.
> The real minimum size is in fact 32 bytes.
>
> Column names are not dynamic string. Once they're created,
> they never change. So they don't need reallocation.
> They only need 10 bytes for a structure like this:
>
> {
>    char *m_ptr;      // 8 bytes
>    uint16 m_length;  // 2 bytes
>    // we can use the remaining unaligned 6 bytes for some other purposes
>    // e.g. for string metadata, to avoid character set conversion in
> many cases
> }
>
> 10 vs 32 is better.
>
>
> 2. Another, and the most important thing, is that names are created on
> MEM_ROOT, not directly on heap. This is true both the structure itself,
> and the char array buffer.
>
> std::string must be freed via destructor or delete, otherwise you'll get
> memory leaks.
>
> Names are automatically freed whenever the containing MEM_ROOT is freed.
> If we switch to std::string, we'll have to maintain the list of all
> created names, to free them properly at the end. This will make the
> things more complicated.
>
>
> >
> > Regards,
> > Vicentiu
>

Follow ups

References