← Back to team overview

maria-developers team mailing list archive

Re: Please review MDEV-8092 Change Create_field::field from "const char *" to LEX_CSTRING

 

Hi, Alexander!

On Jan 09, Alexander Barkov wrote:
> >> I disagree to go the easier way with LEX_STRING.  LEX_STRING is a
> >> very unfortunately designed thing for the server side code.
> >>
> >> It was a kind of Ok on 32 bit platforms.
> >> But now on 64bit platforms it's just a waste of memory.
> >> Length can never be large enough to use the entire "size_t".
> >
> > I wouldn't sweat over it. No matter what you do with LEX_STRING it won't
> > decrease server memory requirements in any noticeable way.
> 
> Monty proposes to collect some of the bool flags in Items into
> bit mask, to save a few bytes, because he thinks it's important.
> You propose to lose 4 or 6 bytes, because it does not affect
> server memory in any noticeable ways and thus is not important.
> 
> I agree with Monty here.
> Saving a few bytes is at least not harmful and should affect
> performance positively. A little bit, but still positively.
> 
> I am not afraid of having more string classes.

It is harmful in a sense that it adds more concepts to the source code
and a developer needs to keep them in mind when writing new code.
These new string classes make code base more complex. Sometimes the
added complexity is unavoidable. Sometimes it's unnecessary.

Anyway, you say that this change improves the performance? I don't think
it does. May even make the performance worse - access to unaligned
memory can be slower (presuming you've packed your data, otherwise the
compiler can align them, which would destroy all your "space savings").
So, the final proof is to benchmark it and see.

> When column metadata is sent to the client side, there is a character
> conversion from utf8 to character_set_client.
> If we remember the column name metadata inside the Name class,
> we can save on conversion and use faster non-converting methods instead.

What kind of metadata? The repertoire?

> > I've checked, LEX_STRING is only used in the plugin API that is
> > versioned and we can safely change LEX_STRING in 10.2 to use, say,
> > uint16, if needed.
> I propose not to change LEX_STRING.

Fine :)

> >> I think that:
> >>
> >> 1. Trying to reuse classes that are not fully suitable for a task is
> >> more harmful than having more classes, because this is simply
> >> not efficient.
> >>
> >> 2. Nobody objects to have many Item or Field classes that suite
> >> a particular functionality better. But some people are very afraid
> >> of having more strings. Why?
> >> Various strings that we have in the server code *are* different enough.
> >> Different string do deserve different classes.
> >
> > Precisely. So, to know what different strings are there in the server, I
> > want to list them all. Then we'll see whether how string classes fit in.
> >
> > 1. char* - zero terminated string. I don't see why one should ever use
> >     it (in the server).
> > 2. uchar* - byte array, not zero terminated, no length. Even worse.
> >     Never ever.
> > 3. LEX_STRING - a string and a length, no charset. Used in C with a
> >     preallocated buffer.
> > 4. LEX_CSTRING - constant string, no charset.
> > 5. LEX_CUSTRING - constant byte array? weird, but possible.
> > 6. DYNAMIC_STRING - C string allocated on the heap. Can be appended to,
> >     reallocates automatically. No charset.
> > 7. CSET_STRING - C++ LEX_STRING with a charset.
> > 8. String - swiss army knife C++ string. Uses a preallocated buffer or
> >     heap. Grows, shrinks, convert charsets, can copy, move, initialize
> >     from anything, print hex, make coffee, escape, append anything, and
> >     so on.
> > 9. StringBuffer - helper template to initialize a String from a local
> >     fixed buffer (very common usage pattern).
> >
> > And I'm pretty sure I've missed a few.
> > Two questions:
> >
> > - what did I miss?
> 
> 10. ConnectSE strings, which re-implement String with a different 
> allocator, but otherwise repeat String functionality.

Ok, so you're basically saying that String is not generic enough and
sometimes one needs a String that uses a different allocator.
I agree with that.

> 11. There are also places where we have just pairs
> 
>    const char *name;
>    uint/size_t name_length;
> 
> inside bigger structures.

That's basically my #1, we should never do it.
They should be replaced with LEX_STRINGs asap.

> > - how do your new classes fits into it?
> 
> They do not fit.
> 
> Various strings have these properties:
> 1. maximum possible length
> 2. pointer signness
> 3. "const" qualifier in the pointer
> 4. memory allocation/realloction possibility.
> 5. Needed in the pure C API

Right. So

LEX_STRING:     signed/noconst/no/C
LEX_CSTRING:    signed/const/no/C
LEX_CUSTRING:   unsigned/const/no/C
DYNAMIC_STRING: signed/noconst/yes/C
CSET_STRING:    signed/noconst/no/C++
String:         signed/noconst/yes/C++

hmm, apparently, you've forgot

   6. charset and other metadata

last two classes have that, other do not.
And, as you can see, I've omitted "maximum possible length" part, as I
don't think it matters much.

> A column name is a string with these properties:
> 1. Not longer than 64K (perhaps even not longer than 255)
> 2. Signed pointer
> 3. Constant pointer
> 4. No [re]allocation
> 5. Not needed in C API
>
> Non of the currently existing classes or structures perfectly fit.

it's signed/const/no/C++ which is pretty much LEX_CSTRING, just as
you've originally had in MDEV. If you add

  6. knows its charset (and repertoire)

then it's CSET_STRING or a (new) CSET_CSTRING, if you want the 'const'
qualifier (although, I suspect, that const CSET_STRING might work too).

> > please add 10, 11, etc to the list as you see fit.
> 
> 12. Name, with "uint16 length" :)

I do not think we need it.

> Note, some of these strings could share some code if:
> 
> 1. We use a template, with abstract pointer and length types.
> So we can pass say "const char *" as PTR and uint16 as LENGTH,
> to get Name.
> String, Name, CSET_STRING can have a common template root.

Yes.

> 2. We accept:
> 
> MDEV-7063 Split String into logical components
> 
> Unfortunately, nobody was willing to review this patch when
> I tried to help Olivier not to re-implement String.

This is assigned to Monty now.

> This is very very very pity and discouraging. I'm crying
> in the nights and eat anti-depressants :)
> Well, not really yet, but getting very close to this :)

Sorry :(

Let's speed up the discussion and take this string-class question on the
next maria call?

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx
-- 
Vote for my Percona Live 2016 talks:
https://www.percona.com/live/data-performance-conference-2016/sessions/mariadb-connectors-fast-and-smart-new-protocol-optimizations#community-voting
https://www.percona.com/live/data-performance-conference-2016/sessions/mariadb-101-security-validation-authentication-encryption#community-voting


Follow ups

References