← Back to team overview

maria-developers team mailing list archive

Re: Please review MDEV-8092 Change Create_field::field from "const char *" to LEX_CSTRING


Hi, Alexander!

On Jan 08, Alexander Barkov wrote:
> >> diff --git a/sql/field.h b/sql/field.h
> >> index 0591914..a40e307 100644
> >> --- a/sql/field.h
> >> +++ b/sql/field.h
> >> @@ -537,6 +537,115 @@ inline bool is_temporal_type_with_time(enum_field_types type)
> >>   }
> >>
> >>
> >> +/**
> >> +  @class Name
> >> +  An arbitrary SQL name, consisting of a NULL-terminated string
> >> +  and its length.
> >> +  Later it can be reused for table and schema names.
> >> +*/
> >> +class Name
> >> +{
> >> +  const char *m_str;
> >> +  uint m_length;
> >
> > Sorry, we have too many "string" classes already.
> > Either use LEX_STRING or create a Lex_string
> > which is inherited from LEX_STRING, binary compatible with it
> > and only adds C++ methods (and can be casted back and forth automatically).
> > So that one could use Lex_string where a LEX_STRING is expected and vice
> > versa
> Sorry for a long reply.
> This is a very convenient moment of time when
> we can improve a few things.

I agree with this, to a certain extent. Cleanups and refactoring is
good. But I'm starting to feel that you do too many too big changes and
that makes the task MDEV much larger than it was supposed to be. And
much delayed too.

Wouldn't it be a good time now to actually start implementing a feature?

> I disagree to go the easier way with LEX_STRING.
> LEX_STRING is a very unfortunately designed thing for the server side code.
> It was a kind of Ok on 32 bit platforms.
> But now on 64bit platforms it's just a waste of memory.
> Length can never be large enough to use the entire "size_t".

I wouldn't sweat over it. No matter what you do with LEX_STRING it won't
decrease server memory requirements in any noticeable way.

> Btw, it can even never be long enough to use "uint". It should be enough
> to use uint16 for length and use the available remainder (6 bytes) to
> store extra useful things, like string metadata.
> It will help to avoid some operations, like character set conversion
> in the client-server protocol, and I planned to add some optimization
> in the future.

I could say that one "should not use LEX_STRING where character sets are
involved", but let's postpone that till the end of the email.

> I propose to do the other way around:
> - gradually get rid of using LEX_STRING and LEX_CSTRING for names
> in the server side and use more suitable Name-alike classes instead.

More suitable for *what*? See below.

> - keep LEX_XXX only for compatibility with the existing pure C API,
>    where it already presents now.

I've checked, LEX_STRING is only used in the plugin API that is
versioned and we can safely change LEX_STRING in 10.2 to use, say,
uint16, if needed.

> As for "too many string classes", I have heard this a few times already,
> both in MariaDB and earlier in Oracle :)

Perhaps, because it's true? :)

> I don't agree with this statement.
> I think that:
> 1. Trying to reuse classes that are not fully suitable for a task is 
> more harmful than having more classes, because this is simply
> not efficient.
> 2. Nobody objects to have many Item or Field classes that suite
> a particular functionality better. But some people are very afraid
> of having more strings. Why?
> Various strings that we have in the server code *are* different enough.
> Different string do deserve different classes.

Precisely. So, to know what different strings are there in the server, I
want to list them all. Then we'll see whether how string classes fit in.

1. char* - zero terminated string. I don't see why one should ever use
   it (in the server).
2. uchar* - byte array, not zero terminated, no length. Even worse.
   Never ever.
3. LEX_STRING - a string and a length, no charset. Used in C with a
   preallocated buffer.
4. LEX_CSTRING - constant string, no charset.
5. LEX_CUSTRING - constant byte array? weird, but possible.
6. DYNAMIC_STRING - C string allocated on the heap. Can be appended to,
   reallocates automatically. No charset.
7. CSET_STRING - C++ LEX_STRING with a charset.
8. String - swiss army knife C++ string. Uses a preallocated buffer or
   heap. Grows, shrinks, convert charsets, can copy, move, initialize
   from anything, print hex, make coffee, escape, append anything, and
   so on.
9. StringBuffer - helper template to initialize a String from a local
   fixed buffer (very common usage pattern).

And I'm pretty sure I've missed a few.
Two questions:

- what did I miss?
- how do your new classes fits into it?

please add 10, 11, etc to the list as you see fit.

> > this is logically wrong.
> > A field *has* a name, that's right. But a field *is not* a name.
> > The Field class should include the name, not be inherited from it.
> Can I spend a few more seconds of you time, to have some fun?
> This reminds me the phrase:
> A rabbit IS not only costly fur, but IS also 3-4 kilograms of dietetic
> meat.
> Looks like this phrase is also logically wrong.
> Or should I rather say it *has* a logically wrong phrasing? :)

Yeah, human languages are often (most of the time?) use words loosely
and rely on implicit meaning. Like "can you pass the salt, please?" -
the expected answer is to pass the salt, not to say "yes, I can".
(and, strictly speaking, human languages don't use words and don't rely
on the meaning. Humans do :)

Chief Architect MariaDB
and security@xxxxxxxxxxx
Vote for my Percona Live 2016 talks:

Follow ups