← Back to team overview

maria-developers team mailing list archive

Re: A proposal to deprecate syntax: SELECT a'test'

 

Hello Monty,

Thanks for you reply.

Did I understand correctly that you have two major concerns:

1. It will be harder to explain in the documentation what a valid alias is.

2. It will be hard to implement, because currently sql_lex.cc returns:
- TEXT_STRING for both single quoted strings and double quoted strings
  in the default sql_mode
- TEXT_STRING for single quoted strings, but IDENT_QUOTED for double
  quoted strings in sql_mode=ANSI_QUOTES.
So you if we add a new terminal symbol DOUBLE_QUOTED_STRING,
then there will be a question how to modify sql_yacc.yy to cover both
sql_mode in a compatible way for the non-alias related grammar.

Can you please clarify what you think how serious is #2?
Will be a real obstacle? Or is the main obstacle only #1?



Also, we could add this into feedback plugin, to figure out
what kind of aliases people really use. It can appear that
nobody really uses single quoted strings as aliases.
What do you think about this?


Please also find comments inline:


On 02/27/2016 12:11 AM, Michael Widenius wrote:
Hi!

On 25 Feb 2016 19:16, "Alexander Barkov" <bar@xxxxxxxxxxx
<mailto:bar@xxxxxxxxxxx>> wrote:

 >> As ' and " are identical in MySQL mode, I don't see how you can fix
 >> this without causing even more confusion.

I want to emphasise the above, as you haven't really understood the
implications of this.
This, in addition to that we support that one can use a string as an
alias makes things a bit tricky to fix in a compatible way.

Ok, so you think that #2 will be a problem, correct?


 > I only propose to change the behavior for the SELECT expression aliases,
 > not to all strings generally.

As I understand it, you propose to change things in the following ways:

From:
- aliases can be identifiers or strings
To:
- aliases has to be identifiers or double quoted strings when used
without AS. When used with AS they can be identifiers or single or
double quoted strings.

I don't think there is any clear logic for the end user to understand
why we would do this.

I think that the final goal should be to deprecate single quoted
strings in all aliases, both with and without AS.

My proposal was to deprecate a'test' (without AS) in an earlier version,
because we need this syntax right now, while "a AS 'text'" could
be deprecated in a later version.

But if you think it will be confusing, then I'll happily agree
to deprecate single quoted strings both with and without AS in
the same version :)


 >>> What can be used instead
 >>> ------------------------
 >>> We have a number of other ways to specify aliases:
 >>>
 >>> The standard ways:
 >>>
 >>> SELECT a AS test;
 >>> SELECT a "test";
 >>> SELECT a AS "test";
 >>
 >>
 >> Ok, this answers my question above. The standard says that an alias is
 >> a text string.

 > Sorry, no. The standard says that an alias is an identifier.
 > Text strings are not valid aliases.

Sorry, I was not clear.
In MySQL mode " is a text string. If we want to support the above
statements , we also have to support using ' for aliases.

Sorry, I still cannot understand this point.
Are you mostly concerned about *implementation* here
(#2 in the beginning of this letter)?
Or mostly about documenting?


The other option would be to say that one must use back tick for
aliases, as that's the equal to ansi sql ".

Removing double quotes from aliases would break the standard
compatibility.
It's possible that there are users who use double quotes
in aliases only and don't use double quotes for any other purposes.
They will be in troubles.
I think we should not break the behavior where we are standard
compliant now. I vote to preserve double quotes in aliases.


So, if your concern is how to ***implement*** this,
will you agree on the change if I come up with a simple patch for
sql_yacc.yy and sql_lex.cc that will do the following:

1. Remove these alias variants in all sql_modes:
SELECT a 'test';
SELECT a AS 'test';

2. Preserve these alias variants for all sql_modes:

SELECT a test;
SELECT a "test";
SELECT a `test`;
SELECT a AS test;
SELECT a AS "test";
SELECT a AS `test`;

3. Preserve the behavior of single and double quoted strings
anywhere else (in all non-alias related grammar parts)
for all sql_modes.

?



 > I only propose to deprecate string literals (i.e. single quoted strings)

In MariaDB, a string literal starts with ' or ". If your are deprecating
one in some context you have also to depricate the other.

I think it should be doable, if implementation is the main concern.


 > as SELECT expression aliases, because this is a non-standard and
 > harmful  extension, which conflicts with another standard syntax assumed
 > for typified literals.

To make things clear. There are 3 non standard things we are talking
about here:
- Both ' and " are string delimiters.
- quoting of identifiers are done with back tick.
- Aliases can be given with identifiers or strings.

You can't depricate something in one context and keep it in another
context. That's not consistent. Either ' and " are string delimiterx or
they are not. They should not be something different depending on context.

 >>
 >>> MySQL/MariaDB extensions:
 >>>
 >>>
 >>> SELECT a `test`;
 >>> SELECT a AS `test`;
 >>>
 >>> That should be enough.
 >>
 >>
 >> Note that this has nothing exclusively to do with alias, but with the
 >> MySQL/MariaDB extension that you can use ' instead of " as a string
 >> delimiter.
 >
 >
 > My proposal is exclusively about aliases, not about using of " as a
 > string delimiter.

See above. You can't make a special case for aliases that breaks the
general rules.
Things gets to complex to describe.

 >> This would break the specification of what a text strings is in
 >> MariaDB.

 > No, this will only change  what a SELECT expression alias is.

If your proposal would be to only accept identifiers as aliases, then we
shouldn't accept " either for aliases. You can't say that " is sometimes
an identifier and sometimes a string.

According to the standard, single quoted strings and double quoted strings are different things. If we try to see them as such and
document them as such, then all problems disappear.

Just think of single quoted strings and double quoted strings as a
different terminal symbols.

- single quoted string can be an expression
- double quoted string can be an expression
- double quoted string can be an alias

but

- single quoted string cannot be an alias

I think there is not any inconsistency in here.

I also think that
It's wrong to state that "both single quoted strings and double
quoted strings must be valid aliases because both single quoted
strings and double quoted strings are valid expressions".


You will not insist that  "both numbers and double
quoted strings must be valid aliases because both numbers and double
quoted strings are valid expressions", will not you? What is the difference?



 >> It would also be very confusing that text strings needs to be
 >> specified differently in different places of the SQL syntax.
 >
 > You're reordering causes and consequences :)
 >
 > Again, this is not about text strings. It's about
 > what can be used as an alias.

It's well defined what a text string is in MySQL mode. You can't change
the definition just for some part of the alias syntax.

Just try to think of single and double quoted strings as different
terminal symbols.


 > I think it will be very obvious:
 >
 > - Expressions support single-quoted and double-quoted strings
 >
 > - Aliases support regular identifiers, delimited (i.e. double-quoted)
 >   identifiers and backtick-quoted identifiers.

We don't have double quoted identifiers in MySQL mode!

This is just a question of mapping from terminal symbols to syntactic items.

The terminal symbol will be DOUBLE_QUOTED_STRING.
I think there are no any problems to map DOUBLE_QUOTED_STRING to
- string literals in expressions
- column names in aliases

It would be very similar to how another terminal symbol "NUM" is mapped
in:

SELECT a,b,1,2 FROM t1 ORDER BY 1,2;

NUM is
- integer literals in expressions
- column references in ORDER BY



You CAN NOT change the meaning of " just for aliases.
Note also that depending on the syntax suggested, you would in some
cases allow strings but not in other cases.

 > Expressions and aliases are instances of very different nature.

Syntax of things must be consistent. You should be able to look at any
part of a sql construct and know what the part is. In your proposal the
meaning of " depends of context, which is the wrong way to do it.

This is only about aliases.
An alias is immediately followed by a comma or by "FROM" or by
END-OF-INPUT. It's perfectly clear, so no needs to go far from
the alias to understand what it is.

I think that in case of aliases it will be even more clearer
than a different meaning of NUM in ORDER BY.


Your options are to allow one to specify aliases with either identifiers
or strings. You should not try to change the meaning of ' or " depending
on context.

I see no problems to interpret a new terminal symbol DOUBLE_QUOTED_STRING as a string literal in most of the grammar,
while as a column name when in alias.

See NUM in ORDER BY.


 > Expression is a value that has a data type and attributes.
 > Alias is an identifier, nothing else.

If that's would be the case in MySQL mode, then you should not be
allowed to use " for aliases either.

I don't agree. See NUM in ORDER BY again.

The fact that NUM is an integer constant almost all around the grammar
does not mean that NUM must be an integer constant in ORDER BY.


 > There is nothing common in them at all.
 > It's quite obvious that they *can* have different syntactic rules.

Sorry, no. It's important to be consistent.

 >>> 2. In 10.2 we disallow this syntax by default and add either
sql_mode or
 >>> old_mode to enable the old behavior.
 >>
 >>
 >> As you can already get the behavior you want by specifying ANSI MODE,
 >
 >
 > Supporting typified literals in ANSI MODE only is not a way to go.
 > We should support them in all modes.

I agree.

 >
 >> I don't think this change is necessary. I think it's also bad as it
 >> makes it harder to define what a text string is.
 >
 > This is not about to define what a text string is.
 > It's the other way around,
 > to define what are valid expressions and valid aliases.

Same thing, if we allow aliases in any case to be defined with a string.
And remember that "aaa" in MySQL mode is ALWAYS a text string.

See NUM and ORDER BY again.



 >> For example:
 >> In MariaDB, a text string can be delimited with ' or ", except in the
 >> case of alias when you have to use '

 > Sorry, not sure that I understood. Can you clarify about aliases and '
 > please?

That's the point. With our proposed change it well be hard to define
what is a text string and what is an identifier and when one can use
which where.

It will be trivial, because expressions and aliases do not intercept in
any ways. They are absolutely different grammar parts!!!
I want to emphasis this!!!

The fact that a user variable can be an valid expression
does not also mean that a user variable must be a valid alias.
This is simply wrong. I think that it's equally wrong to say this about
double-quoted strings.

And see again about meaning of NUM in expressions and in ORDER BY.
NUM being not an expression in ORDER BY is much much much more
confusing than the proposed change about double-strings.



 >> I think it's better to require, as most ANSI SQL databases does, one
 >> to quote column names like 'date' and 'time' to ensure they are not
 >> mixed with operators or string prefixes.
 >
 > I am afraid this is not really true.
 >
 > Both date and time are perfectly valid regular identifiers.
 > Requiring to put them in quotes will be non-Standard.

As far as I remember, date and time are in the sql reserved keywords.
It's allowed by the standard to require that identifiers that are
keywords should be quoted

Oops. Sorry, my mistake.

The SQL standard has two types of keywords:
- reserved keywords, they must be quoted
- non-reserved keywords, they don't have to be quoted

TIME, DATE and TIMESTAMP are in fact reserved keywords and
therefore must be quoted.

But for some reasons we treat them as non-reserved keywords.
Many other databases also treat them as non-reserved keywords.

Don't know why.

If we make TIME, DATE and TIMESTAMP reserved keywords,
it will remove some shift/reduce conflicts.


But the problem is that we cannot make all dynamically
loadable data type names to be reserved keywords.
They must be non-reserved keywords.

Otherwise, loading a new data type plugin can break
existing applications and the server will start to return
a syntax error.

Suppose a query like this:

SELECT test FROM t1;

Suppose now the DBA loads a plugin implementing the "test" data type.

Now the above SELECT query will return a syntax error.


A DBA cannot know all column or table names that its users use in their databases, to safely choose a name for a new plugin.

So, dynamic type names must be non-reserved keywords.


 > I just checked this script:
 >
 >
 > DROP TABLE t1;
 > CREATE TABLE t1 (date DATE);
 > INSERT INTO t1 VALUES('2001-01-01');
 > SELECT date FROM t1;
 >
 > It works perfectly well in SQL Server, PostgreSQL and SQLite3.

Doesn't say anything about the standard.

Right. I replied to
"I think it's better to require, as most ANSI SQL databases".

It seems most of the databases (including MariaDB)
do it in a wrong way :)

But anyway, fixing this for TIME, DATE and TIMESTAMP does
not solve the problem for dynamically loadable types.



 > It will be pity if we won't be able to support this syntax:
 >
 > SELECT INET6'ffff::ffff';
 >
 > unless we deprecate this useless extension that allows string
 > literals as aliases (i.e. single quoted string literals).

That's not true. Wei can easily support this. Just define inet6 as a
keyword that has higher precidence than an alias.

How will it look like in terms of the sql_yacc.yy syntax?


This means that if a user has inet6 as a column name he needs to quote
it if he is using it with an alias without AS.

Aha, you're accepting the idea of having different meanings for these
two examples:
SELECT a'a' FROM t1;
SELECT a AS 'a' FROM t1;
I think this is much more confusing than my proposal to deprecate
single  quotes in aliases.


And there is still a problem here.
Suppose some user uses inet6 as an alias like this:

SELECT inet'inet6' FROM t1;

Now a DBA loads a plugin implementing the 'inet6' data type.

The meaning of the above SELECT changes.

I think deprecating single quoted strings in aliases is a better way to go and will give more consistency and safety.



Note that we also have to support:

SELECT INET6"ffff::ffff";

As " and ' are interchangeable.

Which is something you have not discussed before and would break your
original proposal as you can't distinguish if "..." is a string or an
identifier.

Oops. Sorry for not mentioned this earlier.
This is actually the key point!!!!!

My idea was *exactly*  to get rid of the confusing syntax,
and make these two examples unambiguously mean different things:

SELECT inet6'ffff::ffff';   -- a typified literal of type inet6
SELECT inet6"ffff::ffff";   -- an identifier with an alias "ffff:ffff".

Always! No matter what the sql_mode is, and no matter what the dynamic
type plugins are loaded.

The former should just return an error "unknown data type inet6" if it
is not known to the server, and return a constant if "inet6" is know.


It's true that currently we do support this syntax for literals:

SELECT TIME"10:20:30";
SELECT DATE"2001-01-01";
SELECT TIMESTAMP"2001-01-01 10:20:30";

I think it should be also deprecated.
We should start to return a warning as soon as possible.
Then in a later version change the meaning from typified
literals to aliased identifiers.


Thanks.


Regards,
Monty



References