maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #10761
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
Hi, Alexander!
On Jun 16, Alexander Barkov wrote:
>
> Agreed. This version fixes around 16 classes that had the same problem,
> and which only needed to swap "str" and "tmp_value" in their val_str()
> implementations.
>
> As agreed on slack, the following three classes:
> - Item_str_conv
> - Item_func_make_set
> - Item_char_typecast
> with more complex implementations will be fixed separately.
> I'll create separate MDEVs for these three classes.
Right, thanks!
> > And, by the way, could you also fix the comment in item.h around
> > Item::val_str(String *) not to say that it can only change allocated
> > strings? May be it should say about const (String::make_const)
> > strings instead. Thanks. Also you could add something about this
> > your CONCAT fix. Like, the item can use internal buffer as a
> > temporary storage, but the result should be in the String argument,
> > not the other way around.
>
> Can you please suggest proper comments and proper places for them?
> I got stuck with that :)
Let's see. Old comment:
============================
NOTE
Buffer passed via argument should only be used if the item itself
doesn't have an own String buffer. In case when the item maintains
it's own string buffer, it's preferable to return it instead to
minimize number of mallocs/memcpys.
The caller of this method can modify returned string, but only in case
when it was allocated on heap, (is_alloced() is true). This allows
the caller to efficiently use a buffer allocated by a child without
having to allocate a buffer of it's own. The buffer, given to
val_str() as argument, belongs to the caller and is later used by the
caller at it's own choosing.
A few implications from the above:
- unless you return a string object which only points to your buffer
but doesn't manages it you should be ready that it will be
modified.
- even for not allocated strings (is_alloced() == false) the caller
can change charset (see Item_func_{typecast/binary}. XXX: is this
a bug?
- still you should try to minimize data copying and return internal
object whenever possible.
============================
New comment suggestion:
============================
NOTE
The caller can modify the returned String, if it's not marked
"const" (with the String::mark_as_const() method). That means that
if the item returns its own internal buffer (e.g. tmp_value), it
*must* be marked "const" [1]. So normally it's preferrable to
return the result value in the String, that was passed as an
argument. But, for example, SUBSTR() returns a String that simply
points into the buffer of SUBSTR()'s args[0]->val_str(). Such a
String is always "const", so it's ok to use tmp_value for that and
avoid reallocating/copying of the argument String.
[1] consider SELECT CONCAT(f, ":", f) FROM (SELECT func() AS f);
here the return value of f() is used twice in the top-level
select, and if they share the same tmp_value buffer, modifying the
first one will implicitly modify the second too.
============================
> commit cec33b1026c54bd1e0e92d26053843b3bced7c81
> Author: Alexander Barkov <bar@xxxxxxxxxxx>
> Date: Fri Jun 16 12:58:01 2017 +0400
>
> MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
>
> The bug happens because of a combination of unfortunate circumstances:
>
> 1. Arguments args[0] and args[2] of Item_func_concat point recursively
> (through Item_direct_view_ref's) to the same Item_func_conv_charset.
> Both args[0]->args[0]->ref[0] and args[2]->args[0]->ref[0] refer to
> this Item_func_conv_charset.
>
> 2. When Item_func_concat::args[0]->val_str() is called,
> Item_func_conv_charset::val_str() writes its result to
> Item_func_conc_charset::tmp_value.
>
> 3. Then, for optimization purposes (to avoid copying),
> Item_func_substr::val_str() initializes Item_func_substr::tmp_value
> to point to the buffer fragment owned by Item_func_conv_charset::tmp_value
> Item_func_substr::tmp_value is returned as a result of
> Item_func_concat::args[0]->val_str().
>
> 4. Due to optimization to avoid memory reallocs,
> Item_func_concat::val_str() remembers the result of args[0]->val_str()
> in "res" and further uses "res" to collect the return value.
>
> 5. When Item_func_concat::args[2]->val_str() is called,
> Item_func_conv_charset::tmp_value gets overwritten (see #1),
> which effectively overwrites args[0]'s Item_func_substr::tmp_value (see #3),
> which effectively overwrites "res" (see #4).
>
> This patch does the following:
>
> a. Changes Item_func_conv_charset::val_str(String *str) to use
> tmp_value and str the other way around. After this change tmp_value
> is used to store a temporary result, while str is used to return the value.
> The fixes the second problem (without SUBSTR):
> SELECT CONCAT(t2,'-',t2) c2
> FROM (SELECT CONVERT(t USING latin1) t2 FROM t1) sub;
> As Item_func_concat::val_str() supplies two different buffers when calling
> args[0]->val_str() and args[2]->val_str(), in the new reduction the result
> created during args[0]->val_str() does not get overwritten by
> args[2]->val_str().
>
> b. Fixing the same problem in val_str() for similar classes
>
> Item_func_to_base64
> Item_func_from_base64
> Item_func_weight_string
> Item_func_hex
> Item_func_unhex
> Item_func_quote
> Item_func_compress
> Item_func_uncompress
> Item_func_des_encrypt
> Item_func_des_decrypt
> Item_func_conv_charset
> Item_func_reverse
> Item_func_soundex
> Item_func_aes_encrypt
> Item_func_aes_decrypt
> Item_func_buffer
>
> c. Fixing Item_func::val_str_from_val_str_ascii() the same way.
> Now Item_str_ascii_func::ascii_buff is used for temporary value,
> while the parameter passed to val_str() is used to return the result.
> This fixes the same problem when conversion (from ASCII to e.g. UCS2)
> takes place. See the ctype_ucs.test for example queries that returned
> wrong results before the fix.
>
> d. Some Item_func descendand classes had temporary String buffers
> (tmp_value and tmp_str), but did not really use them.
> Removing these temporary buffers from:
>
> Item_func_decode_histogram
> Item_func_format
> Item_func_binlog_gtid_pos
> Item_func_spatial_collection:
>
> e. Removing Item_func_buffer::tmp_value, because it's not used any more.
>
> f. Renaming Item_func_[un]compress::buffer to "tmp_value",
> for consistency with other classes.
>
> Note, this patch does not fix the following classes
> (although they have a similar problem):
>
> Item_str_conv
> Item_func_make_set
> Item_char_typecast
>
> They have a complex implementations and simple swapping between "tmp_value"
> and "str" won't work. These classes will be fixed separately.
Looks ok, thanks!
Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx
References
-
MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Alexander Barkov, 2017-02-28
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Sergei Golubchik, 2017-03-01
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Sergey Petrunia, 2017-03-02
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Alexander Barkov, 2017-03-03
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Sergei Golubchik, 2017-03-03
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Alexander Barkov, 2017-03-06
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Sergei Golubchik, 2017-03-06
-
Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
From: Alexander Barkov, 2017-06-16