← Back to team overview

maria-developers team mailing list archive

Re: MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery

 

Hi, Alexander!

On Jun 16, Alexander Barkov wrote:
> 
> Agreed. This version fixes around 16 classes that had the same problem,
> and which only needed to swap "str" and "tmp_value" in their val_str()
> implementations.
> 
> As agreed on slack, the following three classes:
> - Item_str_conv
> - Item_func_make_set
> - Item_char_typecast
> with more complex implementations will be fixed separately.
> I'll create separate MDEVs for these three classes.

Right, thanks!

> > And, by the way, could you also fix the comment in item.h around
> > Item::val_str(String *) not to say that it can only change allocated
> > strings? May be it should say about const (String::make_const)
> > strings instead. Thanks. Also you could add something about this
> > your CONCAT fix. Like, the item can use internal buffer as a
> > temporary storage, but the result should be in the String argument,
> > not the other way around.
> 
> Can you please suggest proper comments and proper places for them?
> I got stuck with that :)

Let's see. Old comment:
============================
    NOTE
      Buffer passed via argument  should only be used if the item itself
      doesn't have an own String buffer. In case when the item maintains
      it's own string buffer, it's preferable to return it instead to
      minimize number of mallocs/memcpys.
      The caller of this method can modify returned string, but only in case
      when it was allocated on heap, (is_alloced() is true).  This allows
      the caller to efficiently use a buffer allocated by a child without
      having to allocate a buffer of it's own. The buffer, given to
      val_str() as argument, belongs to the caller and is later used by the
      caller at it's own choosing.
      A few implications from the above:
      - unless you return a string object which only points to your buffer
        but doesn't manages it you should be ready that it will be
        modified.
      - even for not allocated strings (is_alloced() == false) the caller
        can change charset (see Item_func_{typecast/binary}. XXX: is this
        a bug?
      - still you should try to minimize data copying and return internal
        object whenever possible.
============================
New comment suggestion:
============================
    NOTE
      The caller can modify the returned String, if it's not marked
      "const" (with the String::mark_as_const() method). That means that
      if the item returns its own internal buffer (e.g. tmp_value), it
      *must* be marked "const" [1]. So normally it's preferrable to
      return the result value in the String, that was passed as an
      argument. But, for example, SUBSTR() returns a String that simply
      points into the buffer of SUBSTR()'s args[0]->val_str(). Such a
      String is always "const", so it's ok to use tmp_value for that and
      avoid reallocating/copying of the argument String.

      [1] consider SELECT CONCAT(f, ":", f) FROM (SELECT func() AS f);
      here the return value of f() is used twice in the top-level
      select, and if they share the same tmp_value buffer, modifying the
      first one will implicitly modify the second too.
============================

> commit cec33b1026c54bd1e0e92d26053843b3bced7c81
> Author: Alexander Barkov <bar@xxxxxxxxxxx>
> Date:   Fri Jun 16 12:58:01 2017 +0400
> 
>     MDEV-10306 Wrong results with combination of CONCAT, SUBSTR and CONVERT in subquery
>     
>     The bug happens because of a combination of unfortunate circumstances:
>     
>     1. Arguments args[0] and args[2] of Item_func_concat point recursively
>     (through Item_direct_view_ref's) to the same Item_func_conv_charset.
>     Both args[0]->args[0]->ref[0] and args[2]->args[0]->ref[0] refer to
>     this Item_func_conv_charset.
>     
>     2. When Item_func_concat::args[0]->val_str() is called,
>     Item_func_conv_charset::val_str() writes its result to
>     Item_func_conc_charset::tmp_value.
>     
>     3. Then, for optimization purposes (to avoid copying),
>     Item_func_substr::val_str() initializes Item_func_substr::tmp_value
>     to point to the buffer fragment owned by Item_func_conv_charset::tmp_value
>     Item_func_substr::tmp_value is returned as a result of
>     Item_func_concat::args[0]->val_str().
>     
>     4. Due to optimization to avoid memory reallocs,
>     Item_func_concat::val_str() remembers the result of args[0]->val_str()
>     in "res" and further uses "res" to collect the return value.
>     
>     5. When Item_func_concat::args[2]->val_str() is called,
>     Item_func_conv_charset::tmp_value gets overwritten (see #1),
>     which effectively overwrites args[0]'s Item_func_substr::tmp_value (see #3),
>     which effectively overwrites "res" (see #4).
>     
>     This patch does the following:
>     
>     a. Changes Item_func_conv_charset::val_str(String *str) to use
>        tmp_value and str the other way around. After this change tmp_value
>        is used to store a temporary result, while str is used to return the value.
>        The fixes the second problem (without SUBSTR):
>          SELECT CONCAT(t2,'-',t2) c2
>            FROM (SELECT CONVERT(t USING latin1) t2 FROM t1) sub;
>        As Item_func_concat::val_str() supplies two different buffers when calling
>        args[0]->val_str() and args[2]->val_str(), in the new reduction the result
>        created during args[0]->val_str() does not get overwritten by
>        args[2]->val_str().
>     
>     b. Fixing the same problem in val_str() for similar classes
>     
>        Item_func_to_base64
>        Item_func_from_base64
>        Item_func_weight_string
>        Item_func_hex
>        Item_func_unhex
>        Item_func_quote
>        Item_func_compress
>        Item_func_uncompress
>        Item_func_des_encrypt
>        Item_func_des_decrypt
>        Item_func_conv_charset
>        Item_func_reverse
>        Item_func_soundex
>        Item_func_aes_encrypt
>        Item_func_aes_decrypt
>        Item_func_buffer
>     
>     c. Fixing Item_func::val_str_from_val_str_ascii() the same way.
>        Now Item_str_ascii_func::ascii_buff is used for temporary value,
>        while the parameter passed to val_str() is used to return the result.
>        This fixes the same problem when conversion (from ASCII to e.g. UCS2)
>        takes place. See the ctype_ucs.test for example queries that returned
>        wrong results before the fix.
>     
>     d. Some Item_func descendand classes had temporary String buffers
>        (tmp_value and tmp_str), but did not really use them.
>        Removing these temporary buffers from:
>     
>        Item_func_decode_histogram
>        Item_func_format
>        Item_func_binlog_gtid_pos
>        Item_func_spatial_collection:
>     
>     e. Removing Item_func_buffer::tmp_value, because it's not used any more.
>     
>     f. Renaming Item_func_[un]compress::buffer to "tmp_value",
>        for consistency with other classes.
>     
>     Note, this patch does not fix the following classes
>     (although they have a similar problem):
>     
>        Item_str_conv
>        Item_func_make_set
>        Item_char_typecast
>     
>     They have a complex implementations and simple swapping between "tmp_value"
>     and "str" won't work. These classes will be fixed separately.

Looks ok, thanks!

Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx


References