maria-developers team mailing list archive

Thread
Date

Re: Review of base64.diff and german2.diff

To: Roberto Spadim <roberto@xxxxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Tue, 17 Sep 2013 19:38:13 +0400
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAH3kUhFhwUoier+YYf6Uu6oBHgN7-wobWHkdx9f5QjX3tMRjkg@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6

Hello Roberto,


On 09/17/2013 06:38 PM, Roberto Spadim wrote:

hi guys!
one question, as a mariadb user...
base64 will be exposed to sql layer? could i use TO_BASE64() and
FROM_BASE64() at mysql client?
there's a MDEV in JIRA about it, if you could attach it to this patch
could be nice: MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>

and with time, implement base32 and base16: MDEV-4800
<https://mariadb.atlassian.net/browse/MDEV-4800>

base64 is very usefull, and many php users use it to "send/receive"
files from database
base32 from what i know is usefull only for OTP and other password apps
base16 is a hexadecimal function HEX() and UNHEX()


Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level,
so one case use for example things like this:

INSERT INTO t1 VALUES (FROM_BASE64('base64-string'));

or

SELECT TO_BASE64(column) FROM t1;


There are no plans to implement base32.


thanks guys


2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx <mailto:bar@xxxxxxxxxxx>>

    Hi Monty,

    thanks for review.

    I have addressed most of your suggestions. See the new version attached,
    and the detailed comments inline:



    On 09/12/2013 04:32 PM, Michael Widenius wrote:


        Hi!

        Here is the review for the code that we should put into 10.0

        First the base64:

            === modified file 'mysql-test/t/func_str.test'
            --- mysql-test/t/func_str.test  2013-05-07 11:05:09 +0000
            +++ mysql-test/t/func_str.test  2013-08-28 13:14:24 +0000
            @@ -1555,3 +1555,118 @@ drop table t1,t2;
               --echo # End of 5.5 tests
               --echo #

            +
            +--echo #
            +--echo # Start of 5.6 tests
            +--echo #


        Shouldn't this be start of 10.0 tests ?
        (I know that this code is from MySQL 5.6, but still for us this
        is 10.0...)


    This code is (almost) copy-and-paste from MySQL-5.6.
    I think it's a good idea when looking inside a test file
    to be able to see which tests are coming from MySQL-5.6,
    and which tests are coming from MariaDB-10.0.

    I'd suggest to keep "Start of 5.6 tests" in this particular case,
    and also when merging the tests for the other merged MySQL-5.6 features.




            === modified file 'mysys/base64.c'
            --- mysys/base64.c      2011-06-30 15:46:53 +0000
            +++ mysys/base64.c      2013-03-09 06:22:59 +0000
            @@ -1,5 +1,4 @@
            -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems,
            Inc.
            -   Use is subject to license terms.
            +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates.
            All rights reserved.


        Removed 'all rights reserved'. You can't have that for GPL code.

        If there is any new code from us, please add a copyright message
        for the
        MariaDB foundation too!


    Removed 'all rights reserved' and added "MariaDB", as there are some
    our own changes.




                  This program is free software; you can redistribute it
            and/or modify
                  it under the terms of the GNU General Public License
            as published by
            @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
                                            "abcdefghijklmnopqrstuvwxyz"
                                            "0123456789+/";

            +/**
            + * Maximum length base64_needed_encoded_length()
            + * can handle without overflow.
            + */
            +int
            +base64_encode_max_arg_length(__)
            +{
            +#if (SIZEOF_INT == 8)
            +  /*
            +    6827690988321067803 ->   9223372036854775805
            +    6827690988321067804 ->  -9223372036854775807
            +  */


        Please describe from where the following number comes from
        (so that anyone can recalculate and check the number if needed):


    I added more comments where these numbers come from.



            +  return 0x5EC0D4C77B03531BLL
            +#else
            +  /*
            +    1589695686 -> 2147483646 <tel:2147483646>
            +    1589695687 -> -2147483645 <tel:2147483645>
            +  */
            +  return 0x5EC0D4C6;
            +#endif
            +}
            +

               int
               base64_needed_encoded_length(__int length_of_data)
            @@ -39,6 +60,21 @@ base64_needed_encoded_length(__int length_
               }


            +/**
            + * Maximum length base64_needed_decoded_length()
            + * can handle without overflow.
            + */
            +int
            +base64_decode_max_arg_length(__)
            +{


        The following also need a comment.


    Also added comments.



            +#if (SIZEOF_INT == 8)
            +  return 0x2AAAAAAAAAAAAAAALL;
            +#else
            +  return 0x2AAAAAAA;
            +#endif
            +}
            +
            +


        <cut>

            +/**
            + * Get the next character from a base64 encoded stream.
            + * Skip spaces, then scan the next base64 character or a
            pad character
            + * and collect bits into decoder->c.
            + *
            + * @param  decoder  Pointer to MY_BASE64_DECODER
            + * @return
            + *  FALSE on success (a valid base64 encoding character found)
            + *  TRUE  on error (unexpected character or unexpected
            end-of-input found)
            + */


        Add an empty line here.


    Done.



        It would also be good to have a good description of the base64
        format
        in this file so that one can understand what the functions are
        supposed to do (or at least a link to the specification).

        Some thing I figured out by asking bar on IRC:

        - We never generate spaces
        - When decoding, we allow spaces anywhere.
        - The end of an 64 base  decoded string is always padded with '='
            so that the final length is dividable with 4.


    I added a reference to http://en.wikipedia.org/wiki/__Base64
    <http://en.wikipedia.org/wiki/Base64>,
    as well all added these your comments in proper places of the code.




            +static inline my_bool
            +my_base64_decoder_getch(MY___BASE64_DECODER *decoder)


        Remove inline from the above; It's likely to make the code
        slower, not
        faster as this is a big function with lot of if's.


    Removed "inline".



               {
            -  char b[3];
            -  size_t i= 0;
            -  char *dst_base= (char *)dst;
            -  char const *src= src_base;
            -  char *d= dst_base;
            -  size_t j;
            +  if (my_base64_decoder_skip___spaces(decoder))
            +    return TRUE; /* End-of-input */

            -  while (i < len)
            +  if (!my_base64_add(decoder)) /* Valid base64 character
            found */
                 {
            -    unsigned c= 0;
            -    size_t mark= 0;
            +    if (decoder->mark)
            +    {
            +      /* If we have scanned '=' already, then only '=' is
            valid */
            +      DBUG_ASSERT(decoder->state == 3);
            +      decoder->error= 1;
            +      decoder->src--;
            +      return TRUE; /* expected '=', but encoding character
            found */
            +    }
            +    decoder->state++;
            +    return FALSE;
            +  }


        If you want to have the funtion inline, then move the following to
        another not inline function;  We don't need to generate code for
        this
        for every function call:

            +
            +  /* Process error */
            +  switch (decoder->state)
            +  {
            +  case 0:
            +  case 1:
            +    decoder->src--;
            +    return TRUE; /* base64 character expected */
            +    break;
            +
            +  case 2:
            +  case 3:
            +    if (decoder->src[-1] == '=')
            +    {
            +      decoder->error= 0; /* Not an error - it's a pad
            character */
            +      decoder->mark++;
            +    }
            +    else
            +    {
            +      decoder->src--;
            +      return TRUE; /* base64 character or '=' expected */
            +    }
            +    break;
            +  default:
            +    DBUG_ASSERT(0);
            +    return TRUE; /* Wrong state, should not happen */
            +  }


        <cut>

        Add to the comment for base64_decode that the 'dst' buffer has to be
        at least 2 character longer than what is needed to decode src_base.


    I added this comment:

    * Note: 'dst' must have sufficient space to store the decoded data.
    * Use base64_needed_decoded_length() to calculate the correct space
    size.






            +int
            +base64_decode(const char *src_base, size_t len,
            +              void *dst, const char **end_ptr, int flags)
            +{
            +    if (my_base64_decoder_getch(&__decoder) ||
            +        my_base64_decoder_getch(&__decoder) ||
            +        my_base64_decoder_getch(&__decoder) ||
            +        my_base64_decoder_getch(&__decoder))
            +      break;


        Generating the error handling with a switch for every of the
        above is wasteful. See previous comment.


    Right. my_base64_decoder_getch() is not inline anymore.



            +  /* Return error if there are more non-space characters */
            +  decoder.state= 0;
            +  if (!my_base64_decoder_skip___spaces(&decoder))
            +    decoder.error= 1;
            +
                 if (end_ptr != NULL)


        Note that base64_needed_decoded_length() used ceil() when it's
        not needed.


    Removed.




        <cut>

            === modified file 'sql/item_strfunc.cc'
            @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix___length_a
                  set_persist_maybe_null(1);
               }

            +
            +void Item_func_to_base64::fix___length_and_dec()
            +{
            +  maybe_null= args[0]->maybe_null;
            +  collation.set(default_charset(__), DERIVATION_COERCIBLE,
            MY_REPERTOIRE_ASCII);


        Maybe better to cast both arguments to (ulonglong) as the rest
        of the code
        is depending on this.

        Also, why is base64_encode_max_arg_length() int instead of uint or
        even better size_t?


    This is because base64_encode() and base64_decode() are used
    in the replication code using "int" as return value.

    When exposing TO_BASE64() and FROM_BASE64() to the SQL level,
    I did not want to touch the replication code.



            +  if (args[0]->max_length > (uint)
            base64_encode_max_arg_length()__)
            +  {
            +    maybe_null= 1;
            +    fix_char_length_ulonglong((__ulonglong)
            base64_encode_max_arg_length()__);
            +  }
            +  else
            +  {
            +    int length= base64_needed_encoded_length((__int)
            args[0]->max_length);
            +    DBUG_ASSERT(length > 0);


        Don't think assert is needed as the function gurantees it already.

            +    fix_char_length_ulonglong((__ulonglong) length - 1);
            +  }
            +}


        <cut>

            +String *Item_func_from_base64::val___str(String *str)
            +{
            +  String *res= args[0]->val_str_ascii(str);
            +  bool too_long= false;
            +  int length;
            +  const char *end_ptr;
            +
            +  if (!res ||
            +      res->length() > (uint) base64_decode_max_arg_length() ||
            +      (too_long=
            +       ((uint) (length=
            base64_needed_decoded_length((__int) res->length())) >
            +        current_thd->variables.max___allowed_packet)) ||
            +      tmp_value.alloc((uint) length) ||
            +      (length= base64_decode(res->ptr(), (int) res->length(),
            +                             (char *) tmp_value.ptr(),
            &end_ptr, 0)) < 0 ||
            +      end_ptr < res->ptr() + res->length())
            +  {


        Shouldn't we get a readable error or warning if the string
        contained wrong
        characters?
        Now it looks like we will just return null.

        Something like 'Malformed base64 string. Error at position %d" would
        be nice.


    Good idea. I have added a warning. Now it looks clear what went wrong
    if FROM_BASE64() returns NULL.





            +    null_value= 1; // NULL input, too long input, OOM, or
            badly formed input
            +    if (too_long)
            +    {
            +      push_warning_printf(current___thd,
            Sql_condition::WARN_LEVEL___WARN,
            +                          ER_WARN_ALLOWED_PACKET___OVERFLOWED,
            +
              ER(ER_WARN_ALLOWED_PACKET___OVERFLOWED), func_name(),
            +
              current_thd->variables.max___allowed_packet);
            +    }
            +    return 0;
            +  }
            +  tmp_value.length((uint) length);
            +  null_value= 0;
            +  return &tmp_value;
            +}


        <cut>

    <cut>


    _______________________________________________
    Mailing list: https://launchpad.net/~maria-developers
    Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
    <mailto:maria-developers@xxxxxxxxxxxxxxxxxxx>
    Unsubscribe : https://launchpad.net/~maria-developers
    More help   : https://help.launchpad.net/ListHelp




--
Roberto Spadim
SPAEmpresarial

Follow ups

Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17

References

Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-12
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17