maria-developers team mailing list archive

Thread
Date

Re: Review of base64.diff and german2.diff

To: Roberto Spadim <roberto@xxxxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Mon, 23 Sep 2013 19:05:32 +0400
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAH3kUhHfH=uS4EROPmOu8_TndmJRCqLwBza9AtaR+7FZkDA_yg@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6

Hi Roberto,

the patch has been pushed to the 10.0 tree:

lp:~maria-captains/maria/10.0


See here for more details about the 10.0 tree.
https://launchpad.net/maria/10.0

On 09/17/2013 07:42 PM, Roberto Spadim wrote:

nice =)
after this patch commited, could report where i could get it? i will try
to rewrite to base32 if possible :) and send at jira
thanks!


2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx <mailto:bar@xxxxxxxxxxx>>

    Hello Roberto,



    On 09/17/2013 06:38 PM, Roberto Spadim wrote:

        hi guys!
        one question, as a mariadb user...
        base64 will be exposed to sql layer? could i use TO_BASE64() and
        FROM_BASE64() at mysql client?
        there's a MDEV in JIRA about it, if you could attach it to this
        patch
        could be nice: MDEV-4387
        <https://mariadb.atlassian.__net/browse/MDEV-4387
        <https://mariadb.atlassian.net/browse/MDEV-4387>>


        and with time, implement base32 and base16: MDEV-4800
        <https://mariadb.atlassian.__net/browse/MDEV-4800
        <https://mariadb.atlassian.net/browse/MDEV-4800>>


        base64 is very usefull, and many php users use it to "send/receive"
        files from database
        base32 from what i know is usefull only for OTP and other
        password apps
        base16 is a hexadecimal function HEX() and UNHEX()


    Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level,
    so one case use for example things like this:

    INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))__;

    or

    SELECT TO_BASE64(column) FROM t1;


    There are no plans to implement base32.


        thanks guys


        2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx
        <mailto:bar@xxxxxxxxxxx> <mailto:bar@xxxxxxxxxxx
        <mailto:bar@xxxxxxxxxxx>>>


             Hi Monty,

             thanks for review.

             I have addressed most of your suggestions. See the new
        version attached,
             and the detailed comments inline:



             On 09/12/2013 04:32 PM, Michael Widenius wrote:


                 Hi!

                 Here is the review for the code that we should put into
        10.0

                 First the base64:

                     === modified file 'mysql-test/t/func_str.test'
                     --- mysql-test/t/func_str.test  2013-05-07 11:05:09
        +0000
                     +++ mysql-test/t/func_str.test  2013-08-28 13:14:24
        +0000
                     @@ -1555,3 +1555,118 @@ drop table t1,t2;
                        --echo # End of 5.5 tests
                        --echo #

                     +
                     +--echo #
                     +--echo # Start of 5.6 tests
                     +--echo #


                 Shouldn't this be start of 10.0 tests ?
                 (I know that this code is from MySQL 5.6, but still for
        us this
                 is 10.0...)


             This code is (almost) copy-and-paste from MySQL-5.6.
             I think it's a good idea when looking inside a test file
             to be able to see which tests are coming from MySQL-5.6,
             and which tests are coming from MariaDB-10.0.

             I'd suggest to keep "Start of 5.6 tests" in this particular
        case,
             and also when merging the tests for the other merged
        MySQL-5.6 features.




                     === modified file 'mysys/base64.c'
                     --- mysys/base64.c      2011-06-30 15:46:53 +0000
                     +++ mysys/base64.c      2013-03-09 06:22:59 +0000
                     @@ -1,5 +1,4 @@
                     -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun
        Microsystems,
                     Inc.
                     -   Use is subject to license terms.
                     +/* Copyright (c) 2003, 2010, Oracle and/or its
        affiliates.
                     All rights reserved.


                 Removed 'all rights reserved'. You can't have that for
        GPL code.

                 If there is any new code from us, please add a
        copyright message
                 for the
                 MariaDB foundation too!


             Removed 'all rights reserved' and added "MariaDB", as there
        are some
             our own changes.




                           This program is free software; you can
        redistribute it
                     and/or modify
                           it under the terms of the GNU General Public
        License
                     as published by
                     @@ -25,6 +24,28 @@ static char base64_table[] =
        "ABCDEFGHIJ

        "abcdefghijklmnopqrstuvwxyz"
                                                     "0123456789+/";

                     +/**
                     + * Maximum length base64_needed_encoded_length()
                     + * can handle without overflow.
                     + */
                     +int
                     +base64_encode_max_arg_length(____)

                     +{
                     +#if (SIZEOF_INT == 8)
                     +  /*
                     +    6827690988321067803 ->   9223372036854775805
                     +    6827690988321067804 ->  -9223372036854775807
                     +  */


                 Please describe from where the following number comes from
                 (so that anyone can recalculate and check the number if
        needed):


             I added more comments where these numbers come from.



                     +  return 0x5EC0D4C77B03531BLL
                     +#else
                     +  /*
                     +    1589695686 -> 2147483646 <tel:2147483646>
        <tel:2147483646 <tel:2147483646>>
                     +    1589695687 -> -2147483645 <tel:2147483645>
        <tel:2147483645 <tel:2147483645>>

                     +  */
                     +  return 0x5EC0D4C6;
                     +#endif
                     +}
                     +

                        int
                        base64_needed_encoded_length(____int length_of_data)
                     @@ -39,6 +60,21 @@
        base64_needed_encoded_length(____int length_

                        }


                     +/**
                     + * Maximum length base64_needed_decoded_length()
                     + * can handle without overflow.
                     + */
                     +int
                     +base64_decode_max_arg_length(____)

                     +{


                 The following also need a comment.


             Also added comments.



                     +#if (SIZEOF_INT == 8)
                     +  return 0x2AAAAAAAAAAAAAAALL;
                     +#else
                     +  return 0x2AAAAAAA;
                     +#endif
                     +}
                     +
                     +


                 <cut>

                     +/**
                     + * Get the next character from a base64 encoded
        stream.
                     + * Skip spaces, then scan the next base64
        character or a
                     pad character
                     + * and collect bits into decoder->c.
                     + *
                     + * @param  decoder  Pointer to MY_BASE64_DECODER
                     + * @return
                     + *  FALSE on success (a valid base64 encoding
        character found)
                     + *  TRUE  on error (unexpected character or unexpected
                     end-of-input found)
                     + */


                 Add an empty line here.


             Done.



                 It would also be good to have a good description of the
        base64
                 format
                 in this file so that one can understand what the
        functions are
                 supposed to do (or at least a link to the specification).

                 Some thing I figured out by asking bar on IRC:

                 - We never generate spaces
                 - When decoding, we allow spaces anywhere.
                 - The end of an 64 base  decoded string is always
        padded with '='
                     so that the final length is dividable with 4.


             I added a reference to
        http://en.wikipedia.org/wiki/____Base64
        <http://en.wikipedia.org/wiki/__Base64>
             <http://en.wikipedia.org/wiki/__Base64
        <http://en.wikipedia.org/wiki/Base64>>,

             as well all added these your comments in proper places of
        the code.




                     +static inline my_bool
                     +my_base64_decoder_getch(MY_____BASE64_DECODER
        *decoder)



                 Remove inline from the above; It's likely to make the code
                 slower, not
                 faster as this is a big function with lot of if's.


             Removed "inline".



                        {
                     -  char b[3];
                     -  size_t i= 0;
                     -  char *dst_base= (char *)dst;
                     -  char const *src= src_base;
                     -  char *d= dst_base;
                     -  size_t j;
                     +  if (my_base64_decoder_skip_____spaces(decoder))

                     +    return TRUE; /* End-of-input */

                     -  while (i < len)
                     +  if (!my_base64_add(decoder)) /* Valid base64
        character
                     found */
                          {
                     -    unsigned c= 0;
                     -    size_t mark= 0;
                     +    if (decoder->mark)
                     +    {
                     +      /* If we have scanned '=' already, then only
        '=' is
                     valid */
                     +      DBUG_ASSERT(decoder->state == 3);
                     +      decoder->error= 1;
                     +      decoder->src--;
                     +      return TRUE; /* expected '=', but encoding
        character
                     found */
                     +    }
                     +    decoder->state++;
                     +    return FALSE;
                     +  }


                 If you want to have the funtion inline, then move the
        following to
                 another not inline function;  We don't need to generate
        code for
                 this
                 for every function call:

                     +
                     +  /* Process error */
                     +  switch (decoder->state)
                     +  {
                     +  case 0:
                     +  case 1:
                     +    decoder->src--;
                     +    return TRUE; /* base64 character expected */
                     +    break;
                     +
                     +  case 2:
                     +  case 3:
                     +    if (decoder->src[-1] == '=')
                     +    {
                     +      decoder->error= 0; /* Not an error - it's a pad
                     character */
                     +      decoder->mark++;
                     +    }
                     +    else
                     +    {
                     +      decoder->src--;
                     +      return TRUE; /* base64 character or '='
        expected */
                     +    }
                     +    break;
                     +  default:
                     +    DBUG_ASSERT(0);
                     +    return TRUE; /* Wrong state, should not happen */
                     +  }


                 <cut>

                 Add to the comment for base64_decode that the 'dst'
        buffer has to be
                 at least 2 character longer than what is needed to
        decode src_base.


             I added this comment:

             * Note: 'dst' must have sufficient space to store the
        decoded data.
             * Use base64_needed_decoded_length() to calculate the
        correct space
             size.






                     +int
                     +base64_decode(const char *src_base, size_t len,
                     +              void *dst, const char **end_ptr, int
        flags)
                     +{
                     +    if (my_base64_decoder_getch(&____decoder) ||
                     +        my_base64_decoder_getch(&____decoder) ||
                     +        my_base64_decoder_getch(&____decoder) ||
                     +        my_base64_decoder_getch(&____decoder))

                     +      break;


                 Generating the error handling with a switch for every
        of the
                 above is wasteful. See previous comment.


             Right. my_base64_decoder_getch() is not inline anymore.



                     +  /* Return error if there are more non-space
        characters */
                     +  decoder.state= 0;
                     +  if (!my_base64_decoder_skip_____spaces(&decoder))

                     +    decoder.error= 1;
                     +
                          if (end_ptr != NULL)


                 Note that base64_needed_decoded_length() used ceil()
        when it's
                 not needed.


             Removed.




                 <cut>

                     === modified file 'sql/item_strfunc.cc'
                     @@ -451,6 +452,101 @@ void
        Item_func_aes_decrypt::fix_____length_a
                           set_persist_maybe_null(1);
                        }

                     +
                     +void Item_func_to_base64::fix_____length_and_dec()

                     +{
                     +  maybe_null= args[0]->maybe_null;
                     +  collation.set(default_charset(____),
        DERIVATION_COERCIBLE,

                     MY_REPERTOIRE_ASCII);


                 Maybe better to cast both arguments to (ulonglong) as
        the rest
                 of the code
                 is depending on this.

                 Also, why is base64_encode_max_arg_length() int instead
        of uint or
                 even better size_t?


             This is because base64_encode() and base64_decode() are used
             in the replication code using "int" as return value.

             When exposing TO_BASE64() and FROM_BASE64() to the SQL level,
             I did not want to touch the replication code.



                     +  if (args[0]->max_length > (uint)
                     base64_encode_max_arg_length()____)
                     +  {
                     +    maybe_null= 1;
                     +    fix_char_length_ulonglong((____ulonglong)
                     base64_encode_max_arg_length()____);
                     +  }
                     +  else
                     +  {
                     +    int length= base64_needed_encoded_length((____int)

                     args[0]->max_length);
                     +    DBUG_ASSERT(length > 0);


                 Don't think assert is needed as the function gurantees
        it already.

                     +    fix_char_length_ulonglong((____ulonglong)
        length - 1);
                     +  }
                     +}


                 <cut>

                     +String *Item_func_from_base64::val_____str(String
        *str)

                     +{
                     +  String *res= args[0]->val_str_ascii(str);
                     +  bool too_long= false;
                     +  int length;
                     +  const char *end_ptr;
                     +
                     +  if (!res ||
                     +      res->length() > (uint)
        base64_decode_max_arg_length() ||
                     +      (too_long=
                     +       ((uint) (length=
                     base64_needed_decoded_length((____int)
        res->length())) >
                     +
          current_thd->variables.max_____allowed_packet)) ||

                     +      tmp_value.alloc((uint) length) ||
                     +      (length= base64_decode(res->ptr(), (int)
        res->length(),
                     +                             (char *) tmp_value.ptr(),
                     &end_ptr, 0)) < 0 ||
                     +      end_ptr < res->ptr() + res->length())
                     +  {


                 Shouldn't we get a readable error or warning if the string
                 contained wrong
                 characters?
                 Now it looks like we will just return null.

                 Something like 'Malformed base64 string. Error at
        position %d" would
                 be nice.


             Good idea. I have added a warning. Now it looks clear what
        went wrong
             if FROM_BASE64() returns NULL.





                     +    null_value= 1; // NULL input, too long input,
        OOM, or
                     badly formed input
                     +    if (too_long)
                     +    {
                     +      push_warning_printf(current_____thd,
                     Sql_condition::WARN_LEVEL_____WARN,
                     +
          ER_WARN_ALLOWED_PACKET_____OVERFLOWED,
                     +
                       ER(ER_WARN_ALLOWED_PACKET_____OVERFLOWED),
        func_name(),
                     +
                       current_thd->variables.max_____allowed_packet);

                     +    }
                     +    return 0;
                     +  }
                     +  tmp_value.length((uint) length);
                     +  null_value= 0;
                     +  return &tmp_value;
                     +}


                 <cut>

             <cut>


             _________________________________________________
             Mailing list: https://launchpad.net/~maria-__developers
        <https://launchpad.net/~maria-developers>
             Post to     : maria-developers@lists.__launchpad.net
        <mailto:maria-developers@xxxxxxxxxxxxxxxxxxx>
             <mailto:maria-developers@__lists.launchpad.net
        <mailto:maria-developers@xxxxxxxxxxxxxxxxxxx>>

             Unsubscribe : https://launchpad.net/~maria-__developers
        <https://launchpad.net/~maria-developers>
             More help   : https://help.launchpad.net/__ListHelp
        <https://help.launchpad.net/ListHelp>




        --
        Roberto Spadim
        SPAEmpresarial




--
Roberto Spadim
SPAEmpresarial

Follow ups

Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-23

References

Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-12
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17