maria-developers team mailing list archive

Thread
Date
Re: Review of base64.diff and german2.diff

To: Alexander Barkov <bar@xxxxxxxxxxx>
From: Roberto Spadim <roberto@xxxxxxxxxxxxx>
Date: Mon, 23 Sep 2013 13:26:45 -0300
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAH3kUhEKTeEKAnsGQBAs3G-E4p4On0CORFFtErHXs5XGgRrzNQ@mail.gmail.com>
Just one information for everybody it's a veeeeery verrrrry old bug at
mysql reported by me :)

http://bugs.mysql.com/bug.php?id=18861
Add base64_encode, base64_decode functions
6 Apr 2006 18:00

thanks mariadb team! =) 7 years old bug resolved =)



2013/9/23 Roberto Spadim <roberto@xxxxxxxxxxxxx>

> thanks alexander! :D i will read and test it as soon as possible :)
>
>
> 2013/9/23 Alexander Barkov <bar@xxxxxxxxxxx>
>
>> Hi Roberto,
>>
>> the patch has been pushed to the 10.0 tree:
>>
>> lp:~maria-captains/maria/10.0
>>
>>
>> See here for more details about the 10.0 tree.
>> https://launchpad.net/maria/**10.0 <https://launchpad.net/maria/10.0>
>>
>>
>> On 09/17/2013 07:42 PM, Roberto Spadim wrote:
>>
>>> nice =)
>>> after this patch commited, could report where i could get it? i will try
>>> to rewrite to base32 if possible :) and send at jira
>>> thanks!
>>>
>>>
>>> 2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx <mailto:bar@xxxxxxxxxxx>>
>>>
>>>
>>>     Hello Roberto,
>>>
>>>
>>>
>>>     On 09/17/2013 06:38 PM, Roberto Spadim wrote:
>>>
>>>         hi guys!
>>>         one question, as a mariadb user...
>>>         base64 will be exposed to sql layer? could i use TO_BASE64() and
>>>         FROM_BASE64() at mysql client?
>>>         there's a MDEV in JIRA about it, if you could attach it to this
>>>         patch
>>>         could be nice: MDEV-4387
>>>         <https://mariadb.atlassian.__**net/browse/MDEV-4387
>>>
>>>         <https://mariadb.atlassian.**net/browse/MDEV-4387<https://mariadb.atlassian.net/browse/MDEV-4387>
>>> >>
>>>
>>>
>>>         and with time, implement base32 and base16: MDEV-4800
>>>         <https://mariadb.atlassian.__**net/browse/MDEV-4800
>>>
>>>         <https://mariadb.atlassian.**net/browse/MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>
>>> >>
>>>
>>>
>>>         base64 is very usefull, and many php users use it to
>>> "send/receive"
>>>         files from database
>>>         base32 from what i know is usefull only for OTP and other
>>>         password apps
>>>         base16 is a hexadecimal function HEX() and UNHEX()
>>>
>>>
>>>     Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level,
>>>     so one case use for example things like this:
>>>
>>>     INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))**__;
>>>
>>>
>>>     or
>>>
>>>     SELECT TO_BASE64(column) FROM t1;
>>>
>>>
>>>     There are no plans to implement base32.
>>>
>>>
>>>         thanks guys
>>>
>>>
>>>         2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx
>>>         <mailto:bar@xxxxxxxxxxx> <mailto:bar@xxxxxxxxxxx
>>>
>>>         <mailto:bar@xxxxxxxxxxx>>>
>>>
>>>
>>>              Hi Monty,
>>>
>>>              thanks for review.
>>>
>>>              I have addressed most of your suggestions. See the new
>>>         version attached,
>>>              and the detailed comments inline:
>>>
>>>
>>>
>>>              On 09/12/2013 04:32 PM, Michael Widenius wrote:
>>>
>>>
>>>                  Hi!
>>>
>>>                  Here is the review for the code that we should put into
>>>         10.0
>>>
>>>                  First the base64:
>>>
>>>                      === modified file 'mysql-test/t/func_str.test'
>>>                      --- mysql-test/t/func_str.test  2013-05-07 11:05:09
>>>         +0000
>>>                      +++ mysql-test/t/func_str.test  2013-08-28 13:14:24
>>>         +0000
>>>                      @@ -1555,3 +1555,118 @@ drop table t1,t2;
>>>                         --echo # End of 5.5 tests
>>>                         --echo #
>>>
>>>                      +
>>>                      +--echo #
>>>                      +--echo # Start of 5.6 tests
>>>                      +--echo #
>>>
>>>
>>>                  Shouldn't this be start of 10.0 tests ?
>>>                  (I know that this code is from MySQL 5.6, but still for
>>>         us this
>>>                  is 10.0...)
>>>
>>>
>>>              This code is (almost) copy-and-paste from MySQL-5.6.
>>>              I think it's a good idea when looking inside a test file
>>>              to be able to see which tests are coming from MySQL-5.6,
>>>              and which tests are coming from MariaDB-10.0.
>>>
>>>              I'd suggest to keep "Start of 5.6 tests" in this particular
>>>         case,
>>>              and also when merging the tests for the other merged
>>>         MySQL-5.6 features.
>>>
>>>
>>>
>>>
>>>                      === modified file 'mysys/base64.c'
>>>                      --- mysys/base64.c      2011-06-30 15:46:53 +0000
>>>                      +++ mysys/base64.c      2013-03-09 06:22:59 +0000
>>>                      @@ -1,5 +1,4 @@
>>>                      -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun
>>>         Microsystems,
>>>                      Inc.
>>>                      -   Use is subject to license terms.
>>>                      +/* Copyright (c) 2003, 2010, Oracle and/or its
>>>         affiliates.
>>>                      All rights reserved.
>>>
>>>
>>>                  Removed 'all rights reserved'. You can't have that for
>>>         GPL code.
>>>
>>>                  If there is any new code from us, please add a
>>>         copyright message
>>>                  for the
>>>                  MariaDB foundation too!
>>>
>>>
>>>              Removed 'all rights reserved' and added "MariaDB", as there
>>>         are some
>>>              our own changes.
>>>
>>>
>>>
>>>
>>>                            This program is free software; you can
>>>         redistribute it
>>>                      and/or modify
>>>                            it under the terms of the GNU General Public
>>>         License
>>>                      as published by
>>>                      @@ -25,6 +24,28 @@ static char base64_table[] =
>>>         "ABCDEFGHIJ
>>>
>>>         "abcdefghijklmnopqrstuvwxyz"
>>>                                                      "0123456789+/";
>>>
>>>                      +/**
>>>                      + * Maximum length base64_needed_encoded_length()
>>>                      + * can handle without overflow.
>>>                      + */
>>>                      +int
>>>                      +base64_encode_max_arg_length(**____)
>>>
>>>
>>>                      +{
>>>                      +#if (SIZEOF_INT == 8)
>>>                      +  /*
>>>                      +    6827690988321067803 ->   9223372036854775805
>>>                      +    6827690988321067804 ->  -9223372036854775807
>>>                      +  */
>>>
>>>
>>>                  Please describe from where the following number comes
>>> from
>>>                  (so that anyone can recalculate and check the number if
>>>         needed):
>>>
>>>
>>>              I added more comments where these numbers come from.
>>>
>>>
>>>
>>>                      +  return 0x5EC0D4C77B03531BLL
>>>                      +#else
>>>                      +  /*
>>>                      +    1589695686 -> 2147483646 <tel:2147483646>
>>>         <tel:2147483646 <tel:2147483646>>
>>>
>>>                      +    1589695687 -> -2147483645 <tel:2147483645>
>>>         <tel:2147483645 <tel:2147483645>>
>>>
>>>
>>>                      +  */
>>>                      +  return 0x5EC0D4C6;
>>>                      +#endif
>>>                      +}
>>>                      +
>>>
>>>                         int
>>>                         base64_needed_encoded_length(_**___int
>>> length_of_data)
>>>                      @@ -39,6 +60,21 @@
>>>         base64_needed_encoded_length(_**___int length_
>>>
>>>
>>>                         }
>>>
>>>
>>>                      +/**
>>>                      + * Maximum length base64_needed_decoded_length()
>>>                      + * can handle without overflow.
>>>                      + */
>>>                      +int
>>>                      +base64_decode_max_arg_length(**____)
>>>
>>>
>>>                      +{
>>>
>>>
>>>                  The following also need a comment.
>>>
>>>
>>>              Also added comments.
>>>
>>>
>>>
>>>                      +#if (SIZEOF_INT == 8)
>>>                      +  return 0x2AAAAAAAAAAAAAAALL;
>>>                      +#else
>>>                      +  return 0x2AAAAAAA;
>>>                      +#endif
>>>                      +}
>>>                      +
>>>                      +
>>>
>>>
>>>                  <cut>
>>>
>>>                      +/**
>>>                      + * Get the next character from a base64 encoded
>>>         stream.
>>>                      + * Skip spaces, then scan the next base64
>>>         character or a
>>>                      pad character
>>>                      + * and collect bits into decoder->c.
>>>                      + *
>>>                      + * @param  decoder  Pointer to MY_BASE64_DECODER
>>>                      + * @return
>>>                      + *  FALSE on success (a valid base64 encoding
>>>         character found)
>>>                      + *  TRUE  on error (unexpected character or
>>> unexpected
>>>                      end-of-input found)
>>>                      + */
>>>
>>>
>>>                  Add an empty line here.
>>>
>>>
>>>              Done.
>>>
>>>
>>>
>>>                  It would also be good to have a good description of the
>>>         base64
>>>                  format
>>>                  in this file so that one can understand what the
>>>         functions are
>>>                  supposed to do (or at least a link to the
>>> specification).
>>>
>>>                  Some thing I figured out by asking bar on IRC:
>>>
>>>                  - We never generate spaces
>>>                  - When decoding, we allow spaces anywhere.
>>>                  - The end of an 64 base  decoded string is always
>>>         padded with '='
>>>                      so that the final length is dividable with 4.
>>>
>>>
>>>              I added a reference to
>>>         http://en.wikipedia.org/wiki/_**___Base64<http://en.wikipedia.org/wiki/____Base64>
>>>         <http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64>
>>> >
>>>
>>>              <http://en.wikipedia.org/wiki/**__Base64<http://en.wikipedia.org/wiki/__Base64>
>>>         <http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
>>> >>,
>>>
>>>              as well all added these your comments in proper places of
>>>         the code.
>>>
>>>
>>>
>>>
>>>                      +static inline my_bool
>>>                      +my_base64_decoder_getch(MY___**__BASE64_DECODER
>>>
>>>         *decoder)
>>>
>>>
>>>
>>>                  Remove inline from the above; It's likely to make the
>>> code
>>>                  slower, not
>>>                  faster as this is a big function with lot of if's.
>>>
>>>
>>>              Removed "inline".
>>>
>>>
>>>
>>>                         {
>>>                      -  char b[3];
>>>                      -  size_t i= 0;
>>>                      -  char *dst_base= (char *)dst;
>>>                      -  char const *src= src_base;
>>>                      -  char *d= dst_base;
>>>                      -  size_t j;
>>>                      +  if (my_base64_decoder_skip_____**
>>> spaces(decoder))
>>>
>>>
>>>                      +    return TRUE; /* End-of-input */
>>>
>>>                      -  while (i < len)
>>>                      +  if (!my_base64_add(decoder)) /* Valid base64
>>>         character
>>>                      found */
>>>                           {
>>>                      -    unsigned c= 0;
>>>                      -    size_t mark= 0;
>>>                      +    if (decoder->mark)
>>>                      +    {
>>>                      +      /* If we have scanned '=' already, then only
>>>         '=' is
>>>                      valid */
>>>                      +      DBUG_ASSERT(decoder->state == 3);
>>>                      +      decoder->error= 1;
>>>                      +      decoder->src--;
>>>                      +      return TRUE; /* expected '=', but encoding
>>>         character
>>>                      found */
>>>                      +    }
>>>                      +    decoder->state++;
>>>                      +    return FALSE;
>>>                      +  }
>>>
>>>
>>>                  If you want to have the funtion inline, then move the
>>>         following to
>>>                  another not inline function;  We don't need to generate
>>>         code for
>>>                  this
>>>                  for every function call:
>>>
>>>                      +
>>>                      +  /* Process error */
>>>                      +  switch (decoder->state)
>>>                      +  {
>>>                      +  case 0:
>>>                      +  case 1:
>>>                      +    decoder->src--;
>>>                      +    return TRUE; /* base64 character expected */
>>>                      +    break;
>>>                      +
>>>                      +  case 2:
>>>                      +  case 3:
>>>                      +    if (decoder->src[-1] == '=')
>>>                      +    {
>>>                      +      decoder->error= 0; /* Not an error - it's a
>>> pad
>>>                      character */
>>>                      +      decoder->mark++;
>>>                      +    }
>>>                      +    else
>>>                      +    {
>>>                      +      decoder->src--;
>>>                      +      return TRUE; /* base64 character or '='
>>>         expected */
>>>                      +    }
>>>                      +    break;
>>>                      +  default:
>>>                      +    DBUG_ASSERT(0);
>>>                      +    return TRUE; /* Wrong state, should not happen
>>> */
>>>                      +  }
>>>
>>>
>>>                  <cut>
>>>
>>>                  Add to the comment for base64_decode that the 'dst'
>>>         buffer has to be
>>>                  at least 2 character longer than what is needed to
>>>         decode src_base.
>>>
>>>
>>>              I added this comment:
>>>
>>>              * Note: 'dst' must have sufficient space to store the
>>>         decoded data.
>>>              * Use base64_needed_decoded_length() to calculate the
>>>         correct space
>>>              size.
>>>
>>>
>>>
>>>
>>>
>>>
>>>                      +int
>>>                      +base64_decode(const char *src_base, size_t len,
>>>                      +              void *dst, const char **end_ptr, int
>>>         flags)
>>>                      +{
>>>                      +    if (my_base64_decoder_getch(&____**decoder) ||
>>>                      +        my_base64_decoder_getch(&____**decoder) ||
>>>                      +        my_base64_decoder_getch(&____**decoder) ||
>>>                      +        my_base64_decoder_getch(&____**decoder))
>>>
>>>
>>>                      +      break;
>>>
>>>
>>>                  Generating the error handling with a switch for every
>>>         of the
>>>                  above is wasteful. See previous comment.
>>>
>>>
>>>              Right. my_base64_decoder_getch() is not inline anymore.
>>>
>>>
>>>
>>>                      +  /* Return error if there are more non-space
>>>         characters */
>>>                      +  decoder.state= 0;
>>>                      +  if (!my_base64_decoder_skip_____**
>>> spaces(&decoder))
>>>
>>>
>>>                      +    decoder.error= 1;
>>>                      +
>>>                           if (end_ptr != NULL)
>>>
>>>
>>>                  Note that base64_needed_decoded_length() used ceil()
>>>         when it's
>>>                  not needed.
>>>
>>>
>>>              Removed.
>>>
>>>
>>>
>>>
>>>                  <cut>
>>>
>>>                      === modified file 'sql/item_strfunc.cc'
>>>                      @@ -451,6 +452,101 @@ void
>>>         Item_func_aes_decrypt::fix____**_length_a
>>>                            set_persist_maybe_null(1);
>>>                         }
>>>
>>>                      +
>>>                      +void Item_func_to_base64::fix_____**
>>> length_and_dec()
>>>
>>>
>>>                      +{
>>>                      +  maybe_null= args[0]->maybe_null;
>>>                      +  collation.set(default_charset(**____),
>>>
>>>         DERIVATION_COERCIBLE,
>>>
>>>                      MY_REPERTOIRE_ASCII);
>>>
>>>
>>>                  Maybe better to cast both arguments to (ulonglong) as
>>>         the rest
>>>                  of the code
>>>                  is depending on this.
>>>
>>>                  Also, why is base64_encode_max_arg_length() int instead
>>>         of uint or
>>>                  even better size_t?
>>>
>>>
>>>              This is because base64_encode() and base64_decode() are used
>>>              in the replication code using "int" as return value.
>>>
>>>              When exposing TO_BASE64() and FROM_BASE64() to the SQL
>>> level,
>>>              I did not want to touch the replication code.
>>>
>>>
>>>
>>>                      +  if (args[0]->max_length > (uint)
>>>                      base64_encode_max_arg_length()**____)
>>>                      +  {
>>>                      +    maybe_null= 1;
>>>                      +    fix_char_length_ulonglong((___**_ulonglong)
>>>                      base64_encode_max_arg_length()**____);
>>>                      +  }
>>>                      +  else
>>>                      +  {
>>>                      +    int length= base64_needed_encoded_length((**
>>> ____int)
>>>
>>>
>>>                      args[0]->max_length);
>>>                      +    DBUG_ASSERT(length > 0);
>>>
>>>
>>>                  Don't think assert is needed as the function gurantees
>>>         it already.
>>>
>>>                      +    fix_char_length_ulonglong((___**_ulonglong)
>>>
>>>         length - 1);
>>>                      +  }
>>>                      +}
>>>
>>>
>>>                  <cut>
>>>
>>>                      +String *Item_func_from_base64::val___**
>>> __str(String
>>>
>>>         *str)
>>>
>>>                      +{
>>>                      +  String *res= args[0]->val_str_ascii(str);
>>>                      +  bool too_long= false;
>>>                      +  int length;
>>>                      +  const char *end_ptr;
>>>                      +
>>>                      +  if (!res ||
>>>                      +      res->length() > (uint)
>>>         base64_decode_max_arg_length() ||
>>>                      +      (too_long=
>>>                      +       ((uint) (length=
>>>                      base64_needed_decoded_length((**____int)
>>>         res->length())) >
>>>                      +
>>>           current_thd->variables.max____**_allowed_packet)) ||
>>>
>>>
>>>                      +      tmp_value.alloc((uint) length) ||
>>>                      +      (length= base64_decode(res->ptr(), (int)
>>>         res->length(),
>>>                      +                             (char *)
>>> tmp_value.ptr(),
>>>                      &end_ptr, 0)) < 0 ||
>>>                      +      end_ptr < res->ptr() + res->length())
>>>                      +  {
>>>
>>>
>>>                  Shouldn't we get a readable error or warning if the
>>> string
>>>                  contained wrong
>>>                  characters?
>>>                  Now it looks like we will just return null.
>>>
>>>                  Something like 'Malformed base64 string. Error at
>>>         position %d" would
>>>                  be nice.
>>>
>>>
>>>              Good idea. I have added a warning. Now it looks clear what
>>>         went wrong
>>>              if FROM_BASE64() returns NULL.
>>>
>>>
>>>
>>>
>>>
>>>                      +    null_value= 1; // NULL input, too long input,
>>>         OOM, or
>>>                      badly formed input
>>>                      +    if (too_long)
>>>                      +    {
>>>                      +      push_warning_printf(current___**__thd,
>>>                      Sql_condition::WARN_LEVEL_____**WARN,
>>>                      +
>>>           ER_WARN_ALLOWED_PACKET_____**OVERFLOWED,
>>>                      +
>>>                        ER(ER_WARN_ALLOWED_PACKET_____**OVERFLOWED),
>>>         func_name(),
>>>                      +
>>>                        current_thd->variables.max____**_allowed_packet);
>>>
>>>
>>>                      +    }
>>>                      +    return 0;
>>>                      +  }
>>>                      +  tmp_value.length((uint) length);
>>>                      +  null_value= 0;
>>>                      +  return &tmp_value;
>>>                      +}
>>>
>>>
>>>                  <cut>
>>>
>>>              <cut>
>>>
>>>
>>>              ______________________________**___________________
>>>              Mailing list: https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers>
>>>         <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
>>> >
>>>              Post to     : maria-developers@lists.__launc**hpad.net<http://launchpad.net>
>>>         <mailto:maria-developers@**lists.launchpad.net<maria-developers@xxxxxxxxxxxxxxxxxxx>
>>> >
>>>              <mailto:maria-developers@__lis**ts.launchpad.net<http://lists.launchpad.net>
>>>         <mailto:maria-developers@**lists.launchpad.net<maria-developers@xxxxxxxxxxxxxxxxxxx>
>>> >>
>>>
>>>              Unsubscribe : https://launchpad.net/~maria-_**_developers<https://launchpad.net/~maria-__developers>
>>>         <https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
>>> >
>>>              More help   : https://help.launchpad.net/__**ListHelp<https://help.launchpad.net/__ListHelp>
>>>
>>>         <https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
>>> >
>>>
>>>
>>>
>>>
>>>         --
>>>         Roberto Spadim
>>>         SPAEmpresarial
>>>
>>>
>>>
>>>
>>> --
>>> Roberto Spadim
>>> SPAEmpresarial
>>>
>>
>
>
> --
> Roberto Spadim
> SPAEmpresarial
>



-- 
Roberto Spadim
SPAEmpresarial
Follow ups

Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-23
The Usability of Free/Libre/Open Source Projects, A Review
From: Jan Lindström, 2013-09-23
References

Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-12
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-23
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-23