maria-developers team mailing list archive

Thread
Date
Re: Review of base64.diff and german2.diff

To: Alexander Barkov <bar@xxxxxxxxxxx>
From: Roberto Spadim <roberto@xxxxxxxxxxxxx>
Date: Tue, 17 Sep 2013 12:42:45 -0300
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <52387765.7010507@mariadb.org>
nice =)
after this patch commited, could report where i could get it? i will try to
rewrite to base32 if possible :) and send at jira
thanks!


2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx>

> Hello Roberto,
>
>
>
> On 09/17/2013 06:38 PM, Roberto Spadim wrote:
>
>> hi guys!
>> one question, as a mariadb user...
>> base64 will be exposed to sql layer? could i use TO_BASE64() and
>> FROM_BASE64() at mysql client?
>> there's a MDEV in JIRA about it, if you could attach it to this patch
>> could be nice: MDEV-4387 <https://mariadb.atlassian.**
>> net/browse/MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>>
>>
>>
>> and with time, implement base32 and base16: MDEV-4800
>> <https://mariadb.atlassian.**net/browse/MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>
>> >
>>
>>
>> base64 is very usefull, and many php users use it to "send/receive"
>> files from database
>> base32 from what i know is usefull only for OTP and other password apps
>> base16 is a hexadecimal function HEX() and UNHEX()
>>
>
> Yes, TO_BASE64() and FROM_BASE64() will be exposed to the SQL level,
> so one case use for example things like this:
>
> INSERT INTO t1 VALUES (FROM_BASE64('base64-string'))**;
>
> or
>
> SELECT TO_BASE64(column) FROM t1;
>
>
> There are no plans to implement base32.
>
>
>> thanks guys
>>
>>
>> 2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx <mailto:bar@xxxxxxxxxxx>>
>>
>>
>>     Hi Monty,
>>
>>     thanks for review.
>>
>>     I have addressed most of your suggestions. See the new version
>> attached,
>>     and the detailed comments inline:
>>
>>
>>
>>     On 09/12/2013 04:32 PM, Michael Widenius wrote:
>>
>>
>>         Hi!
>>
>>         Here is the review for the code that we should put into 10.0
>>
>>         First the base64:
>>
>>             === modified file 'mysql-test/t/func_str.test'
>>             --- mysql-test/t/func_str.test  2013-05-07 11:05:09 +0000
>>             +++ mysql-test/t/func_str.test  2013-08-28 13:14:24 +0000
>>             @@ -1555,3 +1555,118 @@ drop table t1,t2;
>>                --echo # End of 5.5 tests
>>                --echo #
>>
>>             +
>>             +--echo #
>>             +--echo # Start of 5.6 tests
>>             +--echo #
>>
>>
>>         Shouldn't this be start of 10.0 tests ?
>>         (I know that this code is from MySQL 5.6, but still for us this
>>         is 10.0...)
>>
>>
>>     This code is (almost) copy-and-paste from MySQL-5.6.
>>     I think it's a good idea when looking inside a test file
>>     to be able to see which tests are coming from MySQL-5.6,
>>     and which tests are coming from MariaDB-10.0.
>>
>>     I'd suggest to keep "Start of 5.6 tests" in this particular case,
>>     and also when merging the tests for the other merged MySQL-5.6
>> features.
>>
>>
>>
>>
>>             === modified file 'mysys/base64.c'
>>             --- mysys/base64.c      2011-06-30 15:46:53 +0000
>>             +++ mysys/base64.c      2013-03-09 06:22:59 +0000
>>             @@ -1,5 +1,4 @@
>>             -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems,
>>             Inc.
>>             -   Use is subject to license terms.
>>             +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates.
>>             All rights reserved.
>>
>>
>>         Removed 'all rights reserved'. You can't have that for GPL code.
>>
>>         If there is any new code from us, please add a copyright message
>>         for the
>>         MariaDB foundation too!
>>
>>
>>     Removed 'all rights reserved' and added "MariaDB", as there are some
>>     our own changes.
>>
>>
>>
>>
>>                   This program is free software; you can redistribute it
>>             and/or modify
>>                   it under the terms of the GNU General Public License
>>             as published by
>>             @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
>>                                             "abcdefghijklmnopqrstuvwxyz"
>>                                             "0123456789+/";
>>
>>             +/**
>>             + * Maximum length base64_needed_encoded_length()
>>             + * can handle without overflow.
>>             + */
>>             +int
>>             +base64_encode_max_arg_length(**__)
>>
>>             +{
>>             +#if (SIZEOF_INT == 8)
>>             +  /*
>>             +    6827690988321067803 ->   9223372036854775805
>>             +    6827690988321067804 ->  -9223372036854775807
>>             +  */
>>
>>
>>         Please describe from where the following number comes from
>>         (so that anyone can recalculate and check the number if needed):
>>
>>
>>     I added more comments where these numbers come from.
>>
>>
>>
>>             +  return 0x5EC0D4C77B03531BLL
>>             +#else
>>             +  /*
>>             +    1589695686 -> 2147483646 <tel:2147483646>
>>             +    1589695687 -> -2147483645 <tel:2147483645>
>>
>>             +  */
>>             +  return 0x5EC0D4C6;
>>             +#endif
>>             +}
>>             +
>>
>>                int
>>                base64_needed_encoded_length(_**_int length_of_data)
>>             @@ -39,6 +60,21 @@ base64_needed_encoded_length(_**_int
>> length_
>>
>>                }
>>
>>
>>             +/**
>>             + * Maximum length base64_needed_decoded_length()
>>             + * can handle without overflow.
>>             + */
>>             +int
>>             +base64_decode_max_arg_length(**__)
>>
>>             +{
>>
>>
>>         The following also need a comment.
>>
>>
>>     Also added comments.
>>
>>
>>
>>             +#if (SIZEOF_INT == 8)
>>             +  return 0x2AAAAAAAAAAAAAAALL;
>>             +#else
>>             +  return 0x2AAAAAAA;
>>             +#endif
>>             +}
>>             +
>>             +
>>
>>
>>         <cut>
>>
>>             +/**
>>             + * Get the next character from a base64 encoded stream.
>>             + * Skip spaces, then scan the next base64 character or a
>>             pad character
>>             + * and collect bits into decoder->c.
>>             + *
>>             + * @param  decoder  Pointer to MY_BASE64_DECODER
>>             + * @return
>>             + *  FALSE on success (a valid base64 encoding character
>> found)
>>             + *  TRUE  on error (unexpected character or unexpected
>>             end-of-input found)
>>             + */
>>
>>
>>         Add an empty line here.
>>
>>
>>     Done.
>>
>>
>>
>>         It would also be good to have a good description of the base64
>>         format
>>         in this file so that one can understand what the functions are
>>         supposed to do (or at least a link to the specification).
>>
>>         Some thing I figured out by asking bar on IRC:
>>
>>         - We never generate spaces
>>         - When decoding, we allow spaces anywhere.
>>         - The end of an 64 base  decoded string is always padded with '='
>>             so that the final length is dividable with 4.
>>
>>
>>     I added a reference to http://en.wikipedia.org/wiki/_**_Base64<http://en.wikipedia.org/wiki/__Base64>
>>     <http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
>> >,
>>
>>     as well all added these your comments in proper places of the code.
>>
>>
>>
>>
>>             +static inline my_bool
>>             +my_base64_decoder_getch(MY___**BASE64_DECODER *decoder)
>>
>>
>>
>>         Remove inline from the above; It's likely to make the code
>>         slower, not
>>         faster as this is a big function with lot of if's.
>>
>>
>>     Removed "inline".
>>
>>
>>
>>                {
>>             -  char b[3];
>>             -  size_t i= 0;
>>             -  char *dst_base= (char *)dst;
>>             -  char const *src= src_base;
>>             -  char *d= dst_base;
>>             -  size_t j;
>>             +  if (my_base64_decoder_skip___**spaces(decoder))
>>
>>             +    return TRUE; /* End-of-input */
>>
>>             -  while (i < len)
>>             +  if (!my_base64_add(decoder)) /* Valid base64 character
>>             found */
>>                  {
>>             -    unsigned c= 0;
>>             -    size_t mark= 0;
>>             +    if (decoder->mark)
>>             +    {
>>             +      /* If we have scanned '=' already, then only '=' is
>>             valid */
>>             +      DBUG_ASSERT(decoder->state == 3);
>>             +      decoder->error= 1;
>>             +      decoder->src--;
>>             +      return TRUE; /* expected '=', but encoding character
>>             found */
>>             +    }
>>             +    decoder->state++;
>>             +    return FALSE;
>>             +  }
>>
>>
>>         If you want to have the funtion inline, then move the following to
>>         another not inline function;  We don't need to generate code for
>>         this
>>         for every function call:
>>
>>             +
>>             +  /* Process error */
>>             +  switch (decoder->state)
>>             +  {
>>             +  case 0:
>>             +  case 1:
>>             +    decoder->src--;
>>             +    return TRUE; /* base64 character expected */
>>             +    break;
>>             +
>>             +  case 2:
>>             +  case 3:
>>             +    if (decoder->src[-1] == '=')
>>             +    {
>>             +      decoder->error= 0; /* Not an error - it's a pad
>>             character */
>>             +      decoder->mark++;
>>             +    }
>>             +    else
>>             +    {
>>             +      decoder->src--;
>>             +      return TRUE; /* base64 character or '=' expected */
>>             +    }
>>             +    break;
>>             +  default:
>>             +    DBUG_ASSERT(0);
>>             +    return TRUE; /* Wrong state, should not happen */
>>             +  }
>>
>>
>>         <cut>
>>
>>         Add to the comment for base64_decode that the 'dst' buffer has to
>> be
>>         at least 2 character longer than what is needed to decode
>> src_base.
>>
>>
>>     I added this comment:
>>
>>     * Note: 'dst' must have sufficient space to store the decoded data.
>>     * Use base64_needed_decoded_length() to calculate the correct space
>>     size.
>>
>>
>>
>>
>>
>>
>>             +int
>>             +base64_decode(const char *src_base, size_t len,
>>             +              void *dst, const char **end_ptr, int flags)
>>             +{
>>             +    if (my_base64_decoder_getch(&__**decoder) ||
>>             +        my_base64_decoder_getch(&__**decoder) ||
>>             +        my_base64_decoder_getch(&__**decoder) ||
>>             +        my_base64_decoder_getch(&__**decoder))
>>
>>             +      break;
>>
>>
>>         Generating the error handling with a switch for every of the
>>         above is wasteful. See previous comment.
>>
>>
>>     Right. my_base64_decoder_getch() is not inline anymore.
>>
>>
>>
>>             +  /* Return error if there are more non-space characters */
>>             +  decoder.state= 0;
>>             +  if (!my_base64_decoder_skip___**spaces(&decoder))
>>
>>             +    decoder.error= 1;
>>             +
>>                  if (end_ptr != NULL)
>>
>>
>>         Note that base64_needed_decoded_length() used ceil() when it's
>>         not needed.
>>
>>
>>     Removed.
>>
>>
>>
>>
>>         <cut>
>>
>>             === modified file 'sql/item_strfunc.cc'
>>             @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix___**
>> length_a
>>                   set_persist_maybe_null(1);
>>                }
>>
>>             +
>>             +void Item_func_to_base64::fix___**length_and_dec()
>>
>>             +{
>>             +  maybe_null= args[0]->maybe_null;
>>             +  collation.set(default_charset(**__), DERIVATION_COERCIBLE,
>>
>>             MY_REPERTOIRE_ASCII);
>>
>>
>>         Maybe better to cast both arguments to (ulonglong) as the rest
>>         of the code
>>         is depending on this.
>>
>>         Also, why is base64_encode_max_arg_length() int instead of uint or
>>         even better size_t?
>>
>>
>>     This is because base64_encode() and base64_decode() are used
>>     in the replication code using "int" as return value.
>>
>>     When exposing TO_BASE64() and FROM_BASE64() to the SQL level,
>>     I did not want to touch the replication code.
>>
>>
>>
>>             +  if (args[0]->max_length > (uint)
>>             base64_encode_max_arg_length()**__)
>>             +  {
>>             +    maybe_null= 1;
>>             +    fix_char_length_ulonglong((__**ulonglong)
>>             base64_encode_max_arg_length()**__);
>>             +  }
>>             +  else
>>             +  {
>>             +    int length= base64_needed_encoded_length((**__int)
>>
>>             args[0]->max_length);
>>             +    DBUG_ASSERT(length > 0);
>>
>>
>>         Don't think assert is needed as the function gurantees it already.
>>
>>             +    fix_char_length_ulonglong((__**ulonglong) length - 1);
>>             +  }
>>             +}
>>
>>
>>         <cut>
>>
>>             +String *Item_func_from_base64::val___**str(String *str)
>>
>>             +{
>>             +  String *res= args[0]->val_str_ascii(str);
>>             +  bool too_long= false;
>>             +  int length;
>>             +  const char *end_ptr;
>>             +
>>             +  if (!res ||
>>             +      res->length() > (uint) base64_decode_max_arg_length()
>> ||
>>             +      (too_long=
>>             +       ((uint) (length=
>>             base64_needed_decoded_length((**__int) res->length())) >
>>             +        current_thd->variables.max___**allowed_packet)) ||
>>
>>             +      tmp_value.alloc((uint) length) ||
>>             +      (length= base64_decode(res->ptr(), (int) res->length(),
>>             +                             (char *) tmp_value.ptr(),
>>             &end_ptr, 0)) < 0 ||
>>             +      end_ptr < res->ptr() + res->length())
>>             +  {
>>
>>
>>         Shouldn't we get a readable error or warning if the string
>>         contained wrong
>>         characters?
>>         Now it looks like we will just return null.
>>
>>         Something like 'Malformed base64 string. Error at position %d"
>> would
>>         be nice.
>>
>>
>>     Good idea. I have added a warning. Now it looks clear what went wrong
>>     if FROM_BASE64() returns NULL.
>>
>>
>>
>>
>>
>>             +    null_value= 1; // NULL input, too long input, OOM, or
>>             badly formed input
>>             +    if (too_long)
>>             +    {
>>             +      push_warning_printf(current___**thd,
>>             Sql_condition::WARN_LEVEL___**WARN,
>>             +                          ER_WARN_ALLOWED_PACKET___**
>> OVERFLOWED,
>>             +
>>               ER(ER_WARN_ALLOWED_PACKET___**OVERFLOWED), func_name(),
>>             +
>>               current_thd->variables.max___**allowed_packet);
>>
>>             +    }
>>             +    return 0;
>>             +  }
>>             +  tmp_value.length((uint) length);
>>             +  null_value= 0;
>>             +  return &tmp_value;
>>             +}
>>
>>
>>         <cut>
>>
>>     <cut>
>>
>>
>>     ______________________________**_________________
>>     Mailing list: https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
>>     Post to     : maria-developers@lists.**launchpad.net<maria-developers@xxxxxxxxxxxxxxxxxxx>
>>     <mailto:maria-developers@**lists.launchpad.net<maria-developers@xxxxxxxxxxxxxxxxxxx>
>> >
>>
>>     Unsubscribe : https://launchpad.net/~maria-**developers<https://launchpad.net/~maria-developers>
>>     More help   : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
>>
>>
>>
>>
>> --
>> Roberto Spadim
>> SPAEmpresarial
>>
>


-- 
Roberto Spadim
SPAEmpresarial
Follow ups

Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-23
References

Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-12
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Roberto Spadim, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17