maria-developers team mailing list archive

Thread
Date
Re: Review of base64.diff and german2.diff

To: Alexander Barkov <bar@xxxxxxxxxxx>
From: Roberto Spadim <roberto@xxxxxxxxxxxxx>
Date: Tue, 17 Sep 2013 11:38:22 -0300
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <52382F28.4020800@mariadb.org>
hi guys!
one question, as a mariadb user...
base64 will be exposed to sql layer? could i use TO_BASE64() and
FROM_BASE64() at mysql client?
there's a MDEV in JIRA about it, if you could attach it to this patch could
be nice: MDEV-4387 <https://mariadb.atlassian.net/browse/MDEV-4387>

and with time, implement base32 and base16:
MDEV-4800<https://mariadb.atlassian.net/browse/MDEV-4800>

base64 is very usefull, and many php users use it to "send/receive" files
from database
base32 from what i know is usefull only for OTP and other password apps
base16 is a hexadecimal function HEX() and UNHEX()

thanks guys


2013/9/17 Alexander Barkov <bar@xxxxxxxxxxx>

> Hi Monty,
>
> thanks for review.
>
> I have addressed most of your suggestions. See the new version attached,
> and the detailed comments inline:
>
>
>
> On 09/12/2013 04:32 PM, Michael Widenius wrote:
>
>>
>> Hi!
>>
>> Here is the review for the code that we should put into 10.0
>>
>> First the base64:
>>
>>  === modified file 'mysql-test/t/func_str.test'
>>> --- mysql-test/t/func_str.test  2013-05-07 11:05:09 +0000
>>> +++ mysql-test/t/func_str.test  2013-08-28 13:14:24 +0000
>>> @@ -1555,3 +1555,118 @@ drop table t1,t2;
>>>   --echo # End of 5.5 tests
>>>   --echo #
>>>
>>> +
>>> +--echo #
>>> +--echo # Start of 5.6 tests
>>> +--echo #
>>>
>>
>> Shouldn't this be start of 10.0 tests ?
>> (I know that this code is from MySQL 5.6, but still for us this is
>> 10.0...)
>>
>
> This code is (almost) copy-and-paste from MySQL-5.6.
> I think it's a good idea when looking inside a test file
> to be able to see which tests are coming from MySQL-5.6,
> and which tests are coming from MariaDB-10.0.
>
> I'd suggest to keep "Start of 5.6 tests" in this particular case,
> and also when merging the tests for the other merged MySQL-5.6 features.
>
>
>
>
>>  === modified file 'mysys/base64.c'
>>> --- mysys/base64.c      2011-06-30 15:46:53 +0000
>>> +++ mysys/base64.c      2013-03-09 06:22:59 +0000
>>> @@ -1,5 +1,4 @@
>>> -/* Copyright (c) 2003-2008 MySQL AB, 2009 Sun Microsystems, Inc.
>>> -   Use is subject to license terms.
>>> +/* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights
>>> reserved.
>>>
>>
>> Removed 'all rights reserved'. You can't have that for GPL code.
>>
>> If there is any new code from us, please add a copyright message for the
>> MariaDB foundation too!
>>
>
> Removed 'all rights reserved' and added "MariaDB", as there are some
> our own changes.
>
>
>
>>
>>>      This program is free software; you can redistribute it and/or modify
>>>      it under the terms of the GNU General Public License as published by
>>> @@ -25,6 +24,28 @@ static char base64_table[] = "ABCDEFGHIJ
>>>                                "abcdefghijklmnopqrstuvwxyz"
>>>                                "0123456789+/";
>>>
>>> +/**
>>> + * Maximum length base64_needed_encoded_length()
>>> + * can handle without overflow.
>>> + */
>>> +int
>>> +base64_encode_max_arg_length(**)
>>> +{
>>> +#if (SIZEOF_INT == 8)
>>> +  /*
>>> +    6827690988321067803 ->   9223372036854775805
>>> +    6827690988321067804 ->  -9223372036854775807
>>> +  */
>>>
>>
>> Please describe from where the following number comes from
>> (so that anyone can recalculate and check the number if needed):
>>
>
> I added more comments where these numbers come from.
>
>
>
>>  +  return 0x5EC0D4C77B03531BLL
>>> +#else
>>> +  /*
>>> +    1589695686 ->  2147483646
>>> +    1589695687 -> -2147483645
>>> +  */
>>> +  return 0x5EC0D4C6;
>>> +#endif
>>> +}
>>> +
>>>
>>>   int
>>>   base64_needed_encoded_length(**int length_of_data)
>>> @@ -39,6 +60,21 @@ base64_needed_encoded_length(**int length_
>>>   }
>>>
>>>
>>> +/**
>>> + * Maximum length base64_needed_decoded_length()
>>> + * can handle without overflow.
>>> + */
>>> +int
>>> +base64_decode_max_arg_length(**)
>>> +{
>>>
>>
>> The following also need a comment.
>>
>
> Also added comments.
>
>
>
>>  +#if (SIZEOF_INT == 8)
>>> +  return 0x2AAAAAAAAAAAAAAALL;
>>> +#else
>>> +  return 0x2AAAAAAA;
>>> +#endif
>>> +}
>>> +
>>> +
>>>
>>
>> <cut>
>>
>>  +/**
>>> + * Get the next character from a base64 encoded stream.
>>> + * Skip spaces, then scan the next base64 character or a pad character
>>> + * and collect bits into decoder->c.
>>> + *
>>> + * @param  decoder  Pointer to MY_BASE64_DECODER
>>> + * @return
>>> + *  FALSE on success (a valid base64 encoding character found)
>>> + *  TRUE  on error (unexpected character or unexpected end-of-input
>>> found)
>>> + */
>>>
>>
>> Add an empty line here.
>>
>
> Done.
>
>
>
>> It would also be good to have a good description of the base64 format
>> in this file so that one can understand what the functions are
>> supposed to do (or at least a link to the specification).
>>
>> Some thing I figured out by asking bar on IRC:
>>
>> - We never generate spaces
>> - When decoding, we allow spaces anywhere.
>> - The end of an 64 base  decoded string is always padded with '='
>>    so that the final length is dividable with 4.
>>
>
> I added a reference to http://en.wikipedia.org/wiki/**Base64<http://en.wikipedia.org/wiki/Base64>
> ,
> as well all added these your comments in proper places of the code.
>
>
>
>
>>  +static inline my_bool
>>> +my_base64_decoder_getch(MY_**BASE64_DECODER *decoder)
>>>
>>
>> Remove inline from the above; It's likely to make the code slower, not
>> faster as this is a big function with lot of if's.
>>
>
> Removed "inline".
>
>
>
>>    {
>>> -  char b[3];
>>> -  size_t i= 0;
>>> -  char *dst_base= (char *)dst;
>>> -  char const *src= src_base;
>>> -  char *d= dst_base;
>>> -  size_t j;
>>> +  if (my_base64_decoder_skip_**spaces(decoder))
>>> +    return TRUE; /* End-of-input */
>>>
>>> -  while (i < len)
>>> +  if (!my_base64_add(decoder)) /* Valid base64 character found */
>>>     {
>>> -    unsigned c= 0;
>>> -    size_t mark= 0;
>>> +    if (decoder->mark)
>>> +    {
>>> +      /* If we have scanned '=' already, then only '=' is valid */
>>> +      DBUG_ASSERT(decoder->state == 3);
>>> +      decoder->error= 1;
>>> +      decoder->src--;
>>> +      return TRUE; /* expected '=', but encoding character found */
>>> +    }
>>> +    decoder->state++;
>>> +    return FALSE;
>>> +  }
>>>
>>
>> If you want to have the funtion inline, then move the following to
>> another not inline function;  We don't need to generate code for this
>> for every function call:
>>
>>  +
>>> +  /* Process error */
>>> +  switch (decoder->state)
>>> +  {
>>> +  case 0:
>>> +  case 1:
>>> +    decoder->src--;
>>> +    return TRUE; /* base64 character expected */
>>> +    break;
>>> +
>>> +  case 2:
>>> +  case 3:
>>> +    if (decoder->src[-1] == '=')
>>> +    {
>>> +      decoder->error= 0; /* Not an error - it's a pad character */
>>> +      decoder->mark++;
>>> +    }
>>> +    else
>>> +    {
>>> +      decoder->src--;
>>> +      return TRUE; /* base64 character or '=' expected */
>>> +    }
>>> +    break;
>>> +  default:
>>> +    DBUG_ASSERT(0);
>>> +    return TRUE; /* Wrong state, should not happen */
>>> +  }
>>>
>>
>> <cut>
>>
>> Add to the comment for base64_decode that the 'dst' buffer has to be
>> at least 2 character longer than what is needed to decode src_base.
>>
>
> I added this comment:
>
> * Note: 'dst' must have sufficient space to store the decoded data.
> * Use base64_needed_decoded_length() to calculate the correct space size.
>
>
>
>
>
>
>>  +int
>>> +base64_decode(const char *src_base, size_t len,
>>> +              void *dst, const char **end_ptr, int flags)
>>> +{
>>> +    if (my_base64_decoder_getch(&**decoder) ||
>>> +        my_base64_decoder_getch(&**decoder) ||
>>> +        my_base64_decoder_getch(&**decoder) ||
>>> +        my_base64_decoder_getch(&**decoder))
>>> +      break;
>>>
>>
>> Generating the error handling with a switch for every of the
>> above is wasteful. See previous comment.
>>
>
> Right. my_base64_decoder_getch() is not inline anymore.
>
>
>
>>  +  /* Return error if there are more non-space characters */
>>> +  decoder.state= 0;
>>> +  if (!my_base64_decoder_skip_**spaces(&decoder))
>>> +    decoder.error= 1;
>>> +
>>>     if (end_ptr != NULL)
>>>
>>
>> Note that base64_needed_decoded_length() used ceil() when it's not needed.
>>
>
> Removed.
>
>
>
>
>> <cut>
>>
>>  === modified file 'sql/item_strfunc.cc'
>>> @@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_**length_a
>>>      set_persist_maybe_null(1);
>>>   }
>>>
>>> +
>>> +void Item_func_to_base64::fix_**length_and_dec()
>>> +{
>>> +  maybe_null= args[0]->maybe_null;
>>> +  collation.set(default_charset(**), DERIVATION_COERCIBLE,
>>> MY_REPERTOIRE_ASCII);
>>>
>>
>> Maybe better to cast both arguments to (ulonglong) as the rest of the code
>> is depending on this.
>>
>> Also, why is base64_encode_max_arg_length() int instead of uint or
>> even better size_t?
>>
>
> This is because base64_encode() and base64_decode() are used
> in the replication code using "int" as return value.
>
> When exposing TO_BASE64() and FROM_BASE64() to the SQL level,
> I did not want to touch the replication code.
>
>
>
>>  +  if (args[0]->max_length > (uint) base64_encode_max_arg_length()**)
>>> +  {
>>> +    maybe_null= 1;
>>> +    fix_char_length_ulonglong((**ulonglong)
>>> base64_encode_max_arg_length()**);
>>> +  }
>>> +  else
>>> +  {
>>> +    int length= base64_needed_encoded_length((**int)
>>> args[0]->max_length);
>>> +    DBUG_ASSERT(length > 0);
>>>
>>
>> Don't think assert is needed as the function gurantees it already.
>>
>>  +    fix_char_length_ulonglong((**ulonglong) length - 1);
>>> +  }
>>> +}
>>>
>>
>> <cut>
>>
>>  +String *Item_func_from_base64::val_**str(String *str)
>>> +{
>>> +  String *res= args[0]->val_str_ascii(str);
>>> +  bool too_long= false;
>>> +  int length;
>>> +  const char *end_ptr;
>>> +
>>> +  if (!res ||
>>> +      res->length() > (uint) base64_decode_max_arg_length() ||
>>> +      (too_long=
>>> +       ((uint) (length= base64_needed_decoded_length((**int)
>>> res->length())) >
>>> +        current_thd->variables.max_**allowed_packet)) ||
>>> +      tmp_value.alloc((uint) length) ||
>>> +      (length= base64_decode(res->ptr(), (int) res->length(),
>>> +                             (char *) tmp_value.ptr(), &end_ptr, 0)) <
>>> 0 ||
>>> +      end_ptr < res->ptr() + res->length())
>>> +  {
>>>
>>
>> Shouldn't we get a readable error or warning if the string contained wrong
>> characters?
>> Now it looks like we will just return null.
>>
>> Something like 'Malformed base64 string. Error at position %d" would
>> be nice.
>>
>
> Good idea. I have added a warning. Now it looks clear what went wrong
> if FROM_BASE64() returns NULL.
>
>
>
>
>>
>>  +    null_value= 1; // NULL input, too long input, OOM, or badly formed
>>> input
>>> +    if (too_long)
>>> +    {
>>> +      push_warning_printf(current_**thd, Sql_condition::WARN_LEVEL_**
>>> WARN,
>>> +                          ER_WARN_ALLOWED_PACKET_**OVERFLOWED,
>>> +                          ER(ER_WARN_ALLOWED_PACKET_**OVERFLOWED),
>>> func_name(),
>>> +                          current_thd->variables.max_**allowed_packet);
>>> +    }
>>> +    return 0;
>>> +  }
>>> +  tmp_value.length((uint) length);
>>> +  null_value= 0;
>>> +  return &tmp_value;
>>> +}
>>>
>>
>> <cut>
>>
>>  <cut>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Roberto Spadim
SPAEmpresarial
Follow ups

Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
References

Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-12
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17