maria-developers team mailing list archive

Thread
Date

Re: Review of base64.diff and german2.diff

To: monty@xxxxxxxxxxxx
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Tue, 17 Sep 2013 20:52:06 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <21048.32631.552422.77783@narttu.askmonty.org>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6

Hi Monty,


On 09/17/2013 08:12 PM, Michael Widenius wrote:

Hi!

"Alexander" == Alexander Barkov <bar@xxxxxxxxxxx> writes:


Alexander> Hi Monty,
Alexander> thanks for review.

Alexander> I have addressed most of your suggestions. See the new version attached,
Alexander> and the detailed comments inline:


Alexander> On 09/12/2013 04:32 PM, Michael Widenius wrote:


Hi!

Here is the review for the code that we should put into 10.0

First the base64:

=== modified file 'mysql-test/t/func_str.test'
--- mysql-test/t/func_str.test	2013-05-07 11:05:09 +0000
+++ mysql-test/t/func_str.test	2013-08-28 13:14:24 +0000
@@ -1555,3 +1555,118 @@ drop table t1,t2;
--echo # End of 5.5 tests
--echo #

+
+--echo #
+--echo # Start of 5.6 tests
+--echo #


Shouldn't this be start of 10.0 tests ?
(I know that this code is from MySQL 5.6, but still for us this is 10.0...)


Alexander> This code is (almost) copy-and-paste from MySQL-5.6.
Alexander> I think it's a good idea when looking inside a test file
Alexander> to be able to see which tests are coming from MySQL-5.6,
Alexander> and which tests are coming from MariaDB-10.0.

Alexander> I'd suggest to keep "Start of 5.6 tests" in this particular case,
Alexander> and also when merging the tests for the other merged MySQL-5.6 features.

As long a this is an exact copy paste it's ok.

<cut>

It would also be good to have a good description of the base64 format
in this file so that one can understand what the functions are
supposed to do (or at least a link to the specification).

Some thing I figured out by asking bar on IRC:

- We never generate spaces
- When decoding, we allow spaces anywhere.
- The end of an 64 base  decoded string is always padded with '='
so that the final length is dividable with 4.


Alexander> I added a reference to http://en.wikipedia.org/wiki/Base64,
Alexander> as well all added these your comments in proper places of the code.

Thanks. That will make the code much easier to understand.

I just read trough the definition and noticed that some versions
doesn't use '=' padding.

Should we allow not '=' padding in our decoder too?
(I think it's best to always pad on encoding).



http://dev.mysql.com/doc/refman/5.6/en/string-functions.html#function_to-base64
says:

> Different base-64 encoding schemes exist. These are the encoding and
> decoding rules used by TO_BASE64() and FROM_BASE64():
> ...
> * Encoded output consists of groups of 4 printable characters. Each 3
> bytes of the input data are encoded using 4 characters. If the last
> group is incomplete, it is padded with '=' characters to a length of
> 4.

So we always pad on encoding.

So do most of the modern pieces of software.

So does PHP:
http://php.net/manual/en/function.base64-encode.php

So does Linux "bas64" program:

$ echo "123" |base64
MTIzCg==


This is why we always require pad characters on decoding.
I don't think we should accept not-padded values.


<cut>

=== modified file 'sql/item_strfunc.cc'
@@ -451,6 +452,101 @@ void Item_func_aes_decrypt::fix_length_a
set_persist_maybe_null(1);
}

+
+void Item_func_to_base64::fix_length_and_dec()
+{
+  maybe_null= args[0]->maybe_null;
+  collation.set(default_charset(), DERIVATION_COERCIBLE, MY_REPERTOIRE_ASCII);


Maybe better to cast both arguments to (ulonglong) as the rest of the code
is depending on this.

Also, why is base64_encode_max_arg_length() int instead of uint or
even better size_t?


Alexander> This is because base64_encode() and base64_decode() are used
Alexander> in the replication code using "int" as return value.

Alexander> When exposing TO_BASE64() and FROM_BASE64() to the SQL level,
Alexander> I did not want to touch the replication code.

ok.  We should however at some case fix that.

<cut>

+String *Item_func_from_base64::val_str(String *str)
+{
+  String *res= args[0]->val_str_ascii(str);
+  bool too_long= false;
+  int length;
+  const char *end_ptr;
+
+  if (!res ||
+      res->length() > (uint) base64_decode_max_arg_length() ||
+      (too_long=
+       ((uint) (length= base64_needed_decoded_length((int) res->length())) >
+        current_thd->variables.max_allowed_packet)) ||
+      tmp_value.alloc((uint) length) ||
+      (length= base64_decode(res->ptr(), (int) res->length(),
+                             (char *) tmp_value.ptr(), &end_ptr, 0)) < 0 ||
+      end_ptr < res->ptr() + res->length())
+  {


Shouldn't we get a readable error or warning if the string contained wrong
characters?
Now it looks like we will just return null.

Something like 'Malformed base64 string. Error at position %d" would
be nice.


Alexander> Good idea. I have added a warning. Now it looks clear what went wrong
Alexander> if FROM_BASE64() returns NULL.

Thanks!

Regards,
Monty

Follow ups

Re: Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-23
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17

References

Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-12
Re: Review of base64.diff and german2.diff
From: Alexander Barkov, 2013-09-17
Re: Review of base64.diff and german2.diff
From: Michael Widenius, 2013-09-17