← Back to team overview

maria-developers team mailing list archive

Re: Correct parsing of BINLOG 'XXX' with comment inside

 

Alexander, Sachin, hello.

> Hi Andrei,
>
> We need your help on:
>
> MDEV-10362 mysqlbinlog verbose output cannot be decoded
>
>
> The problem is that "mysqlbinlog -vvv" writes
> comments right inside the BINLOG statement,
> between base64 chunks:
>
>
> BINLOG '
> kxiEVxMBAAAALQAAAMUBAAAAABMAAAAAAAEABHRlc3QAAnQxAAMIDwMCIAAG
> ...
> AAAAAAAAAQAAAPo/AAAAAAAAAAAAAAD6QAAAAAAAAAABAAAA+kAAAAAAAAAAAAAAAA==
> ### UPDATE `test`.`t1`
> ### WHERE
> ###   @1=1 /* LONGINT meta=0 nullable=0 is_null=0 */
> ###   @2=NULL /* LONGINT meta=32 nullable=1 is_null=1 */
> ###   @3=1 /* INT meta=0 nullable=1 is_null=0 */
> AAAAAAD63QUAAAAAAAABAAAA+t0FAAAAAAAAAAAAAPreBQAAAAAAAAEAAAD63gUAAAAAAAAAAAAA
> '/*!*/;

Right, the comments get embedded into base64 stream for no reason.

>
>
> So when later we try to load this output to "mysql", it fails,
> because when mysql_client_binlog_statement() calls base64_decode(),
> the latter fails as it does not expect any comments.
>
> Sachin proposes to add a new flag to base64_decode(), to make
> skip comments when this flags is on.
>
> I'm not sure that we should do this.
>
> Would it be possible to fix mysqlbinlog instead, to print:
> - a BINLOG statement with only base64 data, without any comments inside
> - followed by comments, fully outside of the BINLOG statement.
> ?

A sort of what you both are suggesting, let's add up one more cache,
pass it to print_base64(body, vcom, print_event_info,...) [ actually,
do we really need the cache args as long as they represent
`print_event_info' members? - to check out closely]


void Rows_log_event::print_helper(FILE *file,
                                  PRINT_EVENT_INFO *print_event_info,
                                  char const *const name)
{
  IO_CACHE *const head= &print_event_info->head_cache;
  IO_CACHE *const body= &print_event_info->body_cache;
+ IO_CACHE ...    vcom= ... // vcom after Verbose COMments
  bool do_print_encoded=
    print_event_info->base64_output_mode != BASE64_OUTPUT_DECODE_ROWS &&
    !print_event_info->short_form;

  if (!print_event_info->short_form)
  {
    bool const last_stmt_event= get_flags(STMT_END_F);
    print_header(head, print_event_info, !last_stmt_event);
    my_b_printf(head, "\t%s: table id %lu%s\n",
                name, m_table_id,
                last_stmt_event ? " flags: STMT_END_F" : "");
-   print_base64(body, print_event_info, do_print_encoded);
+   print_base64(body, vcom, print_event_info, do_print_encoded);

so inside

void Log_event::print_base64(IO_CACHE* file,
+                            IO_CACHE* vcom, 
                             PRINT_EVENT_INFO* print_event_info,
                             bool do_print_encoded)

...

  if (print_event_info->verbose)
  {
    ...
    if (ev)
    {
-     ev->print_verbose(file, print_event_info);
+     ev->print_verbose(vcom, print_event_info);


Finally, empty up the vcom cache in a correct order.
Should it be before BINLOG '...' I think, right?


  if (get_flags(STMT_END_F))
  {
     ...
   copy_event_cache_to_file_and_reinit(head, file);
+   copy_event_cache_to_file_and_reinit(vcom, file);
   copy_event_cache_to_file_and_reinit(body, file);

Sachin, regardless whether you like this elaboration,
you got to check out bb-10.1-MDEV-10963 or the latest
patch in the commit list to find some refactoring in this area.
There's a chance it would be pushed before the current task's one.

Cheers,

Andrei


>
> Or even, should not we split such BINLOG statement with multiple
> base64 chunks into individual BINLOG statements?
>
>
> Thanks.
>
>
> On 10/16/2018 04:03 PM, Sachin Setiya wrote:
>> Hi Bar!
>> On Tue, Oct 16, 2018 at 7:33 AM Alexander Barkov <bar@xxxxxxxxxxx> wrote:
>>>
>>> Hi Sachin,
>>>
>>>
>>> On 10/16/2018 03:49 AM, Sachin Setiya wrote:
>>>> Hi Bar!
>>>> On Tue, Sep 11, 2018 at 1:47 PM Alexander Barkov <bar@xxxxxxxxxxx> wrote:
>>>>>
>>>>> Hi Sachin,
>>>>>
>>>>>
>>>>> On 09/10/2018 03:47 PM, Sachin Setia wrote:
>>>>>> Hi Bar,
>>>>>>
>>>>>> Currently if we have generated the sql file using mysqlbinlog -v and
>>>>>> if we have big statement binlog
>>>>>> , then mysqlbinlog add comment in between of Binlog 'XXX' statement ,
>>>>>> but unfortunately base64_decode
>>>>>> does not understands it, and it through error since # is not a base64 char
>>>>>>
>>>>>> This patches solves this.
>>>>>
>>>>> Note, base64_decode() is used in at least two places:
>>>>> - for the binary log
>>>>> - for the SQL function FROM_BASE64()
>>>>>
>>>>>
>>>>> I don't like that your patch changes the behavior of the SQL function
>>>>> FROM_BASE64(). It should not recognize any '#' inside the base64 data
>>>>> as comments.
>>>> Right , So I have creates a flag MY_BASE64_DECODE_ALLOW_COMMENTS it is
>>>> kind of similar to as MY_BASE64_DECODE_ALLOW_MULTIPLE_CHUNKS flag is used
>>>> So no impact on FROM_BASE64
>>>>
>>>>>
>>>>> So perhaps this should be fixed in some other place, not in base64_decode().
>>>>>
>>>>>
>>>>> I checked the output of these commands:
>>>>>
>>>>> ./bin/mysqlbinlog      ./data/retsina-bin.000003
>>>>> ./bin/mysqlbinlog -vvv ./data/retsina-bin.000003
>>>>>
>>>>> It looks suspicious for me.
>>>>>
>>>>> Can you please remind why mysqlbinlog prints multiple base64 chunks
>>>>> inside the same BINLOG statement?
>>>> It is beacuse if we have long row we can have more then one
>>>> ROWS_LOG_EVENT after TABLE_MAP_EVENT
>>>
>>> Do you know where in the code the server decides to end
>>> the current chunk and start a new one?
>> Here In binlog_prepare_pending_rows_event
>>   if (!pending ||
>>       pending->server_id != serv_id ||
>>       pending->get_table_id() != table->s->table_map_id ||
>>       pending->get_general_type_code() != general_type_code ||
>>       pending->get_data_size() + needed > opt_binlog_rows_event_max_size ||
>>       ^^^ this line
>>       pending->get_width() != colcnt ||
>>       !bitmap_cmp(pending->get_cols(), cols))
>>   {
>> 
>>>
>>> Is there some limit (in bytes, or in number or records)?
>>>
>>>>>
>>>>>
>>>>> From my understanding, it is "mysqlbinlog -vvv" who should be fixed
>>>>> to print good base64 data inside the string literal that follows the
>>>>> BINLOG keyword.
>>>>>
>>>>> At least  the additional comments printed by -vvv should be outside of
>>>>> the BINLOG statement (presumably before), not inside.
>>>>>
>>>> Idk , this can be lot of work , plus it will slow mysqlbinlog also ,
>>>> And I think with new solution FROM_BASE64 behavior is not changed
>>>> I have added test for FROM_BASE64
>>>> #No change in BASE64_FROM behaviour
>>>> SELECT FROM_BASE64('TWFy
>>>> #Comment 344
>>>> #
>>>> aWE=') AS 'Output';
>>>> this will generate Maria in earlier patch which is wrong , with new
>>>> patch it gives NULL
>>>
>>> If we really go this way, I suggest to cover all these scenarios too:
>>>
>>> SELECT FROM_BASE64('TWFyaWE=# comment1');
>>> SELECT FROM_BASE64('TWFyaWE=\n# comment1');
>>> SELECT FROM_BASE64('TWFyaWE=\n# comment1\nTWFyaWE=');
>>> SELECT FROM_BASE64('TWFyaWE=\n# comment1\nTWFyaWE=\n# comment2\n');
>>>
>>> They all should give NULL.
>> They all are giving null
>>>
>>>>> Thanks.
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~maria-developers
>>>>> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~maria-developers
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>>
>> 
>> 
>> 


Follow ups

References