← Back to team overview

maria-developers team mailing list archive

Re: Correct parsing of BINLOG 'XXX' with comment inside

 

Hello Sachin, Andrei,


Andrei, thanks for your analysis!

Sachin, can you please try to implement according to Andrei's proporal?

Thanks!

On 10/18/2018 03:57 PM, andrei.elkin@xxxxxxxxxx wrote:
> Alexander, Sachin, hello.
> 
>> Hi Andrei,
>>
>> We need your help on:
>>
>> MDEV-10362 mysqlbinlog verbose output cannot be decoded
>>
>>
>> The problem is that "mysqlbinlog -vvv" writes
>> comments right inside the BINLOG statement,
>> between base64 chunks:
>>
>>
>> BINLOG '
>> kxiEVxMBAAAALQAAAMUBAAAAABMAAAAAAAEABHRlc3QAAnQxAAMIDwMCIAAG
>> ...
>> AAAAAAAAAQAAAPo/AAAAAAAAAAAAAAD6QAAAAAAAAAABAAAA+kAAAAAAAAAAAAAAAA==
>> ### UPDATE `test`.`t1`
>> ### WHERE
>> ###   @1=1 /* LONGINT meta=0 nullable=0 is_null=0 */
>> ###   @2=NULL /* LONGINT meta=32 nullable=1 is_null=1 */
>> ###   @3=1 /* INT meta=0 nullable=1 is_null=0 */
>> AAAAAAD63QUAAAAAAAABAAAA+t0FAAAAAAAAAAAAAPreBQAAAAAAAAEAAAD63gUAAAAAAAAAAAAA
>> '/*!*/;
> 
> Right, the comments get embedded into base64 stream for no reason.
> 
>>
>>
>> So when later we try to load this output to "mysql", it fails,
>> because when mysql_client_binlog_statement() calls base64_decode(),
>> the latter fails as it does not expect any comments.
>>
>> Sachin proposes to add a new flag to base64_decode(), to make
>> skip comments when this flags is on.
>>
>> I'm not sure that we should do this.
>>
>> Would it be possible to fix mysqlbinlog instead, to print:
>> - a BINLOG statement with only base64 data, without any comments inside
>> - followed by comments, fully outside of the BINLOG statement.
>> ?
> 
> A sort of what you both are suggesting, let's add up one more cache,
> pass it to print_base64(body, vcom, print_event_info,...) [ actually,
> do we really need the cache args as long as they represent
> `print_event_info' members? - to check out closely]
> 
> 
> void Rows_log_event::print_helper(FILE *file,
>                                   PRINT_EVENT_INFO *print_event_info,
>                                   char const *const name)
> {
>   IO_CACHE *const head= &print_event_info->head_cache;
>   IO_CACHE *const body= &print_event_info->body_cache;
> + IO_CACHE ...    vcom= ... // vcom after Verbose COMments
>   bool do_print_encoded=
>     print_event_info->base64_output_mode != BASE64_OUTPUT_DECODE_ROWS &&
>     !print_event_info->short_form;
> 
>   if (!print_event_info->short_form)
>   {
>     bool const last_stmt_event= get_flags(STMT_END_F);
>     print_header(head, print_event_info, !last_stmt_event);
>     my_b_printf(head, "\t%s: table id %lu%s\n",
>                 name, m_table_id,
>                 last_stmt_event ? " flags: STMT_END_F" : "");
> -   print_base64(body, print_event_info, do_print_encoded);
> +   print_base64(body, vcom, print_event_info, do_print_encoded);
> 
> so inside
> 
> void Log_event::print_base64(IO_CACHE* file,
> +                            IO_CACHE* vcom, 
>                              PRINT_EVENT_INFO* print_event_info,
>                              bool do_print_encoded)
> 
> ...
> 
>   if (print_event_info->verbose)
>   {
>     ...
>     if (ev)
>     {
> -     ev->print_verbose(file, print_event_info);
> +     ev->print_verbose(vcom, print_event_info);
> 
> 
> Finally, empty up the vcom cache in a correct order.
> Should it be before BINLOG '...' I think, right?
> 
> 
>   if (get_flags(STMT_END_F))
>   {
>      ...
>    copy_event_cache_to_file_and_reinit(head, file);
> +   copy_event_cache_to_file_and_reinit(vcom, file);
>    copy_event_cache_to_file_and_reinit(body, file);
> 
> Sachin, regardless whether you like this elaboration,
> you got to check out bb-10.1-MDEV-10963 or the latest
> patch in the commit list to find some refactoring in this area.
> There's a chance it would be pushed before the current task's one.
> 
> Cheers,
> 
> Andrei
> 
> 
>>
>> Or even, should not we split such BINLOG statement with multiple
>> base64 chunks into individual BINLOG statements?
>>
>>
>> Thanks.
>>
>>
>> On 10/16/2018 04:03 PM, Sachin Setiya wrote:
>>> Hi Bar!
>>> On Tue, Oct 16, 2018 at 7:33 AM Alexander Barkov <bar@xxxxxxxxxxx> wrote:
>>>>
>>>> Hi Sachin,
>>>>
>>>>
>>>> On 10/16/2018 03:49 AM, Sachin Setiya wrote:
>>>>> Hi Bar!
>>>>> On Tue, Sep 11, 2018 at 1:47 PM Alexander Barkov <bar@xxxxxxxxxxx> wrote:
>>>>>>
>>>>>> Hi Sachin,
>>>>>>
>>>>>>
>>>>>> On 09/10/2018 03:47 PM, Sachin Setia wrote:
>>>>>>> Hi Bar,
>>>>>>>
>>>>>>> Currently if we have generated the sql file using mysqlbinlog -v and
>>>>>>> if we have big statement binlog
>>>>>>> , then mysqlbinlog add comment in between of Binlog 'XXX' statement ,
>>>>>>> but unfortunately base64_decode
>>>>>>> does not understands it, and it through error since # is not a base64 char
>>>>>>>
>>>>>>> This patches solves this.
>>>>>>
>>>>>> Note, base64_decode() is used in at least two places:
>>>>>> - for the binary log
>>>>>> - for the SQL function FROM_BASE64()
>>>>>>
>>>>>>
>>>>>> I don't like that your patch changes the behavior of the SQL function
>>>>>> FROM_BASE64(). It should not recognize any '#' inside the base64 data
>>>>>> as comments.
>>>>> Right , So I have creates a flag MY_BASE64_DECODE_ALLOW_COMMENTS it is
>>>>> kind of similar to as MY_BASE64_DECODE_ALLOW_MULTIPLE_CHUNKS flag is used
>>>>> So no impact on FROM_BASE64
>>>>>
>>>>>>
>>>>>> So perhaps this should be fixed in some other place, not in base64_decode().
>>>>>>
>>>>>>
>>>>>> I checked the output of these commands:
>>>>>>
>>>>>> ./bin/mysqlbinlog      ./data/retsina-bin.000003
>>>>>> ./bin/mysqlbinlog -vvv ./data/retsina-bin.000003
>>>>>>
>>>>>> It looks suspicious for me.
>>>>>>
>>>>>> Can you please remind why mysqlbinlog prints multiple base64 chunks
>>>>>> inside the same BINLOG statement?
>>>>> It is beacuse if we have long row we can have more then one
>>>>> ROWS_LOG_EVENT after TABLE_MAP_EVENT
>>>>
>>>> Do you know where in the code the server decides to end
>>>> the current chunk and start a new one?
>>> Here In binlog_prepare_pending_rows_event
>>>   if (!pending ||
>>>       pending->server_id != serv_id ||
>>>       pending->get_table_id() != table->s->table_map_id ||
>>>       pending->get_general_type_code() != general_type_code ||
>>>       pending->get_data_size() + needed > opt_binlog_rows_event_max_size ||
>>>       ^^^ this line
>>>       pending->get_width() != colcnt ||
>>>       !bitmap_cmp(pending->get_cols(), cols))
>>>   {
>>>
>>>>
>>>> Is there some limit (in bytes, or in number or records)?
>>>>
>>>>>>
>>>>>>
>>>>>> From my understanding, it is "mysqlbinlog -vvv" who should be fixed
>>>>>> to print good base64 data inside the string literal that follows the
>>>>>> BINLOG keyword.
>>>>>>
>>>>>> At least  the additional comments printed by -vvv should be outside of
>>>>>> the BINLOG statement (presumably before), not inside.
>>>>>>
>>>>> Idk , this can be lot of work , plus it will slow mysqlbinlog also ,
>>>>> And I think with new solution FROM_BASE64 behavior is not changed
>>>>> I have added test for FROM_BASE64
>>>>> #No change in BASE64_FROM behaviour
>>>>> SELECT FROM_BASE64('TWFy
>>>>> #Comment 344
>>>>> #
>>>>> aWE=') AS 'Output';
>>>>> this will generate Maria in earlier patch which is wrong , with new
>>>>> patch it gives NULL
>>>>
>>>> If we really go this way, I suggest to cover all these scenarios too:
>>>>
>>>> SELECT FROM_BASE64('TWFyaWE=# comment1');
>>>> SELECT FROM_BASE64('TWFyaWE=\n# comment1');
>>>> SELECT FROM_BASE64('TWFyaWE=\n# comment1\nTWFyaWE=');
>>>> SELECT FROM_BASE64('TWFyaWE=\n# comment1\nTWFyaWE=\n# comment2\n');
>>>>
>>>> They all should give NULL.
>>> They all are giving null
>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~maria-developers
>>>>>> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~maria-developers
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>


References