maria-developers team mailing list archive

Thread
Date

Re: [MariaDB/server] Restrict the speed of reading binlog from Master (#246)

To: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Wed, 19 Oct 2016 16:42:10 +0200
Cc: MariaDB/server <reply+000136c1bd7a21746203ed0fe7e542253f2f72dcf1b11f7892cf0000000114186ec792a169ce0ae8850a@xxxxxxxxxxxxxxxx>
In-reply-to: <MariaDB/server/pull/246@github.com> (GCSAdmin's message of "Fri, 14 Oct 2016 03:00:39 -0700")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)

GCSAdmin <notifications@xxxxxxxxxx> writes:

> In some case, the speed of reading binlog from master is high, especially when doing a new replica.
> It would bring the high traffic in master.
> So We introduce a new variable "read_binlog_speed_limit" to control the binlog reading rate for IO thread to solve the problem.

> It can work when slave_compressed_protocol is on.
> But it maybe doesn't work well when the binlog event is very big.
> You can view, comment on, or merge this pull request online at:
>
>   https://github.com/MariaDB/server/pull/246

> -- Patch Links --
>
> https://github.com/MariaDB/server/pull/246.patch
> https://github.com/MariaDB/server/pull/246.diff

Overall this looks clean and simple.

There is one problem. The patch adds a field real_network_read_len to the
NET structure. This will break the client library ABI, because NET is a part
of MYSQL. So this would cause client programs to crash if they are linked
with a different version of the client library. So this needs to be changed
(if I understand correctly).

One option might be to introduce new functions like cli_safe_read_reallen()
and my_net_read_packet_reallen(), which return in addition the actual amount
of bytes read from the server. The old cli_safe_read() and
my_net_read_packet() could then become simple wrapper functions around
those. And cli_safe_read_reallen() can be used in read_event() in
sql/slave.cc.

A smaller issue is that in case of a large packet, a large my_sleep() may be
invoked, which will cause STOP SLAVE to hang. I think this can be solved
simply by calling slave_sleep() instead, it handles terminating the wait
early if interrupted by STOP SLAVE.

Detailed comments on the patch below. I rebased the series against latest
10.2 to get a clean diff (the pull request includes a couple merges against
the main 10.2 tree, these changes are unrelated to the patch). The rebase is
in https://github.com/knielsen/server/tree/GCSAdmin-10.2-binlog-speed-limit-2

> diff --git a/include/mysql.h.pp b/include/mysql.h.pp
> index 857f5b9..1da038c 100644
> --- a/include/mysql.h.pp
> +++ b/include/mysql.h.pp
> @@ -35,6 +35,7 @@ typedef struct st_net {
>    my_bool thread_specific_malloc;
>    unsigned char compress;
>    my_bool unused3;
> +  unsigned long real_network_read_len;

As explained above, I believe this would break the ABI (that's the purpose
of mysql.h.pp, to catch such problems).

> diff --git a/sql/slave.cc b/sql/slave.cc
> index 20bf68e..52bb668 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc

> @@ -3307,13 +3308,14 @@ static int request_dump(THD *thd, MYSQL* mysql, Master_info* mi,
>                          try a reconnect.  We do not want to print anything to
>                          the error log in this case because this a anormal
>                          event in an idle server.
> +    network_read_len    get the real network read length in VIO, especially using compressed protocol 
>  
>      RETURN VALUES
>      'packet_error'      Error
>      number              Length of packet
>  */
>  
> -static ulong read_event(MYSQL* mysql, Master_info *mi, bool* suppress_warnings)
> +static ulong read_event(MYSQL* mysql, Master_info *mi, bool* suppress_warnings, ulong* network_read_len)

Generally, lines longer than 80 characters should be avoided (coding style).

> @@ -4473,6 +4479,34 @@ Stopping slave I/O thread due to out-of-memory error from master");
>          goto err;
>        }
>  
> +      /* Control the binlog read speed of master when read_binlog_speed_limit is non-zero
> +      */
> +      ulonglong read_binlog_speed_limit_in_bytes = opt_read_binlog_speed_limit * 1024;
> +      if (read_binlog_speed_limit_in_bytes) 
> +      {
> +        /* prevent the tokenamount become a large value, 
> +        for example, the IO thread doesn't work for a long time
> +        */
> +        if (tokenamount > read_binlog_speed_limit_in_bytes * 2) 
> +        {
> +          lastchecktime = my_hrtime().val;
> +          tokenamount = read_binlog_speed_limit_in_bytes * 2;
> +        }
> +
> +        do
> +        {
> +          ulonglong currenttime = my_hrtime().val;
> +          tokenamount += (currenttime - lastchecktime) * read_binlog_speed_limit_in_bytes / (1000*1000);
> +          lastchecktime = currenttime;
> +          if(tokenamount < network_read_len)
> +          {
> +            ulonglong micro_sleeptime = 1000*1000 * (network_read_len - tokenamount) / read_binlog_speed_limit_in_bytes ;  
> +            my_sleep(micro_sleeptime > 1000 ? micro_sleeptime : 1000); // at least sleep 1000 micro second
> +          }
> +        }while(tokenamount < network_read_len);
> +        tokenamount -= network_read_len;
> +      }
> +

As explained above, probably better to use slave_sleep() here to allow STOP
SLAVE to interrupt a long sleep.

Would it make sense to do this wait after calling queue_event()? This way
the SQL thread can start applying the event immediately, reducing slave lag.

What do you think?

Thanks,

 - Kristian.