maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #10009
Re: [MariaDB/server] Restrict the speed of reading binlog from Master (#246)
GCSAdmin <notifications@xxxxxxxxxx> writes:
> In some case, the speed of reading binlog from master is high, especially when doing a new replica.
> It would bring the high traffic in master.
> So We introduce a new variable "read_binlog_speed_limit" to control the binlog reading rate for IO thread to solve the problem.
> It can work when slave_compressed_protocol is on.
> But it maybe doesn't work well when the binlog event is very big.
> You can view, comment on, or merge this pull request online at:
>
> https://github.com/MariaDB/server/pull/246
> -- Patch Links --
>
> https://github.com/MariaDB/server/pull/246.patch
> https://github.com/MariaDB/server/pull/246.diff
Overall this looks clean and simple.
There is one problem. The patch adds a field real_network_read_len to the
NET structure. This will break the client library ABI, because NET is a part
of MYSQL. So this would cause client programs to crash if they are linked
with a different version of the client library. So this needs to be changed
(if I understand correctly).
One option might be to introduce new functions like cli_safe_read_reallen()
and my_net_read_packet_reallen(), which return in addition the actual amount
of bytes read from the server. The old cli_safe_read() and
my_net_read_packet() could then become simple wrapper functions around
those. And cli_safe_read_reallen() can be used in read_event() in
sql/slave.cc.
A smaller issue is that in case of a large packet, a large my_sleep() may be
invoked, which will cause STOP SLAVE to hang. I think this can be solved
simply by calling slave_sleep() instead, it handles terminating the wait
early if interrupted by STOP SLAVE.
Detailed comments on the patch below. I rebased the series against latest
10.2 to get a clean diff (the pull request includes a couple merges against
the main 10.2 tree, these changes are unrelated to the patch). The rebase is
in https://github.com/knielsen/server/tree/GCSAdmin-10.2-binlog-speed-limit-2
> diff --git a/include/mysql.h.pp b/include/mysql.h.pp
> index 857f5b9..1da038c 100644
> --- a/include/mysql.h.pp
> +++ b/include/mysql.h.pp
> @@ -35,6 +35,7 @@ typedef struct st_net {
> my_bool thread_specific_malloc;
> unsigned char compress;
> my_bool unused3;
> + unsigned long real_network_read_len;
As explained above, I believe this would break the ABI (that's the purpose
of mysql.h.pp, to catch such problems).
> diff --git a/sql/slave.cc b/sql/slave.cc
> index 20bf68e..52bb668 100644
> --- a/sql/slave.cc
> +++ b/sql/slave.cc
> @@ -3307,13 +3308,14 @@ static int request_dump(THD *thd, MYSQL* mysql, Master_info* mi,
> try a reconnect. We do not want to print anything to
> the error log in this case because this a anormal
> event in an idle server.
> + network_read_len get the real network read length in VIO, especially using compressed protocol
>
> RETURN VALUES
> 'packet_error' Error
> number Length of packet
> */
>
> -static ulong read_event(MYSQL* mysql, Master_info *mi, bool* suppress_warnings)
> +static ulong read_event(MYSQL* mysql, Master_info *mi, bool* suppress_warnings, ulong* network_read_len)
Generally, lines longer than 80 characters should be avoided (coding style).
> @@ -4473,6 +4479,34 @@ Stopping slave I/O thread due to out-of-memory error from master");
> goto err;
> }
>
> + /* Control the binlog read speed of master when read_binlog_speed_limit is non-zero
> + */
> + ulonglong read_binlog_speed_limit_in_bytes = opt_read_binlog_speed_limit * 1024;
> + if (read_binlog_speed_limit_in_bytes)
> + {
> + /* prevent the tokenamount become a large value,
> + for example, the IO thread doesn't work for a long time
> + */
> + if (tokenamount > read_binlog_speed_limit_in_bytes * 2)
> + {
> + lastchecktime = my_hrtime().val;
> + tokenamount = read_binlog_speed_limit_in_bytes * 2;
> + }
> +
> + do
> + {
> + ulonglong currenttime = my_hrtime().val;
> + tokenamount += (currenttime - lastchecktime) * read_binlog_speed_limit_in_bytes / (1000*1000);
> + lastchecktime = currenttime;
> + if(tokenamount < network_read_len)
> + {
> + ulonglong micro_sleeptime = 1000*1000 * (network_read_len - tokenamount) / read_binlog_speed_limit_in_bytes ;
> + my_sleep(micro_sleeptime > 1000 ? micro_sleeptime : 1000); // at least sleep 1000 micro second
> + }
> + }while(tokenamount < network_read_len);
> + tokenamount -= network_read_len;
> + }
> +
As explained above, probably better to use slave_sleep() here to allow STOP
SLAVE to interrupt a long sleep.
Would it make sense to do this wait after calling queue_event()? This way
the SQL thread can start applying the event immediately, reducing slave lag.
What do you think?
Thanks,
- Kristian.