maria-developers team mailing list archive

Thread
Date

答复: 答复: MDEV-520: consider parallel replication patch from taobao patches

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: 丁奇 <dingqi.lxb@xxxxxxxxxx>
Date: Fri, 12 Oct 2012 13:14:03 +0000
Accept-language: zh-CN, en-US
Cc: "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <874nm03pam.fsf@frigg.knielsen-hq.org>
Thread-index: AQHNqGxR4O2ESru0ukCsjiG8sC2DKpe1lZrI
Thread-topic: [Maria-developers] 答复: MDEV-520: consider parallel replication patch from taobao patches

Hi, Kristian

  (I was writting to you when received you mail, it's good news that you get well from ill)

  After I run the mysql-test-run, I find that when consider into other engines like myisam, other binlog formats (mixed, statement) and more complex scence (temporary table, load data statements), I found it is necessary to re-structure code and deal the details more carefully.

  So  I have changed the structure of the patch these two days.  The main concept is not changed.
  Changes are:
   1) fix the case insensitivity  problem
   2) fix the invalid memory access when the key is not long enough.
   3) Change the strategy in mixed cases:
       A transaction is buffer as a whole first, and decide the way of applying based different case -- If  a transaction contains one or more statement-format queries, it is treated like an DDL.
     This may occur when the master's binlog format is mixed.

   My aim is that, if the master fulfills the conditions that we think we can use multiple-thread to increase performce, we improve it . If not, we just confirm the correctness.

 Please fetch the new version in the following URL.

  http://mysql.taobao.org/index.php/RLPR_for_MariaDB#Source_code

  If consider the general cases, there maybe many bugs in the patch.
  There are some errors in the result of ./mtr,  I am dealing with them one by one, but some of them are some complex that I need your help (Such as rpl.rpl_deadlock_innodb).

You mentioned the GTIDs in 5.6. I am so agree with you.

 I used a inner queue to record the order of completed transacton. The reason is that in Version 5.5 and older versions, log_file_name and log_file_pos can not show the adjoining relation.
  If the GTIDs (or a similar mechanism from MariaDB)  introduced,  the logic of getting the "slowest" worker position can become more simple.

Best Regards,
Xiaobin

 _____________________
___________________
发件人: Kristian Nielsen [knielsen@xxxxxxxxxxxxxxx]
发送时间: 2012年10月12日 19:25
到: 丁奇
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
主题: Re: [Maria-developers] 答复: MDEV-520: consider parallel replication patch from taobao patches

Hi 丁奇,

Thanks for your answers, you have a good understanding of the potential issues
and how to solve them.

(I've replied to individual items below, but I mostly agree with your
answers).

I have thought a bit more about the overall idea, and I quite like it. In a
way, this is the natural way to do parallel slave: analyse the changes for
conflicts, and run in parallel any that do not conflict. So I think it is
great that you went and actually tried this and got some real code for it.

I will mention a couple more challenges that will need to be overcome. But
I hope you will first try to complete your plans as you explained in your
mail. This will allow us to see if it works in practise (I think it will), and
then we can work together to handle the possible challenges (which I am sure
can be overcome).

The challenges are mainly to get this working with other replication features
that are already in MariaDB 10.0 or are planned.

----

The first challenge is multi-source replication, which is now in 10.0. This is
the patch by Lixun Peng, which you may already know.

With multi-source replication, we already have multiple threads, one SQL
thread per master connection. In your parallel replication patch, we have also
multiple threads.

So now we could have 16 threads (parallel replication) for each master
connection (multi-source). Then the thread handling starts to be a bit
complex. Then there are a couple of other ideas for parallel replication that
we might want to do later (eg. that work for statement-based binlog or for
DDL), they will require other threads and even more complex thread handling.

I think the way to solve this is to start with your plan, where you just have
16 (by default) threads. And then later we can extend this so that we have a
general pool of replication threads, and some general mechanism of
distributing work to the appropriate thread. (I would be happy to help getting
this working).

Then eventually we will have just one pool of threads which are shared between
multi-source replication, your parallel replication, and whatever else might
be implemented in the future.

----

The second challenge is the commit order, and global transaction ID.

With your patch, transactions on the slave can be committed in a different
order than on the master (because they are run in parallel). This means that
the order in the binlog on the slave (slave-bin.XXXXXX) will be different from
on the master (master-bin.XXXXXX).

This makes it harder if the old master is removed, and one of the slaves
should become the new master. Because the different slaves will have
transactions in different order, and it will be hard to know which
transactions from a new master to apply and which have already been applied.

The MySQL 5.6 implementation of global transaction ID has a way to handle
this, but it is very complex and has some problems. We plan to do another
design (MDEV-26, https://mariadb.atlassian.net/browse/MDEV-26) which requires
that transactions are committed in the same order on the slave as they are on
the master.

Besides, if transactions are committed in different order on the slave, then
some applications may have problems if they require that SELECT sees
transactions in the same order on all slaves (but other applications will not
have problems with this, it depends on the application).

I think this can be solved by implementing a configuration option that enables
or disables that commits in parallel replication happens in the same order as
on the master. If enabled, then each thread will apply the transaction in
parallel as normal, but wait (on a condition variable) for the previous
transaction to commit before committing itself. But if disabled, then we do as
your patch now. Then the user can choose whether to keep the order, which is
safe for all applications and works with global transaction ID, but will be
somewhat less parallel. Or if the user wants to do everything at full
parallelism, but then get commits in different order on the slave.

----

丁奇 <dingqi.lxb@xxxxxxxxxx> writes:

> 1. There indeed is a possible invalid memory access in get_pk_value(). I have changed the definition of  st_hash_item::key  to char[1024+2*NAME_CHAR_LEN+2], and when building the hash key, if (item->key_len + pack_length >= 1024) break;
>    This can guarantee that, even if the total length of the primary key is bigger than 1024, at least one key_part of the key can be recorded into the hash_key.(As the max length of one key_part is 1000 in  MySQL).

Ok, sounds good.

> 2. The problem of case insensitivity is a bug. I will modify it in next version. Simply we can test the definition in the table schema, and decide whethere change the string to lower case before adding to pk_hash.

Yes, agree.

> 4. When deadlock occurs, the whole transaction needs to retry. In current implement, a whole transaction is packed into one "Query", so retrying a query is equal to one transaction.

Ah, yes you are right, missed that. Thanks for the explanation.

>   As I told you before, I am changing the patch to make the patched mysqld as a "pure" slave. In the pure slave mode, all other events will be treated like a DDL statement.That means an User_var_event will wait for all the worker queues to be empty, and call ev->apply_event. Is this strategy suitable? Pleast point out if there are potential problems.

Yes, I think this will work fine. There are lots of details with other events,
but they can be handled like DDL. Then most things should just work.

There will probably be some details and corner cases to work out, but I think
it should be manageable, we can deal with it at the appropriate time.

> About some strategies
> 1. For simplify , I used some sleep(1000000) in the patch, a condition variable will be better, it will be modified in future, but not the highest priority.

Yes, agree. It is a good idea to start with a simple thread scheduling
strategy, if that works then we can improve things later.

> 2. Huge transactions may lead to the memory usage and cpu load, thank you for pointing it out, I never think of it before. I think we can deal with it in this way: If a transaction contains too many events, such as bigger than a centain number, we can treat it like a DDL statement. Because it will be executed after all the worker queues to be empty, there is no "order problem" here, so we do not need to construct the pk_hash. Please give me some suggestion for this issue.

Yes, I agree. If transaction is too big, we can fall back to doing it serially.

> My plan next as:
>    1 Change the  case insensitivity  bug that you have mentioned before.
>    2 Run mysql-test suits and pass all  the  testes.

Sounds good! Looking forward to seeing the result.

 - Kristian.

________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人，请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。

Follow ups

Re: 答复: 答复: MDEV-520: consider parallel replication patch from taobao patches
From: Kristian Nielsen, 2012-10-15

References

MDEV-520: consider parallel replication patch from taobao patches
From: Kristian Nielsen, 2012-10-03
答复: MDEV-520: consider parallel replication patch from taobao patches
From: 丁奇, 2012-10-04
Re: 答复: MDEV-520: consider parallel replication patch from taobao patches
From: Kristian Nielsen, 2012-10-12