maria-developers team mailing list archive

Thread
Date
答复: MDEV-520: consider parallel replication patch from taobao patches

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>, "maria-developers@xxxxxxxxxxxxxxxxxxx" <maria-developers@xxxxxxxxxxxxxxxxxxx>
From: 丁奇 <dingqi.lxb@xxxxxxxxxx>
Date: Thu, 4 Oct 2012 15:24:18 +0000
Accept-language: zh-CN, en-US
In-reply-to: <87vceryexp.fsf@frigg.knielsen-hq.org>
Thread-index: AQHNoVnxWJfBjZEP6ECy/sQ+GZT6VZepQ3aO
Thread-topic: MDEV-520: consider parallel replication patch from taobao patches
Hi, Kristian

Thank you very much for looking  into the patch and comments. You have got a deep understanding of the patch.

Let's discuss the issues one by one.
About Bugs：
1. There indeed is a possible invalid memory access in get_pk_value(). I have changed the definition of  st_hash_item::key  to char[1024+2*NAME_CHAR_LEN+2], and when building the hash key, if (item->key_len + pack_length >= 1024) break;
   This can guarantee that, even if the total length of the primary key is bigger than 1024, at least one key_part of the key can be recorded into the hash_key.(As the max length of one key_part is 1000 in  MySQL).
   As you mentioned in last mail,  this can lead to false positives  where potential parallelism is not exploited, but that is still safe and should be ok.


2. The problem of case insensitivity is a bug. I will modify it in next version. Simply we can test the definition in the table schema, and decide whethere change the string to lower case before adding to pk_hash.

3. There is lock protection in  dispatch_transaction(). It is pk_hash_lock in the function "check_pk_exist_key"

4. When deadlock occurs, the whole transaction needs to retry. In current implement, a whole transaction is packed into one "Query", so retrying a query is equal to one transaction.

5. We handle the events including Query_event, Table_map_event, Write_row_event, Update_row_event, Delete_row_event, Xid_event. For all other events, in current implement, I just run it locally (that is in Transfer), but not in the remote_slave. This will lose some information when master use statment-based-binlog. If the master use row-based-binlog, I have not found the serious affects in data consistency.
  As I told you before, I am changing the patch to make the patched mysqld as a "pure" slave. In the pure slave mode, all other events will be treated like a DDL statement.That means an User_var_event will wait for all the worker queues to be empty, and call ev->apply_event. Is this strategy suitable? Pleast point out if there are potential problems.

About some strategies
1. For simplify , I used some sleep(1000000) in the patch, a condition variable will be better, it will be modified in future, but not the highest priority.

2. Huge transactions may lead to the memory usage and cpu load, thank you for pointing it out, I never think of it before. I think we can deal with it in this way: If a transaction contains too many events, such as bigger than a centain number, we can treat it like a DDL statement. Because it will be executed after all the worker queues to be empty, there is no "order problem" here, so we do not need to construct the pk_hash. Please give me some suggestion for this issue.

3. There are N(default as 16) threads for parallel execution. But the No. 0 thread is special. It is used for the transactions whose affecting rows do not have any primary key or unique key, that means all modification on no-primary-rows will be execute serially.

About uses and plans
I tested it using some simple cases,  valgrind used during the cases. The important method for testing is using the productive data. Before a project use it, a necessary step is to build a slave(and a Transfer) in the test environment, data consistency will be checked after running for some days.
In my plan, the next modification is to change it to the "pure slave", that is to run every event using the ev->apply_event directly locally. After it is done, the test-case suit can be used for test.

I will ping you after I upload the new version of the patch.

Thanks
Xiaobin

________________________________________
发件人: Kristian Nielsen [knielsen@xxxxxxxxxxxxxxx]
发送时间: 2012年10月3日 19:26
到: 丁奇; maria-developers@xxxxxxxxxxxxxxxxxxx
Cc: Sergei Golubchik
主题: MDEV-520: consider parallel replication patch from taobao patches

Hi xiaobin,

I read through your patch found here:

    http://mysql.taobao.org/index.php/RLPR_for_MariaDB

I am still a bit confused about the status of this patch. It would be useful
if you could explain a bit about the status of the patch:

 - How have you tested the patch.

 - How are you currently using the patch? Or is it still just
   development/test?

 - What are your current plans for the patch? (You mentioned that you have
   some changes you are implementing at the moment).

Anyway, I think I have a fair understanding of the overall idea, which seems
very interesting! I will try to summarise here in a bit more detail, as well
as give a few general comments on the patch. Please correct me if I
misunderstood anything.

----

The basic idea is to build a set (implemented as a hash table) of every unique
key in every row-event in each transaction received from the master. If two
transactions have no overlapping keys, they can be applied in parallel
(foreign_key_checks is set to OFF to make this work).

There is a single thread that processes each event received in sequence. It
checks the key set of the new transaction against the key sets of all
transactions queued for each of N worker threads:

 - If there is no conflict with any already queued transaction, the new
   transaction can be queued to any worker thread (the one with fewest queued
   transaction is chosen).

 - If there is a conflict with just one thread, then the transaction can be
   queued for that thread - it will then execute in correct sequence with the
   conflicting transactions.

 - If there is a conflict with multiple threads, wait until the conflicting
   transactions have been processed. No new transactions are queued while
   waiting.

Then there are N (default 16) threads that apply the queued
transactions. Queued transactions are processsed in parallel within one
thread, but transaction in different queues are processed in parallel with
each other.

If a transaction is received which is not using row events (statement-based
replication or DDL), then we wait until all queued transactions are
processed. Only after that is the new transaction queued, so that such events
are processed correctly (but not in parallel, of course).

Likewise, if the worker threads get too far behind, we wait with queuing any
more transactions.

----

I think this is a very interesting approach, and I am happy that someone have
actually tried implementing it.

The biggest potential issue seems to be the performance of the thread that
checks the transactions for conflicts and queues them for the appropriate
thread:

 - Main issue will be huge transactions, if the size of hash tables of key
   values become too big for available memory. Maybe this could be solved by
   having some limit above which transactions are not tried in parallel.

 - There will be a lot of hash lookups of key values, all done in a single
   thread. This could become a limit I suppose - testing will be needed to
   understand this fully. But I think there will be many workloads where this
   will be useful.

Overall, it seems to me this technique is very promising. It seems to be
better at getting maximum parallelism on the slave than any of the other
approaches that have been considered.

As to the patch itself, it seems a bit immature in the present form, but I
understood you are still working on it. Let us discuss what the plans could be
for finishing it and get it ready for inclusion in MariaDB.

The main criteria is that it must be reasonably complete - it must not crash
or silently corrupt data in any usage. However, reasonable restrictions for
what is supported is fine. (For example, we could say that LOAD DATA is not
supported and give an error, but not crash or silently ignore such
statement). But we can discuss this in more details.

----

Finally, a couple general comments on the code:

1. get_pk_value() does not seem to handle collations, eg. case insensitivity?
   It needs to understand that eg. "abab" and "ABAB" can conflict. I think
   this should be solved by copying in values with non-binary collations using
   the appropriate strxfrm() variant.

2. remote_query() retries the query in case of deadlock - but that seems
   wrrong, doesn't it need to retry the whole _transaction_, not just the
   query?

3. What about other events like User_var_event ? Will they be included in a
   later patch?

4. There is a my_sleep(10000) in the code when a transaction conflicts with
   two or more workers. It would be better to replace this with a condition
   variable so we can wake up as soon as the conflict is resolved. Same for
   some other sleeps like in wait_for_all_dml_done().

5. Shouldn't there be some locking in dispatch_transaction()? To protect the
   hash lookups when checking conflicts against worker threads doing
   pk_hash_minus() ?

6. get_pk_value(). There seems to be a buffer overflow - st_hash_item::key is
   fixed size, but there seems no check against overrunning it. One solution
   is to just truncate if data does not fit - this can lead to false positives
   where potential parallelism is not exploited, but that is still safe and
   should be ok.

Thanks!

 - Kristian.

________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人，请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
Follow ups

Re: 答复: MDEV-520: consider parallel replication patch from taobao patches
From: Kristian Nielsen, 2012-10-12
答复: MDEV-520: consider parallel replication patch from taobao patches
From: 丁奇, 2012-10-08
References

MDEV-520: consider parallel replication patch from taobao patches
From: Kristian Nielsen, 2012-10-03