← Back to team overview

linux-traipu team mailing list archive

[Bug 941176] Re: slave error doesn't change sys_replication.applier_state

 

** Description changed:

  sys_replication.applier_state is not always correct; it can show RUNNING
  even though a slave error has broken replication.
  
  To reproduce:
  
  1. Start a master and create a schema called "crash".
  
  2. Start a slave with max-commit-id set to the tx id of that even on the
  master, so the slave does *not* create the crash schema.
  
  3. DROP SCHEMA crash; on the master.
  
  Replication on the slave will break.  Its error log shows:
  
  (SQLSTATE 00000) Can't drop schema 'crash'; schema doesn't exist
  Failure while executing:
  COMMIT
  DROP SCHEMA `crash`
  UPDATE `sys_replication`.`applier_state` SET `last_applied_commit_id` = 12, `originating_server_uuid` = '9908C6AA-A982-4763-B9BA-4EF5F933D219' , `originating_commit_id` = 12 WHERE `master_id` = 1
  
  But sys_replication.applier_state implies that replication is ok:
  
  drizzle> select * from sys_replication.applier_state\G
  *************************** 1. row ***************************
-               master_id: 1
-  last_applied_commit_id: 12
+               master_id: 1
+  last_applied_commit_id: 12
  originating_server_uuid: 9908C6AA-A982-4763-B9BA-4EF5F933D219
-   originating_commit_id: 12
-                  status: RUNNING
-               error_msg: 
+   originating_commit_id: 12
+                  status: RUNNING
+               error_msg:
  
  More than ok, it implies that it actually applied tx id 12, the one that
  caused the error.  This tx is still in the queue:
  
- 
  drizzle> select * from sys_replication.queue\G
  *************************** 1. row ***************************
-                  trx_id: 925
-                  seg_id: 1
-            commit_order: 12
+                  trx_id: 925
+                  seg_id: 1
+            commit_order: 12
  originating_server_uuid: 9908C6AA-A982-4763-B9BA-4EF5F933D219
-   originating_commit_id: 12
-                     msg: transaction_context {
-   server_id: 1
-   transaction_id: 925
-   start_timestamp: 1330211976689868
-   end_timestamp: 1330211976689874
+   originating_commit_id: 12
+                     msg: transaction_context {
+   server_id: 1
+   transaction_id: 925
+   start_timestamp: 1330211976689868
+   end_timestamp: 1330211976689874
  }
  statement {
-   type: DROP_SCHEMA
-   start_timestamp: 1330211976689872
-   end_timestamp: 1330211976689873
-   drop_schema_statement {
-     schema_name: "crash"
-   }
+   type: DROP_SCHEMA
+   start_timestamp: 1330211976689872
+   end_timestamp: 1330211976689873
+   drop_schema_statement {
+     schema_name: "crash"
+   }
  }
  segment_id: 1
  end_segment: true
  
-               master_id: 1
+               master_id: 1
  
  Suggested fix: when the slave encounters an error, update
  sys_replication.applier_state.  I have seen
  sys_replication.applier_state be updated on an error, but in this case
  it doesn't work.  Perhaps it detects some errors but not others?
+ 
+ Workaround: delete the offending transactions from sys_replication.queue
+ and restart the slave.

-- 
You received this bug notification because you are a member of UBUNTU -
AL - BR, which is subscribed to Drizzle.
https://bugs.launchpad.net/bugs/941176

Title:
  slave error doesn't change sys_replication.applier_state

Status in A Lightweight SQL Database for Cloud Infrastructure and Web Applications:
  Confirmed

Bug description:
  sys_replication.applier_state is not always correct; it can show
  RUNNING even though a slave error has broken replication.

  To reproduce:

  1. Start a master and create a schema called "crash".

  2. Start a slave with max-commit-id set to the tx id of that even on
  the master, so the slave does *not* create the crash schema.

  3. DROP SCHEMA crash; on the master.

  Replication on the slave will break.  Its error log shows:

  (SQLSTATE 00000) Can't drop schema 'crash'; schema doesn't exist
  Failure while executing:
  COMMIT
  DROP SCHEMA `crash`
  UPDATE `sys_replication`.`applier_state` SET `last_applied_commit_id` = 12, `originating_server_uuid` = '9908C6AA-A982-4763-B9BA-4EF5F933D219' , `originating_commit_id` = 12 WHERE `master_id` = 1

  But sys_replication.applier_state implies that replication is ok:

  drizzle> select * from sys_replication.applier_state\G
  *************************** 1. row ***************************
                master_id: 1
   last_applied_commit_id: 12
  originating_server_uuid: 9908C6AA-A982-4763-B9BA-4EF5F933D219
    originating_commit_id: 12
                   status: RUNNING
                error_msg:

  More than ok, it implies that it actually applied tx id 12, the one
  that caused the error.  This tx is still in the queue:

  drizzle> select * from sys_replication.queue\G
  *************************** 1. row ***************************
                   trx_id: 925
                   seg_id: 1
             commit_order: 12
  originating_server_uuid: 9908C6AA-A982-4763-B9BA-4EF5F933D219
    originating_commit_id: 12
                      msg: transaction_context {
    server_id: 1
    transaction_id: 925
    start_timestamp: 1330211976689868
    end_timestamp: 1330211976689874
  }
  statement {
    type: DROP_SCHEMA
    start_timestamp: 1330211976689872
    end_timestamp: 1330211976689873
    drop_schema_statement {
      schema_name: "crash"
    }
  }
  segment_id: 1
  end_segment: true

                master_id: 1

  Suggested fix: when the slave encounters an error, update
  sys_replication.applier_state.  I have seen
  sys_replication.applier_state be updated on an error, but in this case
  it doesn't work.  Perhaps it detects some errors but not others?

  Workaround: delete the offending transactions from
  sys_replication.queue and restart the slave.

To manage notifications about this bug go to:
https://bugs.launchpad.net/drizzle/+bug/941176/+subscriptions


References