← Back to team overview

maria-developers team mailing list archive

Parallel replication slave stops with duplicate key errors

 

Hi Kristian,

I was trying out parallel replication and got an error that I am not sure what
to do with.

== Setup ==
I started a master with binlog_format=row,
I ran sysbench there to load the dataset ("prepare" mode, about 10M rows IIRC)
then I restarted the master and ran sysbench in "run" mode a few times.
Then I restarted the master again an ran a "CREATE TABLE" statement manually.
Just so that I have third binlog.

The master state is:
mysql> show binary logs;
+-------------+-----------+
| Log_name    | File_size |
+-------------+-----------+
| pslp.000001 | 194918772 |
| pslp.000002 |  28895350 |
| pslp.000003 |       589 |
+-------------+-----------+

== Non-parallel test run ==
Then I connected with a slave, with all default settings. The slave had empty
datadir, so I made it a slave with this command:

CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3310, MASTER_USER='root';

All server settings were pretty much default. It took a slave one hour to catch
up, but it did so, and encountered no errors.

Ok, this was serial replication

== Parallel test run == 
Then I killed the slave, 
deleted its data directory, put an empty directory instead
I added this to slave.cnf file:

slave_parallel_threads=4
slave_parallel_mode=aggressive

and started the slave.
It ran in parallel for a few minutes, I saw this:

mysql> show processlist;
+----+-------------+-----------+------+---------+------+-----------------------------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                                         | Info             | Progress |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+------------------+----------+
|  3 | root        | localhost | test | Query   |    0 | init                                          | show processlist |    0.000 |
|  4 | system user |           | NULL | Connect |   15 | Waiting for master to send event              | NULL             |    0.000 |
|  5 | system user |           | NULL | Connect |    0 | Write_rows_log_event::write_row(-1)           | NULL             |    0.000 |
|  6 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread              | NULL             |    0.000 |
|  7 | system user |           | NULL | Connect |    0 | Write_rows_log_event::write_row(-1)           | NULL             |    0.000 |
|  8 | system user |           | NULL | Connect |    0 | Waiting for work from SQL thread              | NULL             |    0.000 |
|  9 | system user |           | NULL | Connect |   14 | Waiting for room in worker thread event queue | NULL             |    0.000 |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+------------------+----------+
7 rows in set (0.00 sec)

but then it stopped. The error log has this:

2016-06-15 15:09:58 140702586379008 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './psergey-desktop-relay-bin.000001' position: 4
2016-06-15 15:09:59 140702586985216 [Note] InnoDB: Online DDL : Start
2016-06-15 15:09:59 140702586985216 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
2016-06-15 15:09:59 140702586985216 [Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
2016-06-15 15:09:59 140702586985216 [Note] InnoDB: Online DDL : Start applying row log
2016-06-15 15:09:59 140702586985216 [Note] InnoDB: Online DDL : End of applying row log
2016-06-15 15:09:59 140702586985216 [Note] InnoDB: Online DDL : Completed
2016-06-15 15:11:18 140702586985216 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-1021' for key 'PRIMARY', Gtid 0-1-1021, Internal MariaDB error code: 1942
2016-06-15 15:11:18 140702586985216 [ERROR] Slave (additional info): Duplicate entry '0-1021' for key 'PRIMARY' Error_code: 1062
2016-06-15 15:11:18 140702586985216 [Warning] Slave: Duplicate entry '0-1021' for key 'PRIMARY' Error_code: 1062
2016-06-15 15:11:18 140702586985216 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'pslp.000002' position 8266
2016-06-15 15:11:18 140702586682112 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702586682112 [Warning] Slave: Duplicate entry '821827' for key 'PRIMARY' Error_code: 1062
2016-06-15 15:11:18 140702586682112 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
2016-06-15 15:11:18 140702586682112 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702586682112 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'pslp.000002' position 8266
2016-06-15 15:11:18 140702602623744 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702602623744 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702602623744 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'pslp.000002' position 8266
2016-06-15 15:11:18 140702602926848 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702602926848 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702602926848 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'pslp.000002' position 8266
2016-06-15 15:11:18 140702586985216 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702586985216 [Warning] Slave: Duplicate entry '0-1021' for key 'PRIMARY' Error_code: 1062
2016-06-15 15:11:18 140702586985216 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702586985216 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'pslp.000002' position 8266
2016-06-15 15:11:18 140702586682112 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:18 140702586682112 [Warning] Slave: Duplicate entry '821827' for key 'PRIMARY' Error_code: 1062

... and there were lots of these errors. Eventually it stopped with:


2016-06-15 15:11:22 140702586682112 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:22 140702586682112 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:22 140702586682112 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-06-15 15:11:22 140702586682112 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'pslp.000002' position 8266
2016-06-15 15:11:22 140702586379008 [Note] Slave SQL thread exiting, replication stopped in log 'pslp.000002' at position 8266

The status is:


mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: root
                  Master_Port: 3310
                Connect_Retry: 60
              Master_Log_File: pslp.000003
          Read_Master_Log_Pos: 589
               Relay_Log_File: psergey-desktop-relay-bin.000003
                Relay_Log_Pos: 8549
        Relay_Master_Log_File: pslp.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1942
                   Last_Error: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-1021' for key 'PRIMARY'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 8266
              Relay_Log_Space: 223815659
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1942
               Last_SQL_Error: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-1021' for key 'PRIMARY'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: aggressive


Can you advise, what does this error mean? 


(is there any benefit in cc'ing this to maria-developers@ ? if yes, feel free
to..)

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog




Follow ups