maria-discuss team mailing list archive

Thread
Date
Re: gtid_slave_pos row count

To: "'Kristian Nielsen'" <knielsen@xxxxxxxxxxxxxxx>
From: "Reinis Rozitis" <r@xxxxxxx>
Date: Sat, 29 Sep 2018 11:45:29 +0300
Cc: maria-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87mus0507d.fsf@urd.knielsen-hq.org>
Thread-index: AQLkzQCkwfJ3BCJVhymY56S5s38M4gKfrdjgotD/snA=
> Do you have any errors in the error log about failure to delete rows?

Nope, no errors.


> Anything else special to your setup that might be causing this?

At some point I thought maybe the tokudb_analyze_in_background / tokudb_auto_analyze messes things up as it does the background check (you can also see here the row count growing):

2018-09-29 11:05:48 134488 [Note] TokuDB: Auto scheduling background analysis for ./mysql/gtid_slave_pos_TokuDB, delta_activity 423840 is greater than 40 percent of 1059601 rows. - succeeded.
2018-09-29 11:09:35 134490 [Note] TokuDB: Auto scheduling background analysis for ./mysql/gtid_slave_pos_TokuDB, delta_activity 424359 is greater than 40 percent of 1060885 rows. - succeeded.
2018-09-29 11:13:23 134488 [Note] TokuDB: Auto scheduling background analysis for ./mysql/gtid_slave_pos_TokuDB, delta_activity 424888 is greater than 40 percent of 1062196 rows. - succeeded.

(it triggers also in conservative mode but then it happens just because of a single row being >40% of the table)

I tried to switch off the gtid_pos_auto_engines to use a single gtid_pos InnoDB table and it makes no difference - in conservative mode everything is fine in optimistic the table fills up.



The odd thing is that I'm actually not using gtid for the replication:

MariaDB [mysql]> show slave status\G

                Slave_IO_State: Waiting for master to send event
                   Master_Host: 10.0.8.211
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.096519
           Read_Master_Log_Pos: 79697585
                Relay_Log_File: db-relay-bin.000142
                 Relay_Log_Pos: 78464847
         Relay_Master_Log_File: mysql-bin.096519
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 79697245
               Relay_Log_Space: 595992008
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
..
         Seconds_Behind_Master: 0
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 211
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: No
                   Gtid_IO_Pos:
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: optimistic
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
              Slave_DDL_Groups: 25
Slave_Non_Transactional_Groups: 284193
    Slave_Transactional_Groups: 452098720



The other "special" thing maybe is that the master is still 10.2.4 - but that shouldn’t be the problem?


I have 2 slaves (both 10.3.9 / might try to downgrade back to 10.2.x or previous versions of 10.3.x as I don't know the exact point when it started to happen) and the issue is triggered immediately when switching the parallel mode.


> Can you share the contents of the mysql.gtid_slave_pos table when this
> happens?

Sure, 

MariaDB [mysql]> select @@gtid_slave_pos;
+--------------------+
| @@gtid_slave_pos   |
+--------------------+
| 0-211-211038653075 |
+--------------------+
1 row in set (0.000 sec)


MariaDB [mysql]> select * from gtid_slave_pos limit 10;
+-----------+--------+-----------+--------------+
| domain_id | sub_id | server_id | seq_no       |
+-----------+--------+-----------+--------------+
|         0 |  29488 |       211 | 210594092751 |
|         0 |  29490 |       211 | 210594092753 |
|         0 |  29957 |       211 | 210594093220 |
|         0 |  29958 |       211 | 210594093221 |
|         0 |  29961 |       211 | 210594093224 |
|         0 |  29962 |       211 | 210594093225 |
|         0 |  30095 |       211 | 210594093358 |
|         0 |  30096 |       211 | 210594093359 |
|         0 |  30247 |       211 | 210594093510 |
|         0 |  30275 |       211 | 210594093538 |
+-----------+--------+-----------+--------------+
10 rows in set (0.000 sec)

MariaDB [mysql]> select count(*) from gtid_slave_pos;
+----------+
| count(*) |
+----------+
|  2395877 |
+----------+
1 row in set (0.578 sec)
 

MariaDB [mysql]> select * from gtid_slave_pos_TokuDB limit 10;;
+-----------+--------+-----------+--------------+
| domain_id | sub_id | server_id | seq_no       |
+-----------+--------+-----------+--------------+
|         0 |  29373 |       211 | 210594092636 |
|         0 |  29911 |       211 | 210594093174 |
|         0 |  29912 |       211 | 210594093175 |
|         0 |  30282 |       211 | 210594093545 |
|         0 |  30283 |       211 | 210594093546 |
|         0 |  30284 |       211 | 210594093547 |
|         0 |  30285 |       211 | 210594093548 |
|         0 |  30287 |       211 | 210594093550 |
|         0 |  30348 |       211 | 210594093611 |
|         0 |  30349 |       211 | 210594093612 |
+-----------+--------+-----------+--------------+
10 rows in set (0.001 sec)

MariaDB [mysql]> select count(*) from gtid_slave_pos_TokuDB;
+----------+
| count(*) |
+----------+
|   840001 |



> > Is there something wrong with the purger?
> > (something similar like in https://jira.mariadb.org/browse/MDEV-12147
> > ? )
> 
> That bug is rather different - the row count in the table is not growing, but
> number of unpurged rows is.

True, I just searched the Jira for similar (related to gtid_slave_pos) kind of issues and saw this still being open.



The optimistic mode makes a big difference in our setup as with the conservative there are times when the slaves start to lag several days behind.


rr
Follow ups

Re: gtid_slave_pos row count
From: Kristian Nielsen, 2018-09-29
References

gtid_slave_pos row count
From: Reinis Rozitis, 2018-09-28
Re: gtid_slave_pos row count
From: Kristian Nielsen, 2018-09-29