← Back to team overview

maria-discuss team mailing list archive

Scaling up cluster results in missing rows on some existing nodes


Apologies for the cross-post; we submitted this question to the Codership
forum a week ago and haven't received a response.


Hoping to get some perspective on an issue we saw yesterday. While scaling
up from a single node cluster, we inserted into a table on node 0 and
selected the row count for that table on all subsequent nodes (scripts
below). For each node that was brought online, we observed that connections
to node 0 were suspended during SST, and that there was an additional IST.
At this point the nodes reported that they were synced. However, sometimes
replication did not occur correctly - the additional nodes reported fewer
rows than node 0. Each new node was added by itself, and all other nodes
restarted during each scale-up. Nodes that became out-of-sync continued to
have an incorrect row count even after restart, and sometimes got further
out of sync (i.e. the discrepancy between the number of rows in node 0 and
the other nodes increased)

We were conducting this experiment to see whether* innodb_disallow_writes *was
functioning correctly, but aren't sure whether this issue is related. Has
anyone seen similar behavior while scaling up?


mysql -u$db_user -p$db_password -e"create database if not exists $db_name;"
mysql -u$db_user -p$db_password -e"use $db_name; create table if not exists
$table_name (val int);"

for i in `seq $start_val $end_val`; do
  echo "inserting $i"
  mysql -u$db_user -p$db_password -e"use $db_name; insert into $table_name
VALUES ($i); select count(*) from $table_name;"
  sleep 0.1



for i in `seq $start_val $end_val`; do
  echo "statement $i"
  mysql -u$db_user -p$db_password -e"use replication_test; select count(*)
from vals;"
  sleep 0.1

Thank you,

Cloud Foundry Services, Pivotal