← Back to team overview

randgen team mailing list archive

Re: A question about the RecoveryConsistency Validator



I will do my best to expand this area of documentation in the upcoming days, but in the meantime, here is some short info.

The Recovery Reporter crashes the server , attempts recovery and then issues CHECK|ANALYZE|OPTIMIZE|REPAIR TABLE against each table on the server in order to check for any corruption. It also issues SELECTs against each table by using various FORCE INDEX to cause the table and its indexes to be read in various ways.

If CHECK|ANAYLZE|OPTIMIZE|REPAIR report an error, or if the different SELECTs are not consistent with one another, a recovery failure is reported. Those methods work regardless of the structure of the tables or the actual data or pre-crash workload, however may not catch all issues. Imagine a storage engine for which all CHECK|ANALYZE|OPTIMIZE|REPAIR are missing and wired to return "Unsupported" and which deletes all of its data on recovery. All SELECTS will report a consistently empty table, so recovery will be reported as successfull.

To cover for this eventuality, the RecoveryConsistency Reporter uses a different mechanism. Upon recovery, it performs the following query:

SELECT (SUM(`int_key`)  + SUM(`int`)) / COUNT(*) FROM `$table`

and reports failure if this query , that is, the average of all values in the int_key and int columns is not 200.

This requires a grammar that performs various invariant transactions that move data around but maintain the average of the entire table at 200. If a crash happens and recovery does not consistently recover or roll back entire transactions, the average will be off and this will be reported. Such a grammar that maintains the invariant principle is transactions/transactions.yy combined with the respective ZZ file. One thing that can be improved is that the SELECT is issued numerous times using FORCE INDEX in order to make sure that the table remains consistent regardless of how the data from it is read.

The two Reporters validate the Durability of ACID to a large extent, protecting against data corruption and incompletely written or recovered transactions. One hole however that remains open is if the storage engine looses entire transactions -- in this case, the database remains consistent, so entire transactions can be lost undetected. The solution for this would be to record the progress of the test and the committed transactions in some separate storage and then make sure that the server has all the transactions that have been recorded in that separate storage.

Philip Stoev

----- Original Message ----- From: "Patrick Crews" <gleebix@xxxxxxxxx>
To: <Philip.Stoev@xxxxxxx>
Sent: Friday, May 21, 2010 1:53 AM
Subject: A question about the RecoveryConsistency Validator


Hi.  If you have some time to spare, could you describe the
RecoveryConsistency Validator ?  It isn't described here:

Can / should it work with --threads > 1 ? Do you have any recommended usage scenarios (ie is it like the Recovery Validator)?