← Back to team overview

randgen team mailing list archive

ExecutionTimeComparator is made more statistics-aware



To those of you who may be using the ExecutionTimeComparator validator in RQG testing:

I have just pushed a patch to the randgen repository which changes parts of how this validator works. The default case should be more or less as before, however, some of the tunable settings are slightly different, and more advanced statistical measurements are now possible.

To summarize:

The default is still that the validator compares the execution times from each of two servers for each query that is generated, and reports those that have a difference (or ratio) above or below a given threshold.

Previous extensions (pushed 2011-01-21) enabled the tester to tell the validator to repeat each query a number of times and calculate average numbers, to e.g. reduce the chance of false positives. This was tunable via the setting QUERY_REPEATS. This is now renamed/replaced by the MIN_SAMPLES and MAX_SAMPLES setting.

If MIN_SAMPLES and/or MAX_SAMPLES is 0, only the original results will
be used. If MIN_SAMPLES and MAX_SAMPLES are both larger than 0, the
query will be repeated MAX_SAMPLES times if Statistics::Descriptive
module is not available (read more below for details).

NEW: Statistics::Descriptive:

A big change with this patch is in the cases where the module Statistics::Descriptive is available in the Perl runtime.

In that case, if MIN_SAMPLES and MAX_SAMPLES is set (greater than 0), the query will be repeated at least MIN_SAMPLES times (or at least
twice if MIN_SAMPLES = 1) and at most MAX_SAMPLES times. The mean value
of these samples will be used as the result for each server. However, if the standard deviation of the samples for a query is above a given threshold (MAX_DEVIATION) after MAX_SAMPLES samples, the result is discarded.

The MAX_DEVIATION threshold is relative, and is given in terms of a percentage of the mean value. The higher the threshold is, the more unstable results are accepted.

The idea behind the MIN/MAX samples approach is that the standard deviation and the statistical significance of the result may improve if we have more samples.

The standard deviation is used as an indication of how widely dispersed the measurements are. If the standard deviation is below the threshold after MIN_SAMPLES samples or more, the result is deemed stable enough and returned for further validation (comparison).

Relative standard deviations for each notable query will be included in the output file that is generated, if applicable.

If the --debug option is given to the RQG, more statistical details are written to the output when each query is validated (Warning: Output can be huge if the number of queries is large).

I hope this will be of value, and further improvements are of course welcome.