randgen team mailing list archive

Thread
Date

ExecutionTimeComparator is made more statistics-aware

To: randgen@xxxxxxxxxxxxxxxxxxx
From: "John H. Embretsen" <johnemb@xxxxxxxxx>
Date: Thu, 14 Apr 2011 16:16:32 +0200
User-agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.9.1.9) Gecko/20100318 Lightning/1.0b1 Thunderbird/3.0.4

Hi,

To those of you who may be using the ExecutionTimeComparator validatorin RQG testing:

I have just pushed a patch to the randgen repository which changes partsof how this validator works. The default case should be more or less asbefore, however, some of the tunable settings are slightly different,and more advanced statistical measurements are now possible.



To summarize:

The default is still that the validator compares the execution timesfrom each of two servers for each query that is generated, and reportsthose that have a difference (or ratio) above or below a given threshold.

Previous extensions (pushed 2011-01-21) enabled the tester to tell thevalidator to repeat each query a number of times and calculate averagenumbers, to e.g. reduce the chance of false positives. This was tunablevia the setting QUERY_REPEATS. This is now renamed/replaced by theMIN_SAMPLES and MAX_SAMPLES setting.


If MIN_SAMPLES and/or MAX_SAMPLES is 0, only the original results will
be used. If MIN_SAMPLES and MAX_SAMPLES are both larger than 0, the
query will be repeated MAX_SAMPLES times if Statistics::Descriptive
module is not available (read more below for details).


NEW: Statistics::Descriptive:
=============================

A big change with this patch is in the cases where the moduleStatistics::Descriptive is available in the Perl runtime.

In that case, if MIN_SAMPLES and MAX_SAMPLES is set (greater than 0),the query will be repeated at least MIN_SAMPLES times (or at least

twice if MIN_SAMPLES = 1) and at most MAX_SAMPLES times. The mean value

of these samples will be used as the result for each server. However, ifthe standard deviation of the samples for a query is above a giventhreshold (MAX_DEVIATION) after MAX_SAMPLES samples, the result isdiscarded.

The MAX_DEVIATION threshold is relative, and is given in terms of apercentage of the mean value. The higher the threshold is, the moreunstable results are accepted.

The idea behind the MIN/MAX samples approach is that the standarddeviation and the statistical significance of the result may improve ifwe have more samples.

The standard deviation is used as an indication of how widely dispersedthe measurements are. If the standard deviation is below the thresholdafter MIN_SAMPLES samples or more, the result is deemed stable enoughand returned for further validation (comparison).

Relative standard deviations for each notable query will be included inthe output file that is generated, if applicable.

If the --debug option is given to the RQG, more statistical details arewritten to the output when each query is validated (Warning: Output canbe huge if the number of queries is large).

I hope this will be of value, and further improvements are of coursewelcome.



--
John