maria-developers team mailing list archive

Thread
Date

Re: [GSoC] Optimize mysql-test-runs - Setback

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Elena Stepanova <elenst@xxxxxxxxxxxxxxxx>
Date: Mon, 16 Jun 2014 14:09:59 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20140616094649.GA26740@meddwl.fritz.box>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Hi Sergei,

On 16.06.2014 13:46, Sergei Golubchik wrote:


And if lets say we decide that N=100 (or N=10%) is the best cutoff
value, and then find out that by not filling the queue completely we
lose even 1% in recall,we might want to stay with the full queue. What
is the time difference between running 50 tests and 100 tests? Almost
nothing, especially comparing to what we spend on preparation of the
tests.


I don't think this project will determine what the "best" value is. It
can only find the "best set of model parameters" for recall as a
function of cutoff (here the "best set" means that for any other set of
parameters, recall - in the target recall range - will be less for any
given value of cutoff).

In other words, this project will deliver a function recall(cutoff).
And then we can decide what cutoff we want and how many failures we can
afford to miss. For example, for 80% recall one achieve in 5% of the
time, 90% recall - in 10% of the time, 95% recal - in 50% of the time,
etc.


Of course...

The example above with the numbers was merely to show that since ourprimary goal is to make cutoff small enough, an attempt to artificiallyreduce it further by not using the whole queue won't necessarily payoff. In terms of execution time, it's an essential difference whether torun 100 vs 2000 tests, or even 1000 vs 2000 tests; it's hardly criticalwhether to run 50 or 100 tests.

But to be sure, we need to see whether filling/not-filling the queuemakes such an obvious difference that only one approach should always beused, or whether the difference is marginal and we should make it a partof the "set of model parameters and choose its value later as a part ofour voluntary decision (and maybe adjust it later).

Currently we only have the "non-filling" strategy in place, so we can'tsee the difference, and it will be a pity if we miss this chance toimprove the recall at almost no cost.

/E

And then you can say, "no, we cannot miss more than 5% of failures, so
we'll have to live with 50% speedup only". But no experiments will tell
you how many failures are acceptable for us to miss.

Regards,
Sergei

References

[GSoC] Optimize mysql-test-runs - Setback
From: Pablo Estrada, 2014-06-12
Re: [GSoC] Optimize mysql-test-runs - Setback
From: Elena Stepanova, 2014-06-12
Re: [GSoC] Optimize mysql-test-runs - Setback
From: Pablo Estrada, 2014-06-13
Re: [GSoC] Optimize mysql-test-runs - Setback
From: Elena Stepanova, 2014-06-16
Re: [GSoC] Optimize mysql-test-runs - Setback
From: Sergei Golubchik, 2014-06-16
Re: [GSoC] Optimize mysql-test-runs - Setback
From: Elena Stepanova, 2014-06-16
Re: [GSoC] Optimize mysql-test-runs - Setback
From: Sergei Golubchik, 2014-06-16