maria-developers team mailing list archive

Thread
Date

Re: [GSoC] Optimize mysql-test-runs - Results of new strategy

To: Pablo Estrada <polecito.em@xxxxxxxxx>
From: Elena Stepanova <elenst@xxxxxxxxxxxxxxxx>
Date: Wed, 06 Aug 2014 15:04:48 +0400
Cc: maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CABDWuang8RaivLjeN=myX-WJFNVJfUQNZ6TWaFgVTje7a7a8Kg@mail.gmail.com>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

(sorry, forgot the list in my reply, resending)

Hi Pablo,


On 03.08.2014 17:51, Pablo Estrada wrote:
> Hi Elena,
>
>

>> One thing that I want to see there is fully developed platform mode.I see>> that mode option is still there, so it should not be difficult. Iactually

>> did it myself while experimenting, but since I only made hasty and crude
>> changes, I don't expect them to be useful.
>>
>
> I'm not sure what code you are referring to. Can you be more specific on

> what seems to be missing? I might have missed something whenmigrating from

> the previous architecture...

I was mainly referring to the learning stage. Currently, the learningstage is "global". You go through X test runs, collect data, distributeit between platform-specific queues, and from X+1 test run you startpredicting based on whatever platform-specific data you have at the moment.

But this is bound to cause rather sporadic quality of prediction,because it could happen that out of 3000 learning runs, 1000 belongs toplatform A, while platform B only had 100, and platform C was introducedlater, after your learning cycle. So, for platform B the statisticaldata will be very limited, and for platform C there will be none -- youwill simply start randomizing tests from the very beginning (or usingdata from other platforms as you suggest below, which is still not quitethe same as pure platform-specific approach).

It seems more reasonable, if the platform-specific mode is used, to dolearning per platform too. It is not just about current investigationactivity, but about the real-life implementation too.

Lets suppose tomorrow we start collecting the data and calculating themetrics.Some platforms will run more often than others, so lets say in 2 weeksyou will have X test runs on these platforms so you can start predictingfor them; while other platforms will run less frequently, and it willtake 1 month to collect the same amount of data.And 2 months later there will be Ubuntu Utopic Unicorn which will haveno statistical data at all, and it will be cruel to jump into predictingthere right away, without any statistical data at all.

It sounds more complicated than it is, in fact pretty much all you needto add to your algorithm is making 'count' in your run_simulation a dictrather than a constant.

So, I imagine that when you store your metrics after a test run, youwill also store a number of test runs per platform, and only startpredicting for this particular platform when the count for it reachesthe configured number.


>
> Of the code that's definitely not there, there are a couple things that
> could be added:

> 1. When we calculate the relevance of a test on a given platform, wemight

> want to set the relevance to 0, or we might want to derive a default
> relevance from other platforms (An average, the 'standard', etc...).
> Currently, it's just set to 0.

I think you could combine this idea with what was described above. Whileit makes sense to run *some* full learning cycles on a new platform, itdoes not have to be thousands, especially since some non-LTS platformscome and go awfully fast. So, we run these no-too-many cycles, get cleanplatform-specific data, and if necessary enrich it with the otherplatforms' data.

> 2. We might also, just in case, want to keep the 'standard' queue forwhen

> we don't have the data for this platform (related to the previous point).

If we do what's described above, we should always have data for theplatform.But if you mean calculating and storing the standard metrics, then yes-- since we are going to store the values rather than re-calculate themevery time, there is no reason to be greedy about it. It might even makesense to calculate both metrics that you developed, too. Who knows maybeone day we'll find out that the other one gives us better results.


>
>
>> It doesn't matter in which order they fail/finish; the problem is, when
>> builder2 starts, it doesn't have information about builder1 results, and

>> builder3 doesn't know anything about the first two. So, the metricfor test

>> X could not be increased yet.
>>
>> But in your current calculation, it is. So, naturally, if we happen to
>> catch the failure on builder1, the metric raises dramatically, and the
>> failure will be definitely caught on builders 2 and 3.
>>
>> It is especially important now, when you use incoming lists, and the

>> running sets might be not identical for builders 1-3 even instandard mode.

>>
>
> Right, I see your point. Although if test_run 1 would catch the error,
> test_run 2, although it would be using the same data. might not catch the
> same errors if the running set makes it such that they are pushed out due

> to lower relevance. The effect might not be too big, but itdefinitely has

> potential to affect the results.
>
> Over-pessimistic part:
>>
>> It is similar to the previous one, but look at the same problem from a
>> different angle. Suppose the push broke test X, and the test started

>> failing on all builders (platforms). So, you have 20 failures, oneper test>> run, for the same push. Now, suppose you caught it on one platformbut not>> on others. Your statistics will still show 19 failures missed vs 1failure

>> caught, and recall will be dreadful (~0.05). But in fact, the goal is
>> achieved: the failure has been caught for this push. It doesn't really

>> matter whether you catch it 1 time or 20 times. So, recall hereshould be 1.

>>
>> It should mainly affect per-platform approach, but probably the standard
>> one can also suffer if running sets are not identical for all builders.
>>
>

> Right. It seems that solving these two issues is non-trivial (thetest_run

> table does not contain duration of the test_run, or anything). But we can
> keep in mind these issues.

Right. At this point it doesn't even make sense to solve hem -- inreal-life application, the first one will be gone naturally, justbecause there will be no data from unfinished test runs.

The second one only affects recall calculation, in other words --evaluation of the algorithm. It is interesting from theoretical point ofview, but not critical for real-life application.



> I fixed up the repositories with updated versions of the queries, as well
> as instructions in the README on how to generate them.
>
> Now I am looking a bit at the buildbot code, just to try to suggest some
> design ideas for adding the statistician and the pythia into the MTR
> related classes.

As you know, we have the soft pencil-down in a few days, and the hardone a week later. At this point, there isn't much reason to keepfrantically improving the algorithm (which is never perfect), so you areright not planning on it.


In the remaining time I suggest to

- address the points above;

- make sure that everything that should be configurable is configurable(algorithm, mode, learning set, db connection details);- create structures to store the metrics and reading to/writing from thedatabase;- make sure the predicting and the calculating part can be calledseparately;

- update documentation, clean up logging and code in general.

As long as we have these two parts easily callable, we will find a placein buildbot/MTR to put them to, so don't waste too much time on it.


Regards,
Elena


>
> Regards
> Pablo
>

Follow ups

Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-08-08

References

[GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-06-27
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-21
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-07-21
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-23
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-07-23
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-24
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-24
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-07-24
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-24
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-26
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-27
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-07-27
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-27
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-07-30
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-08-03
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-08-05