← Back to team overview

maria-developers team mailing list archive

Re: [GSoC] Accepted student ready to work : )

 

Hello everyone:
I'm answering to both your emails here (Elena first, then Sergei)


On Thu, May 22, 2014 at 4:12 PM, Elena Stepanova <elenst@xxxxxxxxxxxxxxxx>
 wrote:
>
> I suggest to stay with the terminology, for clarity.

You are right. I'll stick to MTR terminology.

But even on an ideal data set the mixed approach should still be most
> efficient, so it should be okay to use it even if some day we fix all the
> broken tests and collect reliable data.

Yes, I agree. Keeping the Mixed (Branch/Platform) approach.


>>     2. Include a new measure that increases relevancy: Time since last
>> run.
>>
>>     The relevancy index should have a component that makes the test more
>>     relevant the longer it spends not running
>>
>
> I agree with the idea, but have doubts about the criteria.
> I think you should measure not the time, but the number of test runs that
> happened since the last time the test was run (it would be even better if
> we could count the number of revisions, but that's probably not easy).
> The reason is that some branches are very active, while others can be
> extremely slow. So, with the same time-based coefficient the relevancy of a
> test can strike between two consequent test runs just because they happened
> with a month interval, but will be changing too slowly on a branch which
> has a dozen of commits a day.
>

Yes. I agree with you on this. This is what I had in mind, but I couldn't
express it properly on my email : )



>     3. Include also correlation. I still don't have a great idea of how
>>
>>     correlation will be considered, but it's something like this:
>>        1. The data contains the list of test_runs where each test_suite
>> has
>>
>>        failed. If two test suites have failed together a certain
>> percentage of
>>        times (>30%?), then when test A fails, the relevancy test of test
>> B also
>>        goes up... and when test A runs without failing, the relevancy
>> test of test
>>        B goes down too.
>>
>
> We'll need to see how it goes.
> In real life correlation of this kind does exist, but I'd say much more
> often related failures happen due to some environmental problems, so the
> presumed correlation will be fake.


Good point. Let's see how the numbers play out, but  I think you are right
that this will end up with a severe bias due to test blowups and failures
due to environmental problems.



>
> I think in any case we'll have to rely on the fact that your script will
> choose tests not from the whole universe of tests, but from an initial list
> that MTR produces for this particular test run. That is, it will go
> something like that:
> - test run is started in buildbot;
> - MTR collects test cases to run, according to the startup parameters, as
> it always does;
> - the list is passed to your script;
> - the script filters it according to the algorithm that you developed,
> keeps only a small portion of the initial list, and passes it back to MTR;
> - MTR runs the requested tests.
>
> That is, you do exclusion of tests rather than inclusion.
>
> This will solve two problems:
> - first test run: when a new test is added, only MTR knows about it,
> buildbot doesn't; so, when MTR passes to you a test that you know nothing
> about (and assuming that we do have a list of all executed tests in
> buildbot), you'll know it's a new test and will act accordingly;
> - abandoned tests: MTR just won't pass them to your script, so it won't
> take them into account.


Great. This is good to know, to have a more precise idea of how the project
would fit into the MariaDB development.

>
On Thu, May 22, 2014 at 5:39 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
>
> >    - *test_suite, test suite, test case* - When I say test suite or test
> >    case, I am referring to a single test file. For instance '
> >    *pbxt.group_min_max*'. They are the ones that fail, and whose failures
> >    we want to attempt to predict.
>
> may I suggest to distinguish between a test *suite* and a test *case*?
> the latter is usually a one test file, but a suite (for mtr) is a
> directory with many test files. Like, "main", "pbxt", etc.
>

Right. I didn't define this properly. Let's keep the definitions exactly
from MTR, as Elena suggested.

I don't think you should introduce artificial limitations that make the
>  recall worse, because they "look realistic".
>
> You can do it realistic instead, not look realistic - simply pretend
> that your code is already running on buildbot and limits the number of
> tests to run. So, if the test didn't run - you don't have any failure
> information about it.
>
> And then you only need to do what improves recall, nothing else :)
>
> (of course, to calculate the recall you need to use all failures,
> even for tests that you didn't run)


Yes, my code *already works this way*. It doesn't consider failure
information from tests that were not supposed to run.
The graphs that I sent are from scripts that ran like this.

Of course, the recall is just the number of spotted failures from the 100%
of known failures : )

Anyway, with all this, I will get to work on adapting the simulation a
little bit:

   - Time since last run will also affect the relevancy of a test
   - I will try to use the list of changed files from commits to make sure
   new tests start running right away

Any other comments are welcome.

Regards
Pablo

Follow ups

References