← Back to team overview

maria-developers team mailing list archive

Re: [GSoC] Optimize mysql-test-runs - Results of new strategy

 

Hello Elena,
It took me a while to figure out how the files and the test_run s
correspond to each other, and there might still be some hard-to-solve
inconsistencies with them: There were a few cases where it is not easy to
determine -automatically- which file corresponds to which test_run (some
cases where there are more platform+build test_runs than files)... but
excluding those cases, yes, there are about 28k files that can be matched
to test_runs appropriately.

The distribution of these is quite random. They start matching around
test_run #10,000 and then, they go on matching sometimes and sometimes not.

What I'm doing, is the following:

   1. If there is a file that matches this test_run: Parse the file, and
   return the tests in the file as the input list. I am not considering
   'skipped' tests, because it seems that they are skipped because they can't
   be run.
   2. If there is no file matching test_run: Consider ALL known tests as
   being in the input list.

I would like to get some of your feedback on a couple of things:

   - I would still like to define some structure for the interfaces -even
   if a bit loose.
   - You mentioned earlier that rather than a specific running_set, you
   wanted to use a percentage. We can work like this.
   - Do you have any feedback on points 1 and 2 regarding the handling of
   the input test lists?

And one more thing:

   - I have not incorporated test variant into the data, but I'll spend
   some time thinking about how to do this.

That's it for now.
Thanks

Pablo


On Wed, Jul 16, 2014 at 1:10 AM, Pablo Estrada <polecito.em@xxxxxxxxx>
wrote:

> Hi Elena,
> A small progress report: I was able to quickly make the changes related to
> selecting code changes to measure correlations with test failures. Recall
> is still around 80% with running set of 300 and short prediction stages. I
> can focus now on the input file list, since I believe this will make
> results more realistic, and (I expect)  help push recall a further up.
>
> Can you please upload the report files from MTR, so that I can include the
> logic of an input test list?
>
> Also, since I am going to incorporate this logic, it might be good to
> define (even if just roughly) the "core module" and the "wrapper module"
> that you had mentioned earlier, rather than just incorporating the list,
> and making the code that I have now even more bloated with mixed up
> functionalities. What do you think?
>
> Regards
> Pablo
>
>
> On Tue, Jul 15, 2014 at 2:18 PM, Pablo Estrada <polecito.em@xxxxxxxxx>
> wrote:
>
>> Hello Elena,
>>
>> Can you give a raw estimation of a ratio of failures missed due to being
>>> low in the priority queue vs those that were not in the queue at all?
>>>
>>
>> I sent this information in a previous email, here:
>> https://lists.launchpad.net/maria-developers/msg07482.html
>>
>> Also, once again, I would like you to start using an incoming test list
>>> as an initial point of your test set generation. It must be done sooner or
>>> later, I already explained earlier why; and while it's not difficult to
>>> implement even after the end of your project, it might affect the result
>>> considerably, so we need to know if it makes it better or worse, and adjust
>>> the algorithm accordingly.
>>>
>>> You are right. I understand that this information is not fully available
>> for all the test_runs, so can you upload the information going back as much
>> as possible? I can parse these files and adjust the program to work with
>> this. I will get on to work with this, I think this should significantly
>> improve results. I think, it might even push my current strategy from
>> promising results into attractive ones.
>>
>>
>>> There are several options which change the way the tests are executed;
>>> e.g. tests can be run in a "normal" mode, or in PS protocol mode, or with
>>> valgrind, or with embedded server. And it might well be that some tests
>>> always fail e.g. with valgrind, but almost never fail otherwise.
>>> Information about these options is partially available in test_run.info,
>>> but it would require some parsing. It would be perfect if you could analyze
>>> the existing data to understand whether using it can affect your results
>>> before spending time on actual code changes.
>>>
>>
>> I will keep this in consideration, but for now I will focus on these two
>> main things:
>>
>>    - Improving precision of selecting code changes to estimate
>>    correlation with test failures
>>    - Adding the use of an incoming test list
>>
>>
>>
>>> When we are trying to watch all code changes and find correlation with
>>> test failures, if it's done well, it should actually provide immediate
>>> gain; however, it's very difficult to do it right, there is way too much
>>> noise in the statistical data to get a reliable picture. So, while it will
>>> be nice if you get it work (since you already started doing it), don't take
>>> it as a defeat if you eventually find out that it doesn't work very well.
>>>
>>
>> Well, actually, this is the only big difference between the original
>> strategy using just a weighted average of failures; and the new strategy,
>> which performs *significantly better* in longer testing settings. It has
>> been working for a few weeks, and is up on github.
>>
>> Either way, as I said before, I will, from today, focus on improving
>> precision of selecting code changes to estimate correlation with test
>> failures.
>>
>> Regards
>> Pablo
>>
>
>

Follow ups

References