maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07561
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
Hello Elena and all,
First, addressing the previous email:
Looking at the dump, I see it can also happen that the dump contains
> several records for a pair platform/bbnum. I am not sure why it happens, I
> think it shouldn't, might be a bug in buildbot and/or configuration, or
> environmental problems. Anyway, due to the way we store output files, they
> can well override each other in this case, thus for several platform/bbnum
> record you will have only one file. I suppose that's what was hard to
> resolve, sorry about that.
>
No worries ; ). There are several cases where platform and build number
are the same. The system just names the files as follows:
<platform>_<build_id>-log-test_1-stdio
<platform>_<build_id>-log-test_2-stdio
.....
<platform>_<build_id>-log-test_5-stdio
These files seem to correspond temporarily with the test runs
(*test_1-stdio) belongs to the first test_run of the same plt-bnum, and so
on. Unfortunately, there are some cases where there are more test_runs on
the dump than files available, and this means that it's impossible to be
sure which file belongs to which test_run exactly.
> You should consider skipped tests, at least for now. Your logic that they
> are skipped because they can't be run is generally correct; unfortunately,
> MTR first produces the *full* list of tests to run, and determines whether
> a test can be run or not on a later stage, when it starts running the
> tests. Your tool will receive the initial test list, and I'm not sure it's
> realistic to re-write MTR so that it takes into account limitations that
> cause skipping tests before creating the list.
>
I see. Okay then, duly noted.
Possibly it's better to skip a test run altogether if there is no input
> list for it; it would be definitely the best if there were 5K (or whatever
> slice you are currently using) of continuous test runs with input lists; if
> it so happens that there are lists for some branches but not others, you
> can skip the branch entirely.
>
This doesn't seem like a good option. Recall drops seriously, and the
test_runs that have a corresponding file don't seem to have a special
pattern, and tend to have long spaces between them, so the information
becomes irrelevant, and seemingly, not useful.
> The core module should take as parameters
> - list of tests to choose from,
> - size of the running set (%),
> - branch/platform (if we use them in the end),
> and produce a new list of tests of the size of the running set.
>
> The wrapper module should
> - read the list of tests from the outside world (for now, from a file),
> - receive branch/platform as command-line options,
> - have the running set size set as an easily changeable constant or as a
> configuration parameter,
>
> and return the list of tests -- lets say for now, in the form of <test
> suite>.<test name>, blank-separated, e.g.
> main.select innondb.create-index ...
>
>>
>>
I am almost done 'translating' the code into a solution that divides it in
'core' and 'wrapper'. There are a few bugs that I still haven't figured
out, but I believe I can iron those out pretty soon. I will also
incorporate the percentage rather than fixed running_set.
Now, regarding the state of the project (and the recall numbers that I am
able to achieve so far), here are some observations:
- Unfortunately, I am running out of ideas to try to improve recall. I
tried tuning some parameters, giving more weight to ones or others, etc. I
still wasn't able to push recall beyond ~87% on the strategy that uses file
correlations. For what I've seen, some failures are just extremely hard to
predict.
- The strategy that uses only a weighted average of the failure
frequency achieves a higher recall, but for a shorter time. The recall
decays quickly afterwards. I may try to add some file-correlations to this
strategy, to see if the recall can be sustained for a longer term.
- There is one problem that I see regarding the data and the potential
real-world implementation of the program: By verifying the recall with the
historical data, we run the possibility of 'expecting' overfitting... so
the results regarding the errors found when comparing to the historical
data, and the results that could have been obtained by a real-world
implementation are potentially different. A possible way to address that
issue would require modifying the buildbot to gather more data over a
longer term.
So having said that, I am looking for some advice in the following regards:
- I will try to take a step back from the new strategy, and see how I
can adapt the original strategy to prevent the recall function from
declining so sharply with time.
- I will also spend some time keeping a codebase that adjusts better to
the model that we need for the implementation. I will upload code soon. All
suggestions are welcome.
- Nonetheless, I feel that more data would allow to improve the
algorithm greatly. Is it possible to prepare logging into the buildbot that
would allow for more precise data collection? A slower, more iterative
process, working closer with the buildbot and doing more detailed data
collection might deliver better results. (I understand that this would
probably influence the time-scope of the project)
Let me know what you think about my suggestions.
Regards
Pablo
Follow ups
References
-
[GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-06-27
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-06-28
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-06-28
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-06-29
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-06-29
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-06-29
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-14
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Elena Stepanova, 2014-07-14
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-15
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-15
-
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
From: Pablo Estrada, 2014-07-17