maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07377
Re: [GSoC] Optimize mysql-test-runs - Setback
Hello Elena and all,
I have pushed the fixed code. There are a lot of changes in it because I
went through all the code making sure that it made sense. The commit is here
<https://github.com/pabloem/Kokiri/commit/7c47afc45a7b1f390e8737df58205fa53334ba09>,
and although there are a lot of changes, the main line where failures are
caught or missed is this
<https://github.com/pabloem/Kokiri/blob/7c47afc45a7b1f390e8737df58205fa53334ba09/simulator.py#L496>
.
1. The test result file edition information helps improve recall - if
marginally
2. The time since last run information does not improve recall much at
all - See [Weaknesses - 2]
A couple of concepts that I want to define before going on:
- *First failures*. These are failures that happen because of new bugs.
They don't occur close in time as part of a chain of failures. The occur as
a consequence of a transaction that introduces a bug, but they might occur
soon or long after this transaction (usually soon, rather than long). They
might be correlated with the frequency of failure of a test (core or basic
tests that fail often might be specially good at exposing bugs); but many
of them are not (tests of a feature, that don't fail often, but rather,
when that feature is modified).
- *Strict simulation mode.* This is the mode where, if a test is not
part of the running set, its failure is not considered.
Weaknesses:
- It's very difficult to predict 'first failures'. With the current
strategy, if it's been long since a test failed (or if it has never failed
before), the relevancy of the test just goes down, and it never runs.
- Specially in database, and parallel software, there are bugs that hide
in the code for a long time until one test discovers them. Unfortunately,
the analysis that I'm doing requires that the test runs exactly when the
data indicates it will fail. If a test that would fail doesn't run in test
run Z, even though it might run in test run Z+1, the failure is just
considered as missed, as if the bug was 'not encountered' ever.
- This affects the *time since last run* factor. This factor helps
encounter 'hidden' bugs that can be exposed by tests that have
not run, but
the data available makes it difficult
- This would also affect the *correlation* factor. If test A and B
fail together often, and on test_run Z both of them would fail,
but only A
runs, the heightened relevancy of B on the next test_run would
not make it
catch anything (again, this is a limitation of the data, not of reality)
- Humans are probably a lot better at predicting first failures than the
current strategy.
Some ideas:
- I need to be more strict with my testing, and reviewing my code : )
- I need to improve prediction of 'first failures'. What would be a good
way to improve this?
- Correlation between files changed - Tests failed? Apparently Sergei
tried this, but the results were not too good - But this is
before running
in strict simulation mode. With strict simulation mode, anything
that could
help spot first failures could be considered.
I am currently running tests to get the adjusted results. I will graph
them, and send them out in a couple hours.
Regards
Pablo
On Fri, Jun 13, 2014 at 12:40 AM, Elena Stepanova <elenst@xxxxxxxxxxxxxxxx>
wrote:
> Hi Pablo,
>
> Thanks for the update.
>
>
> On 12.06.2014 19:13, Pablo Estrada wrote:
>
>> Hello Sergei, Elena and all,
>> Today while working on the script, I found and fixed an issue:
>>
>> There is some faulty code code in my script that is in charge of
>> collecting
>> the statistics about whether a test failure was caught or not (here
>> <https://github.com/pabloem/Kokiri/blob/master/basic_simulator.py#L393>).
>> I
>> looked into fixing it, and then I could see another *problem*: The *recall
>> numbers* that I had collected previously were *too high*.
>>
>> The actual recall numbers, once we consider the test failures that are
>> *not
>> caught*, are disappointingly lower. I won't show you results yet, since I
>>
>> want to make sure that the code has been fixed, and I have accurate tests
>> first.
>>
>> This is all for now. The strategy that I was using is a lot less effective
>> than it seemed initially. I will send out a more detailed report with
>> results, my opinion on the weak points of the strategy, and ideas,
>> including a roadmap to try to improve results.
>>
>> Regards. All feedback is welcome.
>>
>
> Please push your fixed code that triggered the new results, even if you
> are not ready to share the results themselves yet. It will be easier to
> discuss then.
>
> Regards,
> Elena
>
>
> Pablo
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-developers
>> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~maria-developers
>> More help : https://help.launchpad.net/ListHelp
>>
>>
Follow ups
References