maria-developers team mailing list archive
Mailing list archive
Re: [GSoC] Optimize mysql-test-runs - Results of new strategy
Can you give a raw estimation of a ratio of failures missed due to being
> low in the priority queue vs those that were not in the queue at all?
I sent this information in a previous email, here:
Also, once again, I would like you to start using an incoming test list as
> an initial point of your test set generation. It must be done sooner or
> later, I already explained earlier why; and while it's not difficult to
> implement even after the end of your project, it might affect the result
> considerably, so we need to know if it makes it better or worse, and adjust
> the algorithm accordingly.
> You are right. I understand that this information is not fully available
for all the test_runs, so can you upload the information going back as much
as possible? I can parse these files and adjust the program to work with
this. I will get on to work with this, I think this should significantly
improve results. I think, it might even push my current strategy from
promising results into attractive ones.
> There are several options which change the way the tests are executed;
> e.g. tests can be run in a "normal" mode, or in PS protocol mode, or with
> valgrind, or with embedded server. And it might well be that some tests
> always fail e.g. with valgrind, but almost never fail otherwise.
> Information about these options is partially available in test_run.info,
> but it would require some parsing. It would be perfect if you could analyze
> the existing data to understand whether using it can affect your results
> before spending time on actual code changes.
I will keep this in consideration, but for now I will focus on these two
- Improving precision of selecting code changes to estimate correlation
with test failures
- Adding the use of an incoming test list
> When we are trying to watch all code changes and find correlation with
> test failures, if it's done well, it should actually provide immediate
> gain; however, it's very difficult to do it right, there is way too much
> noise in the statistical data to get a reliable picture. So, while it will
> be nice if you get it work (since you already started doing it), don't take
> it as a defeat if you eventually find out that it doesn't work very well.
Well, actually, this is the only big difference between the original
strategy using just a weighted average of failures; and the new strategy,
which performs *significantly better* in longer testing settings. It has
been working for a few weeks, and is up on github.
Either way, as I said before, I will, from today, focus on improving
precision of selecting code changes to estimate correlation with test