← Back to team overview

maria-developers team mailing list archive

Re: [GSoC] Optimize mysql-test-runs - Results of new strategy


Hi Pablo,

Okay, thanks for the update.

As I understand, the last two graphs were for the new strategy taking into account all edited files, no branch/platform, no time factor? If it's not quite so, could you please indicate which exact options/metrics did you use?

Also, if it's not too long and if it's possible with your current code, can you run the old strategy on the same exact data, learning/running set, and input files, so that we could clearly see the difference?

Meanwhile, I will look at what we have and maybe will come up with some ideas for improving the results.

I suppose your new tree does not include the input lists? Are you using the raw log files, or have you pre-processed them and made clean lists? If you are using the raw files, did you rename them?


On 24.07.2014 14:51, Pablo Estrada wrote:
Hi Elena,
I tracked down the issue with matching files and test_runs. It was simpler
than we thought.
1. I was using the index in the array, rather than the test_run.id field to
identify test runs. Sorry, that was my bad. I changed and reuploaded the


This accounted for cases: 148470, 148471, 148472, 148473, 148474, 148476,
148478, 148481, 148482, 148483.

2. The other 'false misses' happened because there are earlier test_runs
that match the files:

148467 - win32-packages_3172-log-test-stdio

It happens that test_run 100940,101104, has the same platform and build id,
so the file is matched with it earlier.

By the way, I just ran some tests with running_set size 30% and the results
were quite consistent around 80%, even for long runs. Over time it
decreases, albeit slowly. I still feel that a lot more consistent
performance can be obtained with consistent input lists.


On Thu, Jul 24, 2014 at 5:00 PM, Pablo Estrada <polecito.em@xxxxxxxxx>

Hi Elena,

Thanks. I hoped you would have results of the experiments involving
incoming lists of tests, as I think it's an important factor which might
affect the results (and hence the strategy); but I'll look at what we have

I have them now. There was one more bug I hadn't figured out. There are
still a couple bugs related to matching of input test list, but these
results must be quite close to the expected ones. I did them with 3000
rounds of training, and about 1500 rounds of prediction (skipping all runs
without input list).


Although the results are not as originally expected (20-80 ratio, I feel
that they are quite acceptable.

I will see what we can do about getting reliable lists one or another way;
certainly the log files are a temporary solution, but it would be nice to
use them for experiments and see the results anyway, because modifying
MTR/buildbot tandem and especially collecting the new data of considerable
volume will take time.

I understand, nonetheless I feel that this is a reasonable long-term goal
for this project.

This is not to say that parsing logs is the best way to do things, but
apparently something went wrong either with my archiving or with your
matching. If you don't have these files, please let me know.

It seems there's a bug with matching. I am looking at it now.

I've uploaded the fresh dump. Same location, file name

I will run more detailed tests with the new fresh dump. I will focus on a
  running set size of 30%. I believe they will be reasonable.


Follow ups