maria-developers team mailing list archive
Mailing list archive
Re: [GSoC] Optimize mysql-test-runs - Setback
Thank you Elena, I really appreciate that you took the time to make a
thorough review of the code. Answers to your comments are inline.
Conclusion at the end : )
- When I started coding, I had no idea how the project would fit in the
whole system, so, after asking a bit, I wrote a script much like Sergei's
example: Only with the simulations and statistics in mind and not
considering the implementation phase. I tried to make it a bit more
practical to use, but I did not have in mind using this code as part of the
implementation. Only the lessons learned from it.
- If you think it's a better idea, we can define exactly what the
interfaces of my project should be, so that I can write something with
views on the implementation, rather than a pure research-oriented script.
- Regarding the multi-mode logic: I decided to keep the logic from all
modes because it's only a constant overhead (it doesn't affect the
complexity of the original algorithm significantly), and at first I was
considering all failures, not only those from tests that ran. I can easily
add the code to run only the mode of interest.
> if you call several simulations from run_basic_simulations.py, only the
> very first one will use the correct data and get real results. All
> consequent ones will use the data modified by the previous ones, and the
> results will be totally irrelevant.
Yup, this is a huge mistake. When I implemented the series of simulations,
I completely forgot about the metrics, and thought there was no extra
cleanup necessary; but I was wrong. This is the first thing I'll fix.
> These flaws should be easy to fix by doing proper cleanup before each
> simulation. But there are also other fragments of code where, for example,
> logic for 'standard' mode is supposed to be always run and is relied upon,
> even if the desired mode is different.
> In fact, you build all queues every time. It would be an understandable
> trade-off to save the time on simulations, but you re-run them separately
> anyway, and only return the requested queue.
I decided to keep the logic of all modes because it does not affect the
complexity of the algorithm... and yes, I relied on the standard mode
because it always runs. This is not a mistake, but rather it is a 'cheat'
that I was aware of, but indulged in. This is easy to fix - and I don't
think it is too serious of a problem. If I were to code something with
views towards the implementation, I would make sure that only strictly
necessary code would run.
4. Failed tests vs executed tests
> - if we say we can afford running 500 tests, we'd rather run 500 than 20,
> even if some of them never failed before. This will also help us break the
> bubble, especially if we randomize the "tail" (tests with the minimal
> priority that we add to fill the queue). If some of them fail, they'll get
> a proper metric and will migrate to the meaningful part of the queue.
I don't understand what you mean by bubble, but I see your point. You are
right. I will work on making sure that we run RUNNING_SET tests, even when
the priority queue contains less than that.
> To populate the queue, You don't really need the information which tests
> had ever been run; you only need to know which ones MTR *wants* to run, if
> the running set is unlimited. If we assume that it passes the list to you,
> and you iterate through it, you can use your metrics for tests that failed
> or were edited before, and a default minimal metric for other tests. Then,
> if the calculated tests are not enough to fill the queue, you'll randomly
> choose from the rest. It won't completely solve the problem of tests that
> never failed and were never edited, but at least it will make it less
This brings us back to the question of research-only vs
preparing-for-implementation, yes, this can be done.
6. Full / non-full simulation mode
> I couldn't understand what the *non*-full simulation mode is for, can you
> explain this?
This is from the beginning of the implementation, when I was considering
all failures - even from tests that were not supposed to run. I can remove
> 7. Matching logic (get_test_file_change_history)
> The logic where you are trying to match result file names to test names is
> not quite correct. There are some highlights:
> There can also be subsuites. Consider the example:
> The result file can live not only in /r dir, but also in /t dir, together
> with the test file. It's not cool, but it happens, see for example
> Here are some other possible patterns for engine/plugin suites:
> Also, in release builds they can be in mysql-test/plugin folder:
When I coded that logic, I used the website that you had suggested, and
thought that I had covered all possibilities. I see that I didn't. I will
work on that too.
> Be aware that the logic where you compare branch names doesn't currently
> work as expected. Your list of "fail branches" consists of clean names
> only, e.g. "10.0", while row[BRANCH] can be like
> "lp:~maria-captains/maria/10.0". I'm not sure yet why it is sometimes
> stored this way, but it is.
Yes, I am aware of this; but when I looked at the data, there were very,
very few cases like this, so I decided to just ignore them. All the other
cases (>27 thousand) are presented as clean names. I believe this is
negligible, but if you think otherwise, I can add the logic to parse these
MariaDB [kokiri_apr23]> select branch, count(*) from changes where branch
like 'lp%' group by 1 order by 2;
| branch | count(*) |
| lp:~maria-captains/maria/5.3-knielsen | 1 |
| lp:~maria-captains/maria/5.5 | 1 |
| lp:~maria-captains/maria/mariadb-5.1-mwl132 | 1 |
| lp:~maria-captains/maria/mariadb-5.1-mwl163 | 1 |
| lp:~maria-captains/maria/maria-5.1-table-elimination | 3 |
| lp:~maria-captains/maria/10.0-FusionIO | 7 |
| lp:~maria-captains/maria/5.1-converting | 9 |
| lp:~maria-captains/maria/10.0-connect | 15 |
| lp:~maria-captains/maria/10.0 | 45 |
Lets get back to it (both of them) after the logic with dependent
> simulations is fixed, after that we'll review it and see why it doesn't
> work if it still doesn't. Right now any effect that file edition might have
> is rather coincidental, possibly the other one is also broken.
> - Humans are probably a lot better at predicting first failures than
>> current strategy.
> This is true, unfortunately it's a full time job which we can't afford to
> waste a human resource on.
I was not suggesting to hire a person for this : ). What I meant is that a
developer would be a lot better at choosing how to test their own changes,
rather than a script that just accounts for recent failures. This is
information that a developer can easily leverage on his own, along with his
code changes, etc... I was trying to say that considering all this, I need
to improve the algorithm.
In conclusion, this is what I will work on now:
- Add cleanup for every run - I just added (and pushed) this. Right now
I am running tests with this logic.
- Get new results, graph and report them ASAP.
- Add randomized running of tests that have not ran or been modified - I
will make this optional.
- Fix the logic that converts test result file names to test names
Questions to consider:
- Should we define precisely what the implementation should look like,
so that I can code the core, wrapper, etc? Or should I focus on finishing
the experiments and the 'research' first?
- I believe the logic to compare branch names should be good enough, but
if you consider that it should be 100% fixed, I can do that. What do you