← Back to team overview

maria-developers team mailing list archive

Re: [GSoC] Accepted student ready to work : )


Hi Pablo,

Thanks for the update. Some comments inline.

On 02.06.2014 18:44, Pablo Estrada wrote:
Hello everyone,
Here's a small report on the news that I have so far:

    1. I had a slow couple of weeks because of a quick holiday that I took.
    I will make up for that.
    2. I added the metric that considers number of test_runs since a
    test_case ran for the last time. I graphed it, and it does not affect the
    results much at all. -I still think this is useful to uncover hidden bugs
    that might lurk in the code for a long time; but testing this condition is
    difficult with our data. I would like to keep this measure, specially since
    it doesn't seem to affect results negatively. Opinions?
    3. I liked Sergei's idea about using the changes on the test files to
    calculate the relevancy index. If a test has been changed recently, its
    relevancy index should be high. This is also more realistic, and uses
    information that it's easy for us to figure out.

Right, change should be taken into account, but test files cannot be completely relied upon. The problem is that many MTR tests have a more complicated structure than just test/result pair. They call various other files from inside; sometimes a test file is not much more than setting a few variables and then calling some common logic. Thus, the test file itself never changes, while the logic might.

I think it makes more sense to use *result* files. A result file will almost always reflect a change to the logic, it should be more accurate, although not perfect either. If a change was to fix the test itself, e.g. to get rid of a sporadic timeout and such, it's possible that the result file stays the same, while a test or an included file changes.

    - I am trying to match the change_files table or the mysql-test
       directory with the test_failure table. I was confused about the name of
       tests and test suites, but I am making progress on it. Once I am able to
       match at least 90% of the test_names in test_failure with the filename in
       the change_files table, I will incorporate this data into the
code and see
       how it works out.

I recommend to read these pages:
They might help to resolve some confusion.

And of course there is a good "official" MTR manual which you've probably seen: http://dev.mysql.com/doc/mysqltest/2.0/en/index.html

       - *Question*: Looking at the change_files table, there are files that
       have been *ADDED* several times. Why would this be? Maybe when a new
       branch is created, all files are ADDED to it? Any ideas? : ) (If no one
       knows, I'll figure it out, but maybe you do know ; ))

My guess is that the most common reason is multiple merges. A file is added to lets say 5.1; then the change, along with others, is merged into 5.2 and hence the file is added again; then to 5.3, etc. Besides, there might be numerous custom development trees registered on buildbot where the change is also merged into, hence the file is added again.

I'm sure there are more reasons.

    4. I uploaded inline comments for my code last week, let me know if it's
    clear enough. You can start by run_basic_simulations.py, where the most
    important functions are called... and after, you can dive into
    basic_simulator.py, where the simulation is actually done. The repository
    is a bit messy, I admit. I'll clean it up in the following commits.

Thanks for the hints. I started looking at your code, but haven't made much progress yet. I will follow the path you recommended.


This is all I have to report for now. Any advice on the way I'm proceeding
is welcome : )
Have a nice week, everyone.


On Sun, May 25, 2014 at 11:43 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

Hi, Pablo!

On May 25, Pablo Estrada wrote:
On Thu, May 22, 2014 at 5:39 PM, Sergei Golubchik <serg@xxxxxxxxxxx>
I don't think you should introduce artificial limitations that make
the recall worse, because they "look realistic".

You can do it realistic instead, not look realistic - simply pretend
that your code is already running on buildbot and limits the number
of tests to run. So, if the test didn't run - you don't have any
failure information about it.

And then you only need to do what improves recall, nothing else :)

(of course, to calculate the recall you need to use all failures,
even for tests that you didn't run)

Yes, my code *already works this way*. It doesn't consider failure
information from tests that were not supposed to run.  The graphs that
I sent are from scripts that ran like this.

Good. I hoped that'll be the case (but didn't check your scripts on
github yet, sorry).

Of course, the recall is just the number of spotted failures from the
of known failures : )

Anyway, with all this, I will get to work on adapting the simulation a
little bit:

    - Time since last run will also affect the relevancy of a test
    - I will try to use the list of changed files from commits to make
    new tests start running right away

Any other comments are welcome.

Getting back to your "potential fallacy" at how you start taking tests
into account only when they fail for the first time...

I agree, in real life we cannot do that. Instead we start from a
complete list of tests, that is known in advance. And you don't have it,

An option would be to create a complete list of all tests that have
ever failed (and perhaps remove tests that were added in some revision
present in the history). And use that as a "starting set" of tests.

Alternatively, we can generate a list of all tests currently present in
the 10.0 tree - everything that you have in the history tables should be
a subset of that.