← Back to team overview

maria-developers team mailing list archive

Re: [GSoC] Accepted student ready to work : )

 

Hello everyone,
Well, I now have familiarized myself with the data. I will start trying to
simulate some scenarios. In this email, I will summarize my roadmap for the
next few days. The text is* quite complicated* (and my redaction is kind of
clumsy too). It's *not necessary* *to read it in detail*. This email is
mostly to keep a record of what I'll do.

Anyhow, if anybody has questions or comments, I will be happy to address
them, as well as receive any feedback. I will start coding the simulations
in the next few days. Also, if you want more details of what I'm doing, I
can write that up too.

So here's what I'll do for the simulations:

*1. Calculating the: "Relevancy index"* for a test, I have considered two
simple options so far:

   - *Exponential decay*: The relevancy index of a test is the *sum over
   each failure* of( *exp((FailureTime - CurrentTime)/DecayRate))*. It
   decreases exponentially as time passes, and increases if the test fails.
      - DecayRate is
      - i.e. If TestA failed at days 5 and 7, and now is day 9, RI will
      be (exp(5-9)+exp(7-9)) = (exp(-4)+exp(-2)).
      - The unit to measure time is just seconds in UNIX_TIMESTAMP
   - *Weighted moving average*: The relevancy index of a test is: *R[now] =
   R[now-1]*alpha + fail*(1-alpha)*, where fail is 1 if the test failed in
   this run, and 0 if it did not fail. The value is between 1 and 0. It
   decreases slowly if a test runs without failing, and it increases slowly if
   the test fails.
      - 0 < alpha < 1 (Initially set at 0.95 for testing).
      - i.e. If TestB failed for the first time in the last run, and again
      in this run: R[t] = 1*0.95 + 1*0.5 = 1
      - If test B ran once more and did not fail, then: R[t+1] = 1*0.95 +
      0*0.5 = 0.95
      - The *advantage of this method* is that it doesn't have to look at
      the whole history every time it's calculated (unlike the
exponential decay
      method)
      - Much like TCP protocol
(1<http://www.cl.cam.ac.uk/~jac22/books/mm/book/node153.html>
      )

Regarding the *Relevancy Index*, it can be calculated grouping test results
in many ways: *Roughly* using test_name+variation, or *more granularly* by
*including* *branch* and *platform*. I'll add some thoughts regarding these
options at the bottom of the email.

*2. *To* run the simulation*, I'll gather data from the first few thousands
of test_run entries, and then start simulating results. Here's what I'll do:

   1. *Gather data *first few thousands of test_run entries (i.e. 4
   thousand)
   2. After N thousand test_runs, I'll go through the test_run entries *one
   by one*, and using the data gathered to that point, I will select '*running
   sets*' of *100* *test suites* to run on each test_run entry. (The number
   can be adjusted)
   3. If in this *test_run* entry, the list of *failed tests* contains
   tests that are *NOT part* of the *running set*, the failure will be
   ignored, and so the information of this failure will be lost (not used as
   part of the relevancy index). *(See Comment 2)*
   4. If the set of *failed tests *in the *test_run* entry intersect with
   the *running_set*, this is better *recall*. This information will be
   used to continue calculating the *relevancy index*.

According to the results obtained from the simulations, we can adjust the
algorithm (i.e. to consider *relevancy index by* *platform* and *branch*,
etc.)


Comments about the *relevancy index:*

   - The methods to calculate the relevancy index are very simple. There
   are some other useful metrics that could be incorporated
      - *Time since last run. *With the current methods, if a*
test*completely *stops
      running*, it only* becomes less relevant with time*, and so even if
      it could expose defects, it doesn't get to run because its
relevancy index
      is just going down. Incorporating a function that* increases the
      relevancy index* as the *time since the last run* *increases* can
      help solve this issue. I believe this measure will be useful.
      - *Correlation between test failures*. If two tests tend to fail
      together, is it better to just run one of them? Incorporating
this measure
      seems difficult, but it is on the table, in case we should consider it.
   - As you might have seen, I decided to not consider any data concerned
   with *code changes*. I'll work like this and see if the results are
   satisfactory.


Comments regarding *buildbot infrasturcture:*
These comments are out of the scope of this project, but it would be very
desirable features for the buildbot infrastructure.

   - Unfortunately, given the data available in the database, it is NOT
   possible to know *which tests ran* on each *test_run*. This information
   would be very useful, as it would help estimate the *exact failure
rate*of a test. I didn't look into the code, but it seems that *class
   MtrLogObserver*(2<http://buildbot.sourcearchive.com/documentation/0.8.3p1-1/mtrlogobserver_8py_source.html>)
contains
   most of the infrastructure necessary to just add one or two more tables (
   *test_suite*, and *test_suite_test_run*), some code, and start keeping
   track of this information.
   - Another problem with the data available in the database is that it is
   not possible to know *how many test suites exist*. It is only possible
   to estimate *how many different test suites have failed*. This would
   also be helpful information.
   - Actually, this information would be useful not only for this project,
   but in general for book-keeping of the development of MariaDB.

Thanks to all,
Pablo


On Mon, Apr 28, 2014 at 9:57 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Kristian!
>
> On Apr 28, Kristian Nielsen wrote:
> > Sergei Golubchik <serg@xxxxxxxxxxx> writes:
> >
> > > note, that two *different* revisions got the same revno! And the
> changes
> > > from the first revision are completely and totally lost, there is no
> way
> > > to retrieve from from anywhere.
> >
> > Indeed.
> >
> > But note that in main trees (5.1, 5.2, 5.3, 5.5, and 10.0), this cannot
> occur,
> > since we have set the append_revision_only option (or
> "append_revisions_only",
> > can't remember). This prevents a revision number from changing, once
> pushed.
> >
> > So in main trees, the revision number _should_ in fact be unique.
>
> Yes. I omitted that detail, because I hope that we can find a solution
> that works for all trees without checks that only work for main trees.
> But, of course, as the last resort we can rely on append_revisions_only.
>
> > > Revision-id is the only unique identifier for a revision,
> unfortunately,
> > > it's not logged in these tables. I believe we'll change buildbot so
> that
> > > revid would be logged in the future. But so far it wasn't needed, and
> > > this is one of the defects in the data.
> >
> > I actually wanted to log it when I wrote the code. The problem is that
> the
> > revision-id is not available to buildbot when the change is received from
> > Launchpad. I even asked the bzr/launchpad developers to provide the
> revid: so
> > it could be logged. The answer I got was that it is a deliberate feature
> to
> > hide the revision id :-(
> >
> >     https://bugs.launchpad.net/launchpad/+bug/419057
> >
> > So I don't think we will get revid in Buildbot. Of course, if we go to
> git, we
> > will not have this problem anymore, as it always uses a consistent,
> stable
> > revision identifier.
>
> Oh, I see, thanks.
>
> Git - yes, that's not an issue. Bzr - perhaps we could figure out
> something regardless. May be get the revid on the tarbake builder - it
> needs the tree anyway. Or use fake revids. Or something. It is not a
> showstopper for this project, we can think about it later, when we
> finish the research part and get to the integration.
>
> Regards,
> Sergei
>
>

Follow ups

References