maria-developers team mailing list archive
Mailing list archive
Re: [GSoC] Accepted student ready to work : )
Well, I now have familiarized myself with the data. I will start trying to
simulate some scenarios. In this email, I will summarize my roadmap for the
next few days. The text is* quite complicated* (and my redaction is kind of
clumsy too). It's *not necessary* *to read it in detail*. This email is
mostly to keep a record of what I'll do.
Anyhow, if anybody has questions or comments, I will be happy to address
them, as well as receive any feedback. I will start coding the simulations
in the next few days. Also, if you want more details of what I'm doing, I
can write that up too.
So here's what I'll do for the simulations:
*1. Calculating the: "Relevancy index"* for a test, I have considered two
simple options so far:
- *Exponential decay*: The relevancy index of a test is the *sum over
each failure* of( *exp((FailureTime - CurrentTime)/DecayRate))*. It
decreases exponentially as time passes, and increases if the test fails.
- DecayRate is
- i.e. If TestA failed at days 5 and 7, and now is day 9, RI will
be (exp(5-9)+exp(7-9)) = (exp(-4)+exp(-2)).
- The unit to measure time is just seconds in UNIX_TIMESTAMP
- *Weighted moving average*: The relevancy index of a test is: *R[now] =
R[now-1]*alpha + fail*(1-alpha)*, where fail is 1 if the test failed in
this run, and 0 if it did not fail. The value is between 1 and 0. It
decreases slowly if a test runs without failing, and it increases slowly if
the test fails.
- 0 < alpha < 1 (Initially set at 0.95 for testing).
- i.e. If TestB failed for the first time in the last run, and again
in this run: R[t] = 1*0.95 + 1*0.5 = 1
- If test B ran once more and did not fail, then: R[t+1] = 1*0.95 +
0*0.5 = 0.95
- The *advantage of this method* is that it doesn't have to look at
the whole history every time it's calculated (unlike the
- Much like TCP protocol
Regarding the *Relevancy Index*, it can be calculated grouping test results
in many ways: *Roughly* using test_name+variation, or *more granularly* by
*including* *branch* and *platform*. I'll add some thoughts regarding these
options at the bottom of the email.
*2. *To* run the simulation*, I'll gather data from the first few thousands
of test_run entries, and then start simulating results. Here's what I'll do:
1. *Gather data *first few thousands of test_run entries (i.e. 4
2. After N thousand test_runs, I'll go through the test_run entries *one
by one*, and using the data gathered to that point, I will select '*running
sets*' of *100* *test suites* to run on each test_run entry. (The number
can be adjusted)
3. If in this *test_run* entry, the list of *failed tests* contains
tests that are *NOT part* of the *running set*, the failure will be
ignored, and so the information of this failure will be lost (not used as
part of the relevancy index). *(See Comment 2)*
4. If the set of *failed tests *in the *test_run* entry intersect with
the *running_set*, this is better *recall*. This information will be
used to continue calculating the *relevancy index*.
According to the results obtained from the simulations, we can adjust the
algorithm (i.e. to consider *relevancy index by* *platform* and *branch*,
Comments about the *relevancy index:*
- The methods to calculate the relevancy index are very simple. There
are some other useful metrics that could be incorporated
- *Time since last run. *With the current methods, if a*
running*, it only* becomes less relevant with time*, and so even if
it could expose defects, it doesn't get to run because its
is just going down. Incorporating a function that* increases the
relevancy index* as the *time since the last run* *increases* can
help solve this issue. I believe this measure will be useful.
- *Correlation between test failures*. If two tests tend to fail
together, is it better to just run one of them? Incorporating
seems difficult, but it is on the table, in case we should consider it.
- As you might have seen, I decided to not consider any data concerned
with *code changes*. I'll work like this and see if the results are
Comments regarding *buildbot infrasturcture:*
These comments are out of the scope of this project, but it would be very
desirable features for the buildbot infrastructure.
- Unfortunately, given the data available in the database, it is NOT
possible to know *which tests ran* on each *test_run*. This information
would be very useful, as it would help estimate the *exact failure
rate*of a test. I didn't look into the code, but it seems that *class
most of the infrastructure necessary to just add one or two more tables (
*test_suite*, and *test_suite_test_run*), some code, and start keeping
track of this information.
- Another problem with the data available in the database is that it is
not possible to know *how many test suites exist*. It is only possible
to estimate *how many different test suites have failed*. This would
also be helpful information.
- Actually, this information would be useful not only for this project,
but in general for book-keeping of the development of MariaDB.
Thanks to all,
On Mon, Apr 28, 2014 at 9:57 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> Hi, Kristian!
> On Apr 28, Kristian Nielsen wrote:
> > Sergei Golubchik <serg@xxxxxxxxxxx> writes:
> > > note, that two *different* revisions got the same revno! And the
> > > from the first revision are completely and totally lost, there is no
> > > to retrieve from from anywhere.
> > Indeed.
> > But note that in main trees (5.1, 5.2, 5.3, 5.5, and 10.0), this cannot
> > since we have set the append_revision_only option (or
> > can't remember). This prevents a revision number from changing, once
> > So in main trees, the revision number _should_ in fact be unique.
> Yes. I omitted that detail, because I hope that we can find a solution
> that works for all trees without checks that only work for main trees.
> But, of course, as the last resort we can rely on append_revisions_only.
> > > Revision-id is the only unique identifier for a revision,
> > > it's not logged in these tables. I believe we'll change buildbot so
> > > revid would be logged in the future. But so far it wasn't needed, and
> > > this is one of the defects in the data.
> > I actually wanted to log it when I wrote the code. The problem is that
> > revision-id is not available to buildbot when the change is received from
> > Launchpad. I even asked the bzr/launchpad developers to provide the
> revid: so
> > it could be logged. The answer I got was that it is a deliberate feature
> > hide the revision id :-(
> > https://bugs.launchpad.net/launchpad/+bug/419057
> > So I don't think we will get revid in Buildbot. Of course, if we go to
> git, we
> > will not have this problem anymore, as it always uses a consistent,
> > revision identifier.
> Oh, I see, thanks.
> Git - yes, that's not an issue. Bzr - perhaps we could figure out
> something regardless. May be get the revid on the tarbake builder - it
> needs the tree anyway. Or use fake revids. Or something. It is not a
> showstopper for this project, we can think about it later, when we
> finish the research part and get to the integration.