← Back to team overview

maria-developers team mailing list archive

Re: For Google Summer of Code 2014, Interested in the task of "statistically optimize mysql-test runs by running less tests"


Hi, Zhongyi Hu!

On Mar 14, Zhongyi Hu wrote:
> Dear Sergei Golubchik,
> I am a post graduate student of Institute of Software, Chinese Academy
> of Sciences and my name is Zhongyi Hu.
> I major in computer science and my research field is data stream
> mining.  Because I have got enough papers and works for graduation, I
> want to do something interesting, meaningful and valuable in the rest
> time as student.

I see. That's very nice :)

> I have participated in two projects about database, one is main memory
> database and the other is database cluster.  I got some experience of
> database system design and implementaion from them.  Although I am
> just a beginner of this area, I really like it and expect to make it
> as my career.  I often use Mysql in research and work, but MariaDB is
> not very familiar to me.  I am tremendously optimistic about it's
> future because all of you.
> Well, let's come to the point.  I am interested in the task of
> "statistically optimize mysql-test runs by running less tests".  I
> chose this task because I have written a few tools for automatic test.
> I know the performance is very important if there are a large amount
> of data or cases to test.

This task won't make you familiar with database system design or
implementation. For this task it doesn't matter whether tests are
database tests, unit tests, or something completely different. As far as
this task is concerned, they're abstracts units of work that can be
executed in arbitrary order and they can "succeed" or "fail", and the
goal is to execute as few of these "tests" as possible, while detecting
as many "failures" as possible.

> I read the MDEV-5776 and I think the major job is as follow.
> When the code is changed, the mysql-test is used to do the requisite
> tests.  We need to integrate the information of the changes and the
> scenarios to predict the probability of failure for each test and get
> the relationships of the tests.
> Then decide what to test and what test cases should be used.  The
> purpose is to optimize the efficiency of testing.  All of these should
> be done by algorithm and program.

Yes. But it's also useful to take into account the historical data -
what tests failed before and where.

In my experiments historical data were most important (I've got good
results purely from statistical analysys of historical data), and the
information about what files were changed didn't improve the results
much. But perhaps I was doing it wrong?

> In addition, I think that the job is in some ways like mining in data
> stream, such as many data need to be statistical analyzed and the
> hidden patterns changing over time.

Yes, exactly.

> At last, I have two basic questions.
> 1) What exactly are the builder and the combination?
> I thought they refer to compiler and runtime environment.

Kind of, yes. See this my reply:

it contains links to our buildbot (the tool that automatically builds
and tests mariadb on different platforms - "builders").

There you will see what builders are, what combinations are, and so on.

> 2) What does the "individual tests within a big test file" mean?

Most tests use "mysqltest" tool. It is conceptually very simple -
execute a set of commands, record the output. Compare with the correct
pre-recorded output.

A test file contains SQL statements (and sometimes mysqltest
directives). Technically, one can have many logical tests in one test

> Maybe I am completely wrong, but I still look forward to your reply.
> I hope to have the opportunity to learn from you in work and discussion.

If you want to participate in Google Summer of Code, don't forget to
submit a proposal before the deadline:


Follow ups