← Back to team overview

maria-developers team mailing list archive

Re: [GSoC] self-tuning optimizer


Hi, Anshu!

On Jun 25, Anshu Avinash wrote:
> Hi Sergei,
> I have been looking into various methods that can meet our conditions.
> The Gauss-Seidel Method (http://en.wikipedia.org/wiki/Gauss-Seidel_method)
> might meet our requirements. I had just started writing the code.
> So in a nutshell, Gauss-seidel method is an iterative method. In our
> case, the initial values for all variables would be obtained by
> (total_query_time)/total_queries if total_queries not equal to 0, 0
> otherwise. After each query, we will get update the value of one
> variable (which should have a non-zero coefficient in the current
> equation). We would cycle through all constants query after query. The
> logic is that after many queries, the values would converge to the
> correct value. Also updating one variable after each query shouldn't
> be much of an overhead and we just need to store coefficients for the
> current query.
> The code for this is very simple, I can give a patch in 1-2 days, if
> the overall idea seems ok.

I don't have much experience in here :(
Are you sure that Gauss-Seidel will converge? It's not obvious to me
that the convergence criteria are met on our data.

Besides, we do have a sparse matrix, that's true. But we don't (and
won't ever) have a large matrix with millions of unknowns.  We might get
a thousand in a few years (ten per-engine constants times MAX_HA is
already 640). So, methods that are asymptotically faster may not be
faster in our case.

I'd suggest the following: in your method that should be solving the
equation - instead of solving anything, just dump the data to a file
(make sure that data from different threads aren't intermixed, or use
different files per thread). Then you can use any existing tool or
software package to play with the data, try different methods, even
prototype in python, if you'd like.

One of the issues to keep in mind - the system won't be completely
defined. Basically never. MAX_HA = 64, but practically you can expect
the data for one or two engines only, so no matter how many equations
you'll collect, ~120 variables will always have zero coefficients.

By the way, just a thought, in iterative methods you can use your
current values of optimizer constants as initial values for the first