dolfin team mailing list archive

Thread
Date

Re: MTL4 backend: Significant performance results

To: Dag Lindbo <dag@xxxxxxxxxx>
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Wed, 16 Jul 2008 19:10:22 +0100
Cc: dolfin-dev@xxxxxxxxxx
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <487E246E.4030208@csc.kth.se>
User-agent: Thunderbird 2.0.0.14 (X11/20080505)



Dag Lindbo wrote:

Robert Kirby wrote:

Good news, but remember there is a tradeoff here --MTL is designed by some
of the best generic programming folks around, and I am not surprised that
these results are great.  Remember first they are probably only supporting
serial computing.  PETSc has to keep track of a bunch of stuff that enables
parallel computing but may affect serial performance.


True. PETSc provides bindings to some very fancy external packages as
well (BoomerAMG etc), which is a huge benefit.

My intentions with an MTL backend are:
*) Explore the efficiency of the dolfin assembler and see how far we can
push it
*) Possibly compete with the uBLAS in the serial backend domain


With the new linear algebra design, this should be pretty easy to add.

Also, there are issues besides simple assembly matrix-vector product.  Many
algebraic preconditioners need different kinds of queries that may be
optimized to different degrees in different packages (e.g. extract the
diagonal).  While assembly and matvec are probably the two most crucial
benchmarks, it would be interesting to design a more robust set of
benchmarks that would test some of these other features as well.  Not
working primarily in preconditioners, I'm not sure what should go in, but it
should be bigger rather than smaller (e.g. doing SOR, SSOR, ILU, pivoting
for LU, etc)


MTL4 sits nicely with a related project: ITL (from the same group),
which provides the basic Krylov methods and the preconditioners you
mention. I have yet to benchmark these, but I suspect they will be
pretty solid for serial performance. Bindings to at least one high-end
LU solver will have to be provided as is the case today for uBLAS (UMFPACK).

Is it possible to access plain pointers to the underlying CSR matrix? Ifso, it's easy to bolt-on serial preconditioners and LU solvers.

It is my observation that beating PETSc or Trilinos at simple things is not
that hard, but there is a lot of expert knowledge built into these systems
over many years that adds robustness and safety and at least decent
performance across a wide range of operations.  Newer packages targeting a
specific research idea (e.g. template metaprogramming) rather than servicing
the scientific computing world may or may not have this extra robustness
built in yet.


MTL4 is cutting edge in its own right. Whether it is mature enough, I
can't tell for sure. It feels very solid to work with though.

I looked at what I recall being MTL2 before around the time the uBLASbackend was implemented. The issue at the time was typical for researchprojects: maintenance, continuity and completeness. Now that we havesome solid linear algebra backends (PETSc and uBLAS) and we've cleanedup the interface, when can afford to experiment with additional backends.

Did you use DOLFIN::Assembler + MTL, or did you write another assemblerfor testing?

The memory use looks strange to me. Did you perform an LU solver for theuBLAS case? The sparsity pattern doesn't use much memory (justintegers). Can you check the memory when using PETSc?


Garth

Thanks for your input!
/Dag

Rob

On Wed, Jul 16, 2008 at 9:47 AM, <kent-and@xxxxxxxxx> wrote:

Sounds amazing!

I'd like to see that code although I can not promise you to
much response during my holiday, which is starting tomorrow.

Have you compared matrix vector product with vector products using uBlas
or PETSc ?

Kent

Hello!

In light of the long and interesting discussion we had a while ago about
assembler performance I decided to try to squeeze more out of the uBlas
backend. This was not very successful.

However, I've been following the development of MTL4
(http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the
interesting insertion scheme they provide. I implemented a backend --
without sparsity pattern computation -- for the dolfin assembler and here
are some first benchmarks results:

Incomp Navier Stokes on 50x50x50 unit cube

MTL --------------------------------------------------------
assembly time: 8.510000
reassembly time: 6.750000
vecor assembly time: 6.070000

memory: 230 mb

UBLAS ------------------------------------------------------
assembly time: 23.030000
reassembly time: 12.140000
vector assembly time: 6.030000

memory: 642 mb

Poisson on 2000x2000 unit square

MTL --------------------------------------------------------
assembly time: 9.520000
reassembly time: 6.650000
assembly time: 4.730000
vector linear solve: 0.000000

memory: 452 mb

UBLAS ------------------------------------------------------
assembly time: 15.400000
reassembly time: 7.520000
vector assembly time: 5.020000

memory: 1169 mb

Conclusions? MTL is more than twice as fast and allocates less than half
the memory (since there is no sparsity pattern computation) across a set
of forms I've tested.

The code is not perfectly done yet, but I'd still be happy to share it
with whoever wants to mess around with it.

Cheers!

/Dag

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev



------------------------------------------------------------------------

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev

References

MTL4 backend: Significant performance results
From: Dag Lindbo, 2008-07-15
Re: MTL4 backend: Significant performance results
From: kent-and, 2008-07-16
Re: MTL4 backend: Significant performance results
From: Robert Kirby, 2008-07-16
Re: MTL4 backend: Significant performance results
From: Dag Lindbo, 2008-07-16