← Back to team overview

dolfin team mailing list archive

Re: MTL4 backend: Significant performance results

 

Good news, but remember there is a tradeoff here --MTL is designed by some
of the best generic programming folks around, and I am not surprised that
these results are great.  Remember first they are probably only supporting
serial computing.  PETSc has to keep track of a bunch of stuff that enables
parallel computing but may affect serial performance.

Also, there are issues besides simple assembly matrix-vector product.  Many
algebraic preconditioners need different kinds of queries that may be
optimized to different degrees in different packages (e.g. extract the
diagonal).  While assembly and matvec are probably the two most crucial
benchmarks, it would be interesting to design a more robust set of
benchmarks that would test some of these other features as well.  Not
working primarily in preconditioners, I'm not sure what should go in, but it
should be bigger rather than smaller (e.g. doing SOR, SSOR, ILU, pivoting
for LU, etc)

It is my observation that beating PETSc or Trilinos at simple things is not
that hard, but there is a lot of expert knowledge built into these systems
over many years that adds robustness and safety and at least decent
performance across a wide range of operations.  Newer packages targeting a
specific research idea (e.g. template metaprogramming) rather than servicing
the scientific computing world may or may not have this extra robustness
built in yet.

Rob

On Wed, Jul 16, 2008 at 9:47 AM, <kent-and@xxxxxxxxx> wrote:

>
> Sounds amazing!
>
> I'd like to see that code although I can not promise you to
> much response during my holiday, which is starting tomorrow.
>
> Have you compared matrix vector product with vector products using uBlas
> or PETSc ?
>
> Kent
>
>
> > Hello!
> >
> > In light of the long and interesting discussion we had a while ago about
> > assembler performance I decided to try to squeeze more out of the uBlas
> > backend. This was not very successful.
> >
> > However, I've been following the development of MTL4
> > (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the
> > interesting insertion scheme they provide. I implemented a backend --
> > without sparsity pattern computation -- for the dolfin assembler and here
> > are some first benchmarks results:
> >
> > Incomp Navier Stokes on 50x50x50 unit cube
> >
> > MTL --------------------------------------------------------
> > assembly time: 8.510000
> > reassembly time: 6.750000
> > vecor assembly time: 6.070000
> >
> > memory: 230 mb
> >
> > UBLAS ------------------------------------------------------
> > assembly time: 23.030000
> > reassembly time: 12.140000
> > vector assembly time: 6.030000
> >
> > memory: 642 mb
> >
> > Poisson on 2000x2000 unit square
> >
> > MTL --------------------------------------------------------
> > assembly time: 9.520000
> > reassembly time: 6.650000
> > assembly time: 4.730000
> > vector linear solve: 0.000000
> >
> > memory: 452 mb
> >
> > UBLAS ------------------------------------------------------
> > assembly time: 15.400000
> > reassembly time: 7.520000
> > vector assembly time: 5.020000
> >
> > memory: 1169 mb
> >
> > Conclusions? MTL is more than twice as fast and allocates less than half
> > the memory (since there is no sparsity pattern computation) across a set
> > of forms I've tested.
> >
> > The code is not perfectly done yet, but I'd still be happy to share it
> > with whoever wants to mess around with it.
> >
> > Cheers!
> >
> > /Dag
> >
> > _______________________________________________
> > DOLFIN-dev mailing list
> > DOLFIN-dev@xxxxxxxxxx
> > http://www.fenics.org/mailman/listinfo/dolfin-dev
> >
>
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>

Follow ups

References