dolfin team mailing list archive

Thread
Date

Re: MTL4 backend: Significant performance results

To: Robert Kirby <robert.c.kirby@xxxxxxxxx>
From: Dag Lindbo <dag@xxxxxxxxxx>
Date: Wed, 16 Jul 2008 18:40:14 +0200
Cc: dolfin-dev@xxxxxxxxxx
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <b376f5650807160858y244c95bes87be97a8d3328c6@mail.gmail.com>
User-agent: Thunderbird 2.0.0.14 (X11/20080505)

Robert Kirby wrote:
> Good news, but remember there is a tradeoff here --MTL is designed by some
> of the best generic programming folks around, and I am not surprised that
> these results are great.  Remember first they are probably only supporting
> serial computing.  PETSc has to keep track of a bunch of stuff that enables
> parallel computing but may affect serial performance.

True. PETSc provides bindings to some very fancy external packages as
well (BoomerAMG etc), which is a huge benefit.

My intentions with an MTL backend are:
*) Explore the efficiency of the dolfin assembler and see how far we can
push it
*) Possibly compete with the uBLAS in the serial backend domain

> 
> Also, there are issues besides simple assembly matrix-vector product.  Many
> algebraic preconditioners need different kinds of queries that may be
> optimized to different degrees in different packages (e.g. extract the
> diagonal).  While assembly and matvec are probably the two most crucial
> benchmarks, it would be interesting to design a more robust set of
> benchmarks that would test some of these other features as well.  Not
> working primarily in preconditioners, I'm not sure what should go in, but it
> should be bigger rather than smaller (e.g. doing SOR, SSOR, ILU, pivoting
> for LU, etc)

MTL4 sits nicely with a related project: ITL (from the same group),
which provides the basic Krylov methods and the preconditioners you
mention. I have yet to benchmark these, but I suspect they will be
pretty solid for serial performance. Bindings to at least one high-end
LU solver will have to be provided as is the case today for uBLAS (UMFPACK).

> 
> It is my observation that beating PETSc or Trilinos at simple things is not
> that hard, but there is a lot of expert knowledge built into these systems
> over many years that adds robustness and safety and at least decent
> performance across a wide range of operations.  Newer packages targeting a
> specific research idea (e.g. template metaprogramming) rather than servicing
> the scientific computing world may or may not have this extra robustness
> built in yet.

MTL4 is cutting edge in its own right. Whether it is mature enough, I
can't tell for sure. It feels very solid to work with though.

Thanks for your input!
/Dag

> 
> Rob
> 
> On Wed, Jul 16, 2008 at 9:47 AM, <kent-and@xxxxxxxxx> wrote:
> 
>> Sounds amazing!
>>
>> I'd like to see that code although I can not promise you to
>> much response during my holiday, which is starting tomorrow.
>>
>> Have you compared matrix vector product with vector products using uBlas
>> or PETSc ?
>>
>> Kent
>>
>>
>>> Hello!
>>>
>>> In light of the long and interesting discussion we had a while ago about
>>> assembler performance I decided to try to squeeze more out of the uBlas
>>> backend. This was not very successful.
>>>
>>> However, I've been following the development of MTL4
>>> (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the
>>> interesting insertion scheme they provide. I implemented a backend --
>>> without sparsity pattern computation -- for the dolfin assembler and here
>>> are some first benchmarks results:
>>>
>>> Incomp Navier Stokes on 50x50x50 unit cube
>>>
>>> MTL --------------------------------------------------------
>>> assembly time: 8.510000
>>> reassembly time: 6.750000
>>> vecor assembly time: 6.070000
>>>
>>> memory: 230 mb
>>>
>>> UBLAS ------------------------------------------------------
>>> assembly time: 23.030000
>>> reassembly time: 12.140000
>>> vector assembly time: 6.030000
>>>
>>> memory: 642 mb
>>>
>>> Poisson on 2000x2000 unit square
>>>
>>> MTL --------------------------------------------------------
>>> assembly time: 9.520000
>>> reassembly time: 6.650000
>>> assembly time: 4.730000
>>> vector linear solve: 0.000000
>>>
>>> memory: 452 mb
>>>
>>> UBLAS ------------------------------------------------------
>>> assembly time: 15.400000
>>> reassembly time: 7.520000
>>> vector assembly time: 5.020000
>>>
>>> memory: 1169 mb
>>>
>>> Conclusions? MTL is more than twice as fast and allocates less than half
>>> the memory (since there is no sparsity pattern computation) across a set
>>> of forms I've tested.
>>>
>>> The code is not perfectly done yet, but I'd still be happy to share it
>>> with whoever wants to mess around with it.
>>>
>>> Cheers!
>>>
>>> /Dag
>>>
>>> _______________________________________________
>>> DOLFIN-dev mailing list
>>> DOLFIN-dev@xxxxxxxxxx
>>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>>>
>>
>> _______________________________________________
>> DOLFIN-dev mailing list
>> DOLFIN-dev@xxxxxxxxxx
>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>>
>

Attachment: signature.asc
Description: OpenPGP digital signature

Follow ups

Re: MTL4 backend: Significant performance results
From: Garth N. Wells, 2008-07-16

References

MTL4 backend: Significant performance results
From: Dag Lindbo, 2008-07-15
Re: MTL4 backend: Significant performance results
From: kent-and, 2008-07-16
Re: MTL4 backend: Significant performance results
From: Robert Kirby, 2008-07-16