← Back to team overview

dolfin team mailing list archive

Re: MTL4 backend: Significant performance results

 

> Very nice!
>

Thanks :)

> Some comments:
>
> 1. Beating uBLAS by a factor 3 is not that hard. Didem Unat (PhD
> student at UCSD/Simula) and Ilmar have been looking at the assembly in
> DOLFIN recently. We've done some initial benchmarks and have started
> investigating how to speedup the assembly. Take a look at what happens
> when we assemble into uBLAS:
>
>   (i)   Compute sparsity pattern
>   (ii)  Reset tensor
>   (iii) Assemble
>
> For uBLAS, each of these steps is approximately an assembly process.
> I don't remember the exact numbers, but by just using an
> std::vector<std::map<int, double> > instead of a uBLAS matrix, one may
> skip (i) and (ii) and get a speedup.
>

I think this simplifies it too much. Ublas has a matrix type called
"generalized vector of vector" (gvov) that one can assemble into without
(i) and (ii) but then one has to copy the whole matrix to a compressed row
major afterwards. Artefacts of this pre-SP "assembly matrix" can still be
found in DOLFIN.

That MTL can take you from a fresh bilinear form to a matrix ripe for
Krylov iteration three times faster is, in my opinion, impressive.

> We've just started and don't have anything to present yet.
>
> 2. I've also looked at MTL before. We even considered using it as the
> main LA backend a (long) while back.
>
> 3. With the new LA interfaces in place, I wouldn't mind having MTL as
> an optional backend.
>
> --
> Anders
>
>
> On Tue, Jul 15, 2008 at 11:58:05PM +0200, Dag Lindbo wrote:
>> Hello!
>>
>> In light of the long and interesting discussion we had a while ago about
>> assembler performance I decided to try to squeeze more out of the uBlas
>> backend. This was not very successful.
>>
>> However, I've been following the development of MTL4
>> (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the
>> interesting insertion scheme they provide. I implemented a backend --
>> without sparsity pattern computation -- for the dolfin assembler and
>> here
>> are some first benchmarks results:
>>
>> Incomp Navier Stokes on 50x50x50 unit cube
>>
>> MTL --------------------------------------------------------
>> assembly time: 8.510000
>> reassembly time: 6.750000
>> vecor assembly time: 6.070000
>>
>> memory: 230 mb
>>
>> UBLAS ------------------------------------------------------
>> assembly time: 23.030000
>> reassembly time: 12.140000
>> vector assembly time: 6.030000
>>
>> memory: 642 mb
>>
>> Poisson on 2000x2000 unit square
>>
>> MTL --------------------------------------------------------
>> assembly time: 9.520000
>> reassembly time: 6.650000
>> assembly time: 4.730000
>> vector linear solve: 0.000000
>>
>> memory: 452 mb
>>
>> UBLAS ------------------------------------------------------
>> assembly time: 15.400000
>> reassembly time: 7.520000
>> vector assembly time: 5.020000
>>
>> memory: 1169 mb
>>
>> Conclusions? MTL is more than twice as fast and allocates less than half
>> the memory (since there is no sparsity pattern computation) across a set
>> of forms I've tested.
>>
>> The code is not perfectly done yet, but I'd still be happy to share it
>> with whoever wants to mess around with it.
>>
>> Cheers!
>>
>> /Dag
>>
>> _______________________________________________
>> DOLFIN-dev mailing list
>> DOLFIN-dev@xxxxxxxxxx
>> http://www.fenics.org/mailman/listinfo/dolfin-dev
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>




Follow ups