Thread Previous • Date Previous • Date Next • Thread Next |
Dag Lindbo wrote:
kent-and@xxxxxxxxx wrote:Sounds amazing! I'd like to see that code although I can not promise you to much response during my holiday, which is starting tomorrow. Have you compared matrix vector product with vector products using uBlas or PETSc ?Will do. More good news: After some discussion about insertion operations on the MTL list, Peter Gottschling (the lead developer of MTL4) implemented some optimizations based on some other bechmarks I ran. For insertion into very sparse matrices (like Poisson) I got a further 30% speedup. I could put the MTL4 experimental stuff in the sandbox. Does that sound good? I'm going on vacation too.
Sounds good. Send a hg bundle and any special instructions for building. Garth
/DagKentHello! In light of the long and interesting discussion we had a while ago about assembler performance I decided to try to squeeze more out of the uBlas backend. This was not very successful. However, I've been following the development of MTL4 (http://www.osl.iu.edu/research/mtl/mtl4/) with a keen eye on the interesting insertion scheme they provide. I implemented a backend -- without sparsity pattern computation -- for the dolfin assembler and here are some first benchmarks results: Incomp Navier Stokes on 50x50x50 unit cube MTL -------------------------------------------------------- assembly time: 8.510000 reassembly time: 6.750000 vecor assembly time: 6.070000 memory: 230 mb UBLAS ------------------------------------------------------ assembly time: 23.030000 reassembly time: 12.140000 vector assembly time: 6.030000 memory: 642 mb Poisson on 2000x2000 unit square MTL -------------------------------------------------------- assembly time: 9.520000 reassembly time: 6.650000 assembly time: 4.730000 vector linear solve: 0.000000 memory: 452 mb UBLAS ------------------------------------------------------ assembly time: 15.400000 reassembly time: 7.520000 vector assembly time: 5.020000 memory: 1169 mb Conclusions? MTL is more than twice as fast and allocates less than half the memory (since there is no sparsity pattern computation) across a set of forms I've tested. The code is not perfectly done yet, but I'd still be happy to share it with whoever wants to mess around with it. Cheers! /Dag _______________________________________________ DOLFIN-dev mailing list DOLFIN-dev@xxxxxxxxxx http://www.fenics.org/mailman/listinfo/dolfin-dev------------------------------------------------------------------------ _______________________________________________ DOLFIN-dev mailing list DOLFIN-dev@xxxxxxxxxx http://www.fenics.org/mailman/listinfo/dolfin-dev
Thread Previous • Date Previous • Date Next • Thread Next |