On Wed, Jul 16, 2008 at 09:48:16PM +0100, Garth N. Wells wrote:
Anders Logg wrote:
Very nice!
Some comments:
1. Beating uBLAS by a factor 3 is not that hard.
A factor 3 is quite something if the matrix is in compressed row format.
DOLFIN assembly times into uBLAS, PETSc and Trilinos matrices are all
very close to each other.
Didem Unat (PhD
student at UCSD/Simula) and Ilmar have been looking at the assembly in
DOLFIN recently. We've done some initial benchmarks and have started
investigating how to speedup the assembly. Take a look at what happens
when we assemble into uBLAS:
(i) Compute sparsity pattern
(ii) Reset tensor
(iii) Assemble
For uBLAS, each of these steps is approximately an assembly process.
I don't remember the exact numbers, but by just using an
std::vector<std::map<int, double> > instead of a uBLAS matrix, one may
skip (i) and (ii) and get a speedup.
You can do this with uBLAS too by using the uBLAS mapped_matrix (uses
std::map internally) instead of compressed_matrix. The problem is that
it is dead slow for matrix-vector multiplications. Most uBLAS sparse
matrix types are faster to assemble than the compressed_matrix, but are
slower to traverse.
Before the computation of the sparsity pattern was implemented, DOLFIN
assembled into a uBLAS vector-of-compressed-vectors because it is quite
fast to assemble uninitialised and can be converted quite quickly to a
compressed row matrix. This approach may still have merit for some problems.
Garth
I think eventually we should assemble into a special-purpose data
structure that is fast for assembly and the convert row-wise (which is
fast) into something suitable for linear algebra.