← Back to team overview

dolfin team mailing list archive

Re: Fwd: Assembly benchmark

 



Dag Lindbo wrote:
Anders Logg wrote:
On Mon, Aug 11, 2008 at 10:32:23AM +0200, Dag Lindbo wrote:
Hello! (I've been out of e-mail range for a week)

I've posted some results on the wiki (and I actually have MTL4 :-) )
where I just show total assembly times with and without optimization
turned on. MTL4 responds quite well to -O3 (unlike uBLAS).

As far as the MTL4 backend goes, I have implemented and tested almost
all the operators and member functions. We are running into some quirks
with namespaces in MTL4 which Peter G is going to sort out. Also, there
is a bug somewhere in the version of the backend currently in place --
so please don't use it without manually initializing matrix dimensions.

I will sort this out in a day or two and then have a bundle ready that
finalizes the MTL4 backend. The performance results have been
encouraging enough to keep me (and Garth) motivated to get this backend
ready for "real" use.

/Dag
Good. I've updated my results (this time with MTL4 actually installed)
and MTL4 does very good.

The problem is it's cheating. Other backends would also do much better
if they were allowed to guess the number of nonzeros and guess right.


Of course, PETSc should be just as fast as MTL4. uBlas, I don't think
could. As far as Epetra goes, I have no understanding of the insertion
procedure. The STL backend is cheating as well, since the matrix is not
(as far as I know) in a state which is suitable for linear solve.

In my opinion it really is necessary to give the user the option of
specifying the number of nonzeros per row. Many forms get assembled a
_lot_ (e.g. the ICNS-3D form used by Hoffman et. al. has not changed in
years) and it seems absurd that DOLFIN should insist on recomputing the
information that is already known by the user.


The same form is not always assembled on the same mesh, which doesn't make it possible to specify the non-zeroes based on the form alone. What is likely to be the best approach is letting FFC create a function to compute the maximum non-zeroes for a row based on mesh information. Apparently the necessary function already exists in UFC. Bear in mind that for most computationally intensive applications the cost of computing the sparsity pattern will be insignificant compared to the multiple reassembly of the matrix (not that this means we shouldn't try to make it better, but it should be remain generic).

What you suggest with specifying the non-zeroes can be done within the existing framework. A "sparsity pattern" could just constitute the maximum number of non-zeroes on a row, and the matrix class can then initialise the matrix as it wishes. It's also possible to access the underlying matrix object and initialise it.

I don't think a function

  GenericMatrix::init(uint m, uint n, uint nz_max)

is suitable because it's not appropriate for all linear algebra backends (e.g., uBLAS).

Garth

There has been a lot of discussion of assembler performance on the list
and so I made the MTL4 backend as a demonstration of how fast the core
DOLFIN/FFC/UFC assembler design could be. I intend to keep trying to
squeeze more performance out of DOLFIN and you may then decide if my
suggestions are worthwhile or "cheating".

/Dag


------------------------------------------------------------------------

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev


------------------------------------------------------------------------

_______________________________________________
DOLFIN-dev mailing list
DOLFIN-dev@xxxxxxxxxx
http://www.fenics.org/mailman/listinfo/dolfin-dev



Follow ups

References