dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #08829
Re: [HG DOLFIN] Bug fix in assembly benchmark (don't trust values in previous changeset)
On Tue, Jul 22, 2008 at 11:41:26PM +0100, Garth N. Wells wrote:
>
>
> DOLFIN wrote:
> > One or more new changesets pushed to the primary dolfin repository.
> > A short summary of the last three changesets is included below.
> >
> > changeset: 4491:cb0fdfa3514ab65e67b2f922cf482b4f2aa008eb
> > tag: tip
> > user: Anders Logg <logg@xxxxxxxxx>
> > date: Tue Jul 22 23:44:14 2008 +0200
> > files: bench/fem/assembly/cpp/main.cpp
> > description:
> > Bug fix in assembly benchmark (don't trust values in previous changeset)
> > and add reassembly benchmark. New preliminary results:
> >
> > Assemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D StabStokes2D Elasticity3D NSEMomentum3D
> > ---------------------------------------------------------------------------------------------------------
> > uBLAS | 0.45 3.84 3.77 15.1 3.81 8.8 9.13
> > PETSc | 0.42 3.6 3.56 14.07 3.2 7.6 7.9
> > Epetra | 0.45 3.76 3.76 14.94 3.72 8.71 9.06
> > MTL4 | 0.44 3.75 3.75 14.77 3.73 8.75 9.11
> > Assembly | 0.43 3.78 3.8 14.88 3.36 7.05 7.49
> >
> > Reassemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D StabStokes2D Elasticity3D NSEMomentum3D
> > -----------------------------------------------------------------------------------------------------------
> > uBLAS | 0.2 0.64 0.64 4.37 1.49 4.39 4.74
> > PETSc | 0.19 0.54 0.55 3.08 1.06 3.24 3.55
> > Epetra | 0.2 0.65 0.65 4.41 1.5 4.36 4.71
> > MTL4 | 0.22 0.65 0.64 4.42 1.5 4.38 4.73
> > Assembly | 0.17 0.53 0.53 2.92 0.89 2.36 2.73
> >
> > From these results, it looks like the AssemblyMatrix backend is the fastest
> > but there may be bugs etc.
> >
>
> I'm getting quite different results.
Strange that we are getting so different results. Did you use any
particular compiler options?
> Assemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D
> StabStokes2D Elasticity3D NSEMomentum3D
>
> ---------------------------------------------------------------------------------------------------------
> uBLAS | 0.34 2.94 2.88 11.49
> 2.86 6.67 6.98
> PETSc | 0.31 2.7 2.71 10.24
> 2.44 5.72 5.94
> Epetra | 0.35 2.41 2.39 7.22
> 2.14 10.88 10.98
> MTL4 | 0.2 1.78 1.79 2.97
> 0.83 1.99 2.32
> Assembly | 0.31 2.85 2.89 11.26
> 2.57 5.46 5.77
>
> Reassemble | Poisson2DP1 Poisson2DP2 Poisson2DP3 THStokes2D
> StabStokes2D Elasticity3D NSEMomentum3D
>
> -----------------------------------------------------------------------------------------------------------
> uBLAS | 0.14 0.47 0.47 3.22
> 1.1 3.23 3.46
> PETSc | 0.16 0.42 0.42 2.37
> 0.81 2.46 2.68
> Epetra | 0.17 0.43 0.43 2.28
> 0.82 2.2 2.51
> MTL4 | 0.18 0.49 0.48 1.55
> 0.85 1.73 1.9
> Assembly | 0.12 0.43 0.42 2.32
> 0.68 1.82 2.1
>
> MTL4 is the fastest, which is due in large part to the fact the
> MTL4SparsityPattern doesn't do anything. Once we get the sparsity
> pattern sorted out, I expect PETSc to be very close to MTL4.
>
> I don't think that AssemblyMatrix is particularly interesting other than
> for curiosity because it's not good for linear algebra.
I still think it's interesting since it allows us to experiment with
new special-purpose backends for assembly, followed by a conversion to
one of the other formats (like for uBLAS before). The current
implementation as vector<map<uint, real>> is just an example.
> For Stokes + Taylor-Hood, most of the time is in the generation of the
> sparsity pattern. I tested PETSc and MTL4 earlier today for Taylor-Hood
> by not generating the vector-of-a-set in SparsityPattern and just
> prescribing the maximum number of non-zeroes per row. The assembly was
> much faster, and the difference between PETSc and MTL4 was very small.
>
> I also made a modification of SparsityPattern to work with a 'homemade'
> unsorted set using a vector of vectors. It's a lot faster than using
> std::set in SparsityPattern and can return the number of non-zeroes per
> row. However, it isn't ordered for each row so it's not very useful for
> initialising sparse uBLAS matrices by filling the matrix in order. What
> I'll do is implement it, so PETSc and MTL4, and probably Epetra, will be
> considerably faster. For uBLAS, I'll revert to the old strategy of
> assembling into a fast to assemble matrix and converting that to
> compressed row when apply() is called.
ok, sounds good.
Once we've settled on the assembly benchmark, I'd like to run it on
DOLFIN 0.8.0 for reference, so if you modify what we have now we need
to do some backporting.
--
Anders
Attachment:
signature.asc
Description: Digital signature
References