dolfin team mailing list archive

Thread
Date

Re: Assembly benchmark

To: dolfin-dev@xxxxxxxxxx
From: "Matthew Knepley" <knepley@xxxxxxxxx>
Date: Mon, 21 Jul 2008 17:05:05 -0500
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20080721214816.GQ4531@simula.no>

On Mon, Jul 21, 2008 at 4:48 PM, Anders Logg <logg@xxxxxxxxx> wrote:
> On Mon, Jul 21, 2008 at 04:37:28PM -0500, Matthew Knepley wrote:
>> On Mon, Jul 21, 2008 at 4:35 PM, Anders Logg <logg@xxxxxxxxx> wrote:
>> > On Mon, Jul 21, 2008 at 04:03:11PM -0500, Matthew Knepley wrote:
>> >> On Mon, Jul 21, 2008 at 3:55 PM, Matthew Knepley <knepley@xxxxxxxxx> wrote:
>> >> > On Mon, Jul 21, 2008 at 3:50 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>> >> >>
>> >> >>
>> >> >> Anders Logg wrote:
>> >> >>> On Mon, Jul 21, 2008 at 01:48:23PM +0100, Garth N. Wells wrote:
>> >> >>>>
>> >> >>>> Anders Logg wrote:
>> >> >>>>> I have updated the assembly benchmark to include also MTL4, see
>> >> >>>>>
>> >> >>>>>    bench/fem/assembly/
>> >> >>>>>
>> >> >>>>> Here are the current results:
>> >> >>>>>
>> >> >>>>> Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3  THStokes2D  NSEMomentum3D  StabStokes2D
>> >> >>>>> -------------------------------------------------------------------------------------------------------------
>> >> >>>>> uBLAS               |        9.0789    0.45645     3.8042     8.0736  14.937         9.2507        3.8455
>> >> >>>>> PETSc               |        7.7758    0.42798     3.5483     7.3898  13.945         8.1632         3.258
>> >> >>>>> Epetra              |        8.9516    0.45448     3.7976     8.0679  15.404         9.2341        3.8332
>> >> >>>>> MTL4                |        8.9729    0.45554     3.7966     8.0759  14.94          9.2568        3.8658
>> >> >>>>> Assembly            |         7.474    0.43673     3.7341     8.3793  14.633         7.6695        3.3878
>> >> >>>>>
>> >> >>
>> >> >>
>> >> >> I specified in MTL4Matrix maximum 30 nonzeroes per row, and the results
>> >> >> change quite a bit,
>> >> >>
>> >> >>  Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3
>> >> >> THStokes2D  NSEMomentum3D  StabStokes2D
>> >> >>
>> >> >> -------------------------------------------------------------------------------------------------------------
>> >> >>  uBLAS               |        7.1881    0.32748     2.7633     5.8311
>> >> >>     10.968         7.0735        2.8184
>> >> >>  PETSc               |        5.7868    0.30673     2.5489     5.2344
>> >> >>     9.8896          6.069        2.3661
>> >> >>  MTL4                |        2.8641    0.18339     1.6628     2.6811
>> >> >>     2.8519         3.4843       0.85029
>> >> >>  Assembly            |        5.5564    0.30896     2.6858     5.9675
>> >> >>     10.622         5.7144        2.4519
>> >> >>
>> >> >>
>> >> >> MTL4 is a lot faster in all cases.
>> >>
>> >> Okay, if you run KSP ex2 (Poisson 2D) and add a logging stage that
>> >> times assembly (I checked it in to petsc-dev)
>> >> then 1M unknowns takes about 1s
>> >>
>> >>   Matrix Object:
>> >>     type=seqaij, rows=1000000, cols=1000000
>> >>     total: nonzeros=4996000, allocated nonzeros=5000000
>> >>       not using I-node routines
>> >> Summary of Stages:   ----- Time ------  ----- Flops -----  ---
>> >> Messages ---  -- Message Lengths --  -- Reductions --
>> >>                         Avg     %Total     Avg     %Total   counts
>> >> %Total     Avg         %Total   counts   %Total
>> >>  0:      Main Stage: 1.4997e+00  56.3%  3.8891e+08 100.0%  0.000e+00
>> >> 0.0%  0.000e+00        0.0%  2.200e+01  51.2%
>> >>  1:        Assembly: 1.1648e+00  43.7%  0.0000e+00   0.0%  0.000e+00
>> >> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> >>
>> >> I just cut the solve off. Thus all thos enumber are extemely fishy.
>> >>
>> >>   Matt
>> >
>> > We shouldn't trust those numbers just yet. Some of it may be Python
>> > overhead (calling the FFC JIT compiler etc).
>> >
>> > Does 1M unknowns mean a unit square divided into 2x1000x1000 right
>> > triangles?
>>
>> Its FD Poisson, which gives the same sparsity and values as P1 Poisson, so
>> its a 1000x1000 quadrilateral grid. This was just to time insertion.
>>
>>   Matt
>
> But this is a different problem. Since you know the sparsity pattern a
> priori, you may be able to (i) not compute the sparsity pattern, (ii)

No, we only allocate correctly here.

> compute the entries more efficiently, (iii) not compute the
> local-to-global mapping, and (iv) insert the entries more efficiently.

Insertion is the same and we compute the same mapping we always use.
I think you guys overcompute for the l2g.

  Matt

> Our timings include all these steps + Python overhead. I'm going to
> rewrite it in C++ so we can eliminate that source of uncertainty.
>
> --
> Anders
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFIhQQgTuwUCDsYZdERAnUzAJ93hfI/Psx6IccOdOr3GhbODAdFgACdFAj9
> Mc0MiBbB+aiTEMXOajyrnog=
> =oLL0
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> DOLFIN-dev mailing list
> DOLFIN-dev@xxxxxxxxxx
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

Follow ups

Re: Assembly benchmark
From: Garth N. Wells, 2008-07-22

References

Assembly benchmark
From: Anders Logg, 2008-07-21
Re: Assembly benchmark
From: Garth N. Wells, 2008-07-21
Re: Assembly benchmark
From: Anders Logg, 2008-07-21
Re: Assembly benchmark
From: Garth N. Wells, 2008-07-21
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
Re: Assembly benchmark
From: Anders Logg, 2008-07-21
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
Re: Assembly benchmark
From: Anders Logg, 2008-07-21