dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #08810
Re: Assembly benchmark
On Tue, Jul 22, 2008 at 3:30 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>
>
> Matthew Knepley wrote:
>>
>> On Mon, Jul 21, 2008 at 4:48 PM, Anders Logg <logg@xxxxxxxxx> wrote:
>>>
>>> On Mon, Jul 21, 2008 at 04:37:28PM -0500, Matthew Knepley wrote:
>>>>
>>>> On Mon, Jul 21, 2008 at 4:35 PM, Anders Logg <logg@xxxxxxxxx> wrote:
>>>>>
>>>>> On Mon, Jul 21, 2008 at 04:03:11PM -0500, Matthew Knepley wrote:
>>>>>>
>>>>>> On Mon, Jul 21, 2008 at 3:55 PM, Matthew Knepley <knepley@xxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Jul 21, 2008 at 3:50 PM, Garth N. Wells <gnw20@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Anders Logg wrote:
>>>>>>>>>
>>>>>>>>> On Mon, Jul 21, 2008 at 01:48:23PM +0100, Garth N. Wells wrote:
>>>>>>>>>>
>>>>>>>>>> Anders Logg wrote:
>>>>>>>>>>>
>>>>>>>>>>> I have updated the assembly benchmark to include also MTL4, see
>>>>>>>>>>>
>>>>>>>>>>> bench/fem/assembly/
>>>>>>>>>>>
>>>>>>>>>>> Here are the current results:
>>>>>>>>>>>
>>>>>>>>>>> Assembly benchmark | Elasticity3D PoissonP1 PoissonP2
>>>>>>>>>>> PoissonP3 THStokes2D NSEMomentum3D StabStokes2D
>>>>>>>>>>>
>>>>>>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>>>>>> uBLAS | 9.0789 0.45645 3.8042
>>>>>>>>>>> 8.0736 14.937 9.2507 3.8455
>>>>>>>>>>> PETSc | 7.7758 0.42798 3.5483
>>>>>>>>>>> 7.3898 13.945 8.1632 3.258
>>>>>>>>>>> Epetra | 8.9516 0.45448 3.7976
>>>>>>>>>>> 8.0679 15.404 9.2341 3.8332
>>>>>>>>>>> MTL4 | 8.9729 0.45554 3.7966
>>>>>>>>>>> 8.0759 14.94 9.2568 3.8658
>>>>>>>>>>> Assembly | 7.474 0.43673 3.7341
>>>>>>>>>>> 8.3793 14.633 7.6695 3.3878
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> I specified in MTL4Matrix maximum 30 nonzeroes per row, and the
>>>>>>>> results
>>>>>>>> change quite a bit,
>>>>>>>>
>>>>>>>> Assembly benchmark | Elasticity3D PoissonP1 PoissonP2
>>>>>>>> PoissonP3
>>>>>>>> THStokes2D NSEMomentum3D StabStokes2D
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>>> uBLAS | 7.1881 0.32748 2.7633
>>>>>>>> 5.8311
>>>>>>>> 10.968 7.0735 2.8184
>>>>>>>> PETSc | 5.7868 0.30673 2.5489
>>>>>>>> 5.2344
>>>>>>>> 9.8896 6.069 2.3661
>>>>>>>> MTL4 | 2.8641 0.18339 1.6628
>>>>>>>> 2.6811
>>>>>>>> 2.8519 3.4843 0.85029
>>>>>>>> Assembly | 5.5564 0.30896 2.6858
>>>>>>>> 5.9675
>>>>>>>> 10.622 5.7144 2.4519
>>>>>>>>
>>>>>>>>
>>>>>>>> MTL4 is a lot faster in all cases.
>>>>>>
>>>>>> Okay, if you run KSP ex2 (Poisson 2D) and add a logging stage that
>>>>>> times assembly (I checked it in to petsc-dev)
>>>>>> then 1M unknowns takes about 1s
>>>>>>
>>>>>> Matrix Object:
>>>>>> type=seqaij, rows=1000000, cols=1000000
>>>>>> total: nonzeros=4996000, allocated nonzeros=5000000
>>>>>> not using I-node routines
>>>>>> Summary of Stages: ----- Time ------ ----- Flops ----- ---
>>>>>> Messages --- -- Message Lengths -- -- Reductions --
>>>>>> Avg %Total Avg %Total counts
>>>>>> %Total Avg %Total counts %Total
>>>>>> 0: Main Stage: 1.4997e+00 56.3% 3.8891e+08 100.0% 0.000e+00
>>>>>> 0.0% 0.000e+00 0.0% 2.200e+01 51.2%
>>>>>> 1: Assembly: 1.1648e+00 43.7% 0.0000e+00 0.0% 0.000e+00
>>>>>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>>>>>
>>>>>> I just cut the solve off. Thus all thos enumber are extemely fishy.
>>>>>>
>>>>>> Matt
>>>>>
>>>>> We shouldn't trust those numbers just yet. Some of it may be Python
>>>>> overhead (calling the FFC JIT compiler etc).
>>>>>
>>>>> Does 1M unknowns mean a unit square divided into 2x1000x1000 right
>>>>> triangles?
>>>>
>>>> Its FD Poisson, which gives the same sparsity and values as P1 Poisson,
>>>> so
>>>> its a 1000x1000 quadrilateral grid. This was just to time insertion.
>>>>
>>>> Matt
>>>
>>> But this is a different problem. Since you know the sparsity pattern a
>>> priori, you may be able to (i) not compute the sparsity pattern, (ii)
>>
>> No, we only allocate correctly here.
>>
>
> Matt,
>
> Is there much of a performance difference with MatSeqAIJSetPreallocation
> between setting the maximum number of non-zeroes per row (PetscInt nz), and
> setting the number of non-zeroes for each row (PetscInt nnz[]) when the
> number of non-zeroes per row doesn't differ greatly?
There should be no difference at all.
Matt
> Garth
>
>
>>> compute the entries more efficiently, (iii) not compute the
>>> local-to-global mapping, and (iv) insert the entries more efficiently.
>>
>> Insertion is the same and we compute the same mapping we always use.
>> I think you guys overcompute for the l2g.
>>
>> Matt
>>
>>> Our timings include all these steps + Python overhead. I'm going to
>>> rewrite it in C++ so we can eliminate that source of uncertainty.
>>>
>>> --
>>> Anders
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.6 (GNU/Linux)
>>>
>>> iD8DBQFIhQQgTuwUCDsYZdERAnUzAJ93hfI/Psx6IccOdOr3GhbODAdFgACdFAj9
>>> Mc0MiBbB+aiTEMXOajyrnog=
>>> =oLL0
>>> -----END PGP SIGNATURE-----
>>>
>>> _______________________________________________
>>> DOLFIN-dev mailing list
>>> DOLFIN-dev@xxxxxxxxxx
>>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>>>
>>>
>>
>>
>>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener
References
-
Assembly benchmark
From: Anders Logg, 2008-07-21
-
Re: Assembly benchmark
From: Anders Logg, 2008-07-21
-
Re: Assembly benchmark
From: Garth N. Wells, 2008-07-21
-
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
-
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
-
Re: Assembly benchmark
From: Anders Logg, 2008-07-21
-
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
-
Re: Assembly benchmark
From: Anders Logg, 2008-07-21
-
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21
-
Re: Assembly benchmark
From: Garth N. Wells, 2008-07-22