dolfin team mailing list archive

Thread
Date

Re: Assembly benchmark

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: "Matthew Knepley" <knepley@xxxxxxxxx>
Date: Mon, 21 Jul 2008 16:03:11 -0500
Cc: dolfin-dev@xxxxxxxxxx
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <a9f269830807211355o45dc2ce3iaf7eeb802f5fa892@mail.gmail.com>

On Mon, Jul 21, 2008 at 3:55 PM, Matthew Knepley <knepley@xxxxxxxxx> wrote:
> On Mon, Jul 21, 2008 at 3:50 PM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>>
>>
>> Anders Logg wrote:
>>> On Mon, Jul 21, 2008 at 01:48:23PM +0100, Garth N. Wells wrote:
>>>>
>>>> Anders Logg wrote:
>>>>> I have updated the assembly benchmark to include also MTL4, see
>>>>>
>>>>>    bench/fem/assembly/
>>>>>
>>>>> Here are the current results:
>>>>>
>>>>> Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3  THStokes2D  NSEMomentum3D  StabStokes2D
>>>>> -------------------------------------------------------------------------------------------------------------
>>>>> uBLAS               |        9.0789    0.45645     3.8042     8.0736  14.937         9.2507        3.8455
>>>>> PETSc               |        7.7758    0.42798     3.5483     7.3898  13.945         8.1632         3.258
>>>>> Epetra              |        8.9516    0.45448     3.7976     8.0679  15.404         9.2341        3.8332
>>>>> MTL4                |        8.9729    0.45554     3.7966     8.0759  14.94          9.2568        3.8658
>>>>> Assembly            |         7.474    0.43673     3.7341     8.3793  14.633         7.6695        3.3878
>>>>>
>>
>>
>> I specified in MTL4Matrix maximum 30 nonzeroes per row, and the results
>> change quite a bit,
>>
>>  Assembly benchmark  |  Elasticity3D  PoissonP1  PoissonP2  PoissonP3
>> THStokes2D  NSEMomentum3D  StabStokes2D
>>
>> -------------------------------------------------------------------------------------------------------------
>>  uBLAS               |        7.1881    0.32748     2.7633     5.8311
>>     10.968         7.0735        2.8184
>>  PETSc               |        5.7868    0.30673     2.5489     5.2344
>>     9.8896          6.069        2.3661
>>  MTL4                |        2.8641    0.18339     1.6628     2.6811
>>     2.8519         3.4843       0.85029
>>  Assembly            |        5.5564    0.30896     2.6858     5.9675
>>     10.622         5.7144        2.4519
>>
>>
>> MTL4 is a lot faster in all cases.

Okay, if you run KSP ex2 (Poisson 2D) and add a logging stage that
times assembly (I checked it in to petsc-dev)
then 1M unknowns takes about 1s

  Matrix Object:
    type=seqaij, rows=1000000, cols=1000000
    total: nonzeros=4996000, allocated nonzeros=5000000
      not using I-node routines
Summary of Stages:   ----- Time ------  ----- Flops -----  ---
Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.4997e+00  56.3%  3.8891e+08 100.0%  0.000e+00
0.0%  0.000e+00        0.0%  2.200e+01  51.2%
 1:        Assembly: 1.1648e+00  43.7%  0.0000e+00   0.0%  0.000e+00
0.0%  0.000e+00        0.0%  0.000e+00   0.0%

I just cut the solve off. Thus all thos enumber are extemely fishy.

  Matt

> Now I don't believe the numbers. If you preallocate, we do not do any
> extra processing
> outside of sorting the column indices (which every format must do for
> efficient operations).
> Thus, how would you save any time? If these are all in seconds, I will
> run a 2D Poisson here
> and tell you what I get. It would help to specify sizes with this benchmark :)
>
>  Matt
>
>> Garth
>>
>>
>>
>>>> How was the MTL4 matrix intialised? I don't know if it does anything
>>>> with the sparsity pattern yet. I've been intialising MTL4 matrices by
>>>> hand so far with a guess as to the max number of nonzeroes per row.
>>>> Without setting this, the performance is near idenetical to uBLAS. When
>>>> it is set, I observe at least a factor two speed up.
>>>>
>>>> Garth
>>>
>>> The same way as all other backends, which is by a precomputed
>>> sparsity pattern. It looks like this is currently ignored in the
>>> MTL4Matrix implementation:
>>>
>>> void MTL4Matrix::init(const GenericSparsityPattern& sparsity_pattern)
>>> {
>>>   init(sparsity_pattern.size(0), sparsity_pattern.size(1));
>>> }
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> DOLFIN-dev mailing list
>>> DOLFIN-dev@xxxxxxxxxx
>>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>>
>>
>> _______________________________________________
>> DOLFIN-dev mailing list
>> DOLFIN-dev@xxxxxxxxxx
>> http://www.fenics.org/mailman/listinfo/dolfin-dev
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

Follow ups

Re: Assembly benchmark
From: Anders Logg, 2008-07-21

References

Assembly benchmark
From: Anders Logg, 2008-07-21
Re: Assembly benchmark
From: Garth N. Wells, 2008-07-21
Re: Assembly benchmark
From: Anders Logg, 2008-07-21
Re: Assembly benchmark
From: Garth N. Wells, 2008-07-21
Re: Assembly benchmark
From: Matthew Knepley, 2008-07-21