← Back to team overview

dolfin team mailing list archive

Re: assemble of Matrix with Real spaces slow

 

On Friday March 4 2011 03:29:32 Garth N. Wells wrote:
> On 03/03/11 19:48, Johan Hake wrote:
> > On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote:
> >> On 03/03/2011 08:03 PM, Johan Hake wrote:
> >>> Hello!
> >>> 
> >>> I am using Mixed spaces with Reals quite alot. It turnes out that
> >>> assemble forms with functions from MixedFunctionSpaces containing Real
> >>> spaces are dead slow. The time spent also increase with the number of
> >>> included Real spaces, even if none of them are included in form which
> >>> is assembled.
> >>> 
> >>> The attached test script illustrates this.
> >> 
> >> By replacing "CG", 1 by "R", 0 or?
> > 
> > OMG!! Yes, *flush*
> > 
> > That explains the memory usage :P
> > 
> >>> The test script also reviels that an unproportial time is spent in FFC
> >>> generating the code. This time also increase with the number of Real
> >>> spaces included. Turning of FErari helped a bit with this point.
> >> 
> >> I can take a look on the FFC side, but not today.
> > 
> > Nice!
> > 
> > With the update correction from Marie the numbers now looks like:
> > 
> > With PETSc backend
> > 
> > Tensor without Mixed space  |       0.11211     0.11211     1
> > With 1 global dofs          |        1.9482      1.9482     1
> > With 2 global dofs          |        2.8725      2.8725     1
> > With 4 global dofs          |        5.1959      5.1959     1
> > With 8 global dofs          |        10.524      10.524     1
> > With 16 global dofs         |        25.574      25.574     1
> > 
> > With Epetra backend
> > 
> > Tensor without Mixed space  |       0.87544     0.87544     1
> > With 1 global dofs          |        1.7089      1.7089     1
> > With 2 global dofs          |        2.6868      2.6868     1
> > With 4 global dofs          |          4.28        4.28     1
> > With 8 global dofs          |         8.123       8.123     1
> > With 16 global dofs         |        17.394      17.394     1
> > 
> > Still a pretty big increase in time for just adding 16 scalar dofs to a
> > system of 274625 dofs in teh first place.
> 
> I have seen this big slow down for large problems. The first issue,
> which was the computation of the sparsity pattern, has been 'resolved'
> by using boost::unordered_set. This comes at the expense of the a small
> slow down for regular problems.
> 
> I also noticed that Epetra performs much better for these problems than
> PETSc does. We need to check the matrix initialisation, but it could
> ultimately be a limitation of the backends. Each matrix row
> corresponding to a global dof is full, and it may be that backends
> designed for large sparse matrices do not handle this well.

How could inserting into the matrix be the bottle neck. In the test script I 
attached I do not assemble any global dofs.
 
> The best approach is probably to add the entire row at once for global
> dofs. This would require a modified assembler.
> 
> There is a UFC Blueprint to identify global dofs:
> 
>     https://blueprints.launchpad.net/ufc/+spec/global-dofs
> 
> If we can identify global dofs, we have a better chance of dealing with
> the problem properly. This includes running in parallel with global dofs.

Do you envision to have integrals over global dofs to get separated into its 
own tabulate tensor function? Then in DOLFIN we can assemble the whole 
row/column in one loop and insert it into the Matrix in one go.

Do you think we also need to recognise global dofs in UFL to properly flesh 
out these integrals?

Johan

> Garth
> 
> > Johan
> > 
> >> --
> >> Marie
> >> 
> >>> I have not profiled any of this, but I just throw it out there. I do
> >>> not recognize any difference between for example Epetra or PETSc
> >>> backend as suggested in the fixed bug for building of sparsity pattern
> >>> with global dofs.
> >>> 
> >>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it yet.
> >>> 
> >>> Output from summary:
> >>>    Tensor without Mixed space  |       0.11401     0.11401     1
> >>>    With 1 global dofs          |       0.40725     0.40725     1
> >>>    With 2 global dofs          |       0.94694     0.94694     1
> >>>    With 4 global dofs          |         2.763       2.763     1
> >>>    With 8 global dofs          |        9.6149      9.6149     1
> >>> 
> >>> Also the amount of memory used to build the sparsity patter seams to
> >>> double for each step. The memory print for a 32x32x32 unit cube with 16
> >>> global dofs was 1.6 GB memory(!?).
> >>> 
> >>> Johan
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~dolfin
> >>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>> Unsubscribe : https://launchpad.net/~dolfin
> >>> More help   : https://help.launchpad.net/ListHelp
> >> 
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dolfin
> >> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dolfin
> >> More help   : https://help.launchpad.net/ListHelp
> >> 
> >> 
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dolfin
> >> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dolfin
> >> More help   : https://help.launchpad.net/ListHelp
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~dolfin
> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dolfin
> More help   : https://help.launchpad.net/ListHelp



Follow ups

References