dolfin team mailing list archive

Thread
Date
Re: assemble of Matrix with Real spaces slow

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: Johan Hake <johan.hake@xxxxxxxxx>
Date: Fri, 4 Mar 2011 11:29:00 -0800
Cc: dolfin@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D71202E.5090203@cam.ac.uk>
Reply-to: johan.hake@xxxxxxxxx
User-agent: KMail/1.13.5 (Linux/2.6.35-27-generic; KDE/4.6.0; x86_64; ; )
On Friday March 4 2011 09:23:58 Garth N. Wells wrote:
> On 04/03/11 17:11, Johan Hake wrote:
> > On Friday March 4 2011 08:48:14 Garth N. Wells wrote:
> >> On 04/03/11 16:38, Johan Hake wrote:
> >>> On Friday March 4 2011 03:29:32 Garth N. Wells wrote:
> >>>> On 03/03/11 19:48, Johan Hake wrote:
> >>>>> On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote:
> >>>>>> On 03/03/2011 08:03 PM, Johan Hake wrote:
> >>>>>>> Hello!
> >>>>>>> 
> >>>>>>> I am using Mixed spaces with Reals quite alot. It turnes out that
> >>>>>>> assemble forms with functions from MixedFunctionSpaces containing
> >>>>>>> Real spaces are dead slow. The time spent also increase with the
> >>>>>>> number of included Real spaces, even if none of them are included
> >>>>>>> in form which is assembled.
> >>>>>>> 
> >>>>>>> The attached test script illustrates this.
> >>>>>> 
> >>>>>> By replacing "CG", 1 by "R", 0 or?
> >>>>> 
> >>>>> OMG!! Yes, *flush*
> >>>>> 
> >>>>> That explains the memory usage :P
> >>>>> 
> >>>>>>> The test script also reviels that an unproportial time is spent in
> >>>>>>> FFC generating the code. This time also increase with the number of
> >>>>>>> Real spaces included. Turning of FErari helped a bit with this
> >>>>>>> point.
> >>>>>> 
> >>>>>> I can take a look on the FFC side, but not today.
> >>>>> 
> >>>>> Nice!
> >>>>> 
> >>>>> With the update correction from Marie the numbers now looks like:
> >>>>> 
> >>>>> With PETSc backend
> >>>>> 
> >>>>> Tensor without Mixed space  |       0.11211     0.11211     1
> >>>>> With 1 global dofs          |        1.9482      1.9482     1
> >>>>> With 2 global dofs          |        2.8725      2.8725     1
> >>>>> With 4 global dofs          |        5.1959      5.1959     1
> >>>>> With 8 global dofs          |        10.524      10.524     1
> >>>>> With 16 global dofs         |        25.574      25.574     1
> >>>>> 
> >>>>> With Epetra backend
> >>>>> 
> >>>>> Tensor without Mixed space  |       0.87544     0.87544     1
> >>>>> With 1 global dofs          |        1.7089      1.7089     1
> >>>>> With 2 global dofs          |        2.6868      2.6868     1
> >>>>> With 4 global dofs          |          4.28        4.28     1
> >>>>> With 8 global dofs          |         8.123       8.123     1
> >>>>> With 16 global dofs         |        17.394      17.394     1
> >>>>> 
> >>>>> Still a pretty big increase in time for just adding 16 scalar dofs to
> >>>>> a system of 274625 dofs in teh first place.
> >>>> 
> >>>> I have seen this big slow down for large problems. The first issue,
> >>>> which was the computation of the sparsity pattern, has been 'resolved'
> >>>> by using boost::unordered_set. This comes at the expense of the a
> >>>> small slow down for regular problems.
> >>>> 
> >>>> I also noticed that Epetra performs much better for these problems
> >>>> than PETSc does. We need to check the matrix initialisation, but it
> >>>> could ultimately be a limitation of the backends. Each matrix row
> >>>> corresponding to a global dof is full, and it may be that backends
> >>>> designed for large sparse matrices do not handle this well.
> >>> 
> >>> How could inserting into the matrix be the bottle neck. In the test
> >>> script I attached I do not assemble any global dofs.
> >> 
> >> I think that you've find that it is. It will be assembling zeroes in the
> >> global dof positions.
> > 
> > I guess you are right. Is the sparsity pattern and also the tabulated
> > tensor based only on the MixedSpace formulation, and not on the actual
> > integral?
> 
> The sparsity pattern is based on the dof map, which depends on the
> function spaces.
> 
> > Is this a bug or feature?
> 
> I would say just the natural approach. There is/was a Blueprint to avoid
> computing and assembling the zeroes in problems like Stokes, but I'm not
> sure that this would be worthwhile, since it would involve assembling
> non-matrix values, and most backends want to assemble dense local
> matrices into sparse global matrices.

Make sense. I guess the bottle neck is then the insertion into the global 
matrix, where your suggested approach might improve the performance.

Sounds like a pretty hard fix though...

Johan

> Garth
> 
> >>>> The best approach is probably to add the entire row at once for global
> >>>> dofs. This would require a modified assembler.
> >>>> 
> >>>> There is a UFC Blueprint to identify global dofs:
> >>>>     https://blueprints.launchpad.net/ufc/+spec/global-dofs
> >>>> 
> >>>> If we can identify global dofs, we have a better chance of dealing
> >>>> with the problem properly. This includes running in parallel with
> >>>> global dofs.
> >>> 
> >>> Do you envision to have integrals over global dofs to get separated
> >>> into its own tabulate tensor function? Then in DOLFIN we can assemble
> >>> the whole row/column in one loop and insert it into the Matrix in one
> >>> go.
> >> 
> >> No - I had in mind possibly adding only cell-based dofs to the matrix,
> >> and adding the global rows into a global vector, then is that inserted
> >> at the end (as one row) into a matrix. I'm not advocating at this stage
> >> a change to the tabulate_foo interface.
> > 
> > Ok, but you need its own tabulate_tensor function for the global dofs?
> > 
> >>> Do you think we also need to recognise global dofs in UFL to properly
> >>> flesh out these integrals?
> >> 
> >> Yes. For one to handle them properly in parallel since global dofs to
> >> not reside at mesh entity, and the domains are parallelised mesh-wise.
> > 
> > Ok
> > 
> > Johan
> > 
> >> Garth
> >> 
> >>> Johan
> >>> 
> >>>> Garth
> >>>> 
> >>>>> Johan
> >>>>> 
> >>>>>> --
> >>>>>> Marie
> >>>>>> 
> >>>>>>> I have not profiled any of this, but I just throw it out there. I
> >>>>>>> do not recognize any difference between for example Epetra or
> >>>>>>> PETSc backend as suggested in the fixed bug for building of
> >>>>>>> sparsity pattern with global dofs.
> >>>>>>> 
> >>>>>>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it
> >>>>>>> yet.
> >>>>>>> 
> >>>>>>> Output from summary:
> >>>>>>>    Tensor without Mixed space  |       0.11401     0.11401     1
> >>>>>>>    With 1 global dofs          |       0.40725     0.40725     1
> >>>>>>>    With 2 global dofs          |       0.94694     0.94694     1
> >>>>>>>    With 4 global dofs          |         2.763       2.763     1
> >>>>>>>    With 8 global dofs          |        9.6149      9.6149     1
> >>>>>>> 
> >>>>>>> Also the amount of memory used to build the sparsity patter seams
> >>>>>>> to double for each step. The memory print for a 32x32x32 unit cube
> >>>>>>> with 16 global dofs was 1.6 GB memory(!?).
> >>>>>>> 
> >>>>>>> Johan
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> _______________________________________________
> >>>>>>> Mailing list: https://launchpad.net/~dolfin
> >>>>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>>>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>>>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> Mailing list: https://launchpad.net/~dolfin
> >>>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>> 
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> Mailing list: https://launchpad.net/~dolfin
> >>>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>>>> More help   : https://help.launchpad.net/ListHelp
> >>>> 
> >>>> _______________________________________________
> >>>> Mailing list: https://launchpad.net/~dolfin
> >>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>> More help   : https://help.launchpad.net/ListHelp
References

assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-03
Re: assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-04
Re: assemble of Matrix with Real spaces slow
From: Garth N. Wells, 2011-03-04