dolfin team mailing list archive

Thread
Date
Re: assemble of Matrix with Real spaces slow

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: Johan Hake <johan.hake@xxxxxxxxx>
Date: Fri, 4 Mar 2011 09:11:19 -0800
Cc: dolfin@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D7117CE.9000406@cam.ac.uk>
Reply-to: johan.hake@xxxxxxxxx
User-agent: KMail/1.13.5 (Linux/2.6.35-27-generic; KDE/4.6.0; x86_64; ; )
On Friday March 4 2011 08:48:14 Garth N. Wells wrote:
> On 04/03/11 16:38, Johan Hake wrote:
> > On Friday March 4 2011 03:29:32 Garth N. Wells wrote:
> >> On 03/03/11 19:48, Johan Hake wrote:
> >>> On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote:
> >>>> On 03/03/2011 08:03 PM, Johan Hake wrote:
> >>>>> Hello!
> >>>>> 
> >>>>> I am using Mixed spaces with Reals quite alot. It turnes out that
> >>>>> assemble forms with functions from MixedFunctionSpaces containing
> >>>>> Real spaces are dead slow. The time spent also increase with the
> >>>>> number of included Real spaces, even if none of them are included in
> >>>>> form which is assembled.
> >>>>> 
> >>>>> The attached test script illustrates this.
> >>>> 
> >>>> By replacing "CG", 1 by "R", 0 or?
> >>> 
> >>> OMG!! Yes, *flush*
> >>> 
> >>> That explains the memory usage :P
> >>> 
> >>>>> The test script also reviels that an unproportial time is spent in
> >>>>> FFC generating the code. This time also increase with the number of
> >>>>> Real spaces included. Turning of FErari helped a bit with this
> >>>>> point.
> >>>> 
> >>>> I can take a look on the FFC side, but not today.
> >>> 
> >>> Nice!
> >>> 
> >>> With the update correction from Marie the numbers now looks like:
> >>> 
> >>> With PETSc backend
> >>> 
> >>> Tensor without Mixed space  |       0.11211     0.11211     1
> >>> With 1 global dofs          |        1.9482      1.9482     1
> >>> With 2 global dofs          |        2.8725      2.8725     1
> >>> With 4 global dofs          |        5.1959      5.1959     1
> >>> With 8 global dofs          |        10.524      10.524     1
> >>> With 16 global dofs         |        25.574      25.574     1
> >>> 
> >>> With Epetra backend
> >>> 
> >>> Tensor without Mixed space  |       0.87544     0.87544     1
> >>> With 1 global dofs          |        1.7089      1.7089     1
> >>> With 2 global dofs          |        2.6868      2.6868     1
> >>> With 4 global dofs          |          4.28        4.28     1
> >>> With 8 global dofs          |         8.123       8.123     1
> >>> With 16 global dofs         |        17.394      17.394     1
> >>> 
> >>> Still a pretty big increase in time for just adding 16 scalar dofs to a
> >>> system of 274625 dofs in teh first place.
> >> 
> >> I have seen this big slow down for large problems. The first issue,
> >> which was the computation of the sparsity pattern, has been 'resolved'
> >> by using boost::unordered_set. This comes at the expense of the a small
> >> slow down for regular problems.
> >> 
> >> I also noticed that Epetra performs much better for these problems than
> >> PETSc does. We need to check the matrix initialisation, but it could
> >> ultimately be a limitation of the backends. Each matrix row
> >> corresponding to a global dof is full, and it may be that backends
> >> designed for large sparse matrices do not handle this well.
> > 
> > How could inserting into the matrix be the bottle neck. In the test
> > script I attached I do not assemble any global dofs.
> 
> I think that you've find that it is. It will be assembling zeroes in the
> global dof positions.

I guess you are right. Is the sparsity pattern and also the tabulated tensor 
based only on the MixedSpace formulation, and not on the actual integral?

Is this a bug or feature?

> >> The best approach is probably to add the entire row at once for global
> >> dofs. This would require a modified assembler.
> >> 
> >> There is a UFC Blueprint to identify global dofs:
> >>     https://blueprints.launchpad.net/ufc/+spec/global-dofs
> >> 
> >> If we can identify global dofs, we have a better chance of dealing with
> >> the problem properly. This includes running in parallel with global
> >> dofs.
> > 
> > Do you envision to have integrals over global dofs to get separated into
> > its own tabulate tensor function? Then in DOLFIN we can assemble the
> > whole row/column in one loop and insert it into the Matrix in one go.
> 
> No - I had in mind possibly adding only cell-based dofs to the matrix,
> and adding the global rows into a global vector, then is that inserted
> at the end (as one row) into a matrix. I'm not advocating at this stage
> a change to the tabulate_foo interface.

Ok, but you need its own tabulate_tensor function for the global dofs?

> > Do you think we also need to recognise global dofs in UFL to properly
> > flesh out these integrals?
> 
> Yes. For one to handle them properly in parallel since global dofs to
> not reside at mesh entity, and the domains are parallelised mesh-wise.

Ok

Johan

> Garth
> 
> > Johan
> > 
> >> Garth
> >> 
> >>> Johan
> >>> 
> >>>> --
> >>>> Marie
> >>>> 
> >>>>> I have not profiled any of this, but I just throw it out there. I do
> >>>>> not recognize any difference between for example Epetra or PETSc
> >>>>> backend as suggested in the fixed bug for building of sparsity
> >>>>> pattern with global dofs.
> >>>>> 
> >>>>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it yet.
> >>>>> 
> >>>>> Output from summary:
> >>>>>    Tensor without Mixed space  |       0.11401     0.11401     1
> >>>>>    With 1 global dofs          |       0.40725     0.40725     1
> >>>>>    With 2 global dofs          |       0.94694     0.94694     1
> >>>>>    With 4 global dofs          |         2.763       2.763     1
> >>>>>    With 8 global dofs          |        9.6149      9.6149     1
> >>>>> 
> >>>>> Also the amount of memory used to build the sparsity patter seams to
> >>>>> double for each step. The memory print for a 32x32x32 unit cube with
> >>>>> 16 global dofs was 1.6 GB memory(!?).
> >>>>> 
> >>>>> Johan
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> _______________________________________________
> >>>>> Mailing list: https://launchpad.net/~dolfin
> >>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>>> More help   : https://help.launchpad.net/ListHelp
> >>>> 
> >>>> _______________________________________________
> >>>> Mailing list: https://launchpad.net/~dolfin
> >>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>> More help   : https://help.launchpad.net/ListHelp
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> Mailing list: https://launchpad.net/~dolfin
> >>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >>>> Unsubscribe : https://launchpad.net/~dolfin
> >>>> More help   : https://help.launchpad.net/ListHelp
> >> 
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dolfin
> >> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dolfin
> >> More help   : https://help.launchpad.net/ListHelp
Follow ups

Re: assemble of Matrix with Real spaces slow
From: Garth N. Wells, 2011-03-04
References

assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-03
Re: assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-04
Re: assemble of Matrix with Real spaces slow
From: Garth N. Wells, 2011-03-04