dolfin team mailing list archive

Thread
Date
Re: assemble of Matrix with Real spaces slow

To: johan.hake@xxxxxxxxx
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Fri, 04 Mar 2011 17:23:58 +0000
Cc: dolfin@xxxxxxxxxxxxxxxxxxx
In-reply-to: <201103040911.19750.johan.hake@gmail.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8

On 04/03/11 17:11, Johan Hake wrote:
> On Friday March 4 2011 08:48:14 Garth N. Wells wrote:
>> On 04/03/11 16:38, Johan Hake wrote:
>>> On Friday March 4 2011 03:29:32 Garth N. Wells wrote:
>>>> On 03/03/11 19:48, Johan Hake wrote:
>>>>> On Thursday March 3 2011 11:20:03 Marie E. Rognes wrote:
>>>>>> On 03/03/2011 08:03 PM, Johan Hake wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> I am using Mixed spaces with Reals quite alot. It turnes out that
>>>>>>> assemble forms with functions from MixedFunctionSpaces containing
>>>>>>> Real spaces are dead slow. The time spent also increase with the
>>>>>>> number of included Real spaces, even if none of them are included in
>>>>>>> form which is assembled.
>>>>>>>
>>>>>>> The attached test script illustrates this.
>>>>>>
>>>>>> By replacing "CG", 1 by "R", 0 or?
>>>>>
>>>>> OMG!! Yes, *flush*
>>>>>
>>>>> That explains the memory usage :P
>>>>>
>>>>>>> The test script also reviels that an unproportial time is spent in
>>>>>>> FFC generating the code. This time also increase with the number of
>>>>>>> Real spaces included. Turning of FErari helped a bit with this
>>>>>>> point.
>>>>>>
>>>>>> I can take a look on the FFC side, but not today.
>>>>>
>>>>> Nice!
>>>>>
>>>>> With the update correction from Marie the numbers now looks like:
>>>>>
>>>>> With PETSc backend
>>>>>
>>>>> Tensor without Mixed space  |       0.11211     0.11211     1
>>>>> With 1 global dofs          |        1.9482      1.9482     1
>>>>> With 2 global dofs          |        2.8725      2.8725     1
>>>>> With 4 global dofs          |        5.1959      5.1959     1
>>>>> With 8 global dofs          |        10.524      10.524     1
>>>>> With 16 global dofs         |        25.574      25.574     1
>>>>>
>>>>> With Epetra backend
>>>>>
>>>>> Tensor without Mixed space  |       0.87544     0.87544     1
>>>>> With 1 global dofs          |        1.7089      1.7089     1
>>>>> With 2 global dofs          |        2.6868      2.6868     1
>>>>> With 4 global dofs          |          4.28        4.28     1
>>>>> With 8 global dofs          |         8.123       8.123     1
>>>>> With 16 global dofs         |        17.394      17.394     1
>>>>>
>>>>> Still a pretty big increase in time for just adding 16 scalar dofs to a
>>>>> system of 274625 dofs in teh first place.
>>>>
>>>> I have seen this big slow down for large problems. The first issue,
>>>> which was the computation of the sparsity pattern, has been 'resolved'
>>>> by using boost::unordered_set. This comes at the expense of the a small
>>>> slow down for regular problems.
>>>>
>>>> I also noticed that Epetra performs much better for these problems than
>>>> PETSc does. We need to check the matrix initialisation, but it could
>>>> ultimately be a limitation of the backends. Each matrix row
>>>> corresponding to a global dof is full, and it may be that backends
>>>> designed for large sparse matrices do not handle this well.
>>>
>>> How could inserting into the matrix be the bottle neck. In the test
>>> script I attached I do not assemble any global dofs.
>>
>> I think that you've find that it is. It will be assembling zeroes in the
>> global dof positions.
> 
> I guess you are right. Is the sparsity pattern and also the tabulated tensor 
> based only on the MixedSpace formulation, and not on the actual integral?
> 

The sparsity pattern is based on the dof map, which depends on the
function spaces.

> Is this a bug or feature?
>

I would say just the natural approach. There is/was a Blueprint to avoid
computing and assembling the zeroes in problems like Stokes, but I'm not
sure that this would be worthwhile, since it would involve assembling
non-matrix values, and most backends want to assemble dense local
matrices into sparse global matrices.

Garth


>>>> The best approach is probably to add the entire row at once for global
>>>> dofs. This would require a modified assembler.
>>>>
>>>> There is a UFC Blueprint to identify global dofs:
>>>>     https://blueprints.launchpad.net/ufc/+spec/global-dofs
>>>>
>>>> If we can identify global dofs, we have a better chance of dealing with
>>>> the problem properly. This includes running in parallel with global
>>>> dofs.
>>>
>>> Do you envision to have integrals over global dofs to get separated into
>>> its own tabulate tensor function? Then in DOLFIN we can assemble the
>>> whole row/column in one loop and insert it into the Matrix in one go.
>>
>> No - I had in mind possibly adding only cell-based dofs to the matrix,
>> and adding the global rows into a global vector, then is that inserted
>> at the end (as one row) into a matrix. I'm not advocating at this stage
>> a change to the tabulate_foo interface.
> 
> Ok, but you need its own tabulate_tensor function for the global dofs?
> 
>>> Do you think we also need to recognise global dofs in UFL to properly
>>> flesh out these integrals?
>>
>> Yes. For one to handle them properly in parallel since global dofs to
>> not reside at mesh entity, and the domains are parallelised mesh-wise.
> 
> Ok
> 
> Johan
> 
>> Garth
>>
>>> Johan
>>>
>>>> Garth
>>>>
>>>>> Johan
>>>>>
>>>>>> --
>>>>>> Marie
>>>>>>
>>>>>>> I have not profiled any of this, but I just throw it out there. I do
>>>>>>> not recognize any difference between for example Epetra or PETSc
>>>>>>> backend as suggested in the fixed bug for building of sparsity
>>>>>>> pattern with global dofs.
>>>>>>>
>>>>>>> My test has been done on a DOLFIN 0.9.9+. I haven't profiled it yet.
>>>>>>>
>>>>>>> Output from summary:
>>>>>>>    Tensor without Mixed space  |       0.11401     0.11401     1
>>>>>>>    With 1 global dofs          |       0.40725     0.40725     1
>>>>>>>    With 2 global dofs          |       0.94694     0.94694     1
>>>>>>>    With 4 global dofs          |         2.763       2.763     1
>>>>>>>    With 8 global dofs          |        9.6149      9.6149     1
>>>>>>>
>>>>>>> Also the amount of memory used to build the sparsity patter seams to
>>>>>>> double for each step. The memory print for a 32x32x32 unit cube with
>>>>>>> 16 global dofs was 1.6 GB memory(!?).
>>>>>>>
>>>>>>> Johan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dolfin
>>>>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~dolfin
>>>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~dolfin
>>>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dolfin
>>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dolfin
>>>> More help   : https://help.launchpad.net/ListHelp
Follow ups

Re: assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-04
References

assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-03
Re: assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-04
Re: assemble of Matrix with Real spaces slow
From: Garth N. Wells, 2011-03-04
Re: assemble of Matrix with Real spaces slow
From: Johan Hake, 2011-03-04