dolfin team mailing list archive

Thread
Date

Re: multi-thread assembly

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: Andy Ray Terrel <andy.terrel@xxxxxxxxx>
Date: Wed, 10 Nov 2010 10:08:17 -0600
Cc: DOLFIN Mailing List <dolfin@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4CDAC08F.2050406@cam.ac.uk>

On Wed, Nov 10, 2010 at 9:55 AM, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>
>
> On 10/11/10 15:53, Andy Ray Terrel wrote:
>>
>> On Wed, Nov 10, 2010 at 9:47 AM, Anders Logg<logg@xxxxxxxxx>  wrote:
>>>
>>> On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote:
>>>>
>>>> Nice to see multi-thread assembly being added. We should look at
>>>> adding support for the multi-threaded version of SuperLU. What other
>>>> multi-thread solvers are out there?
>>>
>>> Yes, that would be good, but I don't know which solvers are available.
>>
>> SuperLU tends to die on large problems.  Mumps is a much better option.
>>
>
> MUMPS is MPI-based. SuperLU has a multi-threaded version for shared memory
> machines.
>
> Garth

Yes but you compile it to take advantage of MPI's shared memory message passing.

>
>>>
>>>> I haven't looked at the code in great detail, but are element
>>>> tensors being added to the global tensor is a thread-safe fashion?
>>>> Both PETSc and Trilinos are not thread-safe.
>>>
>>> Yes, they should. That's the main point. It's a very simple algorithm
>>> which just partitions the matrix row by row and makes each process
>>> responsible for a chunk of rows. During assembly, all processes
>>> iterate over the entire mesh and on each cell does one of three things:
>>>
>>>  1. all_in_range:  tabulate_tensor as usual and add
>>>  2. none_in_range: skip tabulate_tensor (continue)
>>>  3. some_in_range: tabulate_tensor and insert only rows in range
>>>
>>> Didem Unat (PhD student at UCLA/Simula) tried this in a simple
>>> prototype code and got very good speedups (up to a factor 7 on an
>>> eight-core machine) so it's just a matter of doing the same thing as
>>> part of DOLFIN (which is a bit trickier since some of the data access
>>> is hidden). The current implementation in DOLFIN seems to work and
>>> give some small speedup but I need to do some more testing.
>>>
>>>> Rather than having two assembly classes, would it be worth using
>>>> OpenMP instead? I experimented with OpenMP some time ago, but never
>>>> added it since at the time it required a very recent version of gcc.
>>>> This shouldn't be a problem now.
>>>
>>> I don't think this would work with OpenMP since we need to control how
>>> the rows are inserted.
>>>
>>> If this works out and we get good speedups, we could consider
>>> replacing Assembler by MulticoreAssembler. It's not that much extra
>>> code and it's pretty clean. I haven't tried yet, but it should also
>>> work in combination with MPI (each node has a part of the mesh and
>>> does multi-core assembly).
>>>
>>> --
>>> Anders
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dolfin
>>> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dolfin
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>

Follow ups

Re: multi-thread assembly
From: Garth N. Wells, 2010-11-10

References

multi-thread assembly
From: Garth N. Wells, 2010-11-10
Re: multi-thread assembly
From: Anders Logg, 2010-11-10
Re: multi-thread assembly
From: Andy Ray Terrel, 2010-11-10
Re: multi-thread assembly
From: Garth N. Wells, 2010-11-10