← Back to team overview

dolfin team mailing list archive

Re: multi-thread assembly

 

On Wed, Nov 10, 2010 at 9:47 AM, Anders Logg <logg@xxxxxxxxx> wrote:
> On Wed, Nov 10, 2010 at 02:47:30PM +0000, Garth N. Wells wrote:
>> Nice to see multi-thread assembly being added. We should look at
>> adding support for the multi-threaded version of SuperLU. What other
>> multi-thread solvers are out there?
>
> Yes, that would be good, but I don't know which solvers are available.

SuperLU tends to die on large problems.  Mumps is a much better option.

>
>> I haven't looked at the code in great detail, but are element
>> tensors being added to the global tensor is a thread-safe fashion?
>> Both PETSc and Trilinos are not thread-safe.
>
> Yes, they should. That's the main point. It's a very simple algorithm
> which just partitions the matrix row by row and makes each process
> responsible for a chunk of rows. During assembly, all processes
> iterate over the entire mesh and on each cell does one of three things:
>
>  1. all_in_range:  tabulate_tensor as usual and add
>  2. none_in_range: skip tabulate_tensor (continue)
>  3. some_in_range: tabulate_tensor and insert only rows in range
>
> Didem Unat (PhD student at UCLA/Simula) tried this in a simple
> prototype code and got very good speedups (up to a factor 7 on an
> eight-core machine) so it's just a matter of doing the same thing as
> part of DOLFIN (which is a bit trickier since some of the data access
> is hidden). The current implementation in DOLFIN seems to work and
> give some small speedup but I need to do some more testing.
>
>> Rather than having two assembly classes, would it be worth using
>> OpenMP instead? I experimented with OpenMP some time ago, but never
>> added it since at the time it required a very recent version of gcc.
>> This shouldn't be a problem now.
>
> I don't think this would work with OpenMP since we need to control how
> the rows are inserted.
>
> If this works out and we get good speedups, we could consider
> replacing Assembler by MulticoreAssembler. It's not that much extra
> code and it's pretty clean. I haven't tried yet, but it should also
> work in combination with MPI (each node has a part of the mesh and
> does multi-core assembly).
>
> --
> Anders
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dolfin
> Post to     : dolfin@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dolfin
> More help   : https://help.launchpad.net/ListHelp
>



Follow ups

References