dolfin team mailing list archive

Thread
Date

Failed multicore assembly

To: "Discussion of DOLFIN development" <dolfin-dev@xxxxxxxxxx>
From: "Martin Sandve Alnæs" <martinal@xxxxxxxxx>
Date: Thu, 24 Jan 2008 10:07:16 +0100
Delivered-to: dolfin-dev@xxxxxxxxxx
Sender: martin.alnes@xxxxxxxxx

I've tried to implement multicore assembly using openmp in dolfin, but
it was a big failure. I've attached the hg bundle. The code is
protected by #ifdef _OPENMP, so it should be safe to merge into dolfin
if anyone wants to pursue this further (I won't).

To compile with openMP, I did:
CXX=g++-4.2 CXXFLAGS='-fopenmp -O3' ./configure ....
(I didn't manage to import pydolfin with this build, it missed some symbol.)

The problem is that the matrix insertion "A.add(...)" must be in a
critical section such that only one thread inserts at the same time.
Since the matrix insertion is a dominating part, this only introduces
a lot of overhead. Although I didn't expect much for the stiffness
matrix which I tested with, the result is a surprisingly large
slowdown when running two threads (although both cores were active).

To fix this, one might split the matrix in one datastructure for each
thread, and do "communication" between the matrix structures like with
the MPI-based program. The difference is that the matrices would be in
the memory of the same process, thus the communication overhead is
much smaller.

--
Martin

Attachment: openmp.bundle
Description: Binary data

Follow ups

Re: Failed multicore assembly
From: Anders Logg, 2008-01-28