← Back to team overview

dolfin team mailing list archive

Failed multicore assembly

 

I've tried to implement multicore assembly using openmp in dolfin, but
it was a big failure. I've attached the hg bundle. The code is
protected by #ifdef _OPENMP, so it should be safe to merge into dolfin
if anyone wants to pursue this further (I won't).

To compile with openMP, I did:
CXX=g++-4.2 CXXFLAGS='-fopenmp -O3' ./configure ....
(I didn't manage to import pydolfin with this build, it missed some symbol.)

The problem is that the matrix insertion "A.add(...)" must be in a
critical section such that only one thread inserts at the same time.
Since the matrix insertion is a dominating part, this only introduces
a lot of overhead. Although I didn't expect much for the stiffness
matrix which I tested with, the result is a surprisingly large
slowdown when running two threads (although both cores were active).

To fix this, one might split the matrix in one datastructure for each
thread, and do "communication" between the matrix structures like with
the MPI-based program. The difference is that the matrices would be in
the memory of the same process, thus the communication overhead is
much smaller.

--
Martin

Attachment: openmp.bundle
Description: Binary data


Follow ups