Thread Previous • Date Previous • Date Next • Thread Next |
I've tried to implement multicore assembly using openmp in dolfin, but it was a big failure. I've attached the hg bundle. The code is protected by #ifdef _OPENMP, so it should be safe to merge into dolfin if anyone wants to pursue this further (I won't). To compile with openMP, I did: CXX=g++-4.2 CXXFLAGS='-fopenmp -O3' ./configure .... (I didn't manage to import pydolfin with this build, it missed some symbol.) The problem is that the matrix insertion "A.add(...)" must be in a critical section such that only one thread inserts at the same time. Since the matrix insertion is a dominating part, this only introduces a lot of overhead. Although I didn't expect much for the stiffness matrix which I tested with, the result is a surprisingly large slowdown when running two threads (although both cores were active). To fix this, one might split the matrix in one datastructure for each thread, and do "communication" between the matrix structures like with the MPI-based program. The difference is that the matrices would be in the memory of the same process, thus the communication overhead is much smaller. -- Martin
Attachment:
openmp.bundle
Description: Binary data
Thread Previous • Date Previous • Date Next • Thread Next |