dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #21050
[Bug 709149] [NEW] Potential deadlock in parallel vector resize
Public bug reported:
This may not be a problem in practice, but I'm reporting it because
these things may fail very randomly (only with a specific number of
processors and specific mesh).
In PETScVector::resize (and probably other backends), there is a slight
risk that this test triggers on only a subset of the processors (say if
a vector with distribution 4-3-3 is resized to distribution 3-4-3):
// Check if resizing is required
if (x && (this->local_range().first == range.first && this->local_range().second == range.second))
return;
Then, not all processors participate in the collective resizing, and a
deadlock results. Similar problems may exist elsewhere, wherever the
code path depend on local data.
The easiest is to just skip the test (for distributed vectors at least).
But this may be a performance issue? An alternative would be to let the
vector "know" which distribution it has (i.e., give it a "mapping id" at
creation/resize time), but to get that without parallel overhead might
require changes in the interface.
** Affects: dolfin
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of DOLFIN
Team, which is subscribed to DOLFIN.
https://bugs.launchpad.net/bugs/709149
Title:
Potential deadlock in parallel vector resize
Status in DOLFIN:
New
Bug description:
This may not be a problem in practice, but I'm reporting it because
these things may fail very randomly (only with a specific number of
processors and specific mesh).
In PETScVector::resize (and probably other backends), there is a
slight risk that this test triggers on only a subset of the processors
(say if a vector with distribution 4-3-3 is resized to distribution
3-4-3):
// Check if resizing is required
if (x && (this->local_range().first == range.first && this->local_range().second == range.second))
return;
Then, not all processors participate in the collective resizing, and a
deadlock results. Similar problems may exist elsewhere, wherever the
code path depend on local data.
The easiest is to just skip the test (for distributed vectors at
least). But this may be a performance issue? An alternative would be
to let the vector "know" which distribution it has (i.e., give it a
"mapping id" at creation/resize time), but to get that without
parallel overhead might require changes in the interface.
Follow ups
References