← Back to team overview

dolfin team mailing list archive

Re: [Bug 709149] [NEW] Potential deadlock in parallel vector resize

 

On 28/01/11 10:19, Joachim Haga wrote:
> Public bug reported:
> 
> This may not be a problem in practice, but I'm reporting it because
> these things may fail very randomly (only with a specific number of
> processors and specific mesh).
> 
> In PETScVector::resize (and probably other backends), there is a slight
> risk that this test triggers on only a subset of the processors (say if
> a vector with distribution 4-3-3 is resized to distribution 3-4-3):
> 
>   // Check if resizing is required
>   if (x && (this->local_range().first == range.first && this->local_range().second == range.second))
>     return;
> 
> Then, not all processors participate in the collective resizing, and a
> deadlock results. Similar problems may exist elsewhere, wherever the
> code path depend on local data.
> 
> The easiest is to just skip the test (for distributed vectors at least).
> But this may be a performance issue? An alternative would be to let the
> vector "know" which distribution it has (i.e., give it a "mapping id" at
> creation/resize time), but to get that without parallel overhead might
> require changes in the interface.
>

I think the easiest (in a number of respects) would be not not allow
re-sizing once the underlying object has been created. The backends
don't allow because it is tricky.


> ** Affects: dolfin
>      Importance: Undecided
>          Status: New
>

-- 
You received this bug notification because you are a member of DOLFIN
Team, which is subscribed to DOLFIN.
https://bugs.launchpad.net/bugs/709149

Title:
  Potential deadlock in parallel vector resize

Status in DOLFIN:
  Fix Committed

Bug description:
  This may not be a problem in practice, but I'm reporting it because
  these things may fail very randomly (only with a specific number of
  processors and specific mesh).

  In PETScVector::resize (and probably other backends), there is a
  slight risk that this test triggers on only a subset of the processors
  (say if a vector with distribution 4-3-3 is resized to distribution
  3-4-3):

    // Check if resizing is required
    if (x && (this->local_range().first == range.first && this->local_range().second == range.second))
      return;

  Then, not all processors participate in the collective resizing, and a
  deadlock results. Similar problems may exist elsewhere, wherever the
  code path depend on local data.

  The easiest is to just skip the test (for distributed vectors at
  least). But this may be a performance issue? An alternative would be
  to let the vector "know" which distribution it has (i.e., give it a
  "mapping id" at creation/resize time), but to get that without
  parallel overhead might require changes in the interface.





References