← Back to team overview

dolfin team mailing list archive

Re: [Bug 705401] Re: When PyTrilinos is imported after dolfin, bad things happen

 

On 22/01/11 19:25, Anders Logg wrote:
> On Sat, Jan 22, 2011 at 05:18:48PM -0000, Garth Wells wrote:
>>
>> On 22/01/11 17:07, Anders Logg wrote:
>>> On Sat, Jan 22, 2011 at 03:19:40PM -0000, Garth Wells wrote:
>>>>
>>>> On 22/01/11 14:10, Joachim Haga wrote:
>>>>> I looked a bit more, and it seems more complicated that that. You're
>>>>> right about atexit(), but PyTrilinos actually does the right thing wrt
>>>>> initialisation (it checks MPI_Initialized, and does nothing if already
>>>>> initialised). Hence, this does not explain the failures before exit().
>>>>
>>>> PyTrilinos doesn't do the right thing at the end of a program. It does
>>>> check at initialisation, but it calls finalise irrespective of whether
>>>> or not not it did the initialisation.
>>>>
>>>> PyTrilinos calls the MPI finalise function from atexit, but this is
>>>> called before the destructor for linear algebra objects is called.
>>>> Therefore, 'MPI' objects (e.g. PETScFoo) are still in scope when
>>>> PyTrilinos incorrectly finalises MPI. When DOLFIN tries to destroy the
>>>> MPI-based objects, and error pops up because MPI has been prematurely
>>>> finalised.
>>>
>>> Would it help to check that in cases where we think we should call
>>> MPI::Finalize, we first check whether we have initialized Trilinos and
>>> in that case just issue a warning and skip MPI::Finalize?
>>>
>>
>> No - it doesn't matter what we do. The problem is that PyTrilinos
>> finalises MPI while other objects, e.g. PETScFoo, are still about. It
>> also screws up the PETSc finalise call because PyTrilinos has killed MPI
>> too early.
> 
> What triggers the finalization from Trilinos? Perhaps there's a trick
> to delay it until after linear algebra objects have gone out of scope.
> 

The trick is the import order.

Garth

> --
> Anders
> 
> 
>> Garth
>>
>>>> Garth
>>>>
>>>>> I
>>>>> managed to tease out this error message from trilinos with the "wrong"
>>>>> import order, but it's not helpful:
>>>>>
>>>>>   Error!  An attempt was made to access parameter "aggregation: type" of type "string"
>>>>>   in the parameter (sub)list "ML preconditioner"
>>>>>   using the incorrect type "string"!
>>>>>
>>>>> The visible error, after the above is caught and re-thrown, is
>>>>>
>>>>>   *********************************************************
>>>>>   ML failed to compute the multigrid preconditioner. The
>>>>>   most common problem is an incorrect  data type in ML's
>>>>>   parameter list (e.g. 'int' instead of 'bool').
>>>>>
>>>>>   Note: List.set("ML print initial list",X) might help
>>>>>   figure out the bad one on pid X.
>>>>>   *********************************************************
>>>>>
>>>>>   ML::ERROR:: -1,
>>>>> /home/jobh/src/fenics/trilinos-10.6.2-Source/packages/ml/src/Utils/ml_MultiLevelPreconditioner.cpp,
>>>>> line 1694
>>>>>
>>>>> It ran clean under valgrind, so any stack or heap smash is subtle. I
>>>>> don't think I'll dig any deeper, given that the workaround is so simple.
>>>>>
>>>>> (I added the following in dolfin, to get rid of the MPI abort, but it
>>>>> didn't help with the problem above of course:)
>>>>>
>>>>> diff --git a/dolfin/main/SubSystemsManager.cpp b/dolfin/main/SubSystemsManager.cpp
>>>>> index 52c8982..6a19e4f 100644
>>>>> --- a/dolfin/main/SubSystemsManager.cpp
>>>>> +++ b/dolfin/main/SubSystemsManager.cpp
>>>>> @@ -126,7 +126,10 @@ void SubSystemsManager::finalize_mpi()
>>>>>    //Finalise MPI if required
>>>>>    if (MPI::Is_initialized() and sub_systems_manager.control_mpi)
>>>>>    {
>>>>> -    MPI::Finalize();
>>>>> +    if (MPI::Is_finalized())
>>>>> +        warning("MPI::Finalize has been called by someone else (how rude)");
>>>>> +    else
>>>>> +        MPI::Finalize();
>>>>>      sub_systems_manager.control_mpi = false;
>>>>>    }
>>>>>
>>>>
>>>
>>
>

-- 
You received this bug notification because you are a member of DOLFIN
Team, which is subscribed to DOLFIN.
https://bugs.launchpad.net/bugs/705401

Title:
  When PyTrilinos is imported after dolfin, bad things happen

Status in DOLFIN:
  New

Bug description:
  When using PyTrilinos (ML in particular), the order of imports is
  important. If dolfin is imported first, it crashes at exit, and there
  are problems also with constructing preconditioners etc.

  It looks like it has to do with MPI initialisation, but I haven't
  looked at it closely.

  A simple workaround may be to try an import ML in dolfin/__init__.py
  (just import, not expose) so that it gets initialised. I don't know if
  the performance hit is worth it. It would of course be better to find
  a proper fix.

  Otherwise, it's nice to have it documented here. For google::
  >>> import dolfin
  >>> from PyTrilinos import ML
  >>> exit()
  *** An error occurred in MPI_Finalize
  *** after MPI was finalized
  *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
  [rodin:17864] Abort after MPI_FINALIZE completed successfully; not able to guarantee that all other processes were killed!





References