dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #22384
Re: Unit test problem in parallel
Johan Hake <johan.hake@xxxxxxxxx> writes:
> On Tuesday March 29 2011 23:27:27 Anders Logg wrote:
>> On Tue, Mar 29, 2011 at 11:10:17PM -0700, Johan Hake wrote:
>> > What triggers the error? Is it writing and/or reading to/from file. Is it
>> > assignment of data from within the read function in the test?
>> >
>> > johan
>>
>> It's the next line following the read:
>>
>> std::string filename(p1["filename"]);
>
> It took some time to find the parameter unit test ;)
>
>> So something goes wrong for at least one of the processes when the
>> parameters are read back from file. Here's what happens:
>>
>> 1. All processes create parameter set p0
>>
>> 2. Process 0 writes p0 to file
>>
>> 3. Everyone waits (barrier)
>>
>> 4. All processes read from the file into p1
>>
>> 5. All processes access parameters from p1 and compare to p0
>
> I guess it is 4 that goes wrong. I have tried to google varieties of "open
> shared file fstream". It looks like others have had the same problem.
>
> Johan
>
I don't think it's enough with a barrier. It doesn't guarantee that
the data is flushed to the disk.
An option is of course to use MPI I/O, but that would lead to a
painful rewrite of most I/O routines...
Niclas
>>
>> --
>> Anders
>>
>> > On Tuesday March 29 2011 22:53:01 Anders Logg wrote:
>> > > The parameter unit test is sometimes failing in parallel. On my local
>> > > machine it always seems to work with 2 or 3 processes, but sometimes
>> > > it fails with 4, giving the same error message as the buildbot:
>> > >
>> > > ##Failure Location unknown## : Error
>> > > Test name: InputOutput::test_simple
>> > > uncaught exception of type St13runtime_error
>> > > - *** Error: Unable to access parameter "filename" in parameter set
>> > > "test", par
>> > > ameter not defined.
>> > >
>> > > Failures !!!
>> > > Run: 2 Failure total: 1 Failures: 0 Errors: 1
>> > >
>> > > There is a check for which process writes to file and a barrier that
>> > > should make sure everyone waits until the file gets written.
>> > >
>> > > // Save to file
>> > > if (dolfin::MPI::process_number() == 0)
>> > > {
>> > >
>> > > File f0("test_parameters.xml");
>> > > f0 << p0;
>> > >
>> > > }
>> > > dolfin::MPI::barrier();
>> > >
>> > > // Read from file
>> > > Parameters p1;
>> > > File f1("test_parameters.xml");
>> > > f1 >> p1;
>> > >
>> > > I thought that should do the trick, but apparently not.
>> > >
>> > > Any ideas what goes wrong?
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dolfin
> Post to : dolfin@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dolfin
> More help : https://help.launchpad.net/ListHelp
Follow ups
References