← Back to team overview

dolfin team mailing list archive

Re: [HG DOLFIN] merge

 

On Mon, Aug 17, 2009 at 11:20:08PM +0200, Johan Hake wrote:
> On Monday 17 August 2009 19:19:40 Anders Logg wrote:
> > On Mon, Aug 17, 2009 at 07:09:11PM +0200, DOLFIN wrote:
> > > changeset:   6762:ca407204632a1b0430099c243c915a151b2bd941
> > > parent:      6759:efc24a341e41e9e0c83616be4613d819fe95ccb6
> > > user:        Anders Logg <logg@xxxxxxxxx>
> > > date:        Mon Aug 17 19:08:56 2009 +0200
> > > files:       site-packages/dolfin/compile_function.py
> > > site-packages/dolfin/jit.py description:
> > > Make JIT compiler work in parallel. The process number is added to the
> > > signature to create a unique signature for each process. This means that
> > > each process will compile its own form. This may not be optimal and could
> > > possibly be handled by Instant. On the other hand, it seems to work
> > > nicely and might also be advantageous when processes don't share a common
> > > cache.
> >
> > The Poisson Python demo now runs as is without the need for first
> > running it in serial (to handle JIT compilation):
>
> Did it not work before this change? I know Martin added some file locks to
> prevent simultaneous compilations of the same module.

No, it didn't work before. I get things like

In instant.build_module: Path
'/home/logg/.instant/cache/form_f38430af401fbeddb9be4091a6fcde37cef9fa35'
already exists, but module wasn't found in cache previously. Not
overwriting, assuming this module is valid.
Traceback (most recent call last):
  File "demo.py", line 23, in <module>
    V = FunctionSpace(mesh, "CG", 1)
  File
  "/home/logg/scratch/src/fenics-dev/dolfin-dev/local/lib/python2.6/site-packages/dolfin/functionspace.py",
  line 181, in __init__
    FunctionSpaceBase.__init__(self, mesh, element)
  File
  "/home/logg/scratch/src/fenics-dev/dolfin-dev/local/lib/python2.6/site-packages/dolfin/functionspace.py",
  line 43, in __init__
    ufc_element, ufc_dofmap = jit(self._element)
  File
  "/home/logg/scratch/src/fenics-dev/dolfin-dev/local/lib/python2.6/site-packages/dolfin/jit.py",
  line 67, in jit
    return jit_compile(form, options)
  File
  "/home/logg/scratch/lib/fenics-dev/lib/python2.6/site-packages/ffc/jit/jit.py",
  line 56, in jit
    return jit_element(object, options)
  File
  "/home/logg/scratch/lib/fenics-dev/lib/python2.6/site-packages/ffc/jit/jit.py",
  line 125, in jit_element
    (compiled_form, module, form_data) = jit_form(form, options)
  File
  "/home/logg/scratch/lib/fenics-dev/lib/python2.6/site-packages/ffc/jit/jit.py",
  line 102, in jit_form
    os.unlink(signature + ".h")
  OSError: [Errno 2] No such file or directory:
  'form_f38430af401fbeddb9be4091a6fcde37cef9fa35.h'


I guess the second process tries to read the generated file but
it's not ready yet (still being generated by the first process).

It would be good to handle the parallel JIT compilation as part of
Instant, but I don't know what the best solution is.

> >   mpirun -n 4 python demo.py
>
> Do I have to set some environmental variables to make this work. I can't get
> it to work (probably some stupid error) :P

No, nothing. It should work out of the box.

> Johan
>
> When running the above command I get:
>
> ssh: connect to host hake-laptop port 22: Connection refused

Can you run other processes in parallel?

  mpirun -n 4 ls

Maybe you need to install sshd? I didn't know it was required.

--
Anders

> --------------------------------------------------------------------------
> A daemon (pid 32065) died unexpectedly with status 255 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
>
>
>

Attachment: signature.asc
Description: Digital signature


Follow ups

References