dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #18869
Re: [Bug 612579] Re: Huge performance problem in Python interface
On Thursday August 5 2010 10:08:15 Anders Logg wrote:
> On Thu, Aug 05, 2010 at 02:27:51AM -0000, Johan Hake wrote:
> > On Monday August 2 2010 15:00:47 Johan Hake wrote:
> > > On Monday August 2 2010 12:05:38 Garth Wells wrote:
> > > > On Mon, 2010-08-02 at 18:28 +0000, Anders Logg wrote:
> > > > > On Mon, Aug 02, 2010 at 04:15:54PM -0000, Johan Hake wrote:
> > > > > > It looks like there is something fishy with the cashing. I can
> > > > > > have look at it
> > > > > >
> > > > > > Johan
> > > > >
> > > > > There have been some regressions in the speed of caching, probably
> > > > > as
> > > > >
> > > > > a result of the FFC rewrite earlier this year. See fem-jit-python
here:
> > > > > http://www.fenics.org/bench/
> > > > >
> > > > > I haven't bothered to examine it in detail since I thought it was
> > > > > "good enough" but apparently not.
> > >
> > > Yes the problem is probably not in DOLFIN, but who knows. Looking at
> > > the code that is provided I saw that commenting out the creation of
> > > the DOLFIN Form within the assemble routine and instead wrapping the
> > > Form before calling assemble made the difference that is reported. So
> > > I thought I have a look at that first.
> > >
> > > > It probably is "good enough" in practice. There may be some issues
> > > > following the fix of some memory leaks in Instant earlier this year.
> > >
> > > Yes I hope I do not have to go that far. But we'll see.
> >
> > Looks like this works fine. The module is read from memory.
> >
> > With some profiling it looks like most time is spent in
> >
> > ufl.preprocess (with a lot of different ufl algorithms called)
> > instant.check_swig_version (with some file io)
> >
> > which are all called during ffc.jit.
> >
> > I guess these are functions we need to run for each jit call? We might be
> > able to cache the swig check. However we need to do the preprocessing as
> > it is here that we figure out the signature of the form (if I am not
> > mistaken...).
>
> Maybe we shouldn't need to call preprocess (and I think we didn't do
> this at some point in the past). Instead we can have 3 levels of caching:
>
> 1. Check the id() of the incoming form and directly return the
> compiled module from memory cache (should be super fast)
>
> 2. Preprocess and check the signature of the incoming form and return
> the compiled module from memory cache (can take some time)
>
> 3. If not in the memory cache, check disk cache (will take more time)
>
> 4. Otherwise build the module
>
> Maybe we have lost step (1) along the way.
Yes I think so. Memory cache is used but _after_ the preprocessing.
The problem with this is that jit return the form_data. This can only be
accessed through a preprocessed form. I am not sure why you need to return the
form_data?
We could also cache the preprocessed form. Then we should be safe :) But the
final result might be a bit convoluted?
Also, as Garth pointed out previously, this memory caching can result in
memory leaks. For that I suggest we check the reference count of the original
forms, and if the form does not exists more than in the cache, we can just
remove just remove it.
Johan
> --
> Anders
>
> > If someone know of a good line by line profiler for Python please shout
> > out. cProfile just give me the total time spent in different functions.
> >
> > Johan
> >
> > > Johan
> > >
> > > > Garth
> > > >
> > > > > It would be very welcome if you had a look at it. If you need my
> > > > > help I can jump in.
--
Huge performance problem in Python interface
https://bugs.launchpad.net/bugs/612579
You received this bug notification because you are a member of DOLFIN
Team, which is subscribed to DOLFIN.
Status in DOLFIN: Confirmed
Bug description:
In my Python Code I need to evaluate a nonlinear functional many times. This was rather slow and after a profile run, I've noticed that 90% of the time was spent in the __init__ routine of form.py to compile the form. As far as I can survey the code, this should be necessary only once.
I have attached a simple example that illustrates the effect. In my test, the second code is roughly 40 times faster.
Follow ups
References