← Back to team overview

ffc team mailing list archive

Re: [Bug 769811] [NEW] JIT cache problem with id(form)

 

On Mon, Apr 25, 2011 at 11:10:59PM +0100, Garth N. Wells wrote:
>
>
> On 25/04/11 23:04, Anders Logg wrote:
> > On Mon, Apr 25, 2011 at 10:56:25PM +0100, Garth N. Wells wrote:
> >>
> >>
> >> On 25/04/11 22:48, Anders Logg wrote:
> >>> On Mon, Apr 25, 2011 at 10:41:58PM +0100, Garth N. Wells wrote:
> >>>>
> >>>>
> >>>> On 25/04/11 22:33, Anders Logg wrote:
> >>>>> On Mon, Apr 25, 2011 at 10:26:18PM +0100, Garth N. Wells wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 25/04/11 22:08, Anders Logg wrote:
> >>>>>>> On Mon, Apr 25, 2011 at 07:40:21PM -0000, Garth Wells wrote:
> >>>>>>>> On 25/04/11 20:00, Johan Hake wrote:
> >>>>>>>>> On Monday April 25 2011 11:26:36 Garth Wells wrote:
> >>>>>>>>>> On 25/04/11 18:51, Anders Logg wrote:
> >>>>>>>>>>> On Mon, Apr 25, 2011 at 05:11:41PM -0000, Garth Wells wrote:
> >>>>>>>>>>>> On 25/04/11 17:53, Johan Hake wrote:
> >>>>>>>>>>>>> On Monday April 25 2011 08:59:18 Garth Wells wrote:
> >>>>>>>>>>>>>> On 25/04/11 16:47, Johan Hake wrote:
> >>>>>>>>>>>>>>> Commenting out the cache is really not a fix. The problem is within
> >>>>>>>>>>>>>>> dolfin. Isn't there another way to deal with this?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It is a fix if the cache isn't needed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sure.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> First: How much penalty are there with a disabled memory cache. Maybe
> >>>>>>>>>>>>>>> the problem isn't that bad?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don't get the point of this cache. The way it is now, a form is only
> >>>>>>>>>>>>>> preprocessed if it hasn't already been preprocessed, which seems ok to
> >>>>>>>>>>>>>> me. The old code tried to avoid some preprocessing, but it was highly
> >>>>>>>>>>>>>> dubious and I doubt that it was effective.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think the preprocessing stage actually do take some time. AFAIK the
> >>>>>>>>>>>>> preproces stage essentially do two things. It creates a canonical
> >>>>>>>>>>>>> version of the Form so two Forms that are the same, but constructed at
> >>>>>>>>>>>>> different times are beeing treated equal wrt form generation. Then are
> >>>>>>>>>>>>> DOLFIN specific guys extracted. I am not sure what takes the most
> >>>>>>>>>>>>> time. We should probably profiel it... But if it is the latter we
> >>>>>>>>>>>>> could consider putting another cache in place which is more robust wrt
> >>>>>>>>>>>>> changing DOLFIN objects.
> >>>>>>>>>>>>
> >>>>>>>>>>>> It should be easy to avoid the overhead of preprocessing by keeping the
> >>>>>>>>>>>> object in scope. If the object changes, the only robust way to make sure
> >>>>>>>>>>>> that the form is the same as one in the cache is to compare all the
> >>>>>>>>>>>> data. This requires preprocessing the form, which then defeats the
> >>>>>>>>>>>> purpose of a cache. It may be possible to add a lightweight preprocess
> >>>>>>>>>>>> to UFL, but I don't think that it's worth the effort or extra
> >>>>>>>>>>>> complication.
> >>>>>>>>>
> >>>>>>>>> I think a light weight version might be the way to go. This is then stored in
> >>>>>>>>> memory cache. If we are able to strip such a form for all DOLFIN specific
> >>>>>>>>> things we would also prevent huge memory leaks with mesh beeing kept.
> >>>>>>>>>
> >>>>>>>>> Then we always grab DOLFIN specific data from the passed form instead of
> >>>>>>>>> grabbing from the cache. Not sure how easy this will be to implement, but I
> >>>>>>>>> think we need to explore it, as the DOLFIN specific part of the form really
> >>>>>>>>> has nothing to do with the generated Form.
> >>>>>>>>>
> >>>>>>>>> Martin:
> >>>>>>>>> Why is it important to have the _count in the repr of the form? I guess that
> >>>>>>>>> is used in ufl algorithms? Would it be possible to include a second repr
> >>>>>>>>> function, which did not include the count? This would then be used when the
> >>>>>>>>> signature is checked for. We could then use that repr to generate a form which
> >>>>>>>>> is stored in the memory cache. This would then be tripped for any DOLFIN
> >>>>>>>>> specific objects. This should work as the _count attribute has nothing to do
> >>>>>>>>> with what code gets generated, but it is essential for internal UFL
> >>>>>>>>> algorithms, right?
> >>>>>>>>>
> >>>>>>>>>>> I'm not very happy with this change.
> >>>>>>>>>>
> >>>>>>>>>> The bright side is that slow and correct is a better starting point than
> >>>>>>>>>> fast but wrong ;).
> >>>>>>>>>>
> >>>>>>>>>> An easy fix is to attach the preprocessed form to a Form object. This
> >>>>>>>>>> would work robustly if we can make forms immutable once they've been
> >>>>>>>>>> compiled. Is it possible to make a Python object immutable?
> >>>>>>>>>
> >>>>>>>>> We can probably overload all setattribtue methods which prohibits a user to
> >>>>>>>>> write to these but it might not be possible to prohibit a user to change
> >>>>>>>>> attributes on instances owned by the Form. I guess this is similare to the
> >>>>>>>>> difficulties of preserving constness in C++, but I think it is even harder in
> >>>>>>>>> Python.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> What if we have the FFC jit compiler return the preprocessed form, and
> >>>>>>>> inside dolfin.Form simply do
> >>>>>>>>
> >>>>>>>>     class Form(cpp.Form):
> >>>>>>>>         def __init__(self, form, . . .. )
> >>>>>>>>         ....
> >>>>>>>>
> >>>>>>>>         (...., preprocessed_form) = jit(form, . . . . )
> >>>>>>>>
> >>>>>>>>         form = preprocessed_form
> >>>>>>>>
> >>>>>>>>         .....
> >>>>>>>>
> >>>>>>>> This way, form will have form_data, and the FFC jit function will know
> >>>>>>>> not to call ufl.preprocess.
> >>>>>>>
> >>>>>>> Here's another strange thing. In the JITObject class, we have two
> >>>>>>> functions: __hash__ and signature. As far as I understand, the first
> >>>>>>> is used to located objects (generated code/modules) in the Instant
> >>>>>>> in-memory cache, while the second is used for the on-disk cache.
> >>>>>>>
> >>>>>>> >From some simple tests I did now, it looks like the __hash__ function
> >>>>>>> does not need to any significant speedup. The JIT benchmark runs just
> >>>>>>> as fast if I call signature from within __hash__.
> >>>>>>>
> >>>>>>> Furthermore, the __hash__ function must also be broken since it relies
> >>>>>>> on calling id on the form.
> >>>>>>>
> >>>>>>> Ideally, we should get Instant to handle the caching, both in-memory
> >>>>>>> and on-disk, by providing two functions __hash__ (fast, for in-memory
> >>>>>>> cache) and signature (slow, for on-disk cache).
> >>>>>>>
> >>>>>>> Since __hash__ cannot call id, it must be able to attach a unique
> >>>>>>> string to the form (perhaps based on an internal counter in FFC).
> >>>>>>> My suggestion would be to add this to UFL, something like set_hash
> >>>>>>> and hash (which would return None if set_hash has not been called).
> >>>>>>> If Martin does not like that, we should be able to handle it on the
> >>>>>>> DOLFIN side.
> >>>>>>>
> >>>>>>> So in conclusion: no in-memory cache in FFC (handled by Instant) and
> >>>>>>> FFC attaches a hash to incoming forms so that Instant may recognize
> >>>>>>> them later.
> >>>>>>>
> >>>>>>
> >>>>>> The code that I disabled was caching preprocessed forms, so I don't see
> >>>>>> how this can be handled by Instant.
> >>>>>
> >>>>> The point would be that one could check that "hash" of the form (some
> >>>>> unique string) instead of computing the signature which involves
> >>>>> preprocessing the form.
> >>>>>
> >>>>
> >>>> How would the hash be computed? To check if the mesh has changed, my
> >>>> limited understanding is that the entire object would have to be
> >>>> serialised, and then a hash computed. How expensive is that?
> >>>>
> >>>> The issue that I ran into was not related to signatures. It was related
> >>>> to the non-UFL data that is attached to arguments.
> >>>
> >>> The hash would be unique to each form. It could just be a counter
> >>> value and the counter would be increased inside Instant for each
> >>> object it gets as input.
> >>
> >> But how does Instant know if a form is new? I also don't see why Instant
> >> should need to know if the mesh associated with a form has changed, but
> >> is for the rest the same. Wouldn't Instant need to be DOLFIN-aware?
> >
> > The hash() function would play the same role as the id() function
> > before with the difference that we can't get the same id for a new
> > form as for an old form that's gone out of scope.
> >
> > Instant should not need to know anything it just does this:
> >
> >    check if object has a set_hash() function
> >    if so, calls hash() to get the hash value
> >      checks the cache for that hash value
> >    if not, assign unique value by calling set_hash on the object
> >
> > We would need to make sure from the DOLFIN side that when we change a
> > Form, we also change the hash value (for example by setting it to
> > None) which would trigger the Instant disk cache.
> >
>
> This is the problem - how do we know that it's changed?

Because we change it. When we modify a Form, we need to invalidate the
hash, by making sure that we modify the Form by calling member
functions that invalidate the hash (setting it to None).

> The original issue is not related to disk vs memory cache, or Instant.
> It is how to avoid calling ufl.preprocess unnecessarily when the form
> repr() is unchanged but the mesh has been changed.

Exactly, and that's why I suggest introducing the hash as something to
be checked instead of id() which turned out not to be robust.

And furthermore, the call to id/hash should be handled by Instant, not
FFC. Since Instant handles JIT compiling and caching, FFC should only
need to call Instant, not handle extra caching.

--
Anders



Follow ups

References