← Back to team overview

ffc team mailing list archive

Re: [Ufl] [Bug 769811] [NEW] JIT cache problem with id(form)

 

On Monday April 25 2011 15:34:55 Anders Logg wrote:
> On Mon, Apr 25, 2011 at 03:28:30PM -0700, Johan Hake wrote:
> > On Monday April 25 2011 15:19:20 Anders Logg wrote:
> > > On Mon, Apr 25, 2011 at 03:14:45PM -0700, Johan Hake wrote:
> > > > On Monday April 25 2011 15:04:43 Anders Logg wrote:
> > > > > On Mon, Apr 25, 2011 at 10:56:25PM +0100, Garth N. Wells wrote:
> > > > > > On 25/04/11 22:48, Anders Logg wrote:
> > > > > > > On Mon, Apr 25, 2011 at 10:41:58PM +0100, Garth N. Wells wrote:
> > > > > > >> On 25/04/11 22:33, Anders Logg wrote:
> > > > > > >>> On Mon, Apr 25, 2011 at 10:26:18PM +0100, Garth N. Wells 
wrote:
> > > > > > >>>> On 25/04/11 22:08, Anders Logg wrote:
> > > > > > >>>>> On Mon, Apr 25, 2011 at 07:40:21PM -0000, Garth Wells wrote:
> > > > > > >>>>>> On 25/04/11 20:00, Johan Hake wrote:
> > > > > > >>>>>>> On Monday April 25 2011 11:26:36 Garth Wells wrote:
> > > > > > >>>>>>>> On 25/04/11 18:51, Anders Logg wrote:
> > > > > > >>>>>>>>> On Mon, Apr 25, 2011 at 05:11:41PM -0000, Garth Wells 
wrote:
> > > > > > >>>>>>>>>> On 25/04/11 17:53, Johan Hake wrote:
> > > > > > >>>>>>>>>>> On Monday April 25 2011 08:59:18 Garth Wells wrote:
> > > > > > >>>>>>>>>>>> On 25/04/11 16:47, Johan Hake wrote:
> > > > > > >>>>>>>>>>>>> Commenting out the cache is really not a fix. The
> > > > > > >>>>>>>>>>>>> problem is within dolfin. Isn't there another way
> > > > > > >>>>>>>>>>>>> to deal with this?
> > > > > > >>>>>>>>>>>> 
> > > > > > >>>>>>>>>>>> It is a fix if the cache isn't needed.
> > > > > > >>>>>>>>>>> 
> > > > > > >>>>>>>>>>> Sure.
> > > > > > >>>>>>>>>>> 
> > > > > > >>>>>>>>>>>>> First: How much penalty are there with a disabled
> > > > > > >>>>>>>>>>>>> memory cache. Maybe the problem isn't that bad?
> > > > > > >>>>>>>>>>>> 
> > > > > > >>>>>>>>>>>> I don't get the point of this cache. The way it is
> > > > > > >>>>>>>>>>>> now, a form is only preprocessed if it hasn't
> > > > > > >>>>>>>>>>>> already been preprocessed, which seems ok to me.
> > > > > > >>>>>>>>>>>> The old code tried to avoid some preprocessing, but
> > > > > > >>>>>>>>>>>> it was highly dubious and I doubt that it was
> > > > > > >>>>>>>>>>>> effective.
> > > > > > >>>>>>>>>>> 
> > > > > > >>>>>>>>>>> I think the preprocessing stage actually do take some
> > > > > > >>>>>>>>>>> time. AFAIK the preproces stage essentially do two
> > > > > > >>>>>>>>>>> things. It creates a canonical version of the Form so
> > > > > > >>>>>>>>>>> two Forms that are the same, but constructed at
> > > > > > >>>>>>>>>>> different times are beeing treated equal wrt form
> > > > > > >>>>>>>>>>> generation. Then are DOLFIN specific guys extracted.
> > > > > > >>>>>>>>>>> I am not sure what takes the most time. We should
> > > > > > >>>>>>>>>>> probably profiel it... But if it is the latter we
> > > > > > >>>>>>>>>>> could consider putting another cache in place which
> > > > > > >>>>>>>>>>> is more robust wrt changing DOLFIN objects.
> > > > > > >>>>>>>>>> 
> > > > > > >>>>>>>>>> It should be easy to avoid the overhead of
> > > > > > >>>>>>>>>> preprocessing by keeping the object in scope. If the
> > > > > > >>>>>>>>>> object changes, the only robust way to make sure that
> > > > > > >>>>>>>>>> the form is the same as one in the cache is to
> > > > > > >>>>>>>>>> compare all the data. This requires preprocessing the
> > > > > > >>>>>>>>>> form, which then defeats the purpose of a cache. It
> > > > > > >>>>>>>>>> may be possible to add a lightweight preprocess to
> > > > > > >>>>>>>>>> UFL, but I don't think that it's worth the effort or
> > > > > > >>>>>>>>>> extra complication.
> > > > > > >>>>>>> 
> > > > > > >>>>>>> I think a light weight version might be the way to go.
> > > > > > >>>>>>> This is then stored in memory cache. If we are able to
> > > > > > >>>>>>> strip such a form for all DOLFIN specific things we
> > > > > > >>>>>>> would also prevent huge memory leaks with mesh beeing
> > > > > > >>>>>>> kept.
> > > > > > >>>>>>> 
> > > > > > >>>>>>> Then we always grab DOLFIN specific data from the passed
> > > > > > >>>>>>> form instead of grabbing from the cache. Not sure how
> > > > > > >>>>>>> easy this will be to implement, but I think we need to
> > > > > > >>>>>>> explore it, as the DOLFIN specific part of the form
> > > > > > >>>>>>> really has nothing to do with the generated Form.
> > > > > > >>>>>>> 
> > > > > > >>>>>>> Martin:
> > > > > > >>>>>>> Why is it important to have the _count in the repr of the
> > > > > > >>>>>>> form? I guess that is used in ufl algorithms? Would it be
> > > > > > >>>>>>> possible to include a second repr function, which did not
> > > > > > >>>>>>> include the count? This would then be used when the
> > > > > > >>>>>>> signature is checked for. We could then use that repr to
> > > > > > >>>>>>> generate a form which is stored in the memory cache. This
> > > > > > >>>>>>> would then be tripped for any DOLFIN specific objects.
> > > > > > >>>>>>> This should work as the _count attribute has nothing to
> > > > > > >>>>>>> do with what code gets generated, but it is essential
> > > > > > >>>>>>> for internal UFL algorithms, right?
> > > > > > >>>>>>> 
> > > > > > >>>>>>>>> I'm not very happy with this change.
> > > > > > >>>>>>>> 
> > > > > > >>>>>>>> The bright side is that slow and correct is a better
> > > > > > >>>>>>>> starting point than fast but wrong ;).
> > > > > > >>>>>>>> 
> > > > > > >>>>>>>> An easy fix is to attach the preprocessed form to a Form
> > > > > > >>>>>>>> object. This would work robustly if we can make forms
> > > > > > >>>>>>>> immutable once they've been compiled. Is it possible to
> > > > > > >>>>>>>> make a Python object immutable?
> > > > > > >>>>>>> 
> > > > > > >>>>>>> We can probably overload all setattribtue methods which
> > > > > > >>>>>>> prohibits a user to write to these but it might not be
> > > > > > >>>>>>> possible to prohibit a user to change attributes on
> > > > > > >>>>>>> instances owned by the Form. I guess this is similare to
> > > > > > >>>>>>> the difficulties of preserving constness in C++, but I
> > > > > > >>>>>>> think it is even harder in Python.
> > > > > > >>>>>> 
> > > > > > >>>>>> What if we have the FFC jit compiler return the
> > > > > > >>>>>> preprocessed form, and inside dolfin.Form simply do
> > > > > > >>>>>> 
> > > > > > >>>>>>     class Form(cpp.Form):
> > > > > > >>>>>>         def __init__(self, form, . . .. )
> > > > > > >>>>>>         ....
> > > > > > >>>>>>         
> > > > > > >>>>>>         (...., preprocessed_form) = jit(form, . . . . )
> > > > > > >>>>>>         
> > > > > > >>>>>>         form = preprocessed_form
> > > > > > >>>>>>         
> > > > > > >>>>>>         .....
> > > > > > >>>>>> 
> > > > > > >>>>>> This way, form will have form_data, and the FFC jit
> > > > > > >>>>>> function will know not to call ufl.preprocess.
> > > > > > >>>>> 
> > > > > > >>>>> Here's another strange thing. In the JITObject class, we
> > > > > > >>>>> have two functions: __hash__ and signature. As far as I
> > > > > > >>>>> understand, the first is used to located objects
> > > > > > >>>>> (generated code/modules) in the Instant in-memory cache,
> > > > > > >>>>> while the second is used for the on-disk cache.
> > > > > > >>>>> 
> > > > > > >>>>> >From some simple tests I did now, it looks like the
> > > > > > >>>>> >__hash__ function
> > > > > > >>>>> 
> > > > > > >>>>> does not need to any significant speedup. The JIT benchmark
> > > > > > >>>>> runs just as fast if I call signature from within __hash__.
> > > > > > >>>>> 
> > > > > > >>>>> Furthermore, the __hash__ function must also be broken
> > > > > > >>>>> since it relies on calling id on the form.
> > > > > > >>>>> 
> > > > > > >>>>> Ideally, we should get Instant to handle the caching, both
> > > > > > >>>>> in-memory and on-disk, by providing two functions __hash__
> > > > > > >>>>> (fast, for in-memory cache) and signature (slow, for
> > > > > > >>>>> on-disk cache).
> > > > > > >>>>> 
> > > > > > >>>>> Since __hash__ cannot call id, it must be able to attach a
> > > > > > >>>>> unique string to the form (perhaps based on an internal
> > > > > > >>>>> counter in FFC). My suggestion would be to add this to UFL,
> > > > > > >>>>> something like set_hash and hash (which would return None
> > > > > > >>>>> if set_hash has not been called). If Martin does not like
> > > > > > >>>>> that, we should be able to handle it on the DOLFIN side.
> > > > > > >>>>> 
> > > > > > >>>>> So in conclusion: no in-memory cache in FFC (handled by
> > > > > > >>>>> Instant) and FFC attaches a hash to incoming forms so that
> > > > > > >>>>> Instant may recognize them later.
> > > > > > >>>> 
> > > > > > >>>> The code that I disabled was caching preprocessed forms, so
> > > > > > >>>> I don't see how this can be handled by Instant.
> > > > > > >>> 
> > > > > > >>> The point would be that one could check that "hash" of the
> > > > > > >>> form (some unique string) instead of computing the signature
> > > > > > >>> which involves preprocessing the form.
> > > > > > >> 
> > > > > > >> How would the hash be computed? To check if the mesh has
> > > > > > >> changed, my limited understanding is that the entire object
> > > > > > >> would have to be serialised, and then a hash computed. How
> > > > > > >> expensive is that?
> > > > > > >> 
> > > > > > >> The issue that I ran into was not related to signatures. It
> > > > > > >> was related to the non-UFL data that is attached to
> > > > > > >> arguments.
> > > > > > > 
> > > > > > > The hash would be unique to each form. It could just be a
> > > > > > > counter value and the counter would be increased inside
> > > > > > > Instant for each object it gets as input.
> > > > > > 
> > > > > > But how does Instant know if a form is new? I also don't see why
> > > > > > Instant should need to know if the mesh associated with a form
> > > > > > has changed, but is for the rest the same. Wouldn't Instant need
> > > > > > to be DOLFIN-aware?
> > > > > 
> > > > > The hash() function would play the same role as the id() function
> > > > > before with the difference that we can't get the same id for a new
> > > > > form as for an old form that's gone out of scope.
> > > > > 
> > > > > Instant should not need to know anything it just does this:
> > > > >    check if object has a set_hash() function
> > > > >    if so, calls hash() to get the hash value
> > > > >    
> > > > >      checks the cache for that hash value
> > > > >    
> > > > >    if not, assign unique value by calling set_hash on the object
> > > > > 
> > > > > We would need to make sure from the DOLFIN side that when we change
> > > > > a Form, we also change the hash value (for example by setting it
> > > > > to None) which would trigger the Instant disk cache.
> > > > 
> > > > Sounds complicated...
> > > 
> > > I think it sounds very easy. Everything we need is there: Instant
> > > already has memory and disk cache. We just need to provide the proper
> > > input.
> > 
> > The set and get hash introduce another level of complexity.
> > 
> > > > Now the preprocessed form is stored in the original form. This will
> > > > never change. Whenever a form does not go out of scope the
> > > > preprocessed form will live.
> > > > 
> > > > Also Martin made it impossible to change a form without returning a
> > > > new instance. This prevents any changing of the original form while
> > > > keeping a preprocesses form attached to it.
> > > > 
> > > > If a form has a preprocessed form that will be used for code
> > > > generation. The preprocessed form will be used in instants memory
> > > > cache. The preprocessed form has nothing to do the any DOLFIN
> > > > objects that comes with the original form, such as mesh, expressions
> > > > and such.
> > > > 
> > > > Anything I have missed?
> > > 
> > > What about the __hash__ function in jitobject.py? It still calls
> > > id(). Isn't that a problem?
> > 
> > No. The __hash__ is only used to retreive the memory cached compiled
> > module in instant. This so we can retreive the compiled form. The
> > form_data including all DOLFIN dependent data are stored in form_data
> > which is just retrieved from the preprocessed form.
> 
> Why is this not problematic if a new form is created which happens to
> get the same id as an old destroyed form? I thought that was the
> original issue.

Good point. The hash is only used to retreive the compiled module from instant 
memory cache, right? Then I think we can safely base the hash on the 
signature.
 
Johan


> Anders (offline until tomorrow)
> 
> > A user can change this, but it seems that form_data is correctly updated
> > with the new mesh.
> > 
> > Btw: We could probably store the compiled form also in form_data. We
> > could then avoid going through instant memory cache :)
> > 
> > Johan



References