← Back to team overview

ufl team mailing list archive

Re: [Ffc] [Bug 769811] [NEW] JIT cache problem with id(form)

 

On Monday April 25 2011 15:04:43 Anders Logg wrote:
> On Mon, Apr 25, 2011 at 10:56:25PM +0100, Garth N. Wells wrote:
> > On 25/04/11 22:48, Anders Logg wrote:
> > > On Mon, Apr 25, 2011 at 10:41:58PM +0100, Garth N. Wells wrote:
> > >> On 25/04/11 22:33, Anders Logg wrote:
> > >>> On Mon, Apr 25, 2011 at 10:26:18PM +0100, Garth N. Wells wrote:
> > >>>> On 25/04/11 22:08, Anders Logg wrote:
> > >>>>> On Mon, Apr 25, 2011 at 07:40:21PM -0000, Garth Wells wrote:
> > >>>>>> On 25/04/11 20:00, Johan Hake wrote:
> > >>>>>>> On Monday April 25 2011 11:26:36 Garth Wells wrote:
> > >>>>>>>> On 25/04/11 18:51, Anders Logg wrote:
> > >>>>>>>>> On Mon, Apr 25, 2011 at 05:11:41PM -0000, Garth Wells wrote:
> > >>>>>>>>>> On 25/04/11 17:53, Johan Hake wrote:
> > >>>>>>>>>>> On Monday April 25 2011 08:59:18 Garth Wells wrote:
> > >>>>>>>>>>>> On 25/04/11 16:47, Johan Hake wrote:
> > >>>>>>>>>>>>> Commenting out the cache is really not a fix. The problem
> > >>>>>>>>>>>>> is within dolfin. Isn't there another way to deal with
> > >>>>>>>>>>>>> this?
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> It is a fix if the cache isn't needed.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> Sure.
> > >>>>>>>>>>> 
> > >>>>>>>>>>>>> First: How much penalty are there with a disabled memory
> > >>>>>>>>>>>>> cache. Maybe the problem isn't that bad?
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> I don't get the point of this cache. The way it is now, a
> > >>>>>>>>>>>> form is only preprocessed if it hasn't already been
> > >>>>>>>>>>>> preprocessed, which seems ok to me. The old code tried to
> > >>>>>>>>>>>> avoid some preprocessing, but it was highly dubious and I
> > >>>>>>>>>>>> doubt that it was effective.
> > >>>>>>>>>>> 
> > >>>>>>>>>>> I think the preprocessing stage actually do take some time.
> > >>>>>>>>>>> AFAIK the preproces stage essentially do two things. It
> > >>>>>>>>>>> creates a canonical version of the Form so two Forms that
> > >>>>>>>>>>> are the same, but constructed at different times are beeing
> > >>>>>>>>>>> treated equal wrt form generation. Then are DOLFIN specific
> > >>>>>>>>>>> guys extracted. I am not sure what takes the most time. We
> > >>>>>>>>>>> should probably profiel it... But if it is the latter we
> > >>>>>>>>>>> could consider putting another cache in place which is more
> > >>>>>>>>>>> robust wrt changing DOLFIN objects.
> > >>>>>>>>>> 
> > >>>>>>>>>> It should be easy to avoid the overhead of preprocessing by
> > >>>>>>>>>> keeping the object in scope. If the object changes, the only
> > >>>>>>>>>> robust way to make sure that the form is the same as one in
> > >>>>>>>>>> the cache is to compare all the data. This requires
> > >>>>>>>>>> preprocessing the form, which then defeats the purpose of a
> > >>>>>>>>>> cache. It may be possible to add a lightweight preprocess to
> > >>>>>>>>>> UFL, but I don't think that it's worth the effort or extra
> > >>>>>>>>>> complication.
> > >>>>>>> 
> > >>>>>>> I think a light weight version might be the way to go. This is
> > >>>>>>> then stored in memory cache. If we are able to strip such a form
> > >>>>>>> for all DOLFIN specific things we would also prevent huge memory
> > >>>>>>> leaks with mesh beeing kept.
> > >>>>>>> 
> > >>>>>>> Then we always grab DOLFIN specific data from the passed form
> > >>>>>>> instead of grabbing from the cache. Not sure how easy this will
> > >>>>>>> be to implement, but I think we need to explore it, as the
> > >>>>>>> DOLFIN specific part of the form really has nothing to do with
> > >>>>>>> the generated Form.
> > >>>>>>> 
> > >>>>>>> Martin:
> > >>>>>>> Why is it important to have the _count in the repr of the form? I
> > >>>>>>> guess that is used in ufl algorithms? Would it be possible to
> > >>>>>>> include a second repr function, which did not include the count?
> > >>>>>>> This would then be used when the signature is checked for. We
> > >>>>>>> could then use that repr to generate a form which is stored in
> > >>>>>>> the memory cache. This would then be tripped for any DOLFIN
> > >>>>>>> specific objects. This should work as the _count attribute has
> > >>>>>>> nothing to do with what code gets generated, but it is essential
> > >>>>>>> for internal UFL algorithms, right?
> > >>>>>>> 
> > >>>>>>>>> I'm not very happy with this change.
> > >>>>>>>> 
> > >>>>>>>> The bright side is that slow and correct is a better starting
> > >>>>>>>> point than fast but wrong ;).
> > >>>>>>>> 
> > >>>>>>>> An easy fix is to attach the preprocessed form to a Form object.
> > >>>>>>>> This would work robustly if we can make forms immutable once
> > >>>>>>>> they've been compiled. Is it possible to make a Python object
> > >>>>>>>> immutable?
> > >>>>>>> 
> > >>>>>>> We can probably overload all setattribtue methods which prohibits
> > >>>>>>> a user to write to these but it might not be possible to
> > >>>>>>> prohibit a user to change attributes on instances owned by the
> > >>>>>>> Form. I guess this is similare to the difficulties of preserving
> > >>>>>>> constness in C++, but I think it is even harder in Python.
> > >>>>>> 
> > >>>>>> What if we have the FFC jit compiler return the preprocessed form,
> > >>>>>> and inside dolfin.Form simply do
> > >>>>>> 
> > >>>>>>     class Form(cpp.Form):
> > >>>>>>         def __init__(self, form, . . .. )
> > >>>>>>         ....
> > >>>>>>         
> > >>>>>>         (...., preprocessed_form) = jit(form, . . . . )
> > >>>>>>         
> > >>>>>>         form = preprocessed_form
> > >>>>>>         
> > >>>>>>         .....
> > >>>>>> 
> > >>>>>> This way, form will have form_data, and the FFC jit function will
> > >>>>>> know not to call ufl.preprocess.
> > >>>>> 
> > >>>>> Here's another strange thing. In the JITObject class, we have two
> > >>>>> functions: __hash__ and signature. As far as I understand, the
> > >>>>> first is used to located objects (generated code/modules) in the
> > >>>>> Instant in-memory cache, while the second is used for the on-disk
> > >>>>> cache.
> > >>>>> 
> > >>>>> >From some simple tests I did now, it looks like the __hash__
> > >>>>> >function
> > >>>>> 
> > >>>>> does not need to any significant speedup. The JIT benchmark runs
> > >>>>> just as fast if I call signature from within __hash__.
> > >>>>> 
> > >>>>> Furthermore, the __hash__ function must also be broken since it
> > >>>>> relies on calling id on the form.
> > >>>>> 
> > >>>>> Ideally, we should get Instant to handle the caching, both
> > >>>>> in-memory and on-disk, by providing two functions __hash__ (fast,
> > >>>>> for in-memory cache) and signature (slow, for on-disk cache).
> > >>>>> 
> > >>>>> Since __hash__ cannot call id, it must be able to attach a unique
> > >>>>> string to the form (perhaps based on an internal counter in FFC).
> > >>>>> My suggestion would be to add this to UFL, something like set_hash
> > >>>>> and hash (which would return None if set_hash has not been called).
> > >>>>> If Martin does not like that, we should be able to handle it on the
> > >>>>> DOLFIN side.
> > >>>>> 
> > >>>>> So in conclusion: no in-memory cache in FFC (handled by Instant)
> > >>>>> and FFC attaches a hash to incoming forms so that Instant may
> > >>>>> recognize them later.
> > >>>> 
> > >>>> The code that I disabled was caching preprocessed forms, so I don't
> > >>>> see how this can be handled by Instant.
> > >>> 
> > >>> The point would be that one could check that "hash" of the form (some
> > >>> unique string) instead of computing the signature which involves
> > >>> preprocessing the form.
> > >> 
> > >> How would the hash be computed? To check if the mesh has changed, my
> > >> limited understanding is that the entire object would have to be
> > >> serialised, and then a hash computed. How expensive is that?
> > >> 
> > >> The issue that I ran into was not related to signatures. It was
> > >> related to the non-UFL data that is attached to arguments.
> > > 
> > > The hash would be unique to each form. It could just be a counter
> > > value and the counter would be increased inside Instant for each
> > > object it gets as input.
> > 
> > But how does Instant know if a form is new? I also don't see why Instant
> > should need to know if the mesh associated with a form has changed, but
> > is for the rest the same. Wouldn't Instant need to be DOLFIN-aware?
> 
> The hash() function would play the same role as the id() function
> before with the difference that we can't get the same id for a new
> form as for an old form that's gone out of scope.
> 
> Instant should not need to know anything it just does this:
> 
>    check if object has a set_hash() function
>    if so, calls hash() to get the hash value
>      checks the cache for that hash value
>    if not, assign unique value by calling set_hash on the object
> 
> We would need to make sure from the DOLFIN side that when we change a
> Form, we also change the hash value (for example by setting it to
> None) which would trigger the Instant disk cache.

Sounds complicated...

Now the preprocessed form is stored in the original form. This will never 
change. Whenever a form does not go out of scope the preprocessed form will 
live. 

Also Martin made it impossible to change a form without returning a new 
instance. This prevents any changing of the original form while keeping a 
preprocesses form attached to it.

If a form has a preprocessed form that will be used for code generation. The 
preprocessed form will be used in instants memory cache. The preprocessed form 
has nothing to do the any DOLFIN objects that comes with the original form, 
such as mesh, expressions and such.

Anything I have missed?

Johan

> --
> Anders
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~ufl
> Post to     : ufl@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~ufl
> More help   : https://help.launchpad.net/ListHelp



Follow ups

References