← Back to team overview

ffc team mailing list archive

Re: [Bug 769811] [NEW] JIT cache problem with id(form)

 


On 25/04/11 23:04, Anders Logg wrote:
> On Mon, Apr 25, 2011 at 10:56:25PM +0100, Garth N. Wells wrote:
>>
>>
>> On 25/04/11 22:48, Anders Logg wrote:
>>> On Mon, Apr 25, 2011 at 10:41:58PM +0100, Garth N. Wells wrote:
>>>>
>>>>
>>>> On 25/04/11 22:33, Anders Logg wrote:
>>>>> On Mon, Apr 25, 2011 at 10:26:18PM +0100, Garth N. Wells wrote:
>>>>>>
>>>>>>
>>>>>> On 25/04/11 22:08, Anders Logg wrote:
>>>>>>> On Mon, Apr 25, 2011 at 07:40:21PM -0000, Garth Wells wrote:
>>>>>>>> On 25/04/11 20:00, Johan Hake wrote:
>>>>>>>>> On Monday April 25 2011 11:26:36 Garth Wells wrote:
>>>>>>>>>> On 25/04/11 18:51, Anders Logg wrote:
>>>>>>>>>>> On Mon, Apr 25, 2011 at 05:11:41PM -0000, Garth Wells wrote:
>>>>>>>>>>>> On 25/04/11 17:53, Johan Hake wrote:
>>>>>>>>>>>>> On Monday April 25 2011 08:59:18 Garth Wells wrote:
>>>>>>>>>>>>>> On 25/04/11 16:47, Johan Hake wrote:
>>>>>>>>>>>>>>> Commenting out the cache is really not a fix. The problem is within
>>>>>>>>>>>>>>> dolfin. Isn't there another way to deal with this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is a fix if the cache isn't needed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> First: How much penalty are there with a disabled memory cache. Maybe
>>>>>>>>>>>>>>> the problem isn't that bad?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't get the point of this cache. The way it is now, a form is only
>>>>>>>>>>>>>> preprocessed if it hasn't already been preprocessed, which seems ok to
>>>>>>>>>>>>>> me. The old code tried to avoid some preprocessing, but it was highly
>>>>>>>>>>>>>> dubious and I doubt that it was effective.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the preprocessing stage actually do take some time. AFAIK the
>>>>>>>>>>>>> preproces stage essentially do two things. It creates a canonical
>>>>>>>>>>>>> version of the Form so two Forms that are the same, but constructed at
>>>>>>>>>>>>> different times are beeing treated equal wrt form generation. Then are
>>>>>>>>>>>>> DOLFIN specific guys extracted. I am not sure what takes the most
>>>>>>>>>>>>> time. We should probably profiel it... But if it is the latter we
>>>>>>>>>>>>> could consider putting another cache in place which is more robust wrt
>>>>>>>>>>>>> changing DOLFIN objects.
>>>>>>>>>>>>
>>>>>>>>>>>> It should be easy to avoid the overhead of preprocessing by keeping the
>>>>>>>>>>>> object in scope. If the object changes, the only robust way to make sure
>>>>>>>>>>>> that the form is the same as one in the cache is to compare all the
>>>>>>>>>>>> data. This requires preprocessing the form, which then defeats the
>>>>>>>>>>>> purpose of a cache. It may be possible to add a lightweight preprocess
>>>>>>>>>>>> to UFL, but I don't think that it's worth the effort or extra
>>>>>>>>>>>> complication.
>>>>>>>>>
>>>>>>>>> I think a light weight version might be the way to go. This is then stored in
>>>>>>>>> memory cache. If we are able to strip such a form for all DOLFIN specific
>>>>>>>>> things we would also prevent huge memory leaks with mesh beeing kept.
>>>>>>>>>
>>>>>>>>> Then we always grab DOLFIN specific data from the passed form instead of
>>>>>>>>> grabbing from the cache. Not sure how easy this will be to implement, but I
>>>>>>>>> think we need to explore it, as the DOLFIN specific part of the form really
>>>>>>>>> has nothing to do with the generated Form.
>>>>>>>>>
>>>>>>>>> Martin:
>>>>>>>>> Why is it important to have the _count in the repr of the form? I guess that
>>>>>>>>> is used in ufl algorithms? Would it be possible to include a second repr
>>>>>>>>> function, which did not include the count? This would then be used when the
>>>>>>>>> signature is checked for. We could then use that repr to generate a form which
>>>>>>>>> is stored in the memory cache. This would then be tripped for any DOLFIN
>>>>>>>>> specific objects. This should work as the _count attribute has nothing to do
>>>>>>>>> with what code gets generated, but it is essential for internal UFL
>>>>>>>>> algorithms, right?
>>>>>>>>>
>>>>>>>>>>> I'm not very happy with this change.
>>>>>>>>>>
>>>>>>>>>> The bright side is that slow and correct is a better starting point than
>>>>>>>>>> fast but wrong ;).
>>>>>>>>>>
>>>>>>>>>> An easy fix is to attach the preprocessed form to a Form object. This
>>>>>>>>>> would work robustly if we can make forms immutable once they've been
>>>>>>>>>> compiled. Is it possible to make a Python object immutable?
>>>>>>>>>
>>>>>>>>> We can probably overload all setattribtue methods which prohibits a user to
>>>>>>>>> write to these but it might not be possible to prohibit a user to change
>>>>>>>>> attributes on instances owned by the Form. I guess this is similare to the
>>>>>>>>> difficulties of preserving constness in C++, but I think it is even harder in
>>>>>>>>> Python.
>>>>>>>>>
>>>>>>>>
>>>>>>>> What if we have the FFC jit compiler return the preprocessed form, and
>>>>>>>> inside dolfin.Form simply do
>>>>>>>>
>>>>>>>>     class Form(cpp.Form):
>>>>>>>>         def __init__(self, form, . . .. )
>>>>>>>>         ....
>>>>>>>>
>>>>>>>>         (...., preprocessed_form) = jit(form, . . . . )
>>>>>>>>
>>>>>>>>         form = preprocessed_form
>>>>>>>>
>>>>>>>>         .....
>>>>>>>>
>>>>>>>> This way, form will have form_data, and the FFC jit function will know
>>>>>>>> not to call ufl.preprocess.
>>>>>>>
>>>>>>> Here's another strange thing. In the JITObject class, we have two
>>>>>>> functions: __hash__ and signature. As far as I understand, the first
>>>>>>> is used to located objects (generated code/modules) in the Instant
>>>>>>> in-memory cache, while the second is used for the on-disk cache.
>>>>>>>
>>>>>>> >From some simple tests I did now, it looks like the __hash__ function
>>>>>>> does not need to any significant speedup. The JIT benchmark runs just
>>>>>>> as fast if I call signature from within __hash__.
>>>>>>>
>>>>>>> Furthermore, the __hash__ function must also be broken since it relies
>>>>>>> on calling id on the form.
>>>>>>>
>>>>>>> Ideally, we should get Instant to handle the caching, both in-memory
>>>>>>> and on-disk, by providing two functions __hash__ (fast, for in-memory
>>>>>>> cache) and signature (slow, for on-disk cache).
>>>>>>>
>>>>>>> Since __hash__ cannot call id, it must be able to attach a unique
>>>>>>> string to the form (perhaps based on an internal counter in FFC).
>>>>>>> My suggestion would be to add this to UFL, something like set_hash
>>>>>>> and hash (which would return None if set_hash has not been called).
>>>>>>> If Martin does not like that, we should be able to handle it on the
>>>>>>> DOLFIN side.
>>>>>>>
>>>>>>> So in conclusion: no in-memory cache in FFC (handled by Instant) and
>>>>>>> FFC attaches a hash to incoming forms so that Instant may recognize
>>>>>>> them later.
>>>>>>>
>>>>>>
>>>>>> The code that I disabled was caching preprocessed forms, so I don't see
>>>>>> how this can be handled by Instant.
>>>>>
>>>>> The point would be that one could check that "hash" of the form (some
>>>>> unique string) instead of computing the signature which involves
>>>>> preprocessing the form.
>>>>>
>>>>
>>>> How would the hash be computed? To check if the mesh has changed, my
>>>> limited understanding is that the entire object would have to be
>>>> serialised, and then a hash computed. How expensive is that?
>>>>
>>>> The issue that I ran into was not related to signatures. It was related
>>>> to the non-UFL data that is attached to arguments.
>>>
>>> The hash would be unique to each form. It could just be a counter
>>> value and the counter would be increased inside Instant for each
>>> object it gets as input.
>>
>> But how does Instant know if a form is new? I also don't see why Instant
>> should need to know if the mesh associated with a form has changed, but
>> is for the rest the same. Wouldn't Instant need to be DOLFIN-aware?
> 
> The hash() function would play the same role as the id() function
> before with the difference that we can't get the same id for a new
> form as for an old form that's gone out of scope.
> 
> Instant should not need to know anything it just does this:
> 
>    check if object has a set_hash() function
>    if so, calls hash() to get the hash value
>      checks the cache for that hash value
>    if not, assign unique value by calling set_hash on the object
> 
> We would need to make sure from the DOLFIN side that when we change a
> Form, we also change the hash value (for example by setting it to
> None) which would trigger the Instant disk cache.
> 

This is the problem - how do we know that it's changed?

The original issue is not related to disk vs memory cache, or Instant.
It is how to avoid calling ufl.preprocess unnecessarily when the form
repr() is unchanged but the mesh has been changed.

Garth


> --
> Anders



Follow ups

References