← Back to team overview

ffc team mailing list archive

Re: [Ufl] [Bug 769811] [NEW] JIT cache problem with id(form)

 


On 25/04/11 23:14, Johan Hake wrote:
> On Monday April 25 2011 15:04:43 Anders Logg wrote:
>> On Mon, Apr 25, 2011 at 10:56:25PM +0100, Garth N. Wells wrote:
>>> On 25/04/11 22:48, Anders Logg wrote:
>>>> On Mon, Apr 25, 2011 at 10:41:58PM +0100, Garth N. Wells wrote:
>>>>> On 25/04/11 22:33, Anders Logg wrote:
>>>>>> On Mon, Apr 25, 2011 at 10:26:18PM +0100, Garth N. Wells wrote:
>>>>>>> On 25/04/11 22:08, Anders Logg wrote:
>>>>>>>> On Mon, Apr 25, 2011 at 07:40:21PM -0000, Garth Wells wrote:
>>>>>>>>> On 25/04/11 20:00, Johan Hake wrote:
>>>>>>>>>> On Monday April 25 2011 11:26:36 Garth Wells wrote:
>>>>>>>>>>> On 25/04/11 18:51, Anders Logg wrote:
>>>>>>>>>>>> On Mon, Apr 25, 2011 at 05:11:41PM -0000, Garth Wells wrote:
>>>>>>>>>>>>> On 25/04/11 17:53, Johan Hake wrote:
>>>>>>>>>>>>>> On Monday April 25 2011 08:59:18 Garth Wells wrote:
>>>>>>>>>>>>>>> On 25/04/11 16:47, Johan Hake wrote:
>>>>>>>>>>>>>>>> Commenting out the cache is really not a fix. The problem
>>>>>>>>>>>>>>>> is within dolfin. Isn't there another way to deal with
>>>>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is a fix if the cache isn't needed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> First: How much penalty are there with a disabled memory
>>>>>>>>>>>>>>>> cache. Maybe the problem isn't that bad?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't get the point of this cache. The way it is now, a
>>>>>>>>>>>>>>> form is only preprocessed if it hasn't already been
>>>>>>>>>>>>>>> preprocessed, which seems ok to me. The old code tried to
>>>>>>>>>>>>>>> avoid some preprocessing, but it was highly dubious and I
>>>>>>>>>>>>>>> doubt that it was effective.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the preprocessing stage actually do take some time.
>>>>>>>>>>>>>> AFAIK the preproces stage essentially do two things. It
>>>>>>>>>>>>>> creates a canonical version of the Form so two Forms that
>>>>>>>>>>>>>> are the same, but constructed at different times are beeing
>>>>>>>>>>>>>> treated equal wrt form generation. Then are DOLFIN specific
>>>>>>>>>>>>>> guys extracted. I am not sure what takes the most time. We
>>>>>>>>>>>>>> should probably profiel it... But if it is the latter we
>>>>>>>>>>>>>> could consider putting another cache in place which is more
>>>>>>>>>>>>>> robust wrt changing DOLFIN objects.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It should be easy to avoid the overhead of preprocessing by
>>>>>>>>>>>>> keeping the object in scope. If the object changes, the only
>>>>>>>>>>>>> robust way to make sure that the form is the same as one in
>>>>>>>>>>>>> the cache is to compare all the data. This requires
>>>>>>>>>>>>> preprocessing the form, which then defeats the purpose of a
>>>>>>>>>>>>> cache. It may be possible to add a lightweight preprocess to
>>>>>>>>>>>>> UFL, but I don't think that it's worth the effort or extra
>>>>>>>>>>>>> complication.
>>>>>>>>>>
>>>>>>>>>> I think a light weight version might be the way to go. This is
>>>>>>>>>> then stored in memory cache. If we are able to strip such a form
>>>>>>>>>> for all DOLFIN specific things we would also prevent huge memory
>>>>>>>>>> leaks with mesh beeing kept.
>>>>>>>>>>
>>>>>>>>>> Then we always grab DOLFIN specific data from the passed form
>>>>>>>>>> instead of grabbing from the cache. Not sure how easy this will
>>>>>>>>>> be to implement, but I think we need to explore it, as the
>>>>>>>>>> DOLFIN specific part of the form really has nothing to do with
>>>>>>>>>> the generated Form.
>>>>>>>>>>
>>>>>>>>>> Martin:
>>>>>>>>>> Why is it important to have the _count in the repr of the form? I
>>>>>>>>>> guess that is used in ufl algorithms? Would it be possible to
>>>>>>>>>> include a second repr function, which did not include the count?
>>>>>>>>>> This would then be used when the signature is checked for. We
>>>>>>>>>> could then use that repr to generate a form which is stored in
>>>>>>>>>> the memory cache. This would then be tripped for any DOLFIN
>>>>>>>>>> specific objects. This should work as the _count attribute has
>>>>>>>>>> nothing to do with what code gets generated, but it is essential
>>>>>>>>>> for internal UFL algorithms, right?
>>>>>>>>>>
>>>>>>>>>>>> I'm not very happy with this change.
>>>>>>>>>>>
>>>>>>>>>>> The bright side is that slow and correct is a better starting
>>>>>>>>>>> point than fast but wrong ;).
>>>>>>>>>>>
>>>>>>>>>>> An easy fix is to attach the preprocessed form to a Form object.
>>>>>>>>>>> This would work robustly if we can make forms immutable once
>>>>>>>>>>> they've been compiled. Is it possible to make a Python object
>>>>>>>>>>> immutable?
>>>>>>>>>>
>>>>>>>>>> We can probably overload all setattribtue methods which prohibits
>>>>>>>>>> a user to write to these but it might not be possible to
>>>>>>>>>> prohibit a user to change attributes on instances owned by the
>>>>>>>>>> Form. I guess this is similare to the difficulties of preserving
>>>>>>>>>> constness in C++, but I think it is even harder in Python.
>>>>>>>>>
>>>>>>>>> What if we have the FFC jit compiler return the preprocessed form,
>>>>>>>>> and inside dolfin.Form simply do
>>>>>>>>>
>>>>>>>>>     class Form(cpp.Form):
>>>>>>>>>         def __init__(self, form, . . .. )
>>>>>>>>>         ....
>>>>>>>>>         
>>>>>>>>>         (...., preprocessed_form) = jit(form, . . . . )
>>>>>>>>>         
>>>>>>>>>         form = preprocessed_form
>>>>>>>>>         
>>>>>>>>>         .....
>>>>>>>>>
>>>>>>>>> This way, form will have form_data, and the FFC jit function will
>>>>>>>>> know not to call ufl.preprocess.
>>>>>>>>
>>>>>>>> Here's another strange thing. In the JITObject class, we have two
>>>>>>>> functions: __hash__ and signature. As far as I understand, the
>>>>>>>> first is used to located objects (generated code/modules) in the
>>>>>>>> Instant in-memory cache, while the second is used for the on-disk
>>>>>>>> cache.
>>>>>>>>
>>>>>>>> >From some simple tests I did now, it looks like the __hash__
>>>>>>>>> function
>>>>>>>>
>>>>>>>> does not need to any significant speedup. The JIT benchmark runs
>>>>>>>> just as fast if I call signature from within __hash__.
>>>>>>>>
>>>>>>>> Furthermore, the __hash__ function must also be broken since it
>>>>>>>> relies on calling id on the form.
>>>>>>>>
>>>>>>>> Ideally, we should get Instant to handle the caching, both
>>>>>>>> in-memory and on-disk, by providing two functions __hash__ (fast,
>>>>>>>> for in-memory cache) and signature (slow, for on-disk cache).
>>>>>>>>
>>>>>>>> Since __hash__ cannot call id, it must be able to attach a unique
>>>>>>>> string to the form (perhaps based on an internal counter in FFC).
>>>>>>>> My suggestion would be to add this to UFL, something like set_hash
>>>>>>>> and hash (which would return None if set_hash has not been called).
>>>>>>>> If Martin does not like that, we should be able to handle it on the
>>>>>>>> DOLFIN side.
>>>>>>>>
>>>>>>>> So in conclusion: no in-memory cache in FFC (handled by Instant)
>>>>>>>> and FFC attaches a hash to incoming forms so that Instant may
>>>>>>>> recognize them later.
>>>>>>>
>>>>>>> The code that I disabled was caching preprocessed forms, so I don't
>>>>>>> see how this can be handled by Instant.
>>>>>>
>>>>>> The point would be that one could check that "hash" of the form (some
>>>>>> unique string) instead of computing the signature which involves
>>>>>> preprocessing the form.
>>>>>
>>>>> How would the hash be computed? To check if the mesh has changed, my
>>>>> limited understanding is that the entire object would have to be
>>>>> serialised, and then a hash computed. How expensive is that?
>>>>>
>>>>> The issue that I ran into was not related to signatures. It was
>>>>> related to the non-UFL data that is attached to arguments.
>>>>
>>>> The hash would be unique to each form. It could just be a counter
>>>> value and the counter would be increased inside Instant for each
>>>> object it gets as input.
>>>
>>> But how does Instant know if a form is new? I also don't see why Instant
>>> should need to know if the mesh associated with a form has changed, but
>>> is for the rest the same. Wouldn't Instant need to be DOLFIN-aware?
>>
>> The hash() function would play the same role as the id() function
>> before with the difference that we can't get the same id for a new
>> form as for an old form that's gone out of scope.
>>
>> Instant should not need to know anything it just does this:
>>
>>    check if object has a set_hash() function
>>    if so, calls hash() to get the hash value
>>      checks the cache for that hash value
>>    if not, assign unique value by calling set_hash on the object
>>
>> We would need to make sure from the DOLFIN side that when we change a
>> Form, we also change the hash value (for example by setting it to
>> None) which would trigger the Instant disk cache.
> 
> Sounds complicated...
> 
> Now the preprocessed form is stored in the original form. This will never 
> change. Whenever a form does not go out of scope the preprocessed form will 
> live. 
> 
> Also Martin made it impossible to change a form without returning a new 
> instance. This prevents any changing of the original form while keeping a 
> preprocesses form attached to it.
> 

Yes, and Martin wasn't too happy with some UFL changes in his absence
that broke this design. It is a lot cleaner not modifying objects
post-construction.

Garth

> If a form has a preprocessed form that will be used for code generation. The 
> preprocessed form will be used in instants memory cache. The preprocessed form 
> has nothing to do the any DOLFIN objects that comes with the original form, 
> such as mesh, expressions and such.
> 
> Anything I have missed?
> 
> Johan
> 
>> --
>> Anders
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~ufl
>> Post to     : ufl@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~ufl
>> More help   : https://help.launchpad.net/ListHelp



References