← Back to team overview

yade-dev team mailing list archive

Re: Note on optimized compilation / optimized coding / profiling results

 

Václav S a écrit :
>> The damping dispatcher takes 89% of cpu time to do 
>> something else, namely :
>> - LocateMultivirtualFunctor : 50%
>>     
> Good to know.
>   

Let me precise this : I'm not sure the dispatching mecanism is slow by 
itself. What takes a lot of time in the dispatcher is for instance 
BodyMacroParameters::getBaseClassIndex()  (around 50%), or the function 
find() I spoke about before (10%). I'm attaching a file with the list of 
function costs in LocateMultivirtualFunctor .

Note that getBaseClassIndex() is 10 times slower that getClassIndex(), I 
wonder why...

Bruno





> Another way would be to have boost::ublas::compressed_matrix for the 2D
> case, if 2D array of some 10000 elements is too much (OK, if we have
> 1000 classes once, that would mean 1e6 which is not nice). Should be
> reasonably fast as well. No 3D functors, though ;-).
>
> Will play with that if I find some time. That would sound like big speed
> improvement (there is quite a few dispatchers in the simulation, if they
> take 30% of time, that means 15% for dispatching... wow.
>
> I thought originally this stuff was part of Loki and was well optimized.
>
>   
>> A simple example :
>>
>> shared_ptr<PhysicalAction>& PhysicalActionVectorVector::find(unsigned 
>> int id , int actionIndex )
>> {
>>     if( current_size <= id ) // this is very rarely executed, only at 
>> beginning.
>>     // somebody is accesing out of bounds, make sure he will find, what 
>> he needs - a resetted PhysicalAction of his type
>>     {
>>         ....
>>     }
>>     usedIds[id] = true;
>>     return physicalActions[id][actionIndex];
>> }
>>
>> 1. There is this test at the begining, that is useless all the time 
>> except at iteration 1.
>> 2. there is this usedIds[id] affectation (same remark again, only 
>> modified at iteration 1).
>> 3. Then return a reference to a shared pointer (while shared_ptr 
>> operations are slower than normal).
>>   
>>     
> You're right, I thought that 1. and 2. was the business of ::prepare,
> which is called from PhysicalActionContainerInitializer. It should be
> the user's responsibility to use valid indices, right?
>
> UsedIds, that's a different story; it says what index contains a
> non-null action so that we can iterate over non-null actions. But this
> flag could be put into the action instance itself. Then
> physicalActions[id][actionIndex] could be used directly.
>   
>> Of course, this function "find" is used many many times. Result 3.81% of 
>> total CPU time just for it, while we could just use 
>> physicalActions[id][actionIndex] instead of find(), and reduce that time 
>> a lot. But of course physicalActions is private...
>>   
>>     
> Just change that in the header and make it public, no problem. If the
> "user" screws stuff, it it his problem, not mine. I think Olivier liked
> having lot of stuff private, but then we have overheads for for
> accessors. Sadly, c++ has no way to say: this member is public for
> reading and private for writing, which could help in many cases.
>   
>> Janek : I know that containers will be changed anyway, and that is why I 
>> think this is the right moment to cry about speed. :)
>> Users don't care about Godwin's law, they need SPEED!!!
>>   
>>     
> Cosurgi, should I fiddle with PhysicalActionContainer or is it going to
> be changed anyway?
>   
>> Yes, I thought to that too, but I was not expert enough to be sure. 
>> Can't the compiler check that nothing will be affected in some situations?
>>   
>>     
> It can in some, but I don't know in which ones. Putting const everywhere
> also checks that there are no side-effects of the method on its
> instance, for example.
>
> Another thing is that many methods should be inlined, but grepping
> through headers reveals that only very small subset is: for the most
> part, the openGl wrapper. Inlining functions is turned on by the -O3
> flag to g++, so we don't gain nothing here probably. But gcc manpage
> says for -finline-limit: " This option is particularly useful for
> programs that use inlining heavily such as those based on recursive
> templates with C++." which is just our case.
>
> And by the way, I just found out that assert(...) is not compiled-out in
> optimized builds of scons, since we don't define NDEBUG. Will be fixed
> in next commit. No big deal ;-)
> _______________________________________________
> yade-dev mailing list
> yade-dev@xxxxxxxxxxxxxxxx
> https://lists.berlios.de/mailman/listinfo/yade-dev
>
>   


-- 
 
_______________
Chareyre Bruno
Maitre de conference

Institut National Polytechnique de Grenoble
Laboratoire 3S (Soils Solids Structures) - bureau E145
BP 53 - 38041, Grenoble cedex 9 - France
Tél : 33 4 56 52 86 21
Fax : 33 4 76 82 70 43
________________

_______________________________________________
yade-dev mailing list
yade-dev@xxxxxxxxxxxxxxxx
https://lists.berlios.de/mailman/listinfo/yade-dev



Follow ups

References