yade-dev team mailing list archive
-
yade-dev team
-
Mailing list archive
-
Message #00301
Re: Note on optimized compilation / optimized coding / profiling results
-
To:
Yade Development Group <yade-dev@xxxxxxxxxxxxxxxx>
-
From:
Janek Kozicki <janek_listy@xxxxx>
-
Date:
Mon, 10 Mar 2008 03:21:50 +0100
-
Delivered-to:
janek_listy@xxxxx (janek_listy)
-
Delivered-to:
yade-dev@xxxxxxxxxxxxxxxx
-
Face:
iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAALVBMVEUBAQEtLS1KSkpRUVFXV1dYWFhjY2Nzc3N3d3eHh4eKioqdnZ24uLjLy8vc3NxVIagyAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH2AIVEzgS1fgQtQAAAjRJREFUOMtt1DFv00AUAOAzFQNbjigSyoQaRaBMhKgLUyKXpVNNeUpk9vyDqFJhQ1kiBuaqAwJCqvPtSLY7RlTn5+5IdnYkkt/AOyfxXVLe5vf53Z1875kd34tOEax8djmj6GyjhB5bxz50GdsVZr9fqRjZwAtKOJw5Wqs2MMZ16ALHsaDncF7xAHix1oEFHAB8f+pRjcO4gfZDykcYzbiucRolOLUJ6kjA0xtVt+A6TySlM0RajIpK6DzwKZ/nOYbF/gclHMo1ZOHYY/+Ha+AWuM+3oMS4eeqYzZ8FiCltgUqI8cd2wwAVpJk+8LWYjBtnJdQpHQqJMd4Oxt4bU9ESiFGc5hkqaH74asAX4iabP5I5gZ+qjgGlJCqZa3h3lxhoeVcSE1qLQC4sqKOK9MGW9E3izFqqHokoztLFEgXg31sbZEKnWi2T74A4NxfVQqlkjKtcAWD+zcArFEES01dR0E/nnV0IgugmDd/2L84sOAouRBBHEc7gtc8teDkRlE0iNQPo2w3Xhh/D4TCIQ4LRLoTvgwjj6RRgavdurxYGMaIuGOyAW/PpNlCcU9/93AHenAWYjPoAwa+G3e3to/MgFNTAEKvKDjzuCzHTnY3qqdXtx24VijzQfZ0yewZ5cwRFQaa+mIYr1uI0I76+3W4xhlvoVRwOA0Fdl64HlJnxP6T8YpX/Lga4Wv4A3ErrU5oTfN7Mu/llXMl8RXEPji/lQkN3H7qXqgC2By47EXeU/7PJ/wPxRKMnuZwIeAAAAABJRU5ErkJggg==
-
In-reply-to:
<47C3CAC4.2070609@arcig.cz>
-
Reply-to:
Yade Development Group <yade-dev@xxxxxxxxxxxxxxxx>
-
Sender:
yade-dev-bounces@xxxxxxxxxxxxxxxx
Václav S(milauer said: (by the date of Tue, 26 Feb 2008 09:16:04 +0100)
> OK, a few ideas for low-hanging optimizations. Tell me if you thing
> they will not work.
>
> 1. LocateMultivirtualFunctor can cache last arguments as well as the
> return value at successful lookup. Since it typically does the same
> lookup over and over (takes 50% of Cundall damping), it could be reduced
> to almost zero (I think):
>
> if(index1==cachedIndex1 && index2==cachedIndex2) return cachedResult;
> /* at the beginning of LocateMultivirtualFunctor */
>
> Given c++ short-circuits conditions, it will add one integer comparison
> if we have a "cache miss", no big deal.
yeah. I can't tell now, because I don't remember how I did this. I've
put weeks of effort into MultiMethods stuff (and I tried to make it
as fast as possible). So I'm curious why now it takes 40% of
execution time. After some digging in it I will remember what I've
been doing there.
> (BTW, Janek, couldn't we use some template and virtual functions magic
> to have the compiler do the dispatching code? There is plenty of stuff
> like that in boost, have a look at extending boost::range with new
> classes. Don't know how it works internally, though.
> We also probably would have to know all candidates at compile-time.)
Yes. That's the issue. If we know all candidates at compile time
this and other stuff [*] can be optimized using templates.
Unfortunately the initial design was to allow adding plugins to
already compiled binaries. If we decided now to change this thing
(and tell compiler to make dispatching) - it will require to
recompile more stuff, when you add a new EngineUnit.
Maybe we should do this? I dunno. I'm sure there will be some speed gain.
And lot's of problems with dynamic plugin loading will disappear.
Serious drawback is that later it won't be possible to use yade as a
library (well, it's also not possible now ;). By using as a library I
mean - only headers and binaries are installed. You write some
engines in your homedir (outside of yade directory, which by the way
does not exist anywhere on the computer), compile them and they work
when you execute /usr/bin/yade binary (assuming that yade is
installed on the system just like eny other software).
> 2. I will create a new class CundallForceAndMomentumDamping, which will
> act on RigidBodyParameters and will be inside the same dispatcher as
> CundallnonViscousForceDamping (for ParticleParameters). This way, one
yeah, Bruno already did this :)
> loop over bodies will be gotten rid of. I know the force damping code
> will be duplicated at 2 places, but it is just a few lines of code.
About code duplication: Bruno told me that he doesn't care. And he
can copy hundreds of lines (single file) to write a new contact Law.
I consider this very bad, because when you find a bug, or improve
something so it works faster/better only a single copy is changed, and
all the others get out of sync. And finally you get a mess in which
you don't know what is fast and/or correct.
I want to avoid code duplication whenever possible. I believe there
is a way to do that without any speed penalty, we only need to think
about this a bit. And I'm willing to "shrink" yade code by removing
all duplicates.
First, obvious way, is to correctly inherit classes from each other,
and call virtual functions which contain the code (previously
duplicated).
Another way, if we decided to go template-way (see above [*]) is to
use template inlines, which would even remove the cost of calling
virtual function (the compiler will insert the right code directly
into the spot). I like this solution a bit. But all inheritance will
have to be done by templates, without virtual functions.
Static inheritance (by templates, on compile time) is mutually
exclusive with dynamic inheritance (by virtual calls, on runtime).
But is also faster. Due to C++ limitations we cannot mix both.
Oh, you may say that in yade they are mixed. So OK, to be more
correct: when you go from top to bottom of inheritance tree - once
you start dynamic inheritances you cannot go back and use static
inheritances in levels below. The solution I mentioned above assumes
that we have static inheritance at the very bottom of inheritance
tree.
regards
--
Janek Kozicki |
_______________________________________________
yade-dev mailing list
yade-dev@xxxxxxxxxxxxxxxx
https://lists.berlios.de/mailman/listinfo/yade-dev
References