yade-users team mailing list archive
-
yade-users team
-
Mailing list archive
-
Message #01835
Re: Triax profiling on cluster
-
To:
yade-users@xxxxxxxxxxxxxxxxxxx
-
From:
Janek Kozicki <janek_listy@xxxxx>
-
Date:
Thu, 24 Sep 2009 10:54:22 +0200
-
Face:
iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAALVBMVEUBAQEtLS1KSkpRUVFXV1dYWFhjY2Nzc3N3d3eHh4eKioqdnZ24uLjLy8vc3NxVIagyAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH2AIVEzgS1fgQtQAAAjRJREFUOMtt1DFv00AUAOAzFQNbjigSyoQaRaBMhKgLUyKXpVNNeUpk9vyDqFJhQ1kiBuaqAwJCqvPtSLY7RlTn5+5IdnYkkt/AOyfxXVLe5vf53Z1875kd34tOEax8djmj6GyjhB5bxz50GdsVZr9fqRjZwAtKOJw5Wqs2MMZ16ALHsaDncF7xAHix1oEFHAB8f+pRjcO4gfZDykcYzbiucRolOLUJ6kjA0xtVt+A6TySlM0RajIpK6DzwKZ/nOYbF/gclHMo1ZOHYY/+Ha+AWuM+3oMS4eeqYzZ8FiCltgUqI8cd2wwAVpJk+8LWYjBtnJdQpHQqJMd4Oxt4bU9ESiFGc5hkqaH74asAX4iabP5I5gZ+qjgGlJCqZa3h3lxhoeVcSE1qLQC4sqKOK9MGW9E3izFqqHokoztLFEgXg31sbZEKnWi2T74A4NxfVQqlkjKtcAWD+zcArFEES01dR0E/nnV0IgugmDd/2L84sOAouRBBHEc7gtc8teDkRlE0iNQPo2w3Xhh/D4TCIQ4LRLoTvgwjj6RRgavdurxYGMaIuGOyAW/PpNlCcU9/93AHenAWYjPoAwa+G3e3to/MgFNTAEKvKDjzuCzHTnY3qqdXtx24VijzQfZ0yewZ5cwRFQaa+mIYr1uI0I76+3W4xhlvoVRwOA0Fdl64HlJnxP6T8YpX/Lga4Wv4A3ErrU5oTfN7Mu/llXMl8RXEPji/lQkN3H7qXqgC2By47EXeU/7PJ/wPxRKMnuZwIeAAAAABJRU5ErkJggg==
-
In-reply-to:
<4ABB3100.8050009@arcig.cz>
Václav Šmilauer said: (by the date of Thu, 24 Sep 2009 10:42:40 +0200)
> L1 cache is certainly not useless even for DEM, it's just that all your
> data will not fit inside. But still if one part of your data is at one
> memory location (not chain of shared_ptr's jumping all over the RAM), it
> makes the computation much faster (e.g. Dem3Dof classes have comparable
> speed to SpheresContactGeometry even if they copy extra
> Vector3r+Quaternionr (=28b of data) at every step. There are some papers
> [1] on that; speeds of the L1 cache are orders of magnitude higher than
> speed of CPU-RAM bus and of the RAM modules themselves.
>
> [1] http://people.redhat.com/drepper/cpumemory.pdf
Yes. I know that. But cluster benchmarks show that if a 16 CPU
machine has all 16 cores at 100% load, it calculates at half the
speed, than when only 4 CPUs are used and 12 remaining are sitting
idle. This must correspond to RAM speed, or call me crazy.
--
Janek Kozicki |
References