yade-users team mailing list archive

Thread
Date

Re: Triax profiling on cluster

To: yade-users@xxxxxxxxxxxxxxxxxxx
From: Janek Kozicki <janek_listy@xxxxx>
Date: Wed, 23 Sep 2009 16:16:02 +0200
Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAALVBMVEUBAQEtLS1KSkpRUVFXV1dYWFhjY2Nzc3N3d3eHh4eKioqdnZ24uLjLy8vc3NxVIagyAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH2AIVEzgS1fgQtQAAAjRJREFUOMtt1DFv00AUAOAzFQNbjigSyoQaRaBMhKgLUyKXpVNNeUpk9vyDqFJhQ1kiBuaqAwJCqvPtSLY7RlTn5+5IdnYkkt/AOyfxXVLe5vf53Z1875kd34tOEax8djmj6GyjhB5bxz50GdsVZr9fqRjZwAtKOJw5Wqs2MMZ16ALHsaDncF7xAHix1oEFHAB8f+pRjcO4gfZDykcYzbiucRolOLUJ6kjA0xtVt+A6TySlM0RajIpK6DzwKZ/nOYbF/gclHMo1ZOHYY/+Ha+AWuM+3oMS4eeqYzZ8FiCltgUqI8cd2wwAVpJk+8LWYjBtnJdQpHQqJMd4Oxt4bU9ESiFGc5hkqaH74asAX4iabP5I5gZ+qjgGlJCqZa3h3lxhoeVcSE1qLQC4sqKOK9MGW9E3izFqqHokoztLFEgXg31sbZEKnWi2T74A4NxfVQqlkjKtcAWD+zcArFEES01dR0E/nnV0IgugmDd/2L84sOAouRBBHEc7gtc8teDkRlE0iNQPo2w3Xhh/D4TCIQ4LRLoTvgwjj6RRgavdurxYGMaIuGOyAW/PpNlCcU9/93AHenAWYjPoAwa+G3e3to/MgFNTAEKvKDjzuCzHTnY3qqdXtx24VijzQfZ0yewZ5cwRFQaa+mIYr1uI0I76+3W4xhlvoVRwOA0Fdl64HlJnxP6T8YpX/Lga4Wv4A3ErrU5oTfN7Mu/llXMl8RXEPji/lQkN3H7qXqgC2By47EXeU/7PJ/wPxRKMnuZwIeAAAAABJRU5ErkJggg==
In-reply-to: <4ABA2910.7060304@hmg.inpg.fr>

Bruno Chareyre said:     (by the date of Wed, 23 Sep 2009 15:56:32 +0200)

> 
> >
> > So if you use 5 threads on 16 CPU machine, and 11 other CPUs are
> > idle, you will be twice faster than when you use 5 CPUs and 11 other
> > CPUs are used by someone else for other calculations. That's because
> > Intel has very slow RAM access (contrary to AMD).
> >
> >   
> Oh, I get it! I'm not alone on a machine. The workaround would be to 
> require 16 threads for the full test to keep the other threads idle 
> (still no guarantee since the grid engine could give me 8 proc. on one 
> core + 8 proc. on another...).
> 
> > Therefore making benchmarks on cluster makes little sense, you will
> > get random results. (Unless you talk with everyone else to stop doing
> > their calculations ;). I was doing calculations for several months,
> > and I had 'htop' running of every node and I was noting down whether
> > I am calculating alone, or if others are also using CPU. And those
> > above were my observations.
> >
> >   
> Did you try the "big cache" option? It allows you to skip RAM access and 
> use the internal memory attached to the node. Remi said it can speed up 
> "some" jobs a lot.

Remi told me to use local HDD /tmp directory which I did do. I don't
know what may mean this "big cache".

All nodes ARE using internal memory attached to a node. You are
confusing something here. Each node is an *ordinary* PC. They are
just connected to each other with a mechanism which allows submitting
jobs (sun grid engine). There is nothing special about them.

Maybe Remi was talking about big CPU cache which is always used
automatically. And indeed is relatively big: a whole 6 MB ! Which
confirms what I already said, if you don't use RAM then Intels are
much faster than AMD (this is the case when you are calculating
digits of pi or mandelbrot set, or such). If you do use RAM, then
AMDs are faster. In our calculations we use RAM. A smallest
simulation uses 100MB. Another simulation could be 1 GB. There is no
point with 6 MB of CPU cache then.

AMD had a small CPU cache, only 1 MB. But 21 GB/sec RAM speed.
Intels had a big CPU cache 6 MB, and 2.1 GB/sec RAM speed.

I write "had" because currently both AMD and Intel have caches even
up to 12 MB.

The hallmark of stupidity for me was deciding to use Intels with such
small RAM speed, then decide to add more *slow* ram! This explains
why Intels with 32GB RAM wasn't in default offer from SUN. Guys at
SUN realized that 32 GB of RAM with Intel makes no sense at all. And
they only offered AMDs with 32 GB.

If you want to take advantage of 6 MB of CPU cache, then you
shouldn't be adding another 16 GB of *slow* RAM! You should rather
buy just only 2 GB of RAM.

If you want to take advantage of lots of RAM, you want RAM to be
fast, and don't care about little 1 MB cache. And you should take AMD.

Who at the university decided to buy Intels with 32 GB... I don't get it.


man, why I am so upset, better forget this.
-- 
Janek Kozicki                                                         |

Follow ups

Re: Triax profiling on cluster
From: Bruno Chareyre, 2009-09-23

References

Triax profiling on cluster
From: Bruno Chareyre, 2009-09-23
Re: Triax profiling on cluster
From: Janek Kozicki, 2009-09-23
Re: Triax profiling on cluster
From: Bruno Chareyre, 2009-09-23