← Back to team overview

yade-users team mailing list archive

Re: Triax profiling on cluster


Bruno Chareyre said:     (by the date of Wed, 23 Sep 2009 14:53:17 +0200)

> https://yade.hmg.inpg.fr/index.php/Triaxial_Test_Parallel
> I got interesting results on the cluster. The fastest run is obtained 
> with 5 cores (trying with more now).
> I included results on 200 steps on a dense packing (10k + 200 steps), 
> which confirm the great improvement on collider time thanks to Vaclav.
> Strange part : the base time in single thread is bigger than the one 
> obtained by previous testers on other machines, and looks not really 
> fast compared to my old PC. I suspect not optimized compilation or 
> something.

There goes my rant:

Those processors aren't fast. There's simply a lot of them. That was
the cheapest option available which had 16 cores in it. What do you
expect when you buy a cheapest laptop? For sure you don't expect that
it will be fast :)

When we were buying those I made a table with their prices:

Model	RAM speed GB/s 	Cache	RAM GB	GHz	cores		price
E5440	21/4=5.3	2x6	8	2.83	Intel 8		5800
X5460	21/4=5.3	2x6	8	3.16	Intel 8		7000
X5355	21/4=5.3	2x4	8	2.66	Intel 8		4300
E5345	21/4=5.3	2x4	8	2.33	Intel 8		5400
L5310	21/4=5.3	2x4	8	1.6	Intel 8		2850
E7320	17/4=4.3	2x2	4	2.13	Intel 8		6900
E7220	17/4=4.3	2x2	8	2.93	Intel 8		10700
E7340	17/8=2.1	2x4	16	2.40	Intel 16	17200
X7350	17/8=2.1	2x4	16	2.93	Intel 16	18500
8220	42/2=21		2x1	16	2.80	AMD 8		11700
8218	42/2=21		2x1	16	2.60	AMD 8		17000
8220	42/2=21		2x1	32	2.80	AMD 16		21500

And which did they take? 
16 cores Intel X7350 with 32GB RAM * 3 nodes, RAM speed 2.1 GB/sec
8 cores Intel E5440 with 16GB RAM * 2 nodes, RAM speed 5.3 GB/sec
8 cores Intel E5440 with  8GB RAM * 2 nodes, RAM speed 5.3 GB/sec

And the (cheaper) E5440 (8 CPUs) are faster than X7350 (16 CPUs) due
to RAM speed difference. Those 16 core nodes are twice slower when
all 16 CPUs are under 100% load. This is confirmed by my benchmarks.
And so is RAM speed 2.1 GB/sec twice slower than 5.3 GB/sec....

RAM speed on AMD which were available at that time was 21 GB/sec

This dividing 42/2=21 or 21/4=5.3 is the number of CPUs that share
the same bridge to connect CPUs to memory. Eg. Intel has 8 cores, but
only 2 bridges, so 4 cores use the same bridge simultaneously, and
must share the RAM speed among themselves. I have some .pdfs about

Nobody was listening to my ramblings that AMD is faster. Intel has a
monopoly just like microsoft :P

Also I see that in the end they bought the cheapest option, but with
more RAM (the 32GB Intel were not included in my analysis, because
I didn't know it's there). And if you add a price of extra 16GB of
RAM to that X7350 you will reach the same price as AMD 8220 with 32 GB.
There is no clock difference 2.93 vs. 2.80 Mhz, because AMDs are
well known to be faster clock-for-clock when compared to Intel.
If you use RAM for your calculations then AMD is faster. If you don't
(for example you calculate prime numbers or digits of pi, which don't
need RAM) then Intel is faster.

That table with prices is not confidential. I simply copy/paste them
from sun's website about promotional pricing for universities.
The RAM speeds and clock speeds I checked on Intel and AMD websites.

Currently the matters are different as Intel reached RAM speed of
about 40 GB/sec, and AMD 60 GB/sec, as I heard, but don't take my
word on it, as I didn't make deep research to write this sentence ;>
I just know, that they got a lot better, and Intel learned that they
should compete with AMD, having a monopoly doesn't always help.
Having a good product, does.

> Could it be linked with these messages we have when starting the jobs? :
> WARNING:root:WARNING: job #8 wants 4 slots but only 1 are available
> WARNING:root:WARNING: job #9 wants 5 slots but only 1 are available

yes, and not only this. Speed also depends on how many other jobs are
currently being run on a node.

So if you use 5 threads on 16 CPU machine, and 11 other CPUs are
idle, you will be twice faster than when you use 5 CPUs and 11 other
CPUs are used by someone else for other calculations. That's because
Intel has very slow RAM access (contrary to AMD).

Therefore making benchmarks on cluster makes little sense, you will
get random results. (Unless you talk with everyone else to stop doing
their calculations ;). I was doing calculations for several months,
and I had 'htop' running of every node and I was noting down whether
I am calculating alone, or if others are also using CPU. And those
above were my observations.

best regards
Janek Kozicki                                                         |

Follow ups