yade-users team mailing list archive
Mailing list archive
Re: Triax profiling on cluster
Janek Kozicki <janek_listy@xxxxx>
Wed, 23 Sep 2009 15:42:51 +0200
Bruno Chareyre said: (by the date of Wed, 23 Sep 2009 14:53:17 +0200)
> I got interesting results on the cluster. The fastest run is obtained
> with 5 cores (trying with more now).
> I included results on 200 steps on a dense packing (10k + 200 steps),
> which confirm the great improvement on collider time thanks to Vaclav.
> Strange part : the base time in single thread is bigger than the one
> obtained by previous testers on other machines, and looks not really
> fast compared to my old PC. I suspect not optimized compilation or
There goes my rant:
Those processors aren't fast. There's simply a lot of them. That was
the cheapest option available which had 16 cores in it. What do you
expect when you buy a cheapest laptop? For sure you don't expect that
it will be fast :)
When we were buying those I made a table with their prices:
Model RAM speed GB/s Cache RAM GB GHz cores price
E5440 21/4=5.3 2x6 8 2.83 Intel 8 5800
X5460 21/4=5.3 2x6 8 3.16 Intel 8 7000
X5355 21/4=5.3 2x4 8 2.66 Intel 8 4300
E5345 21/4=5.3 2x4 8 2.33 Intel 8 5400
L5310 21/4=5.3 2x4 8 1.6 Intel 8 2850
E7320 17/4=4.3 2x2 4 2.13 Intel 8 6900
E7220 17/4=4.3 2x2 8 2.93 Intel 8 10700
E7340 17/8=2.1 2x4 16 2.40 Intel 16 17200
X7350 17/8=2.1 2x4 16 2.93 Intel 16 18500
8220 42/2=21 2x1 16 2.80 AMD 8 11700
8218 42/2=21 2x1 16 2.60 AMD 8 17000
8220 42/2=21 2x1 32 2.80 AMD 16 21500
And which did they take?
16 cores Intel X7350 with 32GB RAM * 3 nodes, RAM speed 2.1 GB/sec
8 cores Intel E5440 with 16GB RAM * 2 nodes, RAM speed 5.3 GB/sec
8 cores Intel E5440 with 8GB RAM * 2 nodes, RAM speed 5.3 GB/sec
And the (cheaper) E5440 (8 CPUs) are faster than X7350 (16 CPUs) due
to RAM speed difference. Those 16 core nodes are twice slower when
all 16 CPUs are under 100% load. This is confirmed by my benchmarks.
And so is RAM speed 2.1 GB/sec twice slower than 5.3 GB/sec....
RAM speed on AMD which were available at that time was 21 GB/sec
This dividing 42/2=21 or 21/4=5.3 is the number of CPUs that share
the same bridge to connect CPUs to memory. Eg. Intel has 8 cores, but
only 2 bridges, so 4 cores use the same bridge simultaneously, and
must share the RAM speed among themselves. I have some .pdfs about
Nobody was listening to my ramblings that AMD is faster. Intel has a
monopoly just like microsoft :P
Also I see that in the end they bought the cheapest option, but with
more RAM (the 32GB Intel were not included in my analysis, because
I didn't know it's there). And if you add a price of extra 16GB of
RAM to that X7350 you will reach the same price as AMD 8220 with 32 GB.
There is no clock difference 2.93 vs. 2.80 Mhz, because AMDs are
well known to be faster clock-for-clock when compared to Intel.
If you use RAM for your calculations then AMD is faster. If you don't
(for example you calculate prime numbers or digits of pi, which don't
need RAM) then Intel is faster.
That table with prices is not confidential. I simply copy/paste them
from sun's website about promotional pricing for universities.
The RAM speeds and clock speeds I checked on Intel and AMD websites.
Currently the matters are different as Intel reached RAM speed of
about 40 GB/sec, and AMD 60 GB/sec, as I heard, but don't take my
word on it, as I didn't make deep research to write this sentence ;>
I just know, that they got a lot better, and Intel learned that they
should compete with AMD, having a monopoly doesn't always help.
Having a good product, does.
> Could it be linked with these messages we have when starting the jobs? :
> WARNING:root:WARNING: job #8 wants 4 slots but only 1 are available
> WARNING:root:WARNING: job #9 wants 5 slots but only 1 are available
yes, and not only this. Speed also depends on how many other jobs are
currently being run on a node.
So if you use 5 threads on 16 CPU machine, and 11 other CPUs are
idle, you will be twice faster than when you use 5 CPUs and 11 other
CPUs are used by someone else for other calculations. That's because
Intel has very slow RAM access (contrary to AMD).
Therefore making benchmarks on cluster makes little sense, you will
get random results. (Unless you talk with everyone else to stop doing
their calculations ;). I was doing calculations for several months,
and I had 'htop' running of every node and I was noting down whether
I am calculating alone, or if others are also using CPU. And those
above were my observations.
Janek Kozicki |