libbls-dev-list team mailing list archive
-
libbls-dev-list team
-
Mailing list archive
-
Message #00002
Re: libbls read benchmarks
On Mon, Mar 23, 2009 at 01:38:33AM +0200, Michael Iatrou wrote:
> When the date was Monday 23 March 2009, Alexandros Frantzis wrote:
>
> > I have uploaded a new branch at lp:~libbls/libbls/bench-vs-plain
> > containing (for the time) two new benchmarks. These benchmark the read
> > performance from file and memory using multiple implementations
> > (bless_buffer_read, mmap, read, mmap+memcpy etc). For the file
> > benchmarks it is necessary to have a file named 'bigfile.bin' in the
> > root directory of the branch.
> >
> > I ran the file benchmark using a ~72MiB file containing random data and
> > the times I got were (hot cache, more or less constant through multiple
> > runs):
> >
> > Elapsed time [file_bless_buffer]: 0.250000 (hash: ffffffe9)
> > Elapsed time [file_read]: 0.150000 (hash: ffffffe9)
> > Elapsed time [file_mmap]: 0.160000 (hash: ffffffe9)
> > Elapsed time [file_mmap+memcpy]: 0.240000 (hash: ffffffe9)
> >
> > The times for file_bless_buffer and file_mmap+memcpy were expectedly
> > similar as bless_buffer_read() is essentially an mmap+memcpy internally
> > when reading data from files.
> >
> > So it seems that the avoidable overhead is that of the use of memcpy.
> > An idea around it is to have a new API function eg
> > bless_buffer_read_foreach() that will act just like segcol_foreach().
> >
> > For the memory benchmarks I used a 100 MiB malloc-ed memory area. I
> > wrote to every byte to make sure the area was actually physically
> > allocated. The results (once again more or less constant through
> > multiple runs):
> >
> > Elapsed time [mem_bless_buffer]: 0.280000 (hash: fce00000)
> > Elapsed time [mem_plain]: 0.200000 (hash: fce00000)
> > Elapsed time [mem_memcpy]: 0.210000 (hash: fce00000)
> >
> > These results seem a little strange. First of all it seems unlikely
> > that in the mem_memcpy case there was any a physically copy of data.
> > Probably the physical pages were just mapped again with copy-on-write.
> >
> > The question is why do we have a 40% overhead in bless_buffer_read(),
> > considering that we just do a memcpy, too?
> >
> > Another question is why don't we get a similar memcpy optimization in
> > the case of files?
>
> For both file and memory more data points are required in order to gain an
> insight of the performance characteristics and scalability constraints. I
> suppose that tests in the range of 50-500 MiB, using 25 MiB step should
> suffice.
>
> oprofile would provide a good insight of what exactly is going on.
>
> --
> Μιχάλης Ιατρού (rjzu)
>
I tried the benchmarks again using 25MiB increments from 25MiB-350MiB
for both file and memory and the results were much more sane this time.
The results in the data file are the average of three runs for each
benchmark.
I am attaching the data file and four gnuplot script files for ease of
visualization.
It seems that for the file case we are 60%-70% slower than read/mmap
and about 10% slower than mmap+memcpy.
For the memory case we are 70%-90% slower than plain memory access (ick!)
and about 30%-40% slower than memcpy.
In both cases getting rid of the memcpy (ala segcol_foreach) should help
tremendously, although (especially in the memory case) there will still
be some significant overhead which we will have to look into.
As a final note it seems that all my speculations about memcpy
optimization were plain wrong :)
--
Alexandros
#MB file_buffer file_read file_mmap file_mmap_memcpy mem_buffer mem_plain mem_memcpy
25 0.086667 0.053333 0.053333 0.080000 0.066667 0.040000 0.050000
50 0.170000 0.106667 0.106667 0.156667 0.133333 0.076667 0.100000
75 0.256667 0.153333 0.170000 0.230000 0.200000 0.116667 0.150000
100 0.343333 0.210000 0.216667 0.320000 0.260000 0.160000 0.196667
125 0.433333 0.256667 0.266667 0.400000 0.333333 0.196667 0.246667
150 0.510000 0.310000 0.326667 0.476667 0.403333 0.233333 0.350000
175 0.600000 0.360000 0.380000 0.553333 0.466667 0.266667 0.350000
200 0.683333 0.416667 0.430000 0.640000 0.550000 0.310000 0.396667
225 0.766667 0.470000 0.493333 0.713333 0.596667 0.343333 0.450000
250 0.863333 0.520000 0.540000 0.800000 0.660000 0.380000 0.500000
275 0.943333 0.570000 0.593333 0.880000 0.736667 0.420000 0.553333
300 1.046667 0.626667 0.640000 0.960000 0.843333 0.456667 0.600000
325 1.136667 0.673333 0.703333 1.033333 0.906667 0.500000 0.650000
350 1.220000 0.733333 0.753333 1.110000 1.000000 0.533333 0.700000
plot "bench_read.data" using 1:2 with linespoints title "file:bless_buffer_read" ,\
"bench_read.data" using 1:3 with linespoints title "file:read" ,\
"bench_read.data" using 1:4 with linespoints title "file:mmap" ,\
"bench_read.data" using 1:5 with linespoints title "file:mmap+memcpy"
pause -1
plot "bench_read.data" using 1:($2/$3) with linespoints title "file:bless_buffer_read/file:read" ,\
"bench_read.data" using 1:($2/$4) with linespoints title "file:bless_buffer_read/file:mmap" ,\
"bench_read.data" using 1:($2/$5) with linespoints title "file:bless_buffer_read/file:mmap+memcpy"
pause -1
plot "bench_read.data" using 1:6 with linespoints title "mem:bless_buffer_read" ,\
"bench_read.data" using 1:7 with linespoints title "mem:plain" ,\
"bench_read.data" using 1:8 with linespoints title "mem:memcpy"
pause -1
plot "bench_read.data" using 1:($6/$7) with linespoints title "mem:bless_buffer_read/mem:plain", \
"bench_read.data" using 1:($6/$8) with linespoints title "mem:bless_buffer_read/mem:memcpy"
pause -1
References