libbls-dev-list team mailing list archive

Thread
Date

Re: libbls read benchmarks

To: libbls-dev-list@xxxxxxxxxxxxxxxxxxx
From: Alexandros Frantzis <alf82@xxxxxxxxxxx>
Date: Mon, 23 Mar 2009 23:30:22 +0200
Authentication-results: MX-IN-04.forthnet.gr smtp.mail=alf82@xxxxxxxxxxx; spf=softfail
Authentication-results: MX-IN-04.forthnet.gr header.from=alf82@xxxxxxxxxxx; sender-id=softfail
In-reply-to: <200903230138.33512.m.iatrou@freemail.gr>
User-agent: Mutt/1.5.18 (2008-05-17)

On Mon, Mar 23, 2009 at 01:38:33AM +0200, Michael Iatrou wrote:
> When the date was Monday 23 March 2009, Alexandros Frantzis wrote:
> 
> > I have uploaded a new branch at lp:~libbls/libbls/bench-vs-plain
> > containing (for the time) two new benchmarks. These benchmark the read
> > performance from file and memory using multiple implementations
> > (bless_buffer_read, mmap, read, mmap+memcpy etc). For the file
> > benchmarks it is necessary to have a file named 'bigfile.bin' in the
> > root directory of the branch.
> >
> > I ran the file benchmark using a ~72MiB file containing random data and
> > the times I got were (hot cache, more or less constant through multiple
> > runs):
> >
> > Elapsed time [file_bless_buffer]: 0.250000 (hash: ffffffe9)
> > Elapsed time [file_read]: 0.150000 (hash: ffffffe9)
> > Elapsed time [file_mmap]: 0.160000 (hash: ffffffe9)
> > Elapsed time [file_mmap+memcpy]: 0.240000 (hash: ffffffe9)
> >
> > The times for file_bless_buffer and file_mmap+memcpy were expectedly
> > similar as bless_buffer_read() is essentially an mmap+memcpy internally
> > when reading data from files.
> >
> > So it seems that the avoidable overhead is that of the use of memcpy.
> > An idea around it is to have a new API function eg
> > bless_buffer_read_foreach() that will act just like segcol_foreach().
> >
> > For the memory benchmarks I used a 100 MiB malloc-ed memory area. I
> > wrote to every byte to make sure the area was actually physically
> > allocated. The results (once again more or less constant through
> > multiple runs):
> >
> > Elapsed time [mem_bless_buffer]: 0.280000 (hash: fce00000)
> > Elapsed time [mem_plain]: 0.200000 (hash: fce00000)
> > Elapsed time [mem_memcpy]: 0.210000 (hash: fce00000)
> >
> > These results seem a little strange. First of all it seems unlikely
> > that in the mem_memcpy case there was any a physically copy of data.
> > Probably the physical pages were just mapped again with copy-on-write.
> >
> > The question is why do we have a 40% overhead in bless_buffer_read(),
> > considering that we just do a memcpy, too?
> >
> > Another question is why don't we get a similar memcpy optimization in
> > the case of files?
> 
> For both file and memory more data points are required in order to gain an 
> insight of the performance characteristics and scalability constraints. I 
> suppose that tests in the range of 50-500 MiB, using 25 MiB step should 
> suffice.
> 
> oprofile would provide a good insight of what exactly is going on.
> 
> -- 
>  Μιχάλης Ιατρού (rjzu)
> 

I tried the benchmarks again using 25MiB increments from 25MiB-350MiB
for both file and memory and the results were much more sane this time.

The results in the data file are the average of three runs for each
benchmark.

I am attaching the data file and four gnuplot script files for ease of
visualization.

It seems that for the file case we are 60%-70% slower than read/mmap
and about 10% slower than mmap+memcpy.

For the memory case we are 70%-90% slower than plain memory access (ick!)
and about 30%-40% slower than memcpy.

In both cases getting rid of the memcpy (ala segcol_foreach) should help
tremendously, although (especially in the memory case) there will still
be some significant overhead which we will have to look into.

As a final note it seems that all my speculations about memcpy
optimization were plain wrong :)

-- 
Alexandros

#MB        file_buffer     file_read       file_mmap      file_mmap_memcpy mem_buffer      mem_plain       mem_memcpy
25         0.086667        0.053333        0.053333        0.080000        0.066667        0.040000        0.050000        
50         0.170000        0.106667        0.106667        0.156667        0.133333        0.076667        0.100000        
75         0.256667        0.153333        0.170000        0.230000        0.200000        0.116667        0.150000        
100        0.343333        0.210000        0.216667        0.320000        0.260000        0.160000        0.196667        
125        0.433333        0.256667        0.266667        0.400000        0.333333        0.196667        0.246667        
150        0.510000        0.310000        0.326667        0.476667        0.403333        0.233333        0.350000        
175        0.600000        0.360000        0.380000        0.553333        0.466667        0.266667        0.350000        
200        0.683333        0.416667        0.430000        0.640000        0.550000        0.310000        0.396667        
225        0.766667        0.470000        0.493333        0.713333        0.596667        0.343333        0.450000        
250        0.863333        0.520000        0.540000        0.800000        0.660000        0.380000        0.500000        
275        0.943333        0.570000        0.593333        0.880000        0.736667        0.420000        0.553333        
300        1.046667        0.626667        0.640000        0.960000        0.843333        0.456667        0.600000        
325        1.136667        0.673333        0.703333        1.033333        0.906667        0.500000        0.650000        
350        1.220000        0.733333        0.753333        1.110000        1.000000        0.533333        0.700000

plot "bench_read.data" using 1:2 with linespoints title "file:bless_buffer_read" ,\
     "bench_read.data" using 1:3 with linespoints title "file:read" ,\
     "bench_read.data" using 1:4 with linespoints title "file:mmap" ,\
     "bench_read.data" using 1:5 with linespoints title "file:mmap+memcpy"


pause -1

plot "bench_read.data" using 1:($2/$3) with linespoints title "file:bless_buffer_read/file:read" ,\
     "bench_read.data" using 1:($2/$4) with linespoints title "file:bless_buffer_read/file:mmap" ,\
     "bench_read.data" using 1:($2/$5) with linespoints title "file:bless_buffer_read/file:mmap+memcpy"


pause -1

plot "bench_read.data" using 1:6 with linespoints title "mem:bless_buffer_read" ,\
     "bench_read.data" using 1:7 with linespoints title "mem:plain" ,\
     "bench_read.data" using 1:8 with linespoints title "mem:memcpy"


pause -1

plot "bench_read.data" using 1:($6/$7) with linespoints title "mem:bless_buffer_read/mem:plain", \
     "bench_read.data" using 1:($6/$8) with linespoints title "mem:bless_buffer_read/mem:memcpy"


pause -1

References

libbls read benchmarks
From: Alexandros Frantzis, 2009-03-22
Re: libbls read benchmarks
From: Michael Iatrou, 2009-03-22