← Back to team overview

libbls-dev-list team mailing list archive

Re: libbls read benchmarks

 

When the date was Monday 23 March 2009, Alexandros Frantzis wrote:

> I have uploaded a new branch at lp:~libbls/libbls/bench-vs-plain
> containing (for the time) two new benchmarks. These benchmark the read
> performance from file and memory using multiple implementations
> (bless_buffer_read, mmap, read, mmap+memcpy etc). For the file
> benchmarks it is necessary to have a file named 'bigfile.bin' in the
> root directory of the branch.
>
> I ran the file benchmark using a ~72MiB file containing random data and
> the times I got were (hot cache, more or less constant through multiple
> runs):
>
> Elapsed time [file_bless_buffer]: 0.250000 (hash: ffffffe9)
> Elapsed time [file_read]: 0.150000 (hash: ffffffe9)
> Elapsed time [file_mmap]: 0.160000 (hash: ffffffe9)
> Elapsed time [file_mmap+memcpy]: 0.240000 (hash: ffffffe9)
>
> The times for file_bless_buffer and file_mmap+memcpy were expectedly
> similar as bless_buffer_read() is essentially an mmap+memcpy internally
> when reading data from files.
>
> So it seems that the avoidable overhead is that of the use of memcpy.
> An idea around it is to have a new API function eg
> bless_buffer_read_foreach() that will act just like segcol_foreach().
>
> For the memory benchmarks I used a 100 MiB malloc-ed memory area. I
> wrote to every byte to make sure the area was actually physically
> allocated. The results (once again more or less constant through
> multiple runs):
>
> Elapsed time [mem_bless_buffer]: 0.280000 (hash: fce00000)
> Elapsed time [mem_plain]: 0.200000 (hash: fce00000)
> Elapsed time [mem_memcpy]: 0.210000 (hash: fce00000)
>
> These results seem a little strange. First of all it seems unlikely
> that in the mem_memcpy case there was any a physically copy of data.
> Probably the physical pages were just mapped again with copy-on-write.
>
> The question is why do we have a 40% overhead in bless_buffer_read(),
> considering that we just do a memcpy, too?
>
> Another question is why don't we get a similar memcpy optimization in
> the case of files?

For both file and memory more data points are required in order to gain an 
insight of the performance characteristics and scalability constraints. I 
suppose that tests in the range of 50-500 MiB, using 25 MiB step should 
suffice.

oprofile would provide a good insight of what exactly is going on.

-- 
 Μιχάλης Ιατρού (rjzu)



Follow ups

References