← Back to team overview

libbls-dev-list team mailing list archive

libbls read benchmarks


I have uploaded a new branch at lp:~libbls/libbls/bench-vs-plain
containing (for the time) two new benchmarks. These benchmark the read
performance from file and memory using multiple implementations
(bless_buffer_read, mmap, read, mmap+memcpy etc). For the file
benchmarks it is necessary to have a file named 'bigfile.bin' in the
root directory of the branch.

I ran the file benchmark using a ~72MiB file containing random data and
the times I got were (hot cache, more or less constant through multiple

Elapsed time [file_bless_buffer]: 0.250000 (hash: ffffffe9)
Elapsed time [file_read]: 0.150000 (hash: ffffffe9)
Elapsed time [file_mmap]: 0.160000 (hash: ffffffe9)
Elapsed time [file_mmap+memcpy]: 0.240000 (hash: ffffffe9)

The times for file_bless_buffer and file_mmap+memcpy were expectedly
similar as bless_buffer_read() is essentially an mmap+memcpy internally
when reading data from files.

So it seems that the avoidable overhead is that of the use of memcpy.
An idea around it is to have a new API function eg
bless_buffer_read_foreach() that will act just like segcol_foreach().

For the memory benchmarks I used a 100 MiB malloc-ed memory area. I
wrote to every byte to make sure the area was actually physically
allocated. The results (once again more or less constant through
multiple runs):

Elapsed time [mem_bless_buffer]: 0.280000 (hash: fce00000)
Elapsed time [mem_plain]: 0.200000 (hash: fce00000)
Elapsed time [mem_memcpy]: 0.210000 (hash: fce00000)

These results seem a little strange. First of all it seems unlikely
that in the mem_memcpy case there was any a physically copy of data. 
Probably the physical pages were just mapped again with copy-on-write.

The question is why do we have a 40% overhead in bless_buffer_read(),
considering that we just do a memcpy, too?

Another question is why don't we get a similar memcpy optimization in
the case of files?

Awaiting any insights!


Follow ups