mimblewimble team mailing list archive
-
mimblewimble team
-
Mailing list archive
-
Message #00468
Re: introduction
On Fri, Mar 9, 2018 at 6:29 PM, Ignotus Peverell
<igno.peverell@xxxxxxxxxxxxxx> wrote:
> I'm not sure why but RocksDb seems really unpopular and lmdb
> very popular these days.
could have something to do with rocksdb, when it's put into
real-world scenarios, its legacy from leveldb which is known
to cause data corruption and was abandoned by the google
developers... maybe that has something to do with it? :)
lmdb is popular in part because in an extremely
challenging-to-understand technical way it guarantees not to corrupt
the key-value store *without requiring a write-log* [as long as you
enable fsync mode... which in turn can hammer SSDs... which is the
focus of some improvements in lmdb right now].
also the compression in leveldb / rocksdb... yyeaah how's that work
out on a low-cost android phone with a 1ghz ARM Core with only a
32-bit-wide DDR3 bus bandwidth and 32k 1st-level instruction and data
caches, as opposed to a hyper-threaded 3.5ghz 12-core with 1mb
1st-level cache per core, 12mb cache-coherent 2nd-level, and
256-bit-wide 2.4ghz DDR4 multi-DIMM funneled memory?
> One often overlooked aspects of a database is the quality of the bindings
> in your PL, because poorly written bindings can make all the database
> guarantees go away.
ok one very very important thing to know about lmdb, is: as it's a
memory-mapped (shm with copy-on-write semantics) it returns *direct*
pointers to the values. this is extremely important to know because
most key-value stores return entire memory-copies of the values, even
if you're storing 100 megabyte files.
there do exist "safe-i-fied" variations of lmdb go bindings... it's
up to you, just bear in mind if you do so you'll be losing one of the
main benefits of lmdb.
i remember someone, roger binns, a long looong time ago, telling me
the wonderful phrase, "if you make software idiot-proof only idiots
will use it" :) basically lmdb encourages and invites people to be...
intelligent :)
> And I was a lot more worried about the cryptography
> and the size of range proofs back then.
yehyeh.
> I know the opinions of the lmdb author and others regarding atomicity
> in storages and frankly, I think they're a little too storage-focused
yyyeah it's kiinda reasonable to assume that the underlying storage
is reliable? :) and that you're running suitable backups. what
howard's pointing out is that many of these new key-value stores, even
if the underlying storage *is* reliable, simply corrupt the data
anyway, particularly when it comes to power-loss events.
lmdb was *literally* the only key-value store that did not corrupt
data, in one comprehensive study. the only reason it was corrupting
data in the preliminary report was because the PhD researcher did not
know about the fsync option in lmdb (he'd disabled it). when he
switched it on and re-ran the tests, *zero* data corruption.
> They're append-only for the most part so dealing with failure is
> also very easy
ok so one very useful feature of lmdb is, not only does it have
range-search capability, but if you can guarantee that the key being
inserted is larger than any other key that's been inserted up to that
point, you can call a special function "insert at end". this i
believe only requires like 2 writes to disk, or something mad.
if the key is the blockchain number and that's guaranteed to be
incrementing, you're good to go.
oh: it also has atomic write transactions, *without* locking out
readers. because of the copy-on-write semantics. the writer locks
the root node (beginning a transaction), starts preparing the new
version of the database (each write to a memory-block makes a COPY of
that memory block....), and finally once done there's a bit of
arseing-about locking all readers out for a bit whilst the root node
is updated, and you're done. i say "arseing about", but actually all
readers have their own "transaction" - i.e. they'll be running off of
their own root-node during that open transaction, so the
"arseing-about" to get readers sync'd up only occurs when the reader
closes the read transaction. opening the *next* read transaction will
be when they get the *new* (latest) root block.
my point of mentioning this is: to do a guaranteed (fast)
last-key-insert you do this:
* open write transaction
* seek to end of store
* read last key
* add one (or whatever)
* write new value under new key with the "insert-at-end" function.
* close write transaction.
> So anyway, I'm definitely not married to RocksDb, but I don't think it
> matters enormously either. My biggest beef with it at this point is that
> it's a pain to build and has probably 10x the number of features we need.
> But swapping it out is just 200 LOC [1]. So maybe it's worth doing it just for this reason.
yehyeh. howard chu would almost certainly be interested to help, and
look things over.
> Now I'm going to link to this email on the 10 other places where I've been asked about this :-)
:)
l.
Follow ups
References