← Back to team overview

mimblewimble team mailing list archive

Re: introduction

 

I'm not sure why but RocksDb seems really unpopular and lmdb very popular these days. Honestly, I didn't put that much thought into RocksDb originally. When I started on grin, I looked at the code of other Rust blockchain implementations. Parity was the more advanced one (on Ethereum) and they were using RocksDb, so I figured it would work out okay and the bindings would at least be decent. One often overlooked aspects of a database is the quality of the bindings in your PL, because poorly written bindings can make all the database guarantees go away. And I was a lot more worried about the cryptography and the size of range proofs back then.

I know the opinions of the lmdb author and others regarding atomicity in storages and frankly, I think they're a little too storage-focused (I've known some Oracle DBAs with similar positions). In my experience, from an application standpoint, putting too much trust in storage guarantees is a bad idea. Everything fails eventually, and when it does storage people are pretty quick to put the blame on disks (gotta do Raid 60), networks, language bindings, or you. Btw I'm guilty as well, I have implemented some simple storages in the past.

Truth is, it's actually rather easy to write a resilient blockchain node on a not-so-resilient storage (note: I'm talking about a node here, not wallets). The data is immutable and can be replayed at will. You messed up on the last block? Fine, restart on the one before that and just make sure it's all idempotent. If you're dealing with balances it's a little more complicated, but a node does not. And with careful design, you can make a lot of things idempotent. It's also practically impossible for grin to rely on an atomic storage because we have a separate state (Merkle Mountain Ranges) that are specifically designed to be easy to store in a flat file, while very unwieldy and slow to store in a k/v db. They're append-only for the most part so dealing with failure is also very easy (note: does not preclude bugs, but those get fixed). And when you squint right, the whole blockchain storage is append-only. From a storage standpoint, it's hard to find a more fault-tolerant use case.

So anyway, I'm definitely not married to RocksDb, but I don't think it matters enormously either. My biggest beef with it at this point is that it's a pain to build and has probably 10x the number of features we need. But swapping it out is just 200 LOC [1]. So maybe it's worth doing it just for this reason.

Now I'm going to link to this email on the 10 other places where I've been asked about this :-)

- Igno

[1] https://github.com/mimblewimble/grin/blob/master/store/src/lib.rs

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On 8 March 2018 10:44 PM, Luke Kenneth Casson Leighton <lkcl@xxxxxxxx> wrote:

> On Thu, Mar 8, 2018 at 8:03 PM, Ignotus Peverell
> 
> igno.peverell@xxxxxxxxxxxxxx wrote:
> 
> > > > There is a denial-of-service option when a user downloads the chain,
> > > > 
> > > > the peer can give gigabytes of data and list the wrong unspent outputs.
> > > > 
> > > > The user will see that the result do not add up to 0, but cannot tell where
> > > > 
> > > > the problem is.
> > 
> > > which to be honest I do not quite understand. The user normally downloads
> > > 
> > > the chain by requesting blocks from peers, starting with just the headers
> > > 
> > > which can be checked for proof-of-work.
> > 
> > The paper here refers to the MimbleWimble-style fast sync (IBD),
> 
> hiya igno,
> 
> lots of techie TLAs here that clearly tell me you're on the case and
> 
> know what you're doing. it'll take me a while to catch up / get to
> 
> the point where i could usefully contribute, i must apologise.
> 
> in the meantime (switching tracks), one way i can definitely
> 
> contribute to the underlying reliability is to ask why rocksdb has
> 
> been chosen?
> 
> https://www.reddit.com/r/Monero/comments/4rdnrg/lmdb_vs_rocksdb/
> 
> https://github.com/AltSysrq/lmdb-zero
> 
> rocksdb is based on leveldb, which was designed to hammer both the
> 
> CPU and the storage, on the assumption by google engineers that
> 
> everyone will be using leveldb in google data centres, with google's
> 
> money, and with google's resources, i.e. CPU is cheap and there will
> 
> be nothing else going on. they also didn't do their homework in many
> 
> other ways, resulting in an unstable pile of poo. and rocksdb is
> 
> based on that.
> 
> many people carrying out benchmark tests forget to switch off the
> 
> compression, or they forget to compress the key and/or the value being
> 
> stored when comparing against lmdb, or bdb, and so on.
> 
> so. why was rocksdb chosen?
> 
> l.




Follow ups

References