mimblewimble team mailing list archive
Mailing list archive
In this discussion, I'm only interested in the size of a block when sent across the wire during propagation. It's desirable in this context to have small block sizes (in bytes) so that blocks can traverse the network quickly without causing too much additional traffic. As most nodes will generally receive the data included in a transaction twice (as the transaction is propagated and in a block), some obvious optimizations are possible.
As a reminder, a Grin block is composed of the following:
- A header, including the proof of work, some Merkle-ish roots, a reference to the previous block hash, etc. All the header is always required.
- A set of inputs as a direct link to previous outputs.
- A set of outputs, containing a commitment and a range proof.
- Transaction kernels, containing a signature, a public key and the explicit fee.
The largest piece of data by a good margin right now is the range proof (~5 KB each). Therefore we can introduce a first optimization:
- Block gets propagated sans range proof.
- Upon reception, a node checks whether it has all the range proofs for all outputs included in the block.
- If any missing, the node requests them from the peer that sent the block.
- Validation can proceed, skipping the range proofs that were already known (assuming they were already validated by the transaction pool).
The second largest piece of data is composed of the transaction kernels (~100 bytes each). We can't fully avoid sending those as they're not "anchored" on anything, like the range proof is anchored on an output. However we could send a shortened version of their hash. In this scheme, when sent across the wire, a block would only contain the first 16 bytes (or maybe even 12) of the hash of the whole kernel. Upon reception, an exchange similar to the one for range proofs would occur. Note that this requires the pool maintains a mapping of those truncated hashes with the actual kernels.
Beyond this, optimizations become non-trivial. We do *not* want to introduce some transaction-level hash as transactions are disaggregated in blocks by design. We could introduce something like inverted bloom filters for inputs or outputs but those are still rather small (32 to 40 bytes each) and so the tradeoff of space saving over complexity doesn't seem worth it at this point.
Both optimizations add a couple protocol level messages (request/response for each data type) and some additional interfaces and logic for the existence checks on the transaction pool. That creates a little bit of additional complexity, but stays contained and fairly manageable. The first optimization is by far the most interesting. I'm thinking of the second one more as a "nice to have".
Does it all sound reasonable? Am I missing something?