openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #08528
Re: Enabling data deduplication on Swift
Joe Gordon asked:
________________________________
> Can SHA-1 collisions be generated? If so can you point me to the article?
Check Wikipedia on cryptographic hashing and especially "preimage attack"
To summarize, SHA-256 is effectively immune from a pre image attack. Even MD5 is effectiely immune from
co-incidental collisions until you have something on the order of 2**80 fingerprinted items. Amazon S3 has a
mere 2**40.
> Also why compare hashes in the first place? Linux 'Kenel Samepage Merging', which does page deduplication for KVM, does a full
> compare to be safe [1]. Even if collisions can't be generated, what are the odds of a collision (for SHA-1 and SHA-256) happening by
> chance when using Swift at scale?
The point of distributed reduplication is to avoid transfer of the data in the first place. If you did a safety check using a full compare
then you would have had to do the transfer first. A kernel looking for identical pages does not have that issue.
As for the chance of an accidental collision, you have to stop and think about how large a number 2**128 is (that's the
threshold for risking a birthday collision on a 256 bit key). Having multiple undetected network transmission errors back
to back is more likely.
Also I would recommend that SHA-256 be the *minimum* algorithm. Truly paranoid customers could select SHA-512.
ZFS has had SHA-256 for the purpose of detecting bit rot as the default for some time. It was a bit ambitious when ZFS
was designed, but with today's processors the computational overhead is neglible.
References