← Back to team overview

openstack team mailing list archive

Re: Enabling data deduplication on Swift

 

Paulo, Caitlin,


Can SHA-1 collisions be generated?  If so can you point me to the article?

Also why compare hashes in the first place?  Linux 'Kenel Samepage
Merging', which does page deduplication for KVM, does a full compare to be
safe [1].  Even if collisions can't be generated, what are the odds of a
collision (for SHA-1 and SHA-256) happening by chance when using Swift at
scale?


best,
Joe Gordon

> ****
>



[1] http://www.linux-kvm.com/sites/default/files/KvmForum2008_KSM.pdf


On Fri, Mar 9, 2012 at 4:44 PM, Caitlin Bestler <Caitlin.Bestler@xxxxxxxxxxx
> wrote:

>  Paulo,****
>
> ** **
>
> I believe you’ll find that we’re thinking along the same lines. Please
> review my proposal at http://etherpad.openstack.org/P9MMYSWE6U****
>
> ** **
>
> One quick observation is that SHA-1 is totally inadequate for
> fingerprinting objects in a public object store. An attacker could easily*
> ***
>
> predict the fingerprint of content likely to be posted, generate alternate
> content that had the same SHA-1 fingerprint and pre-empt****
>
> the signature. For example: an ISO of an open source OS distribution. If I
> get my false content with the same fingerprint into the****
>
> repository first then everyone who downloads that ISO will get my altered
> copy.
>
****
>
> ** **
>
> SHA-256 is really needed to make this type of attack infeasible.
>
 **
>
> I also think that distributed deduplication works very well with object
> versioning. Your comments on the proposal cited above ****
>
> would be great to hear.****
>
> ** **
>
> *From:* openstack-bounces+caitlin.bestler=nexenta.com@xxxxxxxxxxxxxxxxxxx[mailto:
> openstack-bounces+caitlin.bestler=nexenta.com@xxxxxxxxxxxxxxxxxxx] *On
> Behalf Of *Paulo Ricardo Motta Gomes
> *Sent:* Thursday, March 08, 2012 1:19 PM
> *To:* openstack@xxxxxxxxxxxxxxxxxxx
>
> *Subject:* [Openstack] Enabling data deduplication on Swift****
>
> ** **
>
> Hello everyone,****
>
> ** **
>
> I'm a student of the European Master in Distributed Computing (EMDC)
> currently working on my master thesis on distributed content-addressable
> storage/deduplication.****
>
> ** **
>
> I'm happy to announce I will be contributing the outcome of my thesis work
> to OpenStack by enabling both object-level and block-level deduplication
> functionality on Swift (
> https://answers.launchpad.net/swift/+question/156862).****
>
> ** **
>
> I have written a detailed blog post where I describe the initial
> architecture of my solution:
> http://paulormg.com/2012/03/05/enabling-deduplication-in-a-distributed-object-storage/
> ****
>
> ** **
>
> Feedback from the OpenStack/Swift community would be very appreciated.****
>
> ** **
>
> Cheers,****
>
>  ****
>
> Paulo****
>
> ** **
>
> --
> European Master in Distributed Computing - www.kth.se/emdc
> Royal Institute of Technology - KTH****
>
> Instituto Superior Técnico - IST****
>
> http://paulormg.com****
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References