openstack team mailing list archive

Thread
Date

Swift block-level deduplication

To: openstack@xxxxxxxxxxxxxxxxxxx
From: Eoghan Glynn <eglynn@xxxxxxxxxx>
Date: Thu, 12 Apr 2012 12:09:25 -0400 (EDT)
In-reply-to: <00b89ce8-25ae-4170-a874-ef778e93fb75@zmail11.collab.prod.int.phx2.redhat.com>


Folks,

>From previous posts on the ML, it seems there are a couple of
efforts in train to add distributed content deduping to Swift.

My question is whether either or both these approaches involve
active client participation in enabling duplicate chunk
detection?

One could see a spectrum ranging between:

1. Client actively breaks the object into chunks, selects the
   hashing algorithm, calculates fingerprint and then only uploads
   if Swift reports that fingerprint is unknown.

2. Client determines which objects are worth deduping, maybe has
   some influence on chunk size and/or hashing, but fingerprint
   calculation is all handled internally by Swift.

3. Client is entirely uninvolved, deduplication is handled
   transparently in the object storage layer and enabled either
   globally or per-container.

If anyone involved has insight into the above, I'd be interested
in hearing your thoughts (the context is leveraging dedupe in glance).

Cheers,
Eoghan

Follow ups

Re: Swift block-level deduplication
From: Caitlin Bestler, 2012-04-12