openstack team mailing list archive

Thread
Date

Re: Enabling data deduplication on Swift

To: andi abes <andi.abes@xxxxxxxxx>
From: Caitlin Bestler <Caitlin.Bestler@xxxxxxxxxxx>
Date: Mon, 12 Mar 2012 16:46:22 +0000
Accept-language: en-US
Cc: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CA+KYVfgsS8SdiZ-RqVsrb9SmOhxdk_GSTkMB0gMgJ3KXnPmRcg@mail.gmail.com>
Thread-index: AQHM/XHIK82GyRBoPUefj650yAW135Zij9MwgAGoRICAAAUhgIAAMaeAgABg5YCAALS2+IAAm1wAgADCY1A=
Thread-topic: [Openstack] Enabling data deduplication on Swift


Andi abes asked: 

> Doesn't that depend on the ratios of read vs write?
> In a read tilted environment (e.g. CDN's, image stores etc), being able to dedup at the block level in the
> relatively rare write case seems a boon. The simplification this could allow - performing localized dedup
> (i.e. each object server deduping just its local storage) seems worth while.

For the most part deduplication has no impact on read performance. The same chunks will be fetched
whether they were de-duplicated or not.

If you have a central metadata system (like GFS or HDFS) then deduplication can impair optimizing the location
of the chunks for streaming reads. But with hash driven algorithms you either place the entire object on one
server, which will preclude parallelizing the fetch, or you distribute the object's chunks to multiple servers 
which will impair the efficiency of a slow streaming read.

Because distributed deduplication relies on fingerprinted chunks it has the advantage of allowing unrestricted
Chunk caching, which is the real solution to optimizing reads of extremely popular data.

References

Enabling data deduplication on Swift
From: Paulo Ricardo Motta Gomes, 2012-03-08
Re: Enabling data deduplication on Swift
From: Caitlin Bestler, 2012-03-09
Re: Enabling data deduplication on Swift
From: Joe Gordon, 2012-03-10
Re: Enabling data deduplication on Swift
From: Maru Newby, 2012-03-10
Re: Enabling data deduplication on Swift
From: andi abes, 2012-03-10
Re: Enabling data deduplication on Swift
From: Maru Newby, 2012-03-11
Re: Enabling data deduplication on Swift
From: Caitlin Bestler, 2012-03-11
Re: Enabling data deduplication on Swift
From: andi abes, 2012-03-12