[SWIFT] Change the partition power to recreate the RING


If effect, this would be a complete replacement of your rings, and that is essentially a whole new cluster. All of the existing data would need to be rehashed into the new ring before it is available.

There is no process that rehashes the data to ensure that it is still in the correct partition. Replication only ensures that the partitions are on the right drives.

To change the number of partitions, you will need to GET all of the data from the old ring and PUT it to the new ring. A more complicated, but perhaps more efficient) solution may include something like walking each drive and rehashing+moving the data to the right partition and then letting replication settle it down.

Either way, 100% of your existing data will need to at least be rehashed (and probably moved). Your CPU (hashing), disks (read+write), RAM (directory walking), and network (replication) may all be limiting factors in how long it will take to do this. Your per-disk free space may also determine what method you choose.

I would not expect any data loss while doing this, but you will probably have availability issues, depending on the data access patterns.

I'd like to eventually see something in swift that allows for changing the partition power in existing rings, but that will be hard/tricky/non-trivial.

Good luck.


On Jan 11, 2013, at 1:17 PM, Alejandro Comisario wrote:

> Hi guys.
> We've created a swift cluster several months ago, the things is that righ now we cant add hardware and we configured lots of partitions thinking about the final picture of the cluster.
> Today each datanodes is having 2500+ partitions per device, and even tuning the background processes ( replicator, auditor & updater ) we really want to try to lower the partition power.
> Since its not possible to do that without recreating the ring, we can have the luxury of recreate it with a very lower partition power, and rebalance / deploy the new ring.
> The question is, having a working cluster with *existing data* is it possible to do this and wait for the data to move around *without data loss* ???
> If so, it might be true to wait for an improvement in the overall cluster performance ?
> We have no problem to have a non working cluster (while moving the data) even for an entire weekend.
> Cheers.

