← Back to team overview

openstack team mailing list archive

Re: [SWIFT] Change the partition power to recreate the RING

 

Yes, I think it would be a great topic for the summit.

--John


On Jan 14, 2013, at 7:54 AM, Tong Li <litong01@xxxxxxxxxx> wrote:

> John and swifters,
> I see this problem as a big problem and I think that the scenario described by Alejandro is a very common scenario. I am thinking if it is possible to have like two rings (one with the newer extended power, one with the existing ring power), when significant changes made to the hardware, partition, a new ring get started with a command, and new data into Swift will use the new ring, and existing data on the existing ring still available and slowly (not impact the normal use) but automatically moves to the new ring, once the existing ring shrinks to the size zero, then that ring can be removed. The idea is to sort of having two virtual Swift systems working side by side, the migration from existing ring to new ring being done without interrupting the service. Can we put this topic/feature as one to be discussed during the next summit and to be considered as a high priority feature to work on for coming releases?
> 
> Thanks.
> 
> Tong Li
> Emerging Technologies & Standards
> Building 501/B205
> litong01@xxxxxxxxxx
> 
> <graycol.gif>John Dickinson ---01/11/2013 04:28:47 PM---If effect, this would be a complete replacement of your rings, and that is essentially a whole new c
> 
> From:	John Dickinson <me@xxxxxx>
> To:	Alejandro Comisario <alejandro.comisario@xxxxxxxxxxxxxxxx>, 
> Cc:	"openstack-operators@xxxxxxxxxxxxxxxxxxx" <openstack-operators@xxxxxxxxxxxxxxxxxxx>, openstack <openstack@xxxxxxxxxxxxxxxxxxx>
> Date:	01/11/2013 04:28 PM
> Subject:	Re: [Openstack] [SWIFT] Change the partition power to recreate the	RING
> Sent by:	openstack-bounces+litong01=us.ibm.com@xxxxxxxxxxxxxxxxxxx
> 
> 
> 
> If effect, this would be a complete replacement of your rings, and that is essentially a whole new cluster. All of the existing data would need to be rehashed into the new ring before it is available.
> 
> There is no process that rehashes the data to ensure that it is still in the correct partition. Replication only ensures that the partitions are on the right drives.
> 
> To change the number of partitions, you will need to GET all of the data from the old ring and PUT it to the new ring. A more complicated, but perhaps more efficient) solution may include something like walking each drive and rehashing+moving the data to the right partition and then letting replication settle it down.
> 
> Either way, 100% of your existing data will need to at least be rehashed (and probably moved). Your CPU (hashing), disks (read+write), RAM (directory walking), and network (replication) may all be limiting factors in how long it will take to do this. Your per-disk free space may also determine what method you choose.
> 
> I would not expect any data loss while doing this, but you will probably have availability issues, depending on the data access patterns.
> 
> I'd like to eventually see something in swift that allows for changing the partition power in existing rings, but that will be hard/tricky/non-trivial.
> 
> Good luck.
> 
> --John
> 
> 
> On Jan 11, 2013, at 1:17 PM, Alejandro Comisario <alejandro.comisario@xxxxxxxxxxxxxxxx> wrote:
> 
> > Hi guys.
> > We've created a swift cluster several months ago, the things is that righ now we cant add hardware and we configured lots of partitions thinking about the final picture of the cluster.
> > 
> > Today each datanodes is having 2500+ partitions per device, and even tuning the background processes ( replicator, auditor & updater ) we really want to try to lower the partition power.
> > 
> > Since its not possible to do that without recreating the ring, we can have the luxury of recreate it with a very lower partition power, and rebalance / deploy the new ring.
> > 
> > The question is, having a working cluster with *existing data* is it possible to do this and wait for the data to move around *without data loss* ???
> > If so, it might be true to wait for an improvement in the overall cluster performance ?
> > 
> > We have no problem to have a non working cluster (while moving the data) even for an entire weekend.
> > 
> > Cheers.
> > 
> > 
> 
> [attachment "smime.p7s" deleted by Tong Li/Raleigh/IBM] _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 

References