openstack team mailing list archive

Thread
Date

Re: [SWIFT] Change the partition power to recreate the RING

To: John Dickinson <me@xxxxxx>
From: Tong Li <litong01@xxxxxxxxxx>
Date: Mon, 14 Jan 2013 10:54:53 -0500
Cc: openstack-bounces+litong01=us.ibm.com@xxxxxxxxxxxxxxxxxxx, "openstack-operators@xxxxxxxxxxxxxxxxxxx" <openstack-operators@xxxxxxxxxxxxxxxxxxx>, openstack <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <1758F814-1C49-46A9-9016-9A5BC5393476@not.mn>

John and swifters,
	I see this problem as a big problem and I think that the scenario
described by Alejandro is a very common scenario. I am thinking if it is
possible to have like two rings (one with the newer extended power, one
with the existing ring power), when significant changes made to the
hardware, partition, a new ring get started with a command, and new data
into Swift will use the new ring, and existing data on the existing ring
still available and slowly (not impact the normal use) but automatically
moves to the new ring, once the existing ring shrinks to the size zero,
then that ring can be removed. The idea is to sort of having two virtual
Swift systems working side by side, the migration from existing ring to new
ring being done without interrupting the service. Can we put this
topic/feature as one to be discussed during the next summit and to be
considered as a high priority feature to work on for coming releases?

Thanks.

Tong Li
Emerging Technologies & Standards
Building 501/B205
litong01@xxxxxxxxxx

From:	John Dickinson <me@xxxxxx>
To:	Alejandro Comisario <alejandro.comisario@xxxxxxxxxxxxxxxx>,
Cc:	"openstack-operators@xxxxxxxxxxxxxxxxxxx"
            <openstack-operators@xxxxxxxxxxxxxxxxxxx>, openstack
            <openstack@xxxxxxxxxxxxxxxxxxx>
Date:	01/11/2013 04:28 PM
Subject:	Re: [Openstack] [SWIFT] Change the partition power to recreate
            the	RING
Sent by:	openstack-bounces+litong01=us.ibm.com@xxxxxxxxxxxxxxxxxxx

If effect, this would be a complete replacement of your rings, and that is
essentially a whole new cluster. All of the existing data would need to be
rehashed into the new ring before it is available.

There is no process that rehashes the data to ensure that it is still in
the correct partition. Replication only ensures that the partitions are on
the right drives.

To change the number of partitions, you will need to GET all of the data
from the old ring and PUT it to the new ring. A more complicated, but
perhaps more efficient) solution may include something like walking each
drive and rehashing+moving the data to the right partition and then letting
replication settle it down.

Either way, 100% of your existing data will need to at least be rehashed
(and probably moved). Your CPU (hashing), disks (read+write), RAM
(directory walking), and network (replication) may all be limiting factors
in how long it will take to do this. Your per-disk free space may also
determine what method you choose.

I would not expect any data loss while doing this, but you will probably
have availability issues, depending on the data access patterns.

I'd like to eventually see something in swift that allows for changing the
partition power in existing rings, but that will be
hard/tricky/non-trivial.

Good luck.

--John

On Jan 11, 2013, at 1:17 PM, Alejandro Comisario
<alejandro.comisario@xxxxxxxxxxxxxxxx> wrote:

> Hi guys.
> We've created a swift cluster several months ago, the things is that righ
now we cant add hardware and we configured lots of partitions thinking
about the final picture of the cluster.
>
> Today each datanodes is having 2500+ partitions per device, and even
tuning the background processes ( replicator, auditor & updater ) we really
want to try to lower the partition power.
>
> Since its not possible to do that without recreating the ring, we can
have the luxury of recreate it with a very lower partition power, and
rebalance / deploy the new ring.
>
> The question is, having a working cluster with *existing data* is it
possible to do this and wait for the data to move around *without data
loss* ???
> If so, it might be true to wait for an improvement in the overall cluster
performance ?
>
> We have no problem to have a non working cluster (while moving the data)
even for an entire weekend.
>
> Cheers.
>
>

[attachment "smime.p7s" deleted by Tong Li/Raleigh/IBM]
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: [SWIFT] Change the partition power to recreate the RING
From: John Dickinson, 2013-01-14

References

[SWIFT] Change the partition power to recreate the RING
From: Alejandro Comisario, 2013-01-11
Re: [SWIFT] Change the partition power to recreate the RING
From: John Dickinson, 2013-01-11