openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #17620
Re: Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
Hi Samuel,
Thanks for quick reply.
They're all 100. And here's the output of swift-ring-builder
root@proxy1:/etc/swift# swift-ring-builder account.builder
account.builder, build version 13
1048576 partitions, 3 replicas, 3 zones, 12 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id zone ip address port name weight partitions
balance meta
0 1 192.168.1.3 6002 c0d1p1 100.00 262144 0.00
1 1 192.168.1.3 6002 c0d2p1 100.00 262144 0.00
2 1 192.168.1.3 6002 c0d3p1 100.00 262144 0.00
3 2 192.168.1.4 6002 c0d1p1 100.00 262144 0.00
4 2 192.168.1.4 6002 c0d2p1 100.00 262144 0.00
5 2 192.168.1.4 6002 c0d3p1 100.00 262144 0.00
6 3 192.168.1.5 6002 c0d1p1 100.00 262144 0.00
7 3 192.168.1.5 6002 c0d2p1 100.00 262144 0.00
8 3 192.168.1.5 6002 c0d3p1 100.00 262144 0.00
9 1 192.168.1.3 6002 c0d4p1 100.00 262144 0.00
10 2 192.168.1.4 6002 c0d4p1 100.00 262144 0.00
11 3 192.168.1.5 6002 c0d4p1 100.00 262144 0.00
On Mon, Oct 22, 2012 at 12:03 PM, Samuel Merritt <sam@xxxxxxxxxxxxxx> wrote:
> On 10/22/12 9:38 AM, Emre Sokullu wrote:
>>
>> Hi folks,
>>
>> At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 years
>> now. Currently, we hold about 18TB of data on 3 storage nodes. Since
>> we hit 84% in utilization, we have recently decided to expand the
>> storage with more disks.
>>
>> In order to do that, after creating a new c0d4p1 partition in each of
>> the storage nodes, we ran the following commands on our proxy server:
>>
>> swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100
>> swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100
>> swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100
>> swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100
>> swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100
>> swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100
>> swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100
>> swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100
>> swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100
>>
>> [snip]
>
>>
>> So right now, the problem is; the disk growth in each of the storage
>> nodes seems to have stalled,
>
> So you've added 3 new devices to each ring and assigned a weight of 100 to
> each one. What are the weights of the other devices in the ring? If they're
> much larger than 100, then that will cause the new devices to end up with a
> small fraction of the data you want on them.
>
> Running "swift-ring-builder <thing>.builder" will show you information,
> including weights, of all the devices in the ring.
>
>
>
>> * Bonus question: why do we copy ring.gz files to storage nodes and
>> how critical they are. To me it's not clear how Swift can afford to
>> wait (even though it's just a few seconds ) for .ring.gz files to be
>> in storage nodes after rebalancing- if those files are so critical.
>
>
> The ring.gz files contain the mapping from Swift partitions to disks. As you
> know, the proxy server uses it to determine which backends have the data for
> a given request. The replicators also use the ring to determine where data
> belongs so that they can ensure the right number of replicas, etc.
>
> When two storage nodes have different versions of a ring.gz file, you can
> get replicator fights. They look like this:
>
> - node1's (old) ring says that the partition for a replica of /cof/fee/cup
> belongs on node2's /dev/sdf.
> - node2's (new) ring says that the same partition belongs on node1's
> /dev/sdd.
>
> When the replicator on node1 runs, it will see that it has the partition for
> /cof/fee/cup on its disk. It will then consult the ring, push that
> partition's contents to node2, and then delete its local copy (since node1's
> ring says that this data does not belong on node1).
>
> When the replicator on node2 runs, it will do the converse: push to node1,
> then delete its local copy.
>
> If you leave the rings out of sync for a long time, then you'll end up
> consuming disk and network IO ping-ponging a set of data around. If they're
> out of sync for a few seconds, then it's not a big deal.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
Follow ups
References