openstack team mailing list archive

Thread
Date

Re: Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)

To: openstack@xxxxxxxxxxxxxxxxxxx
From: Samuel Merritt <sam@xxxxxxxxxxxxxx>
Date: Mon, 22 Oct 2012 12:03:46 -0700
In-reply-to: <CA+D26LsifiJ4Cy+Liz_nCUHzY2q4UC2Oe9Yrx-xUG2agun4nRg@mail.gmail.com>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20121010 Thunderbird/16.0.1

On 10/22/12 9:38 AM, Emre Sokullu wrote:

Hi folks,

At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 years
now. Currently, we hold about 18TB of data on 3 storage nodes. Since
we hit 84% in utilization, we have recently decided to expand the
storage with more disks.

In order to do that, after creating a new c0d4p1 partition in each of
the storage nodes, we ran the following commands on our proxy server:

swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100

> [snip]
>
> So right now, the problem is;  the disk growth in each of the storage
> nodes seems to have stalled,

So you've added 3 new devices to each ring and assigned a weight of 100to each one. What are the weights of the other devices in the ring? Ifthey're much larger than 100, then that will cause the new devices toend up with a small fraction of the data you want on them.

Running "swift-ring-builder <thing>.builder" will show you information,including weights, of all the devices in the ring.

* Bonus question: why do we copy ring.gz files to storage nodes and
how critical they are. To me it's not clear how Swift can afford to
wait (even though it's just a few seconds ) for .ring.gz files to be
in storage nodes after rebalancing- if those files are so critical.

The ring.gz files contain the mapping from Swift partitions to disks. Asyou know, the proxy server uses it to determine which backends have thedata for a given request. The replicators also use the ring to determinewhere data belongs so that they can ensure the right number of replicas,etc.

When two storage nodes have different versions of a ring.gz file, youcan get replicator fights. They look like this:

- node1's (old) ring says that the partition for a replica of/cof/fee/cup belongs on node2's /dev/sdf.- node2's (new) ring says that the same partition belongs on node1's/dev/sdd.

When the replicator on node1 runs, it will see that it has the partitionfor /cof/fee/cup on its disk. It will then consult the ring, push thatpartition's contents to node2, and then delete its local copy (sincenode1's ring says that this data does not belong on node1).

When the replicator on node2 runs, it will do the converse: push tonode1, then delete its local copy.

If you leave the rings out of sync for a long time, then you'll end upconsuming disk and network IO ping-ponging a set of data around. Ifthey're out of sync for a few seconds, then it's not a big deal.

Follow ups

Re: Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
From: Emre Sokullu, 2012-10-22

References

Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
From: Emre Sokullu, 2012-10-22