openstack team mailing list archive

Thread
Date

Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)

To: openstack@xxxxxxxxxxxxxxxxxxx
From: Emre Sokullu <emre@xxxxxxxxxxxxxx>
Date: Mon, 22 Oct 2012 09:38:57 -0700

Hi folks,

At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 years
now. Currently, we hold about 18TB of data on 3 storage nodes. Since
we hit 84% in utilization, we have recently decided to expand the
storage with more disks.

In order to do that, after creating a new c0d4p1 partition in each of
the storage nodes, we ran the following commands on our proxy server:

swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100

swift-ring-builder account.builder rebalance
swift-ring-builder container.builder rebalance
swift-ring-builder object.builder rebalance

scp account.ring.gz storage1:/etc/swift/account.ring.gz
scp container.ring.gz storage1:/etc/swift/container.ring.gz
scp object.ring.gz storage1:/etc/swift/object.ring.gz
scp account.ring.gz storage2:/etc/swift/account.ring.gz
scp container.ring.gz storage2:/etc/swift/container.ring.gz
scp object.ring.gz storage2:/etc/swift/object.ring.gz
scp account.ring.gz storage3:/etc/swift/account.ring.gz
scp container.ring.gz storage3:/etc/swift/container.ring.gz
scp object.ring.gz storage3:/etc/swift/object.ring.gz

So in summary:
* we're adding each of these disks to the system first
* then rebalance
* then copy the ring.gz files to each storage node.

So right now, the problem is;  the disk growth in each of the storage
nodes seems to have stalled,

It's been 2 days since they remain at same levels, they grew by a few
megabytes or not at all.

When I run df -h in each of these servers, here's what I see:

storage1 > /dev/cciss/c0d4p1     1.9T   11M  1.9T   1% /srv/node/c0d4p1
storage2 > /dev/cciss/c0d4p1     1.9T   12M  1.9T   1% /srv/node/c0d4p1
storage3 > /dev/cciss/c0d4p1     1.9T  829M  1.9T   1% /srv/node/c0d4p1

The imbalance between storage3 and others is interesting too. But
storage3 grew to 828M almost immediately, then grew a few mb more just
like others in 2 days.

Also let me admit a mistake I've made at the beginning:
rather than running
scp object.ring.gz storage2:/etc/swift/object.ring.gz
I ran
scp object.ring.gz storage2:/etc/swift/object.ring.gz
at the beginning but fixed it very quickly afterwards -like 2-3 mins later.

I'd be happy to hear your experience.

* Is this growth normal.
** Is it because the utlization has already hit 84% which is too high afaik.
** Is there anyway I can get it faster?

* If not,
** am I missing something
** or does it have something to do with the mistake I've made.
*** If so, is there a way I can fix it.

* Is there a way I can monitor Swift rebalance progress / logs other than df -h

* Bonus question: why do we copy ring.gz files to storage nodes and
how critical they are. To me it's not clear how Swift can afford to
wait (even though it's just a few seconds ) for .ring.gz files to be
in storage nodes after rebalancing- if those files are so critical.

Many thanks.

Follow ups

Re: Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
From: Samuel Merritt, 2012-10-22
Re: Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
From: Emre Sokullu, 2012-10-22