openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #17602
Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
Hi folks,
At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 years
now. Currently, we hold about 18TB of data on 3 storage nodes. Since
we hit 84% in utilization, we have recently decided to expand the
storage with more disks.
In order to do that, after creating a new c0d4p1 partition in each of
the storage nodes, we ran the following commands on our proxy server:
swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100
swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100
swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100
swift-ring-builder account.builder rebalance
swift-ring-builder container.builder rebalance
swift-ring-builder object.builder rebalance
scp account.ring.gz storage1:/etc/swift/account.ring.gz
scp container.ring.gz storage1:/etc/swift/container.ring.gz
scp object.ring.gz storage1:/etc/swift/object.ring.gz
scp account.ring.gz storage2:/etc/swift/account.ring.gz
scp container.ring.gz storage2:/etc/swift/container.ring.gz
scp object.ring.gz storage2:/etc/swift/object.ring.gz
scp account.ring.gz storage3:/etc/swift/account.ring.gz
scp container.ring.gz storage3:/etc/swift/container.ring.gz
scp object.ring.gz storage3:/etc/swift/object.ring.gz
So in summary:
* we're adding each of these disks to the system first
* then rebalance
* then copy the ring.gz files to each storage node.
So right now, the problem is; the disk growth in each of the storage
nodes seems to have stalled,
It's been 2 days since they remain at same levels, they grew by a few
megabytes or not at all.
When I run df -h in each of these servers, here's what I see:
storage1 > /dev/cciss/c0d4p1 1.9T 11M 1.9T 1% /srv/node/c0d4p1
storage2 > /dev/cciss/c0d4p1 1.9T 12M 1.9T 1% /srv/node/c0d4p1
storage3 > /dev/cciss/c0d4p1 1.9T 829M 1.9T 1% /srv/node/c0d4p1
The imbalance between storage3 and others is interesting too. But
storage3 grew to 828M almost immediately, then grew a few mb more just
like others in 2 days.
Also let me admit a mistake I've made at the beginning:
rather than running
scp object.ring.gz storage2:/etc/swift/object.ring.gz
I ran
scp object.ring.gz storage2:/etc/swift/object.ring.gz
at the beginning but fixed it very quickly afterwards -like 2-3 mins later.
I'd be happy to hear your experience.
* Is this growth normal.
** Is it because the utlization has already hit 84% which is too high afaik.
** Is there anyway I can get it faster?
* If not,
** am I missing something
** or does it have something to do with the mistake I've made.
*** If so, is there a way I can fix it.
* Is there a way I can monitor Swift rebalance progress / logs other than df -h
* Bonus question: why do we copy ring.gz files to storage nodes and
how critical they are. To me it's not clear how Swift can afford to
wait (even though it's just a few seconds ) for .ring.gz files to be
in storage nodes after rebalancing- if those files are so critical.
Many thanks.
Follow ups