openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #17611
Re: Expanding Storage - Rebalance Extreeemely Slow (or Stalled?)
Also, fyi,
I didnt' run any "swift-ring-builder create " command before running
the set of "swift-ring-builder account.builder add" commands below.
Because I thought, I'm not recreating, I'm just adding new ones to the
existing system. But I'm not sure if that was the right approach.
On Mon, Oct 22, 2012 at 9:38 AM, Emre Sokullu <emre@xxxxxxxxxxxxxx> wrote:
> Hi folks,
>
> At GROU.PS, we've been an OpenStack SWIFT user for more than 1.5 years
> now. Currently, we hold about 18TB of data on 3 storage nodes. Since
> we hit 84% in utilization, we have recently decided to expand the
> storage with more disks.
>
> In order to do that, after creating a new c0d4p1 partition in each of
> the storage nodes, we ran the following commands on our proxy server:
>
> swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100
> swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100
> swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100
> swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100
> swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100
> swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100
> swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100
> swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100
> swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100
>
> swift-ring-builder account.builder rebalance
> swift-ring-builder container.builder rebalance
> swift-ring-builder object.builder rebalance
>
> scp account.ring.gz storage1:/etc/swift/account.ring.gz
> scp container.ring.gz storage1:/etc/swift/container.ring.gz
> scp object.ring.gz storage1:/etc/swift/object.ring.gz
> scp account.ring.gz storage2:/etc/swift/account.ring.gz
> scp container.ring.gz storage2:/etc/swift/container.ring.gz
> scp object.ring.gz storage2:/etc/swift/object.ring.gz
> scp account.ring.gz storage3:/etc/swift/account.ring.gz
> scp container.ring.gz storage3:/etc/swift/container.ring.gz
> scp object.ring.gz storage3:/etc/swift/object.ring.gz
>
> So in summary:
> * we're adding each of these disks to the system first
> * then rebalance
> * then copy the ring.gz files to each storage node.
>
> So right now, the problem is; the disk growth in each of the storage
> nodes seems to have stalled,
>
> It's been 2 days since they remain at same levels, they grew by a few
> megabytes or not at all.
>
> When I run df -h in each of these servers, here's what I see:
>
> storage1 > /dev/cciss/c0d4p1 1.9T 11M 1.9T 1% /srv/node/c0d4p1
> storage2 > /dev/cciss/c0d4p1 1.9T 12M 1.9T 1% /srv/node/c0d4p1
> storage3 > /dev/cciss/c0d4p1 1.9T 829M 1.9T 1% /srv/node/c0d4p1
>
> The imbalance between storage3 and others is interesting too. But
> storage3 grew to 828M almost immediately, then grew a few mb more just
> like others in 2 days.
>
> Also let me admit a mistake I've made at the beginning:
> rather than running
> scp object.ring.gz storage2:/etc/swift/object.ring.gz
> I ran
> scp object.ring.gz storage2:/etc/swift/object.ring.gz
> at the beginning but fixed it very quickly afterwards -like 2-3 mins later.
>
> I'd be happy to hear your experience.
>
> * Is this growth normal.
> ** Is it because the utlization has already hit 84% which is too high afaik.
> ** Is there anyway I can get it faster?
>
> * If not,
> ** am I missing something
> ** or does it have something to do with the mistake I've made.
> *** If so, is there a way I can fix it.
>
> * Is there a way I can monitor Swift rebalance progress / logs other than df -h
>
> * Bonus question: why do we copy ring.gz files to storage nodes and
> how critical they are. To me it's not clear how Swift can afford to
> wait (even though it's just a few seconds ) for .ring.gz files to be
> in storage nodes after rebalancing- if those files are so critical.
>
> Many thanks.
References