← Back to team overview

maas-devel team mailing list archive

Re: Scaling to 72k nodes

 

Thanks for the write up John!

On 16/10/2012, John Arbash Meinel <john@xxxxxxxxxxxxxxxxx> wrote:
>
> Earlier in the week I had 10 Cluster Controllers, each on a c1.medium,
> which is 2-cores of the same speed as the c1.xlarge. Today I added an
> additional 8 Cluster Controllers (because I'm currently limited to 20
> EC2 instances, and I need to figure out how to change that).

You might want to poke James Page on IRC as he needed his limit bumped
up in the past, unfortunately I think it's not very easy to navigate
the Amazone bureaucracy.

> One interesting bit is the 'fairness' of the system. It appears that
> all the cluster controllers get the job request at approximately the
> same time, but I end up getting most of them completing the job in
> about 3.5s, but then a second wave of them take 13s to complete. My
> guess is that it has something to do with keep-alive. The Region
> controller currently has 12 'wgsi' workers, (though I really only see
> 8 of them with active CPU at any one time).

Was it clear at which stage the delay was on for the second batch?
Waiting for the hardware details, or when posting back the matching
nodes? Fiddling with the number of workers does sound like it may
affect things.

Martin


Follow ups

References