maas-devel team mailing list archive

Thread
Date

Scaling to 72k nodes

To: maas-devel@xxxxxxxxxxxxxxxxxxx
From: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>
Date: Tue, 16 Oct 2012 18:13:52 +0400
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So here are my results so far trying to scale MAAS up to 100,000 nodes
wrt tag rebuilding.

The basic layout now is that I have 1 MAAS Region Controller, set up
on a c1.xlarge machine in EC2 (so it has 8-cores).

Earlier in the week I had 10 Cluster Controllers, each on a c1.medium,
which is 2-cores of the same speed as the c1.xlarge. Today I added an
additional 8 Cluster Controllers (because I'm currently limited to 20
EC2 instances, and I need to figure out how to change that).

After declaring and commissioning everything, I create 4000 nodes in
each node group. So I was at 40,000 nodes, and I'm currently at
72,000. Note that I specifically have no nodes in the 'master'
nodegroup, because I want to remove that load from the machine that
also is running the Postgres db and the Region Controller.


When I issue a request to rebuild a tag, it fires off the celery job
to each cluster controller, which then get the data from the MAAS
Region controller, and post back the tag results.

One interesting bit is the 'fairness' of the system. It appears that
all the cluster controllers get the job request at approximately the
same time, but I end up getting most of them completing the job in
about 3.5s, but then a second wave of them take 13s to complete. My
guess is that it has something to do with keep-alive. The Region
controller currently has 12 'wgsi' workers, (though I really only see
8 of them with active CPU at any one time).

Yesterday with 10 clusters I was seeing:

rebuild 1 tag	9s
rebuild 2 tags	12s
rebuild 4 tags  25s

So you can see that we can do 2 tags at one time, but beyond that we
just serialize the requests. (Which makes sense, because each cluster
only has 2 workers for 2 cpus. I do wonder if we could overcommit
because we should be spending some time waiting for the other side to
think.)

Today with 18 clusters I get:
rebuild 1 tag	11s
rebuild 2 tags	15s
rebuild 4 tags  33s

So far, it doesn't look like we're absolutely saturated, since doing 2
tags in parallel is still faster than doing them sequentially. Which
is a good sign that we can do 18 clusters on a single (higher-powered)
region controller.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB9a6AACgkQJdeBCYSNAAPJ3ACfeLTR1P728CR2dwJjtyXCXcn5
cRwAoNYk/cWGW9/AeYH/VPpAIfJOQg7e
=HWEW
-----END PGP SIGNATURE-----

Follow ups

Re: Scaling to 72k nodes
From: Stuart Bishop, 2012-10-17
Re: Scaling to 72k nodes
From: Martin Packman, 2012-10-16