← Back to team overview

maas-devel team mailing list archive

Re: Scaling to 72k nodes

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 18/10/12 18:58, John Arbash Meinel wrote:
> On 10/18/2012 6:00 PM, Francis J. Lacoste wrote:
>> On 12-10-17 05:02 AM, John Arbash Meinel wrote:
>>> 18 nodegroups, each nodegroup has 4,000 nodes for 72,000 total 
>>> nodes. Each node has a ~25kB XML string associated with it.
> 
>> The 1 cluster controller can host 4k nodes is an assumption we 
>> haven't empirically validated.
> 
>> We should probably either validate it, or try variation on the
>> the number of nodes to clusters ratio (say with 500 per node,
>> 1000, 2000), just to see how the scaling behave differently.
> 
>> Cheers
> 
> For the purposes of what these clusters are doing, having more
> cluster controllers and fewer nodes each can really only be faster,
> rather than slower.
> 
> I'm willing to test that out a bit.
> 
> In the end, I did get HAProxy (c1.medium = 2cpu) running in front
> of 2 Region Controllers (c1.xlarge = 8cpu) backed by 1 DB host
> (c1.xlarge). After getting that working, I ran iftop and noticed
> that the bottleneck is not region controller CPUs, but networking
> bandwidth. Specifically, we saturate 600+Mbps (most of a 1Gbit
> link) while these tags are updating. Which means that we might be
> able to scale up if the cluster workers know about a round-robin
> request against region controllers (theoretically the bandwidth is
> switched, so you can get 1Gbps to each server), but if we hide it
> behind an HAProxy, then the HAProxy bandwidth becomes the
> bottleneck.
> 
> 
> I do have a patch for that which just landed, which enables
> Django's gzip middleware, and patches the cluster controllers to
> pass the 'Accept-Encoding: gzip'. This should even help web
> responsiveness for any large pages, so it seemed like a generally
> good thing.
> 
> I hacked some of the nodes to test it, and on nodes that had 
> compression enabled the time dropped from 22s down to 15s. I'm
> waiting to get it packaged so I don't have to manually hack ~20
> instances.
> 
> 
> I can certainly come back to play with the specific layouts more, 
> though I'll need to submit the request for more instances to
> Amazon. All indications seem strong that the negative scaling would
> be for having fewer clusters with more nodes (say 10,000 nodes per
> cluster).
> 
> At least for tags, the actual processing is "embarrassingly
> parallel". So if you want to throw more workers into the mix, it
> will happily accept it. If we felt it was super critical, we could
> find a granularity at something smaller than a whole cluster size,
> and have the processing work on that (we already process in 1000
> node batches, we would just need a way to divy that work up
> differently).
> 
> I'll poke one more time, but being able to say "rebuilding tags
> from 1 Region Controller to 20 Cluster Controllers" is network
> bandwidth bound, rather than CPU bound or DB load bound is a pretty
> good place to be.
> 
> John =:->


This is all very useful stuff!
It would be really good to have some sort of recommended configuration
for this. I want to add a "hyperscale" section to the docs and it
would be great to include some sort of idea what is possible and what
is most
efficient.

- -- 
Nick Veitch,
Technical Writer, Cloud Engineering, Canonical
https://launchpad.net/~evilnick
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQgGGYAAoJEFYstiVwao215qMH/306WQLJcXGv8imZ6BRNVKrx
2gIpX7XiR6bAavAFibxuSSrOXMB8FnwcTsUEbwGEpRjJn/lX0NXfQULauZPsAjL+
qwt/F/N/gZy3dwW2PBrJx0yE3x/WQBfiPOi3ixPj7wx+qhGc+spEdV7Kx4Gf56qI
byPo/jYaEuA1Xk39Jk0aItUUGMArMTNq3b1r0QppERhwm4NciIRq06SnV0WtbZkl
GGy5pDRLfolI8AvL6GGjhCckbcWvXLR3ghai5n6xJ9pPlyF8RFW/NQlt7YSbfbxK
T+JRnE91LBEeOGIrUaEVVA0h++Ic/MYX2xxUDryMo54xBP88i7XE+B61Leyvf6w=
=Cm5P
-----END PGP SIGNATURE-----


References