maas-devel team mailing list archive

Thread
Date

Re: Appserver & DB Scaling

To: Julian Edwards <julian.edwards@xxxxxxxxxxxxx>
From: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>
Date: Wed, 24 Oct 2012 16:25:39 +0200
Cc: maas-devel@xxxxxxxxxxxxxxxxxxx
In-reply-to: <3117678.Tomj3im3my@stinkpad>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/24/2012 02:33 PM, Julian Edwards wrote:
> On Wednesday 24 Oct 2012 13:15:51 John Arbash Meinel wrote:
>> So we are able to throw a little bit more hardware at the system,
>> but we aren't able to scale out the region controller appservers
>> much more horizontally than the DB itself. And I don't think MaaS
>> currently would 'just work' with a master & slave postgres
>> setup.
> 
> What is the split between read/write transactions on the DB for tag
>  rebuilding?
> 
> One option we thought of ages ago but didn't think we'd need for a
> long time was to shard the master DB.
> 

For tag rebuilding it is about 95% read. With a burst of writing right
at the end.

So the 'load' here is mostly about time. We've roughly reached
saturation at 64knodes and 16 clusters, such that adding more clusters
and nodes makes rebuilding tags take longer.

Some numbers again:

4,000 nodes (1 cluster) takes 12.5s to rebuild
40,000 nodes (10 clusters) takes 14s to rebuild so a 12% slowdown for
10x the data.
64,000 nodes (16 clusters) takes 17s to rebuild, 36% slowdown for 16x
data, or 21% incremental slowdown for 1.6 incremental data.
128,000 "nodes" (16 clusters * 2 tags) takes 30s to rebuild. Or 75%
incremental overhead for 2x the data.

So while things aren't perfectly flat, 1 8-CPU DB + 1 8-CPU Appserver
scales pretty flat up to about 40-50,000 nodes. (Adding 2x the
clusters causes the time to rebuild tags to scale sublinearly, and
almost not at all.) After about 40-50k nodes, adding clusters starts
increasing the time to rebuild ~linearly. (eg 1 cluster takes 8s, 5
clusters take 9s, 10 clusters take 10s, but N>10 clusters takes N
seconds.)

However, if you throw more hardware at the DB server, I think a 32-CPU
DB machine should be able to support ~256,000 nodes still in 'flat'
scaling.

As for doing things differently, the design that we had originally
looked at for 12.10 (but discarded as taking too long) is to have the
hardware information stored on each cluster controller, rather than on
the central database. Which is essentially a sharding scheme.

If you do that, I'm quite confident that things would scale even
better than what we have know, but I'm not sure that it is actually
worth the effort. I don't know what people consider 'reasonable
rebuild times', but 30s is probably still reasonable, since it isn't
blocking an HTTP request. You can rebuild 1 tag across 128,000 (32
cluster controllers) in 30s with the hardware I was using.

So there is "this isn't perfectly flat scaling, and the central DB
server has become a bottleneck", but "it is still perfectly within the
load that we want to support".

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCH+mMACgkQJdeBCYSNAAM4agCgzY+hjs+yoas5TkqknTKIhtug
eCgAoM7XxQTWyc7b7KDwBedzjhgUkVB8
=Z9DM
-----END PGP SIGNATURE-----

References

Appserver & DB Scaling
From: John Arbash Meinel, 2012-10-24
Re: Appserver & DB Scaling
From: Julian Edwards, 2012-10-24