maas-devel team mailing list archive

Thread
Date
EOD Tag Performance Summary (and napkin math for 100k nodes)

To: maas-devel@xxxxxxxxxxxxxxxxxxx
From: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>
Date: Thu, 11 Oct 2012 17:39:39 +0400
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120907 Thunderbird/15.0.1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I figured it would be nice to post the results of tweaking
tag-building performance this week.

Once we finally landed the patch to make tag building asynchronous,
performance dropped quite a bit. I've been testing on my personal
machine, with a database with just the master cluster, but with 10,000
nodes enlisted.

The times when the code landed was:
 5.7s To run xpath_exists(...) inside Postgresql with 10k nodes
 37s  To do the rebuild asynchronously

We ended up tweaking a lot of little bits, mostly wrt how we serialize
the requests and responses.
At the middle of this week, we were at:
 27s To rebuild asynchronously

It turned out that Piston's default json serialization was
particularly bad for the large content blobs we were returning
(ensure_ascii=False, means that simplejson creates a Unicode string,
but does so in pure-python, the C accelerator is only for 'str'
output.) Changing that, and tweaking how many nodes we request in a
batch got us down to:
 12s To rebuild

At which point I'm going to stop trying to tweak it. The goal was to
get to <2x the cost of doing it directly in the database.

Also, with a machine with 4 CPUs, I can run 3 rebuilds simultaneously,
and they each take <15s. Which running serially in the db would take
~17s.[1] If I try to run 4 concurrent, then we run into CPU
contention, because the MAAS process needs one of the CPUs (so they
slow down to 20s per tag).

My napkin math says that 1 MAAS API server can feed ~8 cluster
machines. It takes the DB 1s to extract the raw strings off disk
(measured by psql). And then some amount of time in the DB for pushing
the updated node<=>tag information back in. So DB time should be <2s
out of the 12s. So on my hardware DB contention could be a bottleneck
still. Note, though, that the rows being requested and the rows being
updated are going to be different for each cluster, so scaling the DB
hardware should help. If my simple setup can sustain ~10 cluster
workers, a 32-processor machine should be able to do >100 clusters.

So to scale to 100k nodes, we expect ~4k nodes per cluster. That gives
us 25 cluster controllers. At 8:1, we'll need ~3 MAAS api servers. (is
that 3 processes on 1 machine, or 3 machines, or... ?) And then
something like an 8-processor DB.

10,000 nodes is a 192MB database, so 100,000 should be in the 1GB
range. Which is able to be cached in memory trivially.

So while there is a goal to move the data out of the central DB, and
onto the cluster workers (and the APIs as written should handle that
easily), it doesn't seem strictly necessary to scale into 100k nodes.

John
=:->



[1]: Not that postgresql couldn't run it in parallel too, but it moves
all that load out of the DB machine and onto the horizontally scalable
cluster workers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Cygwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB2zBsACgkQJdeBCYSNAAO5dQCgujI+jte+AfbSN3jdOr3uz/RD
9tMAoJ09C/GY2WUQVaoVyP0k6gx1zlAE
=Zl+3
-----END PGP SIGNATURE-----