← Back to team overview

mosquitto-users team mailing list archive

mosquitto scaling results

 

Hi,

(tl;dr: ~3000 simulataneous bridges, qos1, tls-psk 0.166Hz messages published per client, ~500
messages per second aggregate was about the stable limit. TLS has no impact, how did anyone
get 100,000?! or even 20,000?!)

I've been working on testing out the scalability of the mosquitto broker over the last few
weeks, and would like to present our current figures, and ask for comments/feedback on other
people who have tested out mosquitto in any sort of large scale system.

Anecdotally, we had heard that mosquitto was doing "many 10s of thousands of connections"
"unless you turn on SSL" which all seemed reasonable.

The particular test scenario we were looking at was many (as many as possible) standalone 
mosquitto brokers, each
configured to _bridge_ traffic into a single central mosquitto broker.  Each edge broker
 would be publish one message (QoS 1) every six seconds into the central broker, and those connections
would be TLS-PSK.  On the central broker would be a single QoS 1 subscriber.
We were not interested in how many messages per second a single client could send/receive, but
in how many clients we could sustain at the relatively low rate per client.

We looked at existing tools for this, but found very little, and certainly nothing that
handled setting up all the bridges and offering TLS-PSK support.  We ended up writing some
python tools, named mqtt-malaria to handle generating all the load, as well as monitoring the
central server.  These tools are not perfect.  By using python's multiprocessing, they are
very simple, but quite memory heavy.  It's all open source though, under the permissive 2
clause "new" BSD license.  You can get the tools here:
https://github.com/remakeelectric/mqtt-malaria

The test method largely follows
https://github.com/remakeelectric/mqtt-malaria/blob/master/README-swarm.md
and runs through a series of test runs with different numbers of clients.

Intial testing was against mosquitto 1.2.2, but was quickly moved to 1.2.3 tip to get the fix
in: https://bitbucket.org/oojah/mosquitto/commits/19c2f13905539bf20645f76e6aebbad7db9a2e2b

With a target machine of a dual core XEON, with 2 gig of ram, we are only getting ~3000
clients connected.  We can get 3500 or so, but the slightest disturbance pushes it into
queueing mode.  Our mosquitto configuration has longish queues (20,000 messages) to allow the
central subscriber to be switched off for upgrades.  If the queue ever grows in normal
operation, it's game over.  Once it starts queuing, it's only a matter of time before it
starts to drop, and most importantly, it starts to churn the cpu quite a lot storing and
sorting messages.  Once that has happened, even reducing the number of clients is not enough
to bring the situation back under control.  See the images in the 3000 client section of the
results pdf.

Capturing statistics with "sudo perf record $(pidof mosquitto) sleep 20" and "sudo perf
report" shows that a _lot_ of time is spent in the kernel, on tcp_poll.  (Given that mosquitto
uses poll(), which is O(n) for the number of connections, this is not terribly surprising.)

Note also, this means that SSL was _not_ the problem here.  Indeed, turning off SSL had only a
1-2% change in CPU usage.  For 3000 clients, at ~80% cpu usage, switching from
qos1 down to qos0 has a noticeable cpu improvement, dropping down to ~55%, a far greater
change than SSL.  Again, this is in line with the perf reports showing where time was being
spent.

Note that the aggregate packet rate received here is only ~500 messages per second.  With 500
clients, receiving exactly the same total aggregate message rate only uses ~15% cpu, far less.

I've attached a pdf with some graphs and numbers for the most relevant set of tests:
https://github.com/remakeelectric/mqtt-malaria/blob/master/results.500bytes.xClients.0.166mps.pdf

I'm very curious what other people see, and how they manage to achieve it.  This is not at _all_ in line with the
anecdotal reports of running as many as 100k connections to a mosquitto broker.  Is there some
basic setting here somewhere that I'm missing to make this magically run further?  The results
on earlier mailing list topics are an order of magnitude better than I'm seeing, so there's a
major disconnect there :(

I'd like to thank Roger especially for his patience while I was testing all this, and his
extremely prompt responses with fixes and tweaks to investigate.  The numbers we got, while
not what we'd hoped are perfectly acceptable for us (from 1.2.3) onwards.


Sincerely,
Karl Palsson
ReMake Electric ehf

Footnotes:
Payload size is ~500 bytes for all tests.  A small amount of testing was done with 5000 bytes,
and 5 bytes.  5 bytes was virtually identical, but at 5000 bytes the ssl encryption did
increase the cpu usage a noticeable amount.

The only _system_ tuning done was to raise the file handle limits for the mosquitto broker.
The only tweak to mosquittos' config (other than the required TLS listener) was to set the
max_queued_messages to 20,000.  All settings were default.

Anecdotal reports of "many thousands" for reference:
https://lists.launchpad.net/mosquitto-users/msg00242.html
https://lists.launchpad.net/mosquitto-users/msg00163.html
https://lists.launchpad.net/mosquitto-users/msg00106.html