(phew - not my anecdotes! but certainly numbers I've quoted myself in the past,
and similar numbers reported in IBM's performance reports for their implementation)
So one question I would ask is whether you made any OS-level changes to your
system? I know that typically the network stack on Linux, Windows etc may limit
the number of connections to a socket. I believe that was one reason why IBM's
best numbers were against a 64-bit Linux host rather than Windows, which has a
less configurable / more limited network stack (although I may be mistaken on
this). You mention file handles - that will have an impact too. Actually I just
checked
ftp://public.dhe.ibm.com/software/integration/support/supportpacs/individual/mp0a.pdf
(the IBM report) and it only specifies file handle limit changes on Linux.
I'd also be curious to know more about the specific configurations of
large-number-of-client-connection scenarios - we should document them on
mqtt.org/wiki <http://mqtt.org/wiki> in fact :-)
On Mon, Nov 11, 2013 at 5:24 PM, Karl Palsson <karlp@xxxxxxxxxxxx
<mailto:karlp@xxxxxxxxxxxx>> wrote:
Hi,
(tl;dr: ~3000 simulataneous bridges, qos1, tls-psk 0.166Hz messages
published per client, ~500
messages per second aggregate was about the stable limit. TLS has no impact,
how did anyone
get 100,000?! or even 20,000?!)
I've been working on testing out the scalability of the mosquitto broker
over the last few
weeks, and would like to present our current figures, and ask for
comments/feedback on other
people who have tested out mosquitto in any sort of large scale system.
Anecdotally, we had heard that mosquitto was doing "many 10s of thousands of
connections"
"unless you turn on SSL" which all seemed reasonable.
The particular test scenario we were looking at was many (as many as
possible) standalone
mosquitto brokers, each
configured to _bridge_ traffic into a single central mosquitto broker. Each
edge broker
would be publish one message (QoS 1) every six seconds into the central
broker, and those connections
would be TLS-PSK. On the central broker would be a single QoS 1 subscriber.
We were not interested in how many messages per second a single client could
send/receive, but
in how many clients we could sustain at the relatively low rate per client.
We looked at existing tools for this, but found very little, and certainly
nothing that
handled setting up all the bridges and offering TLS-PSK support. We ended
up writing some
python tools, named mqtt-malaria to handle generating all the load, as well
as monitoring the
central server. These tools are not perfect. By using python's
multiprocessing, they are
very simple, but quite memory heavy. It's all open source though, under the
permissive 2
clause "new" BSD license. You can get the tools here:
https://github.com/remakeelectric/mqtt-malaria
The test method largely follows
https://github.com/remakeelectric/mqtt-malaria/blob/master/README-swarm.md
and runs through a series of test runs with different numbers of clients.
Intial testing was against mosquitto 1.2.2, but was quickly moved to 1.2.3
tip to get the fix
in:
https://bitbucket.org/oojah/mosquitto/commits/19c2f13905539bf20645f76e6aebbad7db9a2e2b
With a target machine of a dual core XEON, with 2 gig of ram, we are only
getting ~3000
clients connected. We can get 3500 or so, but the slightest disturbance
pushes it into
queueing mode. Our mosquitto configuration has longish queues (20,000
messages) to allow the
central subscriber to be switched off for upgrades. If the queue ever grows
in normal
operation, it's game over. Once it starts queuing, it's only a matter of
time before it
starts to drop, and most importantly, it starts to churn the cpu quite a lot
storing and
sorting messages. Once that has happened, even reducing the number of
clients is not enough
to bring the situation back under control. See the images in the 3000
client section of the
results pdf.
Capturing statistics with "sudo perf record $(pidof mosquitto) sleep 20" and
"sudo perf
report" shows that a _lot_ of time is spent in the kernel, on tcp_poll.
(Given that mosquitto
uses poll(), which is O(n) for the number of connections, this is not
terribly surprising.)
Note also, this means that SSL was _not_ the problem here. Indeed, turning
off SSL had only a
1-2% change in CPU usage. For 3000 clients, at ~80% cpu usage, switching from
qos1 down to qos0 has a noticeable cpu improvement, dropping down to ~55%, a
far greater
change than SSL. Again, this is in line with the perf reports showing where
time was being
spent.
Note that the aggregate packet rate received here is only ~500 messages per
second. With 500
clients, receiving exactly the same total aggregate message rate only uses
~15% cpu, far less.
I've attached a pdf with some graphs and numbers for the most relevant set
of tests:
https://github.com/remakeelectric/mqtt-malaria/blob/master/results.500bytes.xClients.0.166mps.pdf
I'm very curious what other people see, and how they manage to achieve it.
This is not at _all_ in line with the
anecdotal reports of running as many as 100k connections to a mosquitto
broker. Is there some
basic setting here somewhere that I'm missing to make this magically run
further? The results
on earlier mailing list topics are an order of magnitude better than I'm
seeing, so there's a
major disconnect there :(
I'd like to thank Roger especially for his patience while I was testing all
this, and his
extremely prompt responses with fixes and tweaks to investigate. The
numbers we got, while
not what we'd hoped are perfectly acceptable for us (from 1.2.3) onwards.
Sincerely,
Karl Palsson
ReMake Electric ehf
Footnotes:
Payload size is ~500 bytes for all tests. A small amount of testing was
done with 5000 bytes,
and 5 bytes. 5 bytes was virtually identical, but at 5000 bytes the ssl
encryption did
increase the cpu usage a noticeable amount.
The only _system_ tuning done was to raise the file handle limits for the
mosquitto broker.
The only tweak to mosquittos' config (other than the required TLS listener)
was to set the
max_queued_messages to 20,000. All settings were default.
Anecdotal reports of "many thousands" for reference:
https://lists.launchpad.net/mosquitto-users/msg00242.html
https://lists.launchpad.net/mosquitto-users/msg00163.html
https://lists.launchpad.net/mosquitto-users/msg00106.html
--
Mailing list: https://launchpad.net/~mosquitto-users
Post to : mosquitto-users@xxxxxxxxxxxxxxxxxxx
<mailto:mosquitto-users@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~mosquitto-users
More help : https://help.launchpad.net/ListHelp
--
Andy Piper | Kingston upon Thames, London (UK)
blog: http://andypiper.co.uk | skype: andypiperuk
twitter: @andypiper | images: http://www.flickr.com/photos/andypiper