← Back to team overview

mosquitto-users team mailing list archive

Re: mosquitto scaling results

 


Reposted to the list with Andy's permission.

----- original message below ----

Did you meant to reply only to me? I can re-reply to the list with this answer if you'd like :)

As I mentioned in the footnotes, I _only_ changed the open files limits, as I'm led to believe that's all that matters on linux, as a socket is a "file", and none of the other linux scaling documentation (for scaling nginx/node.js etc)

I've ignored things like tcp reuse time, local port ranges and tcp keepalive, as the 3000 bridges maintain a single stable long lived connection, with a stream of packets, not like web clients making brand new requests continuously.

The edge bridges configuration is in the mqtt-malaria project, at https://github.com/remakeelectric/mqtt-malaria/blob/master/beem/bridge.py#L43

The actual production configuration is virtually identical, without the juggling of the local listen port, which is there to run the hundreds of bridges on each test machine, and the outbound topic map is (obviously) a little different :)

I didn't include the central bridge config either, but I can easily share it if people would feel it useful. It's a model often discussed though, an TLS-PSK listener with a keyfile, plus the queue extensions I mentioned in the first email.

Sincerely,
Karl Palsson




On 11/11/2013 07:54 PM, Andy Piper wrote:
(phew - not my anecdotes! but certainly numbers I've quoted myself in the past,
and similar numbers reported in IBM's performance reports for their implementation)

So one question I would ask is whether you made any OS-level changes to your
system? I know that typically the network stack on Linux, Windows etc may limit
the number of connections to a socket. I believe that was one reason why IBM's
best numbers were against a 64-bit Linux host rather than Windows, which has a
less configurable / more limited network stack (although I may be mistaken on
this). You mention file handles - that will have an impact too. Actually I just
checked
ftp://public.dhe.ibm.com/software/integration/support/supportpacs/individual/mp0a.pdf
(the IBM report) and it only specifies file handle limit changes on Linux.

I'd also be curious to know more about the specific configurations of
large-number-of-client-connection scenarios - we should document them on
mqtt.org/wiki <http://mqtt.org/wiki> in fact :-)



On Mon, Nov 11, 2013 at 5:24 PM, Karl Palsson <karlp@xxxxxxxxxxxx
<mailto:karlp@xxxxxxxxxxxx>> wrote:


    Hi,

    (tl;dr: ~3000 simulataneous bridges, qos1, tls-psk 0.166Hz messages
    published per client, ~500
    messages per second aggregate was about the stable limit. TLS has no impact,
    how did anyone
    get 100,000?! or even 20,000?!)

    I've been working on testing out the scalability of the mosquitto broker
    over the last few
    weeks, and would like to present our current figures, and ask for
    comments/feedback on other
    people who have tested out mosquitto in any sort of large scale system.

    Anecdotally, we had heard that mosquitto was doing "many 10s of thousands of
    connections"
    "unless you turn on SSL" which all seemed reasonable.

    The particular test scenario we were looking at was many (as many as
    possible) standalone
    mosquitto brokers, each
    configured to _bridge_ traffic into a single central mosquitto broker.  Each
    edge broker
      would be publish one message (QoS 1) every six seconds into the central
    broker, and those connections
    would be TLS-PSK.  On the central broker would be a single QoS 1 subscriber.
    We were not interested in how many messages per second a single client could
    send/receive, but
    in how many clients we could sustain at the relatively low rate per client.

    We looked at existing tools for this, but found very little, and certainly
    nothing that
    handled setting up all the bridges and offering TLS-PSK support.  We ended
    up writing some
    python tools, named mqtt-malaria to handle generating all the load, as well
    as monitoring the
    central server.  These tools are not perfect.  By using python's
    multiprocessing, they are
    very simple, but quite memory heavy.  It's all open source though, under the
    permissive 2
    clause "new" BSD license.  You can get the tools here:
    https://github.com/remakeelectric/mqtt-malaria

    The test method largely follows
    https://github.com/remakeelectric/mqtt-malaria/blob/master/README-swarm.md
    and runs through a series of test runs with different numbers of clients.

    Intial testing was against mosquitto 1.2.2, but was quickly moved to 1.2.3
    tip to get the fix
    in:
    https://bitbucket.org/oojah/mosquitto/commits/19c2f13905539bf20645f76e6aebbad7db9a2e2b

    With a target machine of a dual core XEON, with 2 gig of ram, we are only
    getting ~3000
    clients connected.  We can get 3500 or so, but the slightest disturbance
    pushes it into
    queueing mode.  Our mosquitto configuration has longish queues (20,000
    messages) to allow the
    central subscriber to be switched off for upgrades.  If the queue ever grows
    in normal
    operation, it's game over.  Once it starts queuing, it's only a matter of
    time before it
    starts to drop, and most importantly, it starts to churn the cpu quite a lot
    storing and
    sorting messages.  Once that has happened, even reducing the number of
    clients is not enough
    to bring the situation back under control.  See the images in the 3000
    client section of the
    results pdf.

    Capturing statistics with "sudo perf record $(pidof mosquitto) sleep 20" and
    "sudo perf
    report" shows that a _lot_ of time is spent in the kernel, on tcp_poll.
      (Given that mosquitto
    uses poll(), which is O(n) for the number of connections, this is not
    terribly surprising.)

    Note also, this means that SSL was _not_ the problem here.  Indeed, turning
    off SSL had only a
    1-2% change in CPU usage.  For 3000 clients, at ~80% cpu usage, switching from
    qos1 down to qos0 has a noticeable cpu improvement, dropping down to ~55%, a
    far greater
    change than SSL.  Again, this is in line with the perf reports showing where
    time was being
    spent.

    Note that the aggregate packet rate received here is only ~500 messages per
    second.  With 500
    clients, receiving exactly the same total aggregate message rate only uses
    ~15% cpu, far less.

    I've attached a pdf with some graphs and numbers for the most relevant set
    of tests:
    https://github.com/remakeelectric/mqtt-malaria/blob/master/results.500bytes.xClients.0.166mps.pdf

    I'm very curious what other people see, and how they manage to achieve it.
      This is not at _all_ in line with the
    anecdotal reports of running as many as 100k connections to a mosquitto
    broker.  Is there some
    basic setting here somewhere that I'm missing to make this magically run
    further?  The results
    on earlier mailing list topics are an order of magnitude better than I'm
    seeing, so there's a
    major disconnect there :(

    I'd like to thank Roger especially for his patience while I was testing all
    this, and his
    extremely prompt responses with fixes and tweaks to investigate.  The
    numbers we got, while
    not what we'd hoped are perfectly acceptable for us (from 1.2.3) onwards.


    Sincerely,
    Karl Palsson
    ReMake Electric ehf

    Footnotes:
    Payload size is ~500 bytes for all tests.  A small amount of testing was
    done with 5000 bytes,
    and 5 bytes.  5 bytes was virtually identical, but at 5000 bytes the ssl
    encryption did
    increase the cpu usage a noticeable amount.

    The only _system_ tuning done was to raise the file handle limits for the
    mosquitto broker.
    The only tweak to mosquittos' config (other than the required TLS listener)
    was to set the
    max_queued_messages to 20,000.  All settings were default.

    Anecdotal reports of "many thousands" for reference:
    https://lists.launchpad.net/mosquitto-users/msg00242.html
    https://lists.launchpad.net/mosquitto-users/msg00163.html
    https://lists.launchpad.net/mosquitto-users/msg00106.html


    --
    Mailing list: https://launchpad.net/~mosquitto-users
    Post to     : mosquitto-users@xxxxxxxxxxxxxxxxxxx
    <mailto:mosquitto-users@xxxxxxxxxxxxxxxxxxx>
    Unsubscribe : https://launchpad.net/~mosquitto-users
    More help   : https://help.launchpad.net/ListHelp




--
Andy Piper | Kingston upon Thames, London (UK)
blog: http://andypiper.co.uk   |   skype: andypiperuk
twitter: @andypiper  |  images: http://www.flickr.com/photos/andypiper


References