← Back to team overview

mysql-proxy-discuss team mailing list archive

Re: funnel - a multiplexer plugin for mysql-proxy

 

Hi Kay

On Thu, Feb 5, 2009 at 8:54 PM, Kay Röpke <Kay.Roepke@xxxxxxx> wrote:
> Hi Nick!
>
> On Feb 5, 2009, at 6:23 PM, nick loeve wrote:
>
>> Hi,
>>
>> I have created an experimental branch of the mysql-proxy code on
>> launchpad, in order to show what I have done to implement a connection
>> multiplexer with backlog for mysql-proxy, and hopefully get some
>> feedback on our implementation/design. We called the plugin 'funnel'.
>> [...]
>> Our existing solution works, but we are looking at how to get more
>> performance using a plugin to mysql-proxy. The plugin in the branch I
>> posted accomplishes the main tasks I described above, but there are
>> more features that we would like to implement, mainly support for more
>> statistics via the admin plugin. I had to make a few changes to the
>> core state-machine that handles the front-end/back-end connection
>> state in order to achieve the backlog.
>
> Very interesting! I will take a closer look at what the actual differences
> to the proxy plugin are later (Launchpad sadly doesn't make it easy to diff
> two files...).
>
>> Some assumptions:
>>
>> We have some hardcoded 'assumptions' in the code base, such as only
>> ever using one backend (as we always have the funnel sitting in front
>> of a mysqld on the same host) and we have a single user/database for
>> most of slave architectures, so multi-user and or complex permissions
>> may not work correctly. I would like to eventually remove these
>> limitations/assumptions.
>
> Fair enough for a first version, I'd say.
>
>> We are currently testing our plugin in a live environment, and our
>> benchmarks are proving that the mysql-proxy design is giving us better
>> capacity and lower average query times at peak traffic times.
>
> I'd be interested in some of the boundary conditions of your setup:
>  - how many queries/sec do you have?
>  - what is the average/stddev of the execution time of those queries?
>  - how large are the resultsets (approx in bytes or something)?
>  - how many clients are generally active at the same time (IOW what's the
> concurrency)?

Well some round-about stats for the past week on one of our main
read-only slave architectures:

~42K selects per second at peak times, and coming in via replication,
about ~5K updates, ~5K inserts and about ~1K deletes per second. This
is one of our heaviest architectures, with close to 50 replication
slaves. During off-peak

I do not have average/stdev times for the replicated statements at
hand, but for our selects it varies greatly depending on the
architecture. For the arch i mentioned above average query time is
about 0.02 seconds and stddev is around 0.04. Peak times see our
stddev increase, sometimes greatly.

Result sets are extremely varied. The architecture above is optimized
for returning very small result sets (usually only a max of few rows
with small fields). We try to push large result sets into their own
architectures. I don't have hard numbers at hand.

Clients connected for the architecture above is around 500 per slave,
and can increase slightly at peak times. Those 500 client connections
are doing an average of 1K-1.5K queries per second per slave (at peak
times). Depending on slave hardware, sometimes up to 20% of queries
reach the backlog. We use persistent connections on that arch, so
average new connections per second is pretty low.

We have around 10 slave architectures similar in ratio of
slaves/clients/queries/timings to the one mentioned above, and quite a
few more that have different replication setups, and are tuned for a
particular purpose.

>
> The reason I'm asking is because I've seen situations where the relative
> timing of query arrival, the average size of the resultsets and the
> execution times were favorable to the current single thread implementation
> in that you would not really notice a slowdown compared to going to mysqld
> directly.
> In general, I think it would be a safe statement to say that for high
> concurrency with very small execution times and tiny resultsets the current
> single threaded Proxy has the most trouble (all because the added latency is
> much larger than the time spent for the query itself).
> It would be nice to see if this theory holds true for what you are seeing,
> as well.

Yes that is exactly what we are seeing in our main slave
architectures. We have some beefy hardware for our databslaves, but we
struggle to push the queries in and out quick enough to really make
the database work hard and take advantage of the number of cores and
memory available. Across all our arches our biggest bottleneck is
connection handling and network I/O.  We do not see this problem so
much with the architectures tuned for returning larger result sets.

>
>> Im particularly interested in the blueprint on launchpad about threaded
>> I/O. We did have an attempt at adding a thread pool to our plugin in
>> order to handle some backlog clearing and some I/O, but without large
>> changes to the main proxy engine we did not succeed in getting stable
>> enough to really test out in our high traffic environment.
>
> In fact, Jan and I have met today and talked on this very topic.
> Soon we will pick up our efforts in adding multithreading (mostly
> revitalizing old patches).
> The current plan is the following (and we need to add these to the
> blueprints after our team meeting next week):
> Step 1:
> - accept connections on one thread
> - have multiple worker threads the accepted filedescriptor gets handed off
> to (via a queue)
> - all subsequent events on this filedescriptor will be handled by the thread
> it was handed to
>  this essentially means that all network traffic will be handled by multiple
> threads
>  since we still have a global lock around the Lua state, everything that
> needs to go into Lua will run as before in a single thread
>
> Step 2:
> - give each thread its own local Lua state (still sharing the script)
> - remove the global mutex
> - access to global structures (backends, usually) will need some kind of
> synchronization
>  we would like to use a shared-nothing approach, basically making copies of
> global structures and versioning them (checks can be done with atomic ints,
> for example)
>  LuaLanes is another alternative.
>
> Step 1 is relatively easy compared to Step 2.
> There are few things to take into consideration, of course, even with step
> 1.
> My initial prototype picked a worker thread on every event, which proved to
> be extremely heavyweight under high load, mostly because the queue used was
> under high contention (reading data from a socket tended to be not much
> slower than the overhead of putting the event into a queue and letting a
> worker thread pull it out again. it was one hot queue...)
> The danger with making connections stick to one thread for their entire
> lifetime is that one thread might end up with getting all the active
> connections, and leave the other threads idling, thus turning the entire
> thing into a more or less heavyweight single thread implementation. I'm not
> yet sure how to solve this efficiently, but I guess we will try different
> approaches before we pick a winner.
>
> Step 2 is not without complications either. Since a copied global state
> would only be "mostly up to date", it's fairly important to pick the places
> where we update it. In most cases the global state is relatively static, but
> in some applications it might not be, e.g. in load balancing situations
> where backend weights are a function of backend system load, number of
> queries executed or something along those lines.
> In those situations, it might actually be cheaper to use a mutex to access
> global state rather than copying a lot, but that can lead to high lock
> contention, too. Maybe a non-locking alternative would be better, using
> atomic operations where they are available (currently glib is lacking them
> on HP/UX for PA-RISC iirc, maybe some AIX, too). Atomic ops are not without
> traps either, of course.
> In either case, we need to have an implementation to make these kinds of
> decisions, otherwise the effects are pure speculation.
> (As an aside: We cannot get away with defining the lua_lock/lua_unlock
> macros to acquire/release a mutex because those only make the Lua
> interpreter itself threadsafe, not what we built on top of it...sadly)

Step one sounds similar to what we tried to do within our plugin, but
we more and more had to re-implement parts of the core engine within
our plugin to accommodate multiple threads using libevent. I would be
interested in helping out where possible to achieve what you described
above.


>
> Thanks for sharing!
>

Np, I look forward to more :)


-- 
Nick Loeve



Follow ups

References