← Back to team overview

mysql-proxy-discuss team mailing list archive

Re: funnel - a multiplexer plugin for mysql-proxy

 

Hi Kay

On Thu, Mar 5, 2009 at 3:39 AM, Kay Röpke <Kay.Roepke@xxxxxxx> wrote:
>
> On Mar 4, 2009, at 4:07 PM, nick loeve wrote:
>
>> Im tearing my hair out to spot the point where concurrent memory
>> alloc/free is happening. If anyone cares to take a look that would be
>> awesome!
>
>
> just a quick note on this one: have you tried running it in valgrind? you
> seem to be running linux judging from the earlier stacktrace.

Yeah Im on Linux. I have run it through valgrind+memcheck, but I think
the overhead of memcheck it too much to trigger what i suspect is a
race condition.

>
> [...lot's of core reading interlude...]
>
> ok, i've gave it a more thorough look:
> i'm seeing that network_backlog_add() will add a network_mysqld_con* to the
> backlog queue in the global chassis struct.
> further, chassis_event_threads_init_thread will use EV_PERSIST for listening
> to events on that file descriptor, causing the event to not be discarded
> after it fired.
> then, when checking for stuff in the backlog, we pull one item out of the
> global queue and associate it with our thread.
> two questions come to mind:
> 1) are the two events now potentially delivered to two different threads? is
> this causing the problem?

Hmm, I assumed not, but I will have a deeper look today. The two
events are on two different file descriptors, and we push the timer
event that restarts a connections state machine from backlog into the
current thread's event_base, using the chassis_event_add_timer_local.

> 2) is this comment & code founded in dependable behavior of libevent?
>        /* the timer will fire immediately */
>        event_set(&(backlog_item->client->event), -1, 0,
> network_mysqld_con_handle, backlog_item);
>   especially the 0 as the events we are interested in. i couldn't find in
> the docs and libevent sources are too hard to follow for a cursory glance
> (at least right now).

Well the timer will fire at the next libevent event dispatch run. At
least that is my understanding. Agreed, the libevent docs are pretty
simple. I set the events to 0, as I do not want to inadvertantly
corrupt the network queue structures by modifying the socket->to_read
variable. So by not setting a file descriptor and using events == 0 I
was hoping to resume from the same spot we went into backlog. Thats
why I also had to introduce the READ_QUERY_FROM_BACKLOG connection
state, as we have already read the query from the client before going
to backlog, and we don't want to wait for another READ event that will
not be coming.

>
> with regard to question one: it should be easy to find this out, by adding a
> mutex to the connection struct and doing a trylock/lock duo first thing in
> network_mysqld_con_handle and logging a failed trylock. this should of
> course never fail, because only one thread is supposed to be touching that
> connection at any given time. (this would make a good addition to 'debug'
> builds, btw. i suspect this will not be the last time we run into this kind
> of trouble. we could do it with an atomic int to be less intrusive but
> that's somewhat besides the point)
> you should not see crashes of this sort with the mutex, since then it's
> guaranteed to be safe.
>
> note that it's a PITA to add the unlock, because there are return statements
> in the middle of con_handle. if you are using gcc and wanna have some fun,
> try adding a local variable for the mutex with __attribute__ ((cleanup
> (some_function))) as its attribute. in some_function you'll get the address
> of the variable as your parameter. then call unlock on the mutex...
> refer to
> http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html#Variable-Attributes :)
> (neat hack, i might add. although describing it took longer than to define a
> macro replacing 'return' with 'g_unlock(con->mutex); return' for the scope
> of con_handle :P)

I did try exactly what you described, except i did not bork if I
couldn't get the lock. Ill have to try that out.

>
> anyway, i suspect you will see output from a failing trylock at some point.
> if that happens i'd try to make the backlog queue thread-local. that should
> also cure the problem.
> this appears to be another unfortunate side effect of pinning connections to
> one thread, although the implications of not doing so aren't exactly giving
> me pleasant thoughts either...

Ok, ill have a deeper dig, and try out the thread-local backlog.


Thanks for taking a look!

Cheers

-- 
Nick Loeve



Follow ups

References