maria-discuss team mailing list archive

Thread
Date
Re: MariaDB non-blocking with EPOLL

To: Dave C <dave.zap@xxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Mon, 20 Jun 2016 11:52:08 +0200
Cc: maria-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAKVp00Ub5i9pmNzeAogXXVD-hpSnkDEJS-kCtnTo2J3sddmphg@mail.gmail.com> (Dave C.'s message of "Mon, 20 Jun 2016 07:36:51 +0100")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
Dave C <dave.zap@xxxxxxxxx> writes:

> Originally posted to :
> http://stackoverflow.com/questions/37909652/mariadb-non-blocking-with-epoll
> Edited for context.
>
> I have single threaded server written in C that accepts TCP/UDP connections
> based on EPOLL and supports plugins for the multitude of protocol layers we
> need to support. That bit is fine.

> We use MariaDB and the MariaDB C connector that supports non blocking
> functions in it's API as described here.

Cool, always glad to see users of this API.

> Otherwise I fetch the file descriptor that seems to be immediate and
> register it with EPOLL and bail back to the main EPOLL loop waiting for
> events.
>
> s = mysql_get_socket(mysql);
> if(s > 0){
>     brt_socket_set_fds(endpoint, s);
>     struct epoll_event event;
>     event.data.fd = s;
>     event.events = EPOLLRDHUP | EPOLLIN | EPOLLET | EPOLLOUT;
>     s = epoll_ctl(efd, EPOLL_CTL_ADD, s, &event);

So, in principle here you should/need only register the events that
mysql_real_connect_start() requests in the return value. For
mysql_real_connect_start(), this would typically be MYSQL_WAIT_WRITE,
corresponding to EPOLLOUT. (EPOLLOUT marks that an async socket connect has
completed).

If it is easier for you to always register for all events, I am not sure it
will be a problem. Normally I suppose only the events that the API requests
will be possible to trigger. Though, if you somehow eg. got in a situation
where there is a EPOLLIN pending, but the non-blocking API is requesting
only MYSQL_WAIT_WRITE, maybe you could end up in a busy-loop calling the API
with the EPOLLIN being ignored. But I doubt this is related to your problem
at hand, just wanted to mention it as something maybe worth considering.

Note also that if MYSQL_WAIT_TIMEOUT is included in the return from
mysql_real_connect_start(), your code is expected to set up a timeout
handler that will call back into the API if the time period returned by
mysql_get_timeout_value() elapses. Without this, timeout values from the
mysql api will not work (though other things should be fine).

>     if (s == -1) {
>         syslog(LOG_ERR, "brd_db : epoll error.");
>         // handle error.
>     }...
>
> So then some time later I do get the EPOLLOUT indicating the socket has
> been opened.
>
> And I dutifully call mysql_real_connect_cont() but at this stage it is
> still returning a non-zero value, indicating I must wait longer?

Yes. At this stage, data needs to be sent back and forth between server and
client to handle login and such. So the return value should probably be
MYSQL_WAIT_READ, indicating that the API is waiting for a response from the
server.

> But then that is the last EPOLL event I get, except for the EPOLLRDHUP when
> I guess the MariaDB hangs up after 10 seconds.

Hm.

So I noticed you added a post to the stackoverflow that you solved your
problem by passing the return value from mysql_real_connect_start() as the
third parameter to mysql_real_connect_cont()?

But in fact this is not correct. What you need to pass in is a value
indicating the events that _actually_ occured.

So mysql_real_connect_start() probably returns MYSQL_WAIT_WRITE |
MYSQL_WAIT_TIMEOUT. If you get EPOLLOUT, you should then pass
MYSQL_WAIT_WRITE as the third parameter to mysql_real_connect_cont().
Otherwise, if the timeout triggers, you would pass MYSQL_WAIT_TIMEOUT.
The idea is that the API returns the events it wants to be notified about,
and you then pass back the subset of those events that actually occured.

If you pass MYSQL_WAIT_TIMEOUT to mysql_real_connect_cont(), then it will
actually fail the connection with a timeout. And looking in current code,
all other values happen to be currently ignored. So I am puzzled that this
would solve your problem, though the timeout might mean your code no longer
hangs (but also does not connect successfully)?

> Can anyone help me understand if this idea is even workable?

It should definitely be workable. One thing I notice is that you are using
the EPOLLET flag, which uses edge-triggered epoll. Edge-triggered can be
tricky to use, and it is easy to introduce a bug that will cause an event to
be lost. But I do not quite see anything wrong - the non-blocking API
shouldn't return non-zero unless it is actually waiting for something
(eg. recv() returns EAGAIN or EINTR). But there might be a bug, maybe the
API has not been well tested in edge-triggered mode? Or an event gets lost
between _start() and _cont() somehow?

Can you try to run your program under strace? Here is what strace gives for
the example program client/async_example.c :

  socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
  fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
  fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
  connect(3, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.1.7")}, 16) = -1 EINPROGRESS (Operation now in progress)
  poll([{fd=3, events=POLLOUT}], 1, 4294967295) = 1 ([{fd=3, revents=POLLOUT}])
  getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
  setsockopt(3, SOL_IP, IP_TOS, [8], 4)   = 0
  setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
  setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
  recvfrom(3, 0x55a0bf5ff1d0, 16384, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
  poll([{fd=3, events=POLLIN}], 1, 4294967295) = 1 ([{fd=3, revents=POLLIN}])
  recvfrom(3, "S\0\0\0\n5.5.49-0+deb7u1\0\220\0\0\0009^OR+~]"..., 16384, MSG_DONTWAIT, NULL, NULL) = 87
  stat("/usr/local/mysql/share/charsets/Index.xml", 0x55a0bf5f8cd0) = -1 ENOENT (No such file or directory)
  futex(0x55a0be039740, FUTEX_WAKE_PRIVATE, 2147483647) = 0
  sendto(3, "V\0\0\1\5\242>\0\0\0\0@\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 90, MSG_DONTWAIT, NULL, 0) = 90
  recvfrom(3, 0x55a0bf5ff1d0, 16384, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
  poll([{fd=3, events=POLLIN}], 1, 4294967295) = 1 ([{fd=3, revents=POLLIN}])
  recvfrom(3, "\7\0\0\2\0\0\0\2\0\0\0", 16384, MSG_DONTWAIT, NULL, NULL) = 11

Here we see how first the connect() call returns EINPROGRESS, and then a
POLLOUT event arrives when the connection completes. And after, recvfrom()
fails with EAGAIN twice, followed by POLLIN events when the expected reply
arrives from the server. If you compare with an strace from when your
program hangs, maybe it will show what the problem is.

(Now I'm thinking if there might actually be a bug in case of EINTR? Seems
the API will return in this case, while there might still be data pending?
So the API should actually retry the recv on EINTR, rather than return with
MYSQL_WAIT_READ, for edge-triggered to work? But I don't suppose you're
getting EINTR in your application, or if you do, the strace should show.)

I would also try to run a tcpdump in parallel with your tests, just to make
sure that packets are being sent and received on the network, and that your
problem is not being caused by something outside your application (firewall
or whatever) - just in case...

Hope this helps, and if not do ask again with more info.

 - Kristian.
References

MariaDB non-blocking with EPOLL
From: Dave C, 2016-06-20