← Back to team overview

maria-discuss team mailing list archive

Re: connection handling is buggy (HY000/2002): Resource temporarily unavailable)

 

i can reprduce that also on the 12-core production sevrer, in that case only 250 connections are failing and also here only with Keep-Alive requests to the webserver, otherwise not enough load

i guess "futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)" is the culprit but why needs that to be exposed to the client as connection error instead just let him wait and try internally again?

fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 95
fcntl(88, F_SETFL, O_RDWR)              = 0
fcntl(95, F_SETFD, FD_CLOEXEC)          = 0
futex(0x7f65e917eae0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f65e917eae0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f656bf20ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f656bf20ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1
futex(0x55ce6e382888, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}])
fcntl(88, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 104
fcntl(88, F_SETFL, O_RDWR)              = 0
fcntl(104, F_SETFD, FD_CLOEXEC)         = 0
futex(0x7f65be075ca4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f65be075ca0, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1
futex(0x55ce6e382948, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=87, events=POLLIN}, {fd=88, events=POLLIN}], 2, -1) = 1 ([{fd=88, revents=POLLIN}])
fcntl(88, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(88, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
accept4(88, {sa_family=AF_UNIX}, [128->2], SOCK_CLOEXEC) = 101
fcntl(88, F_SETFL, O_RDWR)              = 0
fcntl(101, F_SETFD, FD_CLOEXEC)         = 0

Am 16.07.2017 um 07:33 schrieb Reindl Harald:
simple reproducer script below, obviously it needs the '-k' (keep-alive) flag, otherwise not enough contention on the database server

ab -c 200 -n 500000 -k http://corecms/connect-bench.php

[root@srv-rhsoft:~]$ cat php_error.log | wc -l
312326

<?php declare(strict_types=1);
require __DIR__ . '/php/serverconf.inc.php';
$conn = mysqli_init();
mysqli_options($conn, MYSQLI_OPT_INT_AND_FLOAT_NATIVE, true);
if(mysqli_real_connect($conn, 'localhost', $sql_user, $sql_pwd, $sql_db, 3600, '', 0) === true)
{
  echo 'OK';
}
else
{
  echo 'FAILED';
}
?>

[harry@srv-rhsoft:~]$ php -v
PHP 7.1.8-dev (cli) (built: Jul 13 2017 17:26:17) ( NTS )

[harry@srv-rhsoft:~]$ rpm -q mariadb
mariadb-10.2.7-5.fc25.20170714.rh.x86_64

Am 16.07.2017 um 06:55 schrieb Reindl Harald:
i started to play around with "thread_handling = pool-of-threads" with 10.2.7 and removed at that time the @mysqli_real_connect() error supression of my database-layer which also has a usleep() and retry-loop in case connection failed on so completly burried the issue

PHP Warning: mysqli_real_connect() [<a href='http://at.php.net/manual/de/function.mysqli-real-connect.php'>function.mysqli-real-connect.php</a>]: (HY000/2002): Resource temporarily unavailable

you should not see such messages when run a "ab -c 200 -n 500000 -k http://corecms/show_content.php?sid=2"; with "max_connections = 300"
_____________________________________

thread_handling  = one-thread-per-connection
[root@srv-rhsoft:~]$ cat php_error.log | wc -l
52596

thread_handling = pool-of-threads
thread_pool_idle_timeout = 900
[root@srv-rhsoft:~]$ cat php_error.log | wc -l
39282

thread_handling = pool-of-threads
thread_pool_oversubscribe = 10
thread_pool_idle_timeout = 900
thread_pool_priority = high
[root@srv-rhsoft:~]$ cat php_error.log | wc -l
24849

since my database-layer makes a usleep(100000) before each retry and the retry-lop still has error-supression that means the cms waits 10% of all requests at least 0.1 seconds for the mariadb server which means the 4300 Requests/Second could be much higher if every connection suceeds at the first try (at least the thread pool seems to work slightly better then without)
_____________________________________

what makes me really worry here is while the load is running (mysqld each time restartet before start apache-benchmark)

+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Threadpool_idle_threads | 181   |
| Threadpool_threads      | 189   |
+-------------------------+-------+

after the benchmark has finished and the machine is idle:
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Threadpool_idle_threads | 207   |
| Threadpool_threads      | 208   |
+-------------------------+-------+

frankly i would expect at least that this numbers are going up while the load is running to at least 200 or not going up that high at all but don't refuse conncetions which is IMHO the point of a pool
_____________________________________

the core-cms itself makes exactly two queries per request over 3 MyISAM tables, one cache-table with a pirmary key and the second one is a simple join on two tables with only 2 records, so not really something one should call real load

select SQL_NO_CACHE * from cms1_cache where `hash`='fullmenu_1_2_0de0_0'
select SQL_NO_CACHE * from `cms1_sub` join `cms1_haupt` on sid=2 and hid=shid

mysqld as well as httpd had "LimitNOFILE=infinity" in the systemd-unit, the connection type is unix socket, so that all should not be a problem on a i7-3770 CPU @ 3.40GHz with 32 GB RAM

it takes some time until the errors start to appear in the log, likely after httpd (mod_prefork) had forked enough worker to introduce real concurrency to the database



Follow ups

References