← Back to team overview

aims team mailing list archive

[Bug 1073478] Re: squid3 logrotate fails when squid is not running

 

The underlying issue here is that the upstart script does not wait long
enough for squid3 to finish.

By default, when squid3 receives a SIGTERM it will close the socket for
incoming connections and will wait for existing connections to complet.
How long it waits before forcefully closing those connections is
configured by the shutdown_lifetime directive - 30 seconds by default.

Current setting, SIGKILL after 5 seconds:
=====
# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 20865 attached - interrupt to quit
     0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
     0.000440 rt_sigreturn(0xf)         = -1 EINTR (Interrupted system call)
     0.000418 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     5.006483 +++ killed by SIGKILL +++

With "kill timeout 40" in upstart script, notice the pid file being unlinked (the ~30 seconds wait is due to the default value of shutdown_lifetime directive):
=====
# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 20805 attached - interrupt to quit
     0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
     0.000284 rt_sigreturn(0xf)         = -1 EINTR (Interrupted system call)
     0.000402 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
    31.611726 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     0.001792 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000247 rt_sigreturn(0xffffffffffffffff) = 63
     0.006027 unlink("/var/run/squid3.pid") = 0
     0.000539 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 0
     0.000387 open("/etc/group", O_RDONLY|O_CLOEXEC) = 0
Process 20805 detached

Note that the current behavior is also quite bad and can corrupt cache
index files, which is a really bad thing as squid will take a long time
to start the next time when it detects a corrupted cache index file [1].

I'm willing to submit a SRU for this, but I wonder if we should replace
SIGTERM with SIGINT. Using SIGINT means that squid will drop outstanding
connections immediately and then proceed to write cache index files.
This will avoid squid slowing down shutdown times for users [2] and we
somewhat keep the current behavior (ie. the current 5-sec SIGKILL
already does that and I don't see anyone complaining).

With "kill signal SIGINT", which does not wait on outstanding connections:
=====
# strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
Process 20891 attached - interrupt to quit
     0.000000 --- SIGINT (Interrupt) @ 0 (0) ---
     0.000312 rt_sigreturn(0x2)         = -1 EINTR (Interrupted system call)
     0.000237 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     1.123564 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
     0.000754 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000103 rt_sigreturn(0xffffffffffffffff) = 0
     0.002031 unlink("/var/run/squid3.pid") = 0
     0.000154 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 0
     0.000142 open("/etc/group", O_RDONLY|O_CLOEXEC) = 0
Process 20891 detached

Still, let me reiterate that even with SIGINT enabled I believe we
should still increase "kill timeout" somewhat as we surely want to avoid
any cache corruption. According to [1] we don't need to wait that much,
so I guess anything between 20~40 should be fine. Again, since issuing a
SIGKILL during cache write *WILL* corrupt it, it might be sane to wait
even longer for cases when the system is under high load and/or has a
lot of cache dirs - this would/should be quite rare anyway: in most
cases squid shutdown time with SIGINT should be under 5 secs.

Of course, since there is still a chance that SIGKILL does get issued,
we will also need the fix proposed in #1 because there's a chance the
pid file was not removed.

[1] http://lists.squid-cache.org/pipermail/squid-users/2015-July/004752.html
[2] http://askubuntu.com/questions/18127/squid3-starts-and-stops-slowly

-- 
You received this bug notification because you are a member of AIMS,
which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1073478

Title:
  squid3 logrotate fails when squid is not running

Status in squid3 package in Ubuntu:
  Triaged

Bug description:
  Squid3 on Precise does not clean up its PID file, causing logrotate to 
  spit out an error each night when cron runs it. This is on a server where
  we want squid3 installed and ready to go but not running by default.

  0 root@rackspace:/etc/mysql#start squid3 
  squid3 start/running, process 10392
  0 root@rackspace:/etc/mysql#cat /var/run/squid3.pid 
  10392
  0 root@rackspace:/etc/mysql#stop squid3 
  squid3 stop/waiting
  0 root@rackspace:/etc/mysql#cat /var/run/squid3.pid 
  10392
  0 root@rackspace:/etc/mysql#logrotate -f /etc/logrotate.d/squid3 
  squid: ERROR: Could not send signal 10 to process 10392: (3) No such process
  error: error running shared postrotate script for '/var/log/squid3/*.log '
  1 root@rackspace:/etc/mysql#lsb_release -d
  Description:    Ubuntu 12.04.1 LTS
  0 root@rackspace:/etc/mysql#dpkg -l squid3
  Desired=Unknown/Install/Remove/Purge/Hold
  | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
  |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
  ||/ Name                                     Version                                  Description
  +++-========================================-========================================-================================================================================================
  ii  squid3                                   3.1.19-1ubuntu3.12.04.1                  Full featured Web Proxy cache (HTTP proxy)
  0 root@rackspace:/etc/mysql#

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/squid3/+bug/1073478/+subscriptions