group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #11560
[Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
What I think is happening in our case:
Since no ExecStop= was specified, systemd will send SIGTERM [...]
Details: https://www.freedesktop.org/software/systemd/man/systemd.kill.html#
KillMode is "process" in the service file.
That means "If set to process, only the main process itself is killed."
So in this case it relies on that being forwarded to the child processes.
That takes time.
If not waiting for it to be "complete" the following restart will send the next SIGTERM and this eliminates the (already in cleanup) main proccess before it can distribute the TERM to its childs/siblings. This is our error state.
In this broken state
Main PID: 10600 (code=exited, status=0/SUCCESS)
Our mode of KillMode=process might have special handling and kill all of them (since there is no main to kill). That is the cleanup, which gets it back to work again.
Since the service files in both (X/Z) cases are the same I wonder if
there is a systemd change which fixes this by some sort of waiting for
the signal to be handled (e.g. waiting for the MainPid to go away on its
own).
Systemd versions:
Xenial: 229-4ubuntu16
Zesty: 232-18ubuntu1
** Description changed:
Because "PIDFile=" directive is missing in the systemd unit file,
keepalived sometimes fails to kill all old processes. The old processes
remain with old settings and cause unexpected behaviors. The detail of
this bug is described in this ticket in upstream:
https://github.com/acassen/keepalived/issues/443.
The official systemd unit file is available since version 1.2.24 by this
commit:
https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15
This includes "PIDFile" directive correctly:
PIDFile=/var/run/keepalived.pid
We should go the same way.
I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic.
Package: keepalived
Version: 1.2.19-1
=======================================================================
How to reproduce:
I used the two instances of Ubuntu 16.04.2 on DigitalOcean:
Configurations
--------------
MASTER server's /etc/keepalived/keepalived.conf:
vrrp_script chk_nothing {
script "/bin/true"
interval 2
}
vrrp_instance G1 {
interface eth1
state BACKUP
priority 100
virtual_router_id 123
unicast_src_ip <primal IP>
unicast_peer {
<secondal IP>
}
track_script {
chk_nothing
}
}
BACKUP server's /etc/keepalived/keepalived.conf:
vrrp_script chk_nothing {
script "/bin/true"
interval 2
}
vrrp_instance G1 {
interface eth1
state MASTER
priority 200
virtual_router_id 123
unicast_src_ip <secondal IP>
unicast_peer {
<primal IP>
}
track_script {
chk_nothing
}
}
- Procedures
- ----------
+ Loop based probing for the Error to exist:
+ ------------------------------------------
+ After the setup above start keepalived on both servers:
+ $ sudo systemctl start keepalived.service
+ Then run the following loop
+ $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done
+
+ Expected: no error, only time reports
+ Error case: Showing Main PID exited, details below
+
+ Step by Step Procedures
+ -----------------------
1) Start keepalived on both servers
$ sudo systemctl start keepalived.service
2) Restart keepalived on either one
$ sudo systemctl restart keepalived.service
3) Check status and PID
$ systemctl status -n0 keepalived.service
Result
------
0) Before restart
Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so
good.
root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago
Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
Main PID: 3403 (keepalived)
Tasks: 3
Memory: 1.7M
CPU: 1.900s
CGroup: /system.slice/keepalived.service
├─3403 /usr/sbin/keepalived
├─3405 /usr/sbin/keepalived
└─3406 /usr/sbin/keepalived
1) First restart
Now Main PID is 3403, which was one of the previous subprocesses and is
actually exited. Something is wrong. Yet, the previous processes are all
exited; we are not likely to see no weird behaviors here.
root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived
root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago
Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
Main PID: 3403 (code=exited, status=0/SUCCESS)
Tasks: 3
Memory: 1.7M
CPU: 11ms
CGroup: /system.slice/keepalived.service
├─4783 /usr/sbin/keepalived
├─4784 /usr/sbin/keepalived
└─4785 /usr/sbin/keepalived
2) Second restart
Now Main PID is 4783 and subprocesses' PIDs are 4783-4785. This is
problematic as 4783 is the old process, which should have exited before
new processes arose. Therefore, keepalived remains in old settings while
users believe it uses the new setting.
root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived
root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2017-03-04 01:51:49 UTC; 1s ago
Process: 4796 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
Main PID: 4783 (keepalived)
Tasks: 3
Memory: 1.7M
CPU: 6ms
CGroup: /system.slice/keepalived.service
├─4783 /usr/sbin/keepalived
├─4784 /usr/sbin/keepalived
└─4785 /usr/sbin/keepalived
** Also affects: systemd (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1644530
Title:
keepalived fails to restart cleanly due to the wrong systemd settings
Status in keepalived package in Ubuntu:
Fix Released
Status in systemd package in Ubuntu:
New
Status in keepalived source package in Xenial:
Confirmed
Status in systemd source package in Xenial:
New
Bug description:
Because "PIDFile=" directive is missing in the systemd unit file,
keepalived sometimes fails to kill all old processes. The old
processes remain with old settings and cause unexpected behaviors. The
detail of this bug is described in this ticket in upstream:
https://github.com/acassen/keepalived/issues/443.
The official systemd unit file is available since version 1.2.24 by
this commit:
https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15
This includes "PIDFile" directive correctly:
PIDFile=/var/run/keepalived.pid
We should go the same way.
I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic.
Package: keepalived
Version: 1.2.19-1
=======================================================================
How to reproduce:
I used the two instances of Ubuntu 16.04.2 on DigitalOcean:
Configurations
--------------
MASTER server's /etc/keepalived/keepalived.conf:
vrrp_script chk_nothing {
script "/bin/true"
interval 2
}
vrrp_instance G1 {
interface eth1
state BACKUP
priority 100
virtual_router_id 123
unicast_src_ip <primal IP>
unicast_peer {
<secondal IP>
}
track_script {
chk_nothing
}
}
BACKUP server's /etc/keepalived/keepalived.conf:
vrrp_script chk_nothing {
script "/bin/true"
interval 2
}
vrrp_instance G1 {
interface eth1
state MASTER
priority 200
virtual_router_id 123
unicast_src_ip <secondal IP>
unicast_peer {
<primal IP>
}
track_script {
chk_nothing
}
}
Loop based probing for the Error to exist:
------------------------------------------
After the setup above start keepalived on both servers:
$ sudo systemctl start keepalived.service
Then run the following loop
$ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done
Expected: no error, only time reports
Error case: Showing Main PID exited, details below
Step by Step Procedures
-----------------------
1) Start keepalived on both servers
$ sudo systemctl start keepalived.service
2) Restart keepalived on either one
$ sudo systemctl restart keepalived.service
3) Check status and PID
$ systemctl status -n0 keepalived.service
Result
------
0) Before restart
Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so
good.
root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago
Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
Main PID: 3403 (keepalived)
Tasks: 3
Memory: 1.7M
CPU: 1.900s
CGroup: /system.slice/keepalived.service
├─3403 /usr/sbin/keepalived
├─3405 /usr/sbin/keepalived
└─3406 /usr/sbin/keepalived
1) First restart
Now Main PID is 3403, which was one of the previous subprocesses and
is actually exited. Something is wrong. Yet, the previous processes
are all exited; we are not likely to see no weird behaviors here.
root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived
root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago
Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
Main PID: 3403 (code=exited, status=0/SUCCESS)
Tasks: 3
Memory: 1.7M
CPU: 11ms
CGroup: /system.slice/keepalived.service
├─4783 /usr/sbin/keepalived
├─4784 /usr/sbin/keepalived
└─4785 /usr/sbin/keepalived
2) Second restart
Now Main PID is 4783 and subprocesses' PIDs are 4783-4785. This is
problematic as 4783 is the old process, which should have exited
before new processes arose. Therefore, keepalived remains in old
settings while users believe it uses the new setting.
root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived
root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2017-03-04 01:51:49 UTC; 1s ago
Process: 4796 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
Main PID: 4783 (keepalived)
Tasks: 3
Memory: 1.7M
CPU: 6ms
CGroup: /system.slice/keepalived.service
├─4783 /usr/sbin/keepalived
├─4784 /usr/sbin/keepalived
└─4785 /usr/sbin/keepalived
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/keepalived/+bug/1644530/+subscriptions