debcrafters-packages team mailing list archive
-
debcrafters-packages team
-
Mailing list archive
-
Message #00284
[Bug 2099676] Re: Network connectivity loss after systemctl daemon-reexec
[Expired for systemd (Ubuntu) because there has been no activity for 60
days.]
** Changed in: systemd (Ubuntu)
Status: Incomplete => Expired
--
You received this bug notification because you are a member of
Debcrafters packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2099676
Title:
Network connectivity loss after systemctl daemon-reexec
Status in systemd package in Ubuntu:
Expired
Bug description:
# Our problem
We are running multiple K8S clusters on Ubuntu 24.04.1 LTS nodes.
On one of these clusters, we have noticed at least twice that most of the nodes (~5 out of 8) went offline without any action on our side.
To restore connectivity, we tried ifdown/ifup, disconnect/connect network from hypervisor and networking service restart but nothing helped, we had to reboot the nodes from the console.
After some investigations, we were able to correlate this outage with the `apt-daily-upgrade` service run triggered by the `apt-daily-upgrade` timer.
Somehow, the `apt-daily-upgrade` service updated a package which triggered a `systemctl daemon-reexec`, cuting network connectivity in the process.
# Symptoms
Node is flagged as `NotReady` by K8s
SSH connection to node is not working
From the node, we can't ping the gateway
The output of `systemctl daemon-reexec` in `journalctl` is way more verbose than usual :
```
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Reexecuting requested from client PID 2711048 ('systemctl') (unit apt-daily-upgrade.service)...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Reexecuting.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: systemd 255.4-1ubuntu8.5 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +
QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Detected virtualization vmware.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Detected architecture x86-64.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Starting man-db.service - Daily man-db regeneration...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping containerd.service - containerd container runtime...
févr. 21 06:06:55 lylux0634kdp004 ntpd[1106]: ERR: ntpd exiting on signal 15 (Terminated)
févr. 21 06:06:55 lylux0634kdp004 ntpd[1106]: PROTO: 172.16.10.254 unlink local addr 172.16.34.4 -> <null>
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping ntpsec.service - Network Time Service...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping open-vm-tools.service - Service for virtual machines hosted on VMware...
févr. 21 06:06:55 lylux0634kdp004 systemd-journald[504]: Journal stopped
févr. 21 06:06:55 lylux0634kdp004 systemd-journald[504]: Received SIGTERM from PID 1 (systemd).
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping systemd-journald.service - Journal Service...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: ntpsec.service: Deactivated successfully.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopped ntpsec.service - Network Time Service.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: ntpsec.service: Consumed 1min 12.819s CPU time, 12.4M memory peak, 0B memory swap peak.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Deactivated successfully.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 3374 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 3375 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 3475 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 3512 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 3545 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 3618 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Unit process 2574706 (containerd-shim) remains running after unit stopped.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopped containerd.service - containerd container runtime.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Consumed 9min 54.298s CPU time, 3.4G memory peak, 0B memory swap peak.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 3374 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 3375 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 3475 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 3512 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 3545 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 3618 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: Found left-over process 2574706 (containerd-shim) in control group while starting unit. Ignoring.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: containerd.service: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Starting containerd.service - containerd container runtime...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: netplan-ovs-cleanup.service - OpenVSwitch configuration for cleanup was skipped because of an unmet condition check (ConditionFileIsExecutable=/usr/bin/ovs-vsctl).
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Starting ntpsec.service - Network Time Service...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: systemd-networkd-wait-online.service: Deactivated successfully.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopped systemd-networkd-wait-online.service - Wait for Network to be Configured.
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping systemd-networkd-wait-online.service - Wait for Network to be Configured...
févr. 21 06:06:55 lylux0634kdp004 systemd[1]: Stopping systemd-networkd.service - Network Configuration...
```
The `Found left-over process` lines made me think of bug
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2013543 but
from my understanding, we should not be impacted on Noble hosts.
# Testcase
Here is the catch : we can't reproduce the issue on-demand.
When manually running `systemctl daemon-reexec`, we are not
experiencing the same outage and journalctl is only logging 5 lines :
```
févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Reexecuting requested from client PID 23296 ('systemctl') (unit session-2.scope)...
févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Reexecuting.
févr. 21 11:01:06 lylux0634kdp004 systemd[1]: systemd 255.4-1ubuntu8.5 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT >
févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Detected virtualization vmware.
févr. 21 11:01:06 lylux0634kdp004 systemd[1]: Detected architecture x86-64.
```
# Some aditional details
root@lylux0634kdp004:~# lsb_release -d
No LSB modules are available.
Description: Ubuntu 24.04.1 LTS
root@lylux0634kdp004:~# apt-cache policy systemd
systemd:
Installé : 255.4-1ubuntu8.5
Candidat : 255.4-1ubuntu8.5
Table de version :
*** 255.4-1ubuntu8.5 500
500 https://XXXXXX/ubuntu-fr noble-updates/main amd64 Packages
100 /var/lib/dpkg/status
255.4-1ubuntu8 500
500 https://XXXXX/ubuntu-fr noble/main amd64 Packages
root@lylux0634kdp004:~# uname -a
Linux lylux0634kdp004 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Feel free to request any aditional details that would be of any help
in the troubleshooting of this issue.
Antoine
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2099676/+subscriptions