← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1847512] [NEW] xenial: leftover scope units for Kubernetes transient mounts

 

Public bug reported:

[Impact]

When running Kubernetes on Xenial there's a leftover scope unit
for the transient mounts used by a pod (eg, secret volume mount)
together with its associate cgroup dirs, after the pod completes,
almost every time such pod is created:

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)

    /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
    /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope

This problem becomes noticeable with Kubernetes CronJobs as time
goes by, as it repeatedly recreates pods to run the cronjob task.

Over time, the leftover units (and associated cgroup directories)
pile up to a significant amount, and start to cause problems for
other components, with effects on /sys/fs/cgroup/ scanning:

- Kubelet CPU/Memory Usage linearly increases using CronJob [1]

and systemd commands time out, breaking things like Ansible:

- failed: [...] (item=apt-daily-upgrade.service) => {[...]
  "msg": "Unable to disable service apt-daily-upgrade.service:
  Failed to execute operation: Connection timed out\n"}

The problem seems to be related to empty cgroup notification
on the legacy/classic hierarchy; it doesn't happen on hybrid
or unified hierarchies.

The fix is upstream systemd commit d8fdc62037b5 ("core: use
an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification").

That patch is already in progress/review in bug 1846787 [2],
and is present on Bionic and later, only Xenial is required.

[Test Case]

    Create K8s pods with secret volume mounts (example below)
    on Xenial/4.15 kernel, and check this after it completes:

    $ sudo systemctl list-units --type=scope | grep 'Kubernetes
transient mount for'

    (With the fix applied, there are zero units reported)

    Steps:
    -----

    Create a Xenial VM:

    $ uvt-simplestreams-libvirt sync release=xenial arch=amd64
    $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64

    Install the HWE/4.15 kernel and MicroK8s on it:

    $ uvt-kvm wait vm-xenial
    $ uvt-kvm ssh vm-xenial

    $ sudo apt update
    $ sudo apt install linux-image-4.15.0-65-generic
    $ sudo reboot

    $ uvt-kvm wait vm-xenial
    $ uvt-kvm ssh vm-xenial

    $ sudo snap install microk8s --channel=1.16/stable --classic
    $ sudo snap alias microk8s.kubectl kubectl
    $ sudo usermod -a -G microk8s $USER
    $ exit

    Check package versions:

    $ uvt-kvm ssh vm-xenial

    $ lsb_release -cs
    xenial

    $ uname -rv
    4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019

    $ snap list microk8s
    Name      Version  Rev  Tracking  Publisher   Notes
    microk8s  v1.16.0  920  1.16      canonical✓  classic

    $ dpkg -s systemd | grep ^Version:
    Version: 229-4ubuntu21.22

    Create a pod with a secret/volume:

    $ cat <<EOF > pod-with-secret.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: pod-with-secret
    spec:
      containers:
      - name: container
        image: debian:stretch
        args: ["/bin/true"]
        volumeMounts:
        - name: secret
          mountPath: /secret
      volumes:
      - name: secret
        secret:
          secretName: secret-for-pod
      restartPolicy: Never
    EOF

    $ kubectl create secret generic secret-for-pod --from-
literal=key=value

    Notice it leaves a transient scope unit running even after complete:

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    $

    $ kubectl create -f pod-with-secret.yaml

    $ kubectl get pods
    NAME              READY   STATUS      RESTARTS   AGE
    pod-with-secret   0/1     Completed   0          30s

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f

    And more transient scope units are left running
    as the pod is created again (e.g., like cronjob).

    $ kubectl delete pods pod-with-secret
    $ kubectl create -f pod-with-secret.yaml

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
    run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret

    $ kubectl delete pods pod-with-secret
    $ kubectl create -f pod-with-secret.yaml

    $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
    run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret
    run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret

    $ kubectl delete pods pod-with-secret

    Repeating the test with a CronJob:

    $ cat <<EOF > cronjob-with-secret.yaml
    apiVersion: batch/v1beta1
    kind: CronJob
    metadata:
      name: cronjob-with-secret
    spec:
      schedule: "*/1 * * * *"
      jobTemplate:
        spec:
          template:
        spec:
          nodeSelector:
            kubernetes.io/hostname: sf219578xt
          containers:
          - name: container
            image: debian:stretch
            args: ["/bin/true"]
            volumeMounts:
            - name: secret
              mountPath: /secret
          volumes:
          - name: secret
            secret:
              secretName: secret-for-pod
          restartPolicy: OnFailure
    EOF

    $ kubectl create secret generic secret-for-pod --from-
literal=key=value

    $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    $

    $ kubectl create -f cronjob-with-secret.yaml
    cronjob.batch/cronjob-with-secret created

    (wait ~5 minutes)

    $ kubectl get cronjobs
    NAME                  SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
    cronjob-with-secret   */1 * * * *   False     0        42s             5m54s

    $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
    run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f
    run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret
    run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret
    run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f
    $

[1] https://github.com/kubernetes/kubernetes/issues/64137
[2] https://bugs.launchpad.net/bugs/1846787

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: Invalid

** Affects: systemd (Ubuntu Xenial)
     Importance: Medium
     Assignee: Mauricio Faria de Oliveira (mfo)
         Status: In Progress

** Also affects: systemd (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: systemd (Ubuntu)
       Status: New => Incomplete

** Changed in: systemd (Ubuntu Xenial)
       Status: New => In Progress

** Changed in: systemd (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: systemd (Ubuntu Xenial)
     Assignee: (unassigned) => Mauricio Faria de Oliveira (mfo)

** Description changed:

  [Impact]
  
  When running Kubernetes on Xenial there's a leftover scope unit
  for the transient mounts used by a pod (eg, secret volume mount)
  together with its associate cgroup dirs, after the pod completes,
  almost every time such pod is created:
  
-     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)
- 
-     /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
-     /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
-     /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
-     /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
-     /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
-     /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
+     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)
+ 
+     /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
+     /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
+     /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
+     /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
+     /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
+     /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
  
  This problem becomes noticeable with Kubernetes CronJobs as time
  goes by, as it repeatedly recreates pods to run the cronjob task.
  
  Over time, the leftover units (and associated cgroup directories)
  pile up to a significant amount, and start to cause problems for
  other components, with effects on /sys/fs/cgroup/ scanning:
  
  - Kubelet CPU/Memory Usage linearly increases using CronJob [1]
  
  and systemd commands time out, breaking things like Ansible:
  
  - failed: [...] (item=apt-daily-upgrade.service) => {[...]
-   "msg": "Unable to disable service apt-daily-upgrade.service: 
-   Failed to execute operation: Connection timed out\n"} 
+   "msg": "Unable to disable service apt-daily-upgrade.service:
+   Failed to execute operation: Connection timed out\n"}
  
  The problem seems to be related to empty cgroup notification
  on the legacy/classic hierarchy; it doesn't happen on hybrid
  or unified hierarchies.
  
  The fix is upstream systemd commit d8fdc62037b5 ("core: use
  an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification").
  
- That patch is already in progress/review in bug 1846787 [2].
+ That patch is already in progress/review in bug 1846787 [2],
+ and is present on Bionic and later, only Xenial is required.
  
  [Test Case]
  
-     Create K8s pods with secret volume mounts (example below)
-     on Xenial/4.15 kernel, and check this after it completes:
- 
-     $ sudo systemctl list-units --type=scope | grep 'Kubernetes
+     Create K8s pods with secret volume mounts (example below)
+     on Xenial/4.15 kernel, and check this after it completes:
+ 
+     $ sudo systemctl list-units --type=scope | grep 'Kubernetes
  transient mount for'
  
-     (With the fix applied, there are zero units reported)
- 
-     Steps:
-     -----
- 
-     Create a Xenial VM:
- 
-     $ uvt-simplestreams-libvirt sync release=xenial arch=amd64
-     $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64
- 
-     Install the HWE/4.15 kernel and MicroK8s on it:
- 
-     $ uvt-kvm wait vm-xenial
-     $ uvt-kvm ssh vm-xenial
- 
-     $ sudo apt update
-     $ sudo apt install linux-image-4.15.0-65-generic
-     $ sudo reboot
- 
-     $ uvt-kvm wait vm-xenial
-     $ uvt-kvm ssh vm-xenial
- 
-     $ sudo snap install microk8s --channel=1.16/stable --classic
-     $ sudo snap alias microk8s.kubectl kubectl
-     $ sudo usermod -a -G microk8s $USER
-     $ exit
- 
-     Check package versions:
- 
-     $ uvt-kvm ssh vm-xenial
- 
-     $ lsb_release -cs
-     xenial
- 
-     $ uname -rv
-     4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019
- 
-     $ snap list microk8s
-     Name      Version  Rev  Tracking  Publisher   Notes
-     microk8s  v1.16.0  920  1.16      canonical✓  classic
- 
-     $ dpkg -s systemd | grep ^Version:
-     Version: 229-4ubuntu21.22
- 
- 
-     Create a pod with a secret/volume:
- 
-     $ cat <<EOF > pod-with-secret.yaml
-     apiVersion: v1
-     kind: Pod
-     metadata:
-       name: pod-with-secret
-     spec:
-       containers:
-       - name: container
-         image: debian:stretch
-         args: ["/bin/true"]
-         volumeMounts:
-         - name: secret
-           mountPath: /secret
-       volumes:
-       - name: secret
-         secret:
-           secretName: secret-for-pod
-       restartPolicy: Never
-     EOF
- 
-     $ kubectl create secret generic secret-for-pod --from-
+     (With the fix applied, there are zero units reported)
+ 
+     Steps:
+     -----
+ 
+     Create a Xenial VM:
+ 
+     $ uvt-simplestreams-libvirt sync release=xenial arch=amd64
+     $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64
+ 
+     Install the HWE/4.15 kernel and MicroK8s on it:
+ 
+     $ uvt-kvm wait vm-xenial
+     $ uvt-kvm ssh vm-xenial
+ 
+     $ sudo apt update
+     $ sudo apt install linux-image-4.15.0-65-generic
+     $ sudo reboot
+ 
+     $ uvt-kvm wait vm-xenial
+     $ uvt-kvm ssh vm-xenial
+ 
+     $ sudo snap install microk8s --channel=1.16/stable --classic
+     $ sudo snap alias microk8s.kubectl kubectl
+     $ sudo usermod -a -G microk8s $USER
+     $ exit
+ 
+     Check package versions:
+ 
+     $ uvt-kvm ssh vm-xenial
+ 
+     $ lsb_release -cs
+     xenial
+ 
+     $ uname -rv
+     4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019
+ 
+     $ snap list microk8s
+     Name      Version  Rev  Tracking  Publisher   Notes
+     microk8s  v1.16.0  920  1.16      canonical✓  classic
+ 
+     $ dpkg -s systemd | grep ^Version:
+     Version: 229-4ubuntu21.22
+ 
+     Create a pod with a secret/volume:
+ 
+     $ cat <<EOF > pod-with-secret.yaml
+     apiVersion: v1
+     kind: Pod
+     metadata:
+       name: pod-with-secret
+     spec:
+       containers:
+       - name: container
+         image: debian:stretch
+         args: ["/bin/true"]
+         volumeMounts:
+         - name: secret
+           mountPath: /secret
+       volumes:
+       - name: secret
+         secret:
+           secretName: secret-for-pod
+       restartPolicy: Never
+     EOF
+ 
+     $ kubectl create secret generic secret-for-pod --from-
  literal=key=value
  
- 
-     Notice it leaves a transient scope unit running even after complete:
- 
-     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     $
- 
-     $ kubectl create -f pod-with-secret.yaml
- 
-     $ kubectl get pods
-     NAME              READY   STATUS      RESTARTS   AGE
-     pod-with-secret   0/1     Completed   0          30s
- 
-     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
- 
-     And more transient scope units are left running 
-     as the pod is created again (e.g., like cronjob).
- 
-     $ kubectl delete pods pod-with-secret
-     $ kubectl create -f pod-with-secret.yaml
- 
-     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
-     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret
- 
-     $ kubectl delete pods pod-with-secret
-     $ kubectl create -f pod-with-secret.yaml
- 
-     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
-     run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret
-     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret
- 
-     $ kubectl delete pods pod-with-secret
- 
-     Repeating the test with a CronJob:
- 
-     $ cat <<EOF > cronjob-with-secret.yaml
-     apiVersion: batch/v1beta1
-     kind: CronJob
-     metadata:
-       name: cronjob-with-secret
-     spec:
-       schedule: "*/1 * * * *"
-       jobTemplate:
-         spec:
-           template:
-         spec:
-           nodeSelector:
-             kubernetes.io/hostname: sf219578xt
-           containers:
-           - name: container
-             image: debian:stretch
-             args: ["/bin/true"]
-             volumeMounts:
-             - name: secret
-               mountPath: /secret
-           volumes:
-           - name: secret
-             secret:
-               secretName: secret-for-pod
-           restartPolicy: OnFailure
-     EOF
- 
-     $ kubectl create secret generic secret-for-pod --from-
+     Notice it leaves a transient scope unit running even after complete:
+ 
+     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     $
+ 
+     $ kubectl create -f pod-with-secret.yaml
+ 
+     $ kubectl get pods
+     NAME              READY   STATUS      RESTARTS   AGE
+     pod-with-secret   0/1     Completed   0          30s
+ 
+     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
+ 
+     And more transient scope units are left running
+     as the pod is created again (e.g., like cronjob).
+ 
+     $ kubectl delete pods pod-with-secret
+     $ kubectl create -f pod-with-secret.yaml
+ 
+     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
+     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret
+ 
+     $ kubectl delete pods pod-with-secret
+     $ kubectl create -f pod-with-secret.yaml
+ 
+     $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
+     run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret
+     run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret
+ 
+     $ kubectl delete pods pod-with-secret
+ 
+     Repeating the test with a CronJob:
+ 
+     $ cat <<EOF > cronjob-with-secret.yaml
+     apiVersion: batch/v1beta1
+     kind: CronJob
+     metadata:
+       name: cronjob-with-secret
+     spec:
+       schedule: "*/1 * * * *"
+       jobTemplate:
+         spec:
+           template:
+         spec:
+           nodeSelector:
+             kubernetes.io/hostname: sf219578xt
+           containers:
+           - name: container
+             image: debian:stretch
+             args: ["/bin/true"]
+             volumeMounts:
+             - name: secret
+               mountPath: /secret
+           volumes:
+           - name: secret
+             secret:
+               secretName: secret-for-pod
+           restartPolicy: OnFailure
+     EOF
+ 
+     $ kubectl create secret generic secret-for-pod --from-
  literal=key=value
  
- 
-     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     $
- 
-     $ kubectl create -f cronjob-with-secret.yaml 
-     cronjob.batch/cronjob-with-secret created
- 
-     (wait ~5 minutes)
- 
-     $ kubectl get cronjobs
-     NAME                  SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
-     cronjob-with-secret   */1 * * * *   False     0        42s             5m54s
- 
-     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
-     run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f
-     run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret
-     run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret
-     run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f
-     $
+     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     $
+ 
+     $ kubectl create -f cronjob-with-secret.yaml
+     cronjob.batch/cronjob-with-secret created
+ 
+     (wait ~5 minutes)
+ 
+     $ kubectl get cronjobs
+     NAME                  SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
+     cronjob-with-secret   */1 * * * *   False     0        42s             5m54s
+ 
+     $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
+     run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f
+     run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret
+     run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret
+     run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f
+     $
  
  [1] https://github.com/kubernetes/kubernetes/issues/64137
  [2] https://bugs.launchpad.net/bugs/1846787

** Changed in: systemd (Ubuntu)
       Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1847512

Title:
  xenial: leftover scope units for Kubernetes transient mounts

Status in systemd package in Ubuntu:
  Invalid
Status in systemd source package in Xenial:
  In Progress

Bug description:
  [Impact]

  When running Kubernetes on Xenial there's a leftover scope unit
  for the transient mounts used by a pod (eg, secret volume mount)
  together with its associate cgroup dirs, after the pod completes,
  almost every time such pod is created:

      $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/(...)/volumes/kubernetes.io~secret/(...)

      /sys/fs/cgroup/devices/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
      /sys/fs/cgroup/pids/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
      /sys/fs/cgroup/blkio/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
      /sys/fs/cgroup/memory/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
      /sys/fs/cgroup/cpu,cpuacct/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope
      /sys/fs/cgroup/systemd/system.slice/run-r560fcf858630417780c258c55fa21c8b.scope

  This problem becomes noticeable with Kubernetes CronJobs as time
  goes by, as it repeatedly recreates pods to run the cronjob task.

  Over time, the leftover units (and associated cgroup directories)
  pile up to a significant amount, and start to cause problems for
  other components, with effects on /sys/fs/cgroup/ scanning:

  - Kubelet CPU/Memory Usage linearly increases using CronJob [1]

  and systemd commands time out, breaking things like Ansible:

  - failed: [...] (item=apt-daily-upgrade.service) => {[...]
    "msg": "Unable to disable service apt-daily-upgrade.service:
    Failed to execute operation: Connection timed out\n"}

  The problem seems to be related to empty cgroup notification
  on the legacy/classic hierarchy; it doesn't happen on hybrid
  or unified hierarchies.

  The fix is upstream systemd commit d8fdc62037b5 ("core: use
  an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification").

  That patch is already in progress/review in bug 1846787 [2],
  and is present on Bionic and later, only Xenial is required.

  [Test Case]

      Create K8s pods with secret volume mounts (example below)
      on Xenial/4.15 kernel, and check this after it completes:

      $ sudo systemctl list-units --type=scope | grep 'Kubernetes
  transient mount for'

      (With the fix applied, there are zero units reported)

      Steps:
      -----

      Create a Xenial VM:

      $ uvt-simplestreams-libvirt sync release=xenial arch=amd64
      $ uvt-kvm create --memory 8192 --cpu 8 --disk 8 vm-xenial release=xenial arch=amd64

      Install the HWE/4.15 kernel and MicroK8s on it:

      $ uvt-kvm wait vm-xenial
      $ uvt-kvm ssh vm-xenial

      $ sudo apt update
      $ sudo apt install linux-image-4.15.0-65-generic
      $ sudo reboot

      $ uvt-kvm wait vm-xenial
      $ uvt-kvm ssh vm-xenial

      $ sudo snap install microk8s --channel=1.16/stable --classic
      $ sudo snap alias microk8s.kubectl kubectl
      $ sudo usermod -a -G microk8s $USER
      $ exit

      Check package versions:

      $ uvt-kvm ssh vm-xenial

      $ lsb_release -cs
      xenial

      $ uname -rv
      4.15.0-65-generic #74~16.04.1-Ubuntu SMP Wed Sep 18 09:51:44 UTC 2019

      $ snap list microk8s
      Name      Version  Rev  Tracking  Publisher   Notes
      microk8s  v1.16.0  920  1.16      canonical✓  classic

      $ dpkg -s systemd | grep ^Version:
      Version: 229-4ubuntu21.22

      Create a pod with a secret/volume:

      $ cat <<EOF > pod-with-secret.yaml
      apiVersion: v1
      kind: Pod
      metadata:
        name: pod-with-secret
      spec:
        containers:
        - name: container
          image: debian:stretch
          args: ["/bin/true"]
          volumeMounts:
          - name: secret
            mountPath: /secret
        volumes:
        - name: secret
          secret:
            secretName: secret-for-pod
        restartPolicy: Never
      EOF

      $ kubectl create secret generic secret-for-pod --from-
  literal=key=value

      Notice it leaves a transient scope unit running even after
  complete:

      $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      $

      $ kubectl create -f pod-with-secret.yaml

      $ kubectl get pods
      NAME              READY   STATUS      RESTARTS   AGE
      pod-with-secret   0/1     Completed   0          30s

      $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f

      And more transient scope units are left running
      as the pod is created again (e.g., like cronjob).

      $ kubectl delete pods pod-with-secret
      $ kubectl create -f pod-with-secret.yaml

      $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
      run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret

      $ kubectl delete pods pod-with-secret
      $ kubectl create -f pod-with-secret.yaml

      $ systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      run-r560fcf858630417780c258c55fa21c8b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/7baa3896-a4ef-4c11-a2c2-09f94ca565f7/volumes/kubernetes.io~secret/default-token-24k4f
      run-ra5caa6aa3bb0426795ce991f178649f3.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/6a74c2cd-3029-4f30-8dc2-131de44d6625/volumes/kubernetes.io~secret/secret
      run-rb947fb640fbc41cf9a50b1ceb4ccbf78.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/b61d553c-e50c-4dca-905a-d82e3bc3c3a4/volumes/kubernetes.io~secret/secret

      $ kubectl delete pods pod-with-secret

      Repeating the test with a CronJob:

      $ cat <<EOF > cronjob-with-secret.yaml
      apiVersion: batch/v1beta1
      kind: CronJob
      metadata:
        name: cronjob-with-secret
      spec:
        schedule: "*/1 * * * *"
        jobTemplate:
          spec:
            template:
          spec:
            nodeSelector:
              kubernetes.io/hostname: sf219578xt
            containers:
            - name: container
              image: debian:stretch
              args: ["/bin/true"]
              volumeMounts:
              - name: secret
                mountPath: /secret
            volumes:
            - name: secret
              secret:
                secretName: secret-for-pod
            restartPolicy: OnFailure
      EOF

      $ kubectl create secret generic secret-for-pod --from-
  literal=key=value

      $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      $

      $ kubectl create -f cronjob-with-secret.yaml
      cronjob.batch/cronjob-with-secret created

      (wait ~5 minutes)

      $ kubectl get cronjobs
      NAME                  SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
      cronjob-with-secret   */1 * * * *   False     0        42s             5m54s

      $ sudo systemctl list-units --type=scope | grep 'Kubernetes transient mount for'
      run-r022aebe0c9944f6fbd6cd989a2c2b819.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/9df44dc4-de5b-4586-9948-2930f6bc47fa/volumes/kubernetes.io~secret/default-token-24k4f
      run-r2123bea060344165b7b13320d68f1fd5.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/5e95c6ef-f04f-47c5-b479-d3a5b1830106/volumes/kubernetes.io~secret/secret
      run-rb8605acad9e54c3d965b2cba965b593b.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/933f41b5-a566-432e-86bb-897549675403/volumes/kubernetes.io~secret/secret
      run-rbbaa670a270a41238d019e08a1aba400.scope loaded active running Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/2e942cf5-3b1d-48f2-8758-5f1875dc05f7/volumes/kubernetes.io~secret/default-token-24k4f
      $

  [1] https://github.com/kubernetes/kubernetes/issues/64137
  [2] https://bugs.launchpad.net/bugs/1846787

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1847512/+subscriptions


Follow ups