← Back to team overview

touch-packages team mailing list archive

[Bug 1536021] [NEW] [xenial/armhf] lxc-stop --kill hangs forever, container pid 1 in 'D' state

 

Public bug reported:

Since I upgraded our armhf autopkgtest boxes from wily to xenial, I very
often get eternal hangs on lxc-stop:

adt-virt-lxc-egctlo  RUNNING  10.0.3.154  -     -       NO

root     15766  0.0  0.0   5044  1488 ?        S    Jan19   0:00 lxc-
stop --kill --name adt-virt-lxc-egctlo

I can still attach to the container, and it seems pid1 is in some
"uninterruptible deep kernel sleep":

$ sudo lxc-attach -n adt-virt-lxc-egctlo ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1   5060  2344 ?        Ds   Jan19   0:00 /sbin/init
root       230  0.0  0.1  12112  2224 ?        Ss   Jan19   0:00 /lib/systemd/systemd-journald
root       263  0.0  0.0   3372  1060 ?        Ss   Jan19   0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp
root       321  0.0  0.0   4896   912 ?        Ss   Jan19   0:00 /usr/sbin/cron -f
syslog     329  0.0  0.0  31148  1424 ?        Ssl  Jan19   0:00 /usr/sbin/rsyslogd -n
message+   349  0.0  0.0   4860  1540 ?        Ss   Jan19   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopid
root       358  0.0  0.0      0     0 ?        Zs   Jan19   0:00 [systemd-logind] <defunct>
root       384  0.0  0.0   3848   692 pts/3    Ss+  Jan19   0:00 /sbin/agetty --noclear --keep-baud pts/3 115200 38400 9600 vt220
root       386  0.0  0.0   3848   692 pts/0    Ss+  Jan19   0:00 /sbin/agetty --noclear --keep-baud pts/0 115200 38400 9600 vt220
root       389  0.0  0.0   3848   692 pts/1    Ss+  Jan19   0:00 /sbin/agetty --noclear --keep-baud pts/1 115200 38400 9600 vt220
root       391  0.0  0.0      0     0 ?        Zs   Jan19   0:00 [agetty] <defunct>
root       393  0.0  0.0   5064  1028 ?        Ss   Jan19   0:00 (agetty)
root      4907  0.0  0.0   5652  1176 ?        S    Jan19   0:00 reboot
root      4917  0.0  0.0      0     0 ?        Zs   Jan19   0:00 [ondemand] <defunct>
root      5747  0.0  0.0      0     0 ?        Z    Jan19   0:00 [bash] <defunct>
root      5748  0.0  0.0      0     0 ?        Z    Jan19   0:00 [bash] <defunct>
root      7168  0.0  0.0      0     0 ?        Z    Jan19   0:00 [dkms] <defunct>
root      8516  0.0  0.0      0     0 ?        Z    Jan19   0:00 [dkms] <defunct>
root     21174  0.0  0.0   6788  1304 pts/3    R+   07:20   0:00 ps aux

journal in the container still works, but does not show anything
interesting. systemctl hangs due to pid1 getting into this 'D' state.
Due to that, stracing pid 1 is also useless.

These boxes are still running the trusty kernel 3.13, as newer kernels
don't boot on those boxes (the block devices are missing, probably a
missing block driver?), so this regression is not due to a kernel
change.

So this is somewhere between lxc, lxcfs, systemd, or cgmanager. I'll
bisect these packages in the next days to find out, as so far I don't
yet have a way to reproduce this reliably.

** Affects: lxc (Ubuntu)
     Importance: Undecided
         Status: New

** Summary changed:

- [xenial/armhf] lxc-stop --kill hangs forever
+ [xenial/armhf] lxc-stop --kill hangs forever, container pid 1 in 'D' state

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1536021

Title:
  [xenial/armhf] lxc-stop --kill hangs forever, container pid 1 in 'D'
  state

Status in lxc package in Ubuntu:
  New

Bug description:
  Since I upgraded our armhf autopkgtest boxes from wily to xenial, I
  very often get eternal hangs on lxc-stop:

  adt-virt-lxc-egctlo  RUNNING  10.0.3.154  -     -       NO

  root     15766  0.0  0.0   5044  1488 ?        S    Jan19   0:00 lxc-
  stop --kill --name adt-virt-lxc-egctlo

  I can still attach to the container, and it seems pid1 is in some
  "uninterruptible deep kernel sleep":

  $ sudo lxc-attach -n adt-virt-lxc-egctlo ps aux
  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root         1  0.0  0.1   5060  2344 ?        Ds   Jan19   0:00 /sbin/init
  root       230  0.0  0.1  12112  2224 ?        Ss   Jan19   0:00 /lib/systemd/systemd-journald
  root       263  0.0  0.0   3372  1060 ?        Ss   Jan19   0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp
  root       321  0.0  0.0   4896   912 ?        Ss   Jan19   0:00 /usr/sbin/cron -f
  syslog     329  0.0  0.0  31148  1424 ?        Ssl  Jan19   0:00 /usr/sbin/rsyslogd -n
  message+   349  0.0  0.0   4860  1540 ?        Ss   Jan19   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopid
  root       358  0.0  0.0      0     0 ?        Zs   Jan19   0:00 [systemd-logind] <defunct>
  root       384  0.0  0.0   3848   692 pts/3    Ss+  Jan19   0:00 /sbin/agetty --noclear --keep-baud pts/3 115200 38400 9600 vt220
  root       386  0.0  0.0   3848   692 pts/0    Ss+  Jan19   0:00 /sbin/agetty --noclear --keep-baud pts/0 115200 38400 9600 vt220
  root       389  0.0  0.0   3848   692 pts/1    Ss+  Jan19   0:00 /sbin/agetty --noclear --keep-baud pts/1 115200 38400 9600 vt220
  root       391  0.0  0.0      0     0 ?        Zs   Jan19   0:00 [agetty] <defunct>
  root       393  0.0  0.0   5064  1028 ?        Ss   Jan19   0:00 (agetty)
  root      4907  0.0  0.0   5652  1176 ?        S    Jan19   0:00 reboot
  root      4917  0.0  0.0      0     0 ?        Zs   Jan19   0:00 [ondemand] <defunct>
  root      5747  0.0  0.0      0     0 ?        Z    Jan19   0:00 [bash] <defunct>
  root      5748  0.0  0.0      0     0 ?        Z    Jan19   0:00 [bash] <defunct>
  root      7168  0.0  0.0      0     0 ?        Z    Jan19   0:00 [dkms] <defunct>
  root      8516  0.0  0.0      0     0 ?        Z    Jan19   0:00 [dkms] <defunct>
  root     21174  0.0  0.0   6788  1304 pts/3    R+   07:20   0:00 ps aux

  journal in the container still works, but does not show anything
  interesting. systemctl hangs due to pid1 getting into this 'D' state.
  Due to that, stracing pid 1 is also useless.

  These boxes are still running the trusty kernel 3.13, as newer kernels
  don't boot on those boxes (the block devices are missing, probably a
  missing block driver?), so this regression is not due to a kernel
  change.

  So this is somewhere between lxc, lxcfs, systemd, or cgmanager. I'll
  bisect these packages in the next days to find out, as so far I don't
  yet have a way to reproduce this reliably.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1536021/+subscriptions


Follow ups