kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #00427
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
This looks to be the third case we hit with the same symptom (and likely
for yet another reason). The complicating issue is that lxc containers
make use of net namespaces and if those are released, references to I
think the netdevice structures, are temporarily moved over to the
loopback device. So anything going wrong with respect of references will
show up like what you see (to add non mental note: bug 1021471 and bug
1065434).
Since you say it takes 10-15hrs to hit it feels like this could again be
a case of something rarely going on when the container is shut down
which then causes a reference to not being dropped. Right now the range
between 3.5.0-27 (maybe?) and 3.8.0-25 is quite vast. And at least up to
3.10 we can assume it has not been detected/fixed. So unlikely something
that will be easy to spot. I know it is a lot of effort, but it would be
really important to narrow down the version delta. If possible, I would
suggest to use the mainline kernels to start of a rough manual
bisection.
http://kernel.ubuntu.com/~kernel-ppa/mainline/
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1196295
Title:
lxc-start enters uninterruptible sleep
Status in The Linux Kernel:
Confirmed
Status in “linux” package in Ubuntu:
Confirmed
Status in “lxc” package in Ubuntu:
Confirmed
Bug description:
After running and terminating around 6000 containers overnight,
something happened on my box that is affecting every new LXC container
I try to start. The DEBUG log file looks like:
lxc-start 1372615570.399 WARN lxc_start - inherited fd 9
lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/302' (5/6)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/303' (7/8)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/304' (10/11)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/305' (12/13)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/306' (14/15)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/307' (16/17)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/308' (18/19)
lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/309' (20/21)
lxc-start 1372615570.399 INFO lxc_conf - tty's configured
lxc-start 1372615570.399 DEBUG lxc_start - sigchild handler set
lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized
lxc-start 1372615570.404 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp
lxc-start 1372615570.404 INFO lxc_start - stored saved_nic
#0 idx 12392 name vethP59
lxc-start 1372615570.404 INFO lxc_conf - opened
/home/x/vm/vm-59.hold as fd 25
It stops there. In 'ps faux', it looks like:
root 31621 0.0 0.0 25572 1272 ? D 14:06 0:00 \_
lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o
/home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m
On a successful LXC run (prior to the server getting into this state),
this hangs just before:
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/' (rootfs)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys' (sysfs)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/proc' (proc)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/dev' (devtmpfs)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/dev/pts' (devpts)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/run' (tmpfs)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/' (btrfs)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys/fs/cgroup' (tmpfs)
lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys/fs/cgroup/cpuset' (cgroup)
lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children'
lxc-start 1372394092.208 DEBUG lxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at /
It looks like a resource leak, but I'm not yet sure of what that would
be.
If it matters, I SIGKILL my lxc-start processes instead of using lxc-
stop. Could that have any negative implications?
Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm guessing it's due to my SIGKILL). I've run cgclear and my /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers still hang.
---
Architecture: amd64
DistroRelease: Ubuntu 13.04
MarkForUpload: True
Package: lxc 0.9.0-0ubuntu3.3
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
Uname: Linux 3.8.0-25-generic x86_64
UserGroups:
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions