group.of.nepali.translators team mailing list archive

Thread
Date

[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Stéphane Graber <stgraber@xxxxxxxxxxxx>
Date: Wed, 23 Aug 2017 01:00:23 -0000
Reply-to: Bug 1602192 <1602192@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Also affects: lxd (Ubuntu Xenial)
Importance: Undecided
Status: New

** Description changed:

- Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-
- core/+bug/1593828/comments/18
+ == SRU
+ === Rationale
+ LXD containers using systemd will use a very large amount of inotify watches. This means that a system will typically run out of global watches with as little as 15 Ubuntu 16.04 containers.
+
+ An easy fix for the issue is to bump the number of user watches up to
+ 1024, making it possible to run around 100 containers before hitting the
+ limit again.
+
+ To do so, LXD is now shipping a sysctl.d file which bumps that
+ particular limit on systems that have LXD installed.
+
+ === Testcase
+ 1) Upgrade LXD
+ 2) Spawn about 50 Ubuntu 16.04 containers ("lxc launch ubuntu:16.04")
+ 3) Check that they all get an IP address ("lxc list"), that's a pretty good sign that they booted properly
+
+ === Regression potential
+ Not expecting anything here. Juju has shipped a similar configuration for a while now and so have the LXD feature releases.
+
+ We pretty much just forgot to include this particular change in our LTS
+ packaging branch
+
+
+ == Original bug report
+ Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-core/+bug/1593828/comments/18

"...
However, if you bootstrap LXD and do:

juju bootstrap localxd lxd --upload-tools
for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done

Somewhere between 10-20-th deploy fails with machine in pending state
(nothin useful in logs) and none of the new deploys after that first
pending succeeds. Might be a different bug, but it's easy to verify with
running that for loop.

So, this particular error was not in my logs, but the controller still
ends up unable to provision at least 30 machines ..."

I can reproduce this. Looking on the failed machine I can see that jujud
isn't running, which is why juju considers the machine not up, and in
fact nothing of juju seems to be installed. There's nothing about juju
in /var/log.

Comparing cloud-init-output.log between a stuck-pending machine and one
which has started up fine, they both start with some key-generation
messages, but the successful machine then has the line:

Cloud-init v. 0.7.7 running 'init' at Tue, 12 Jul 2016 08:32:00 +0000.
Up 4.0 seconds.

...and then a whole lot of juju-installation gubbins, while the failed
machine log just stops.

** Changed in: lxd (Ubuntu Xenial)
Status: New => Triaged

** Changed in: lxd (Ubuntu Xenial)
Status: Triaged => In Progress

** Changed in: lxd (Ubuntu Xenial)
Importance: Undecided => Medium

--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1602192

Title:
when starting many LXD containers, they start failing to boot with
"Too many open files"

Status in lxd package in Ubuntu:
Fix Released
Status in lxd source package in Xenial:
In Progress

Bug description:
== SRU
=== Rationale
LXD containers using systemd will use a very large amount of inotify watches. This means that a system will typically run out of global watches with as little as 15 Ubuntu 16.04 containers.

An easy fix for the issue is to bump the number of user watches up to
1024, making it possible to run around 100 containers before hitting
the limit again.

To do so, LXD is now shipping a sysctl.d file which bumps that
particular limit on systems that have LXD installed.

=== Testcase
1) Upgrade LXD
2) Spawn about 50 Ubuntu 16.04 containers ("lxc launch ubuntu:16.04")
3) Check that they all get an IP address ("lxc list"), that's a pretty good sign that they booted properly

=== Regression potential
Not expecting anything here. Juju has shipped a similar configuration for a while now and so have the LXD feature releases.

We pretty much just forgot to include this particular change in our
LTS packaging branch

== Original bug report
Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-core/+bug/1593828/comments/18

"...
However, if you bootstrap LXD and do:

juju bootstrap localxd lxd --upload-tools
for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done

So, this particular error was not in my logs, but the controller still
ends up unable to provision at least 30 machines ..."

I can reproduce this. Looking on the failed machine I can see that
jujud isn't running, which is why juju considers the machine not up,
and in fact nothing of juju seems to be installed. There's nothing
about juju in /var/log.

Comparing cloud-init-output.log between a stuck-pending machine and
one which has started up fine, they both start with some key-
generation messages, but the successful machine then has the line:

Cloud-init v. 0.7.7 running 'init' at Tue, 12 Jul 2016 08:32:00 +0000.
Up 4.0 seconds.

...and then a whole lot of juju-installation gubbins, while the failed
machine log just stops.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions