← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"

 

** Also affects: lxd (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Description changed:

- Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-
- core/+bug/1593828/comments/18
+ == SRU
+ === Rationale
+ LXD containers using systemd will use a very large amount of inotify watches. This means that a system will typically run out of global watches with as little as 15 Ubuntu 16.04 containers.
+ 
+ An easy fix for the issue is to bump the number of user watches up to
+ 1024, making it possible to run around 100 containers before hitting the
+ limit again.
+ 
+ To do so, LXD is now shipping a sysctl.d file which bumps that
+ particular limit on systems that have LXD installed.
+ 
+ === Testcase
+ 1) Upgrade LXD
+ 2) Spawn about 50 Ubuntu 16.04 containers ("lxc launch ubuntu:16.04")
+ 3) Check that they all get an IP address ("lxc list"), that's a pretty good sign that they booted properly
+ 
+ === Regression potential
+ Not expecting anything here. Juju has shipped a similar configuration for a while now and so have the LXD feature releases.
+ 
+ We pretty much just forgot to include this particular change in our LTS
+ packaging branch
+ 
+ 
+ == Original bug report
+ Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-core/+bug/1593828/comments/18
  
  "...
  However, if you bootstrap LXD and do:
  
  juju bootstrap localxd lxd --upload-tools
  for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done
  
  Somewhere between 10-20-th deploy fails with machine in pending state
  (nothin useful in logs) and none of the new deploys after that first
  pending succeeds. Might be a different bug, but it's easy to verify with
  running that for loop.
  
  So, this particular error was not in my logs, but the controller still
  ends up unable to provision at least 30 machines ..."
  
  I can reproduce this. Looking on the failed machine I can see that jujud
  isn't running, which is why juju considers the machine not up, and in
  fact nothing of juju seems to be installed. There's nothing about juju
  in /var/log.
  
  Comparing cloud-init-output.log between a stuck-pending machine and one
  which has started up fine, they both start with some key-generation
  messages, but the successful machine then has the line:
  
  Cloud-init v. 0.7.7 running 'init' at Tue, 12 Jul 2016 08:32:00 +0000.
  Up 4.0 seconds.
  
  ...and then a whole lot of juju-installation gubbins, while the failed
  machine log just stops.

** Changed in: lxd (Ubuntu Xenial)
       Status: New => Triaged

** Changed in: lxd (Ubuntu Xenial)
       Status: Triaged => In Progress

** Changed in: lxd (Ubuntu Xenial)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1602192

Title:
  when starting many LXD containers, they start failing to boot with
  "Too many open files"

Status in lxd package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  In Progress

Bug description:
  == SRU
  === Rationale
  LXD containers using systemd will use a very large amount of inotify watches. This means that a system will typically run out of global watches with as little as 15 Ubuntu 16.04 containers.

  An easy fix for the issue is to bump the number of user watches up to
  1024, making it possible to run around 100 containers before hitting
  the limit again.

  To do so, LXD is now shipping a sysctl.d file which bumps that
  particular limit on systems that have LXD installed.

  === Testcase
  1) Upgrade LXD
  2) Spawn about 50 Ubuntu 16.04 containers ("lxc launch ubuntu:16.04")
  3) Check that they all get an IP address ("lxc list"), that's a pretty good sign that they booted properly

  === Regression potential
  Not expecting anything here. Juju has shipped a similar configuration for a while now and so have the LXD feature releases.

  We pretty much just forgot to include this particular change in our
  LTS packaging branch

  
  == Original bug report
  Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-core/+bug/1593828/comments/18

  "...
  However, if you bootstrap LXD and do:

  juju bootstrap localxd lxd --upload-tools
  for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done

  Somewhere between 10-20-th deploy fails with machine in pending state
  (nothin useful in logs) and none of the new deploys after that first
  pending succeeds. Might be a different bug, but it's easy to verify
  with running that for loop.

  So, this particular error was not in my logs, but the controller still
  ends up unable to provision at least 30 machines ..."

  I can reproduce this. Looking on the failed machine I can see that
  jujud isn't running, which is why juju considers the machine not up,
  and in fact nothing of juju seems to be installed. There's nothing
  about juju in /var/log.

  Comparing cloud-init-output.log between a stuck-pending machine and
  one which has started up fine, they both start with some key-
  generation messages, but the successful machine then has the line:

  Cloud-init v. 0.7.7 running 'init' at Tue, 12 Jul 2016 08:32:00 +0000.
  Up 4.0 seconds.

  ...and then a whole lot of juju-installation gubbins, while the failed
  machine log just stops.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions