← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1571209] Re: Sockfile check retries too short for a busy system boot

 

This bug was fixed in the package libvirt - 1.2.2-0ubuntu13.1.23

---------------
libvirt (1.2.2-0ubuntu13.1.23) trusty; urgency=medium

  * d/libvirt-bin.init, d/libvirt-bin.upstart: fix waiting for the libvirt
    socket (LP: #1571209)
    - avoid timing out on slow systems (only stop when service is stopped)
    - fix whitespace damage formerly added to d/libvirt-bin.init
    - no more long sleep without announcing to log
    - check socket and service status more often for lower latency on changes
    - fix check if unix_sock_dir path is set in /etc/libvirt/libvirtd.conf
    - fix the upstart service name that is checked

 -- Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx>  Thu, 07 Sep
2017 14:22:45 +0200

** Changed in: libvirt (Ubuntu Trusty)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1571209

Title:
  Sockfile check retries too short for a busy system boot

Status in libvirt package in Ubuntu:
  Fix Released
Status in libvirt source package in Precise:
  Won't Fix
Status in libvirt source package in Trusty:
  Fix Released
Status in libvirt source package in Wily:
  Won't Fix
Status in libvirt source package in Xenial:
  Fix Released
Status in libvirt source package in Zesty:
  Fix Released
Status in libvirt source package in Artful:
  Fix Released

Bug description:
  [Impact]

   * Libvirt service reports to be ready, but it has not spawned the libvirt 
     socket yet. Depending services fail. There was an SRU (#1455608) meant 
     to fix that but it has many deficiencies (not considering config, 
     giving up after 10 seconds, being an unconditional sleep 2, taking up 
     to 2 seconds to a service stop while in pist-start).

   * This is the backport and improvement of a change that was brought to 
     Yakkety already, but there due to systemd it doesn't matter too much.

  [Test Case]

   * There are two very different ways to "test" this due to the overload 
     based scenario where this really becomes important.

   * Version #1 - being lame
     One can just modify the upstart script and exchange the check for the 
     socket with /bin/true.
     That way it waits forever which allows you to check the log entries, 
     the abort responsiveness and similar.

   * Version #2 - recreating the case
     - This mostly means the system has to be very slow and overloaded.
       You can either just slow down the system (e.g. run a qemu with nice 
       MAX). Stress your host with other things burning CPU/memory/disk.
     - we worked with adding autostart guests (see comment #35) but that 
       actually takes place after the socket is created. The reported acse 
       had a raid rebuilding.
     - TL;DR get your system slow enough so that libvirt exceeds 10 seconds 
       to start properly (the old limit is 5*2 seconds)

  [Regression Potential]

   * I'd think that there might exist (super rare) cases were the post-start 
     now does spin forever. But by the definition 
     http://upstart.ubuntu.com/cookbook/#post-start this is correct. It is 
     started (yes) but not yet ready. Yet this might appear as a regression 
     to some.
   * Other than that clearly this should fix more issues than it (hopefully 
     not) causes.

  [Other Info]
   
   * n/a

  
  --- END SRU Template ---


  [ problem description ]

  sockfile_check_retries is first introduced by #1455608, for preventing
  the failure case of sockfile not ready, but it was default to a hard-
  coded value "5", it might be too short for a busy system boot.

  #1455608 -
  https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1455608

  [ step to reproduce ]

  setup a clean install system (Ubuntu Server 14.04.4 LTS), and assemble
  os disk as RAID-1, boot up some guest instances (count > 10, start-at-
  boot), force shutdown host by pressing power-button for 3s ~ 5s, or
  via IPMI command, then power-on afterward. it may sometimes failed to
  get sockfile ready after in "post-start" script, with an line of error
  in /var/log/syslog,

  ==> kernel: [ 313.059830] init: libvirt-bin post-start process (2430)
  terminated with status 1 <==

  since there's multiple VMs Read/Write before a non-graceful shutdown,
  RAID devices need to re-sync after boot, and lead to a slow response,
  but start-up script for libvirt-bin can only wait 5 cycles, 2 seconds
  wait for each cycle, so it will timed-out after 10s, and exit with
  "1".

  [ possible solution ]

  extend the retry times for sockfile waiting, and make it possible to
  change via editing `/etc/default/libvirt-bin` file.

  <please see the patch file as attachment>

  [ sysinfo ]

  $ lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description: Ubuntu 14.04.4 LTS
  Release: 14.04
  Codename: trusty

  $ uname -a
  Linux host2 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

  [ related issue ]

  #1386465 -
  https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1386465

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1571209/+subscriptions