group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #16773
[Bug 1571209] Re: Sockfile check retries too short for a busy system boot
This bug was fixed in the package libvirt - 1.2.2-0ubuntu13.1.23
---------------
libvirt (1.2.2-0ubuntu13.1.23) trusty; urgency=medium
* d/libvirt-bin.init, d/libvirt-bin.upstart: fix waiting for the libvirt
socket (LP: #1571209)
- avoid timing out on slow systems (only stop when service is stopped)
- fix whitespace damage formerly added to d/libvirt-bin.init
- no more long sleep without announcing to log
- check socket and service status more often for lower latency on changes
- fix check if unix_sock_dir path is set in /etc/libvirt/libvirtd.conf
- fix the upstart service name that is checked
-- Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx> Thu, 07 Sep
2017 14:22:45 +0200
** Changed in: libvirt (Ubuntu Trusty)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1571209
Title:
Sockfile check retries too short for a busy system boot
Status in libvirt package in Ubuntu:
Fix Released
Status in libvirt source package in Precise:
Won't Fix
Status in libvirt source package in Trusty:
Fix Released
Status in libvirt source package in Wily:
Won't Fix
Status in libvirt source package in Xenial:
Fix Released
Status in libvirt source package in Zesty:
Fix Released
Status in libvirt source package in Artful:
Fix Released
Bug description:
[Impact]
* Libvirt service reports to be ready, but it has not spawned the libvirt
socket yet. Depending services fail. There was an SRU (#1455608) meant
to fix that but it has many deficiencies (not considering config,
giving up after 10 seconds, being an unconditional sleep 2, taking up
to 2 seconds to a service stop while in pist-start).
* This is the backport and improvement of a change that was brought to
Yakkety already, but there due to systemd it doesn't matter too much.
[Test Case]
* There are two very different ways to "test" this due to the overload
based scenario where this really becomes important.
* Version #1 - being lame
One can just modify the upstart script and exchange the check for the
socket with /bin/true.
That way it waits forever which allows you to check the log entries,
the abort responsiveness and similar.
* Version #2 - recreating the case
- This mostly means the system has to be very slow and overloaded.
You can either just slow down the system (e.g. run a qemu with nice
MAX). Stress your host with other things burning CPU/memory/disk.
- we worked with adding autostart guests (see comment #35) but that
actually takes place after the socket is created. The reported acse
had a raid rebuilding.
- TL;DR get your system slow enough so that libvirt exceeds 10 seconds
to start properly (the old limit is 5*2 seconds)
[Regression Potential]
* I'd think that there might exist (super rare) cases were the post-start
now does spin forever. But by the definition
http://upstart.ubuntu.com/cookbook/#post-start this is correct. It is
started (yes) but not yet ready. Yet this might appear as a regression
to some.
* Other than that clearly this should fix more issues than it (hopefully
not) causes.
[Other Info]
* n/a
--- END SRU Template ---
[ problem description ]
sockfile_check_retries is first introduced by #1455608, for preventing
the failure case of sockfile not ready, but it was default to a hard-
coded value "5", it might be too short for a busy system boot.
#1455608 -
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1455608
[ step to reproduce ]
setup a clean install system (Ubuntu Server 14.04.4 LTS), and assemble
os disk as RAID-1, boot up some guest instances (count > 10, start-at-
boot), force shutdown host by pressing power-button for 3s ~ 5s, or
via IPMI command, then power-on afterward. it may sometimes failed to
get sockfile ready after in "post-start" script, with an line of error
in /var/log/syslog,
==> kernel: [ 313.059830] init: libvirt-bin post-start process (2430)
terminated with status 1 <==
since there's multiple VMs Read/Write before a non-graceful shutdown,
RAID devices need to re-sync after boot, and lead to a slow response,
but start-up script for libvirt-bin can only wait 5 cycles, 2 seconds
wait for each cycle, so it will timed-out after 10s, and exit with
"1".
[ possible solution ]
extend the retry times for sockfile waiting, and make it possible to
change via editing `/etc/default/libvirt-bin` file.
<please see the patch file as attachment>
[ sysinfo ]
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.4 LTS
Release: 14.04
Codename: trusty
$ uname -a
Linux host2 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[ related issue ]
#1386465 -
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1386465
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1571209/+subscriptions