yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #58174
[Bug 1621818] Re: nova-compute unexpected input/output errors on starting instances (NFS + image-cache)
Reviewed: https://review.openstack.org/386956
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53da313a86e81bf1df75119ca0e8f857e7b2909c
Submitter: Jenkins
Branch: master
commit 53da313a86e81bf1df75119ca0e8f857e7b2909c
Author: Joris S'heeren <joris.sheeren@xxxxxxxxxxxxxxxx>
Date: Fri Sep 9 15:40:58 2016 +0200
Catch error and log warning when not able to update mtimes
When we launch an instance, nova updates the mtime for the _base image
to let the image cache manager know the image is actively used. This
can lead to unexpected I/O errors when launching a large amount of
instances at once coming from the same _base image.
This commit puts the execute call in a try, except block to catch
possible errors. It also logs a warning with the path and error message.
With this, at least once the update will succeed.
Closes-Bug: #1621818
Co-Authored-By: Matt Riedemann <mriedem@xxxxxxxxxx>
Change-Id: I2fd1700aa4563a906eb574cbbe16caa63abae0d6
Signed-off-by: Joris S'heeren <joris.sheeren@xxxxxxxxxxxxxxxx>
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1621818
Title:
nova-compute unexpected input/output errors on starting instances (NFS
+ image-cache)
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) mitaka series:
In Progress
Status in OpenStack Compute (nova) newton series:
New
Bug description:
Our setup consists of multiple controllers and multiple hypervisors.
Our shared storage for the instances is on a nfs 4.1 export. Using
Ubuntu 16.04 LTS and Openstack Mitaka
When we launch an instance, nova updates the mtime for the _base image to let the image cache manager know the image is actively used. I think this was added here: https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=fb6ca3e7c8a38328d384cd41c061ded6623dac90
Because of this, in our setup, we are seeing unexpected input/output errors:
Stderr: u"/bin/touch: setting times of
'/var/lib/nova/instances/_base/79e34519bacb47ad6f64c4baca4d33fd5c57d34d':
Input/output error
A full trace can be found here:
http://paste.openstack.org/show/570161/
This error particularly shows itself when launching multiple instances
at once.
Also, because of this error, the instances are rescheduled. The assigned neutron ports, however, are not deleted. This results in multiple ip's assigned to the instances, with only one of them UP. This also results in attached floating ip's not working ..
This is similar to https://bugs.launchpad.net/nova/+bug/1609526, nova should tell neutron, either to delete the unused port, or update it instead of creating a new one.
Some more info on our environment:
----------------------------------
Using libvirt + kvm, neutron with openvswitch L3 HA
# dpkg -l | grep nova
ii nova-common 2:13.0.0-0ubuntu2 all OpenStack Compute - common files
ii nova-compute 2:13.0.0-0ubuntu2 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:13.0.0-0ubuntu2 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:13.0.0-0ubuntu2 all OpenStack Compute - compute node libvirt support
ii python-nova 2:13.0.0-0ubuntu2 all OpenStack Compute Python libraries
ii python-novaclient 2:3.3.1-2 all client library for OpenStack Compute API - Python 2.7
# dpkg -l |grep libvirt
ii libvirt-bin 1.3.1-1ubuntu10.1 amd64 programs for the libvirt library
ii libvirt0:amd64 1.3.1-1ubuntu10.1 amd64 library for interfacing with different virtualization systems
ii nova-compute-libvirt 2:13.0.0-0ubuntu2 all OpenStack Compute - compute node libvirt support
ii python-libvirt 1.3.1-1ubuntu1 amd64 libvirt Python bindings
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1621818/+subscriptions
References