yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #59379
[Bug 1646896] [NEW] System hangs when using NFS storage backend with loopback mounts
Public bug reported:
Description
===========
When using high speed disks and NFS as storage backend, during high loads the nfs mounts hang indefinitely.
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* Spin up a VM with a mounted cinder volume from an NFS backend
* Generate some read/write load
* Occationally the loopback NFS mounts will hang. The machine and everything else using that mount will also hang.
Expected result
===============
The system should run stably
Actual result
=============
Occasionally , usually during higher load the system will hang.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Kilo
openstack-nova-compute-2015.1.1-1.el7.noarch
openstack-nova-cert-2015.1.1-1.el7.noarch
python-nova-2015.1.1-1.el7.noarch
openstack-nova-console-2015.1.1-1.el7.noarch
openstack-nova-novncproxy-2015.1.1-1.el7.noarch
openstack-nova-common-2015.1.1-1.el7.noarch
python-novaclient-2.23.0-1.el7.noarch
openstack-nova-scheduler-2015.1.1-1.el7.noarch
openstack-nova-api-2015.1.1-1.el7.noarch
openstack-nova-conductor-2015.1.1-1.el7.noarch
2. Which hypervisor did you use?
Libvirt + KVM,
2. Which storage type did you use?
NFS
3. Which networking type did you use?
Neutron with OpenVSwitch
Logs & Configs
==============
Nova.conf:
[DEFAULT]
notification_driver=ceilometer.compute.nova_notifier
notification_driver=nova.openstack.common.notifier.rpc_notifier
notification_driver =
notification_topics=notifications
rpc_backend=rabbit
internal_service_availability_zone=internal
default_availability_zone=nova
notify_api_faults=False
state_path=/openstack/nova
report_interval=10
enabled_apis=ec2,osapi_compute,metadata
ec2_listen=0.0.0.0
ec2_workers=2
osapi_compute_listen=0.0.0.0
osapi_compute_workers=2
metadata_listen=0.0.0.0
metadata_workers=2
compute_manager=nova.compute.manager.ComputeManager
service_down_time=60
rootwrap_config=/etc/nova/rootwrap.conf
auth_strategy=keystone
use_forwarded_for=False
novncproxy_host=192.168.0.1
novncproxy_port=6080
allow_resize_to_same_host=true
block_device_allocate_retries=1560
heal_instance_info_cache_interval=60
reserved_host_memory_mb=512
network_api_class=nova.network.neutronv2.api.API
default_floating_pool=public
force_snat_range=0.0.0.0/0
metadata_host=192.168.0.1
dhcp_domain=novalocal
security_group_api=neutron
debug=True
verbose=True
log_dir=/var/log/nova
use_syslog=False
cpu_allocation_ratio=16.0
ram_allocation_ratio=1.5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
compute_driver=libvirt.LibvirtDriver
vif_plugging_is_fatal=True
vif_plugging_timeout=300
firewall_driver=nova.virt.firewall.NoopFirewallDriver
remove_unused_base_images=true
force_raw_images=True
novncproxy_base_url=http://0.0.0.0:6080/vnc_auto.html
vncserver_listen=192.168.0.1
vncserver_proxyclient_address=127.0.0.1
vnc_enabled=True
vnc_keymap=en-us
volume_api_class=nova.volume.cinder.API
amqp_durable_queues=False
sql_connection=mysql:XXXXXXXXXXX
lock_path=/openstack/nova/tmp
osapi_volume_listen=0.0.0.0
[api_database]
[barbican]
[cells]
[cinder]
[conductor]
workers=2
[database]
[ephemeral_storage_encryption]
[glance]
api_servers=192.168.0.1:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
auth_uri=http://192.168.0.1:5000/v2.0
identity_uri=http://192.168.0.1:35357
admin_user=nova
admin_password=XXXXXXx
[libvirt]
virt_type=kvm
inject_password=False
inject_key=False
inject_partition=-1
live_migration_uri=qemu+tcp://nova@%s/system
cpu_mode=host-model
disk_cachemodes=file=writethrough,block=writethrough
nfs_mount_options=rw,hard,intr,nolock,vers=4.1,timeo=10
vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver
[metrics]
[neutron]
......
Cinder.conf:
[nfs_ssd]
nfs_used_ratio=0.95
nfs_oversub_ratio=10.0
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/nfs_shares_ssd.conf
volume_backend_name=nfs_ssd
quota_volumes = -1
nfs_mount_options=rw,hard,intr,nolock
- No notable output in nova log
- System log /dmesg after a hang:
Nov 24 04:10:41 openstack1.itgix.com kernel: INFO: task qemu-kvm:11726 blocked for more than 120 seconds.
Nov 24 04:10:41 openstack1.itgix.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 24 04:10:41 openstack1.itgix.com kernel: qemu-kvm D ffff88118b1b1f60 0 11726 1 0x00000080
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77c40 0000000000000082 ffff881184b86780 ffff880da4c77fd8
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77fd8 ffff880da4c77fd8 ffff881184b86780 ffff88118b1b1f58
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff88118b1b1f5c ffff881184b86780 00000000ffffffff ffff88118b1b1f60
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>] schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff812fbff4>] ? timerqueue_del+0x24/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>] mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>] generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>] schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>] mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>] generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811dde5d>] do_sync_write+0x8d/0xd0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811de67d>] vfs_write+0xbd/0x1e0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811df2d2>] SyS_pwrite64+0x92/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1646896
Title:
System hangs when using NFS storage backend with loopback mounts
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
When using high speed disks and NFS as storage backend, during high loads the nfs mounts hang indefinitely.
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* Spin up a VM with a mounted cinder volume from an NFS backend
* Generate some read/write load
* Occationally the loopback NFS mounts will hang. The machine and everything else using that mount will also hang.
Expected result
===============
The system should run stably
Actual result
=============
Occasionally , usually during higher load the system will hang.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Kilo
openstack-nova-compute-2015.1.1-1.el7.noarch
openstack-nova-cert-2015.1.1-1.el7.noarch
python-nova-2015.1.1-1.el7.noarch
openstack-nova-console-2015.1.1-1.el7.noarch
openstack-nova-novncproxy-2015.1.1-1.el7.noarch
openstack-nova-common-2015.1.1-1.el7.noarch
python-novaclient-2.23.0-1.el7.noarch
openstack-nova-scheduler-2015.1.1-1.el7.noarch
openstack-nova-api-2015.1.1-1.el7.noarch
openstack-nova-conductor-2015.1.1-1.el7.noarch
2. Which hypervisor did you use?
Libvirt + KVM,
2. Which storage type did you use?
NFS
3. Which networking type did you use?
Neutron with OpenVSwitch
Logs & Configs
==============
Nova.conf:
[DEFAULT]
notification_driver=ceilometer.compute.nova_notifier
notification_driver=nova.openstack.common.notifier.rpc_notifier
notification_driver =
notification_topics=notifications
rpc_backend=rabbit
internal_service_availability_zone=internal
default_availability_zone=nova
notify_api_faults=False
state_path=/openstack/nova
report_interval=10
enabled_apis=ec2,osapi_compute,metadata
ec2_listen=0.0.0.0
ec2_workers=2
osapi_compute_listen=0.0.0.0
osapi_compute_workers=2
metadata_listen=0.0.0.0
metadata_workers=2
compute_manager=nova.compute.manager.ComputeManager
service_down_time=60
rootwrap_config=/etc/nova/rootwrap.conf
auth_strategy=keystone
use_forwarded_for=False
novncproxy_host=192.168.0.1
novncproxy_port=6080
allow_resize_to_same_host=true
block_device_allocate_retries=1560
heal_instance_info_cache_interval=60
reserved_host_memory_mb=512
network_api_class=nova.network.neutronv2.api.API
default_floating_pool=public
force_snat_range=0.0.0.0/0
metadata_host=192.168.0.1
dhcp_domain=novalocal
security_group_api=neutron
debug=True
verbose=True
log_dir=/var/log/nova
use_syslog=False
cpu_allocation_ratio=16.0
ram_allocation_ratio=1.5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
compute_driver=libvirt.LibvirtDriver
vif_plugging_is_fatal=True
vif_plugging_timeout=300
firewall_driver=nova.virt.firewall.NoopFirewallDriver
remove_unused_base_images=true
force_raw_images=True
novncproxy_base_url=http://0.0.0.0:6080/vnc_auto.html
vncserver_listen=192.168.0.1
vncserver_proxyclient_address=127.0.0.1
vnc_enabled=True
vnc_keymap=en-us
volume_api_class=nova.volume.cinder.API
amqp_durable_queues=False
sql_connection=mysql:XXXXXXXXXXX
lock_path=/openstack/nova/tmp
osapi_volume_listen=0.0.0.0
[api_database]
[barbican]
[cells]
[cinder]
[conductor]
workers=2
[database]
[ephemeral_storage_encryption]
[glance]
api_servers=192.168.0.1:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
auth_uri=http://192.168.0.1:5000/v2.0
identity_uri=http://192.168.0.1:35357
admin_user=nova
admin_password=XXXXXXx
[libvirt]
virt_type=kvm
inject_password=False
inject_key=False
inject_partition=-1
live_migration_uri=qemu+tcp://nova@%s/system
cpu_mode=host-model
disk_cachemodes=file=writethrough,block=writethrough
nfs_mount_options=rw,hard,intr,nolock,vers=4.1,timeo=10
vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver
[metrics]
[neutron]
......
Cinder.conf:
[nfs_ssd]
nfs_used_ratio=0.95
nfs_oversub_ratio=10.0
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/nfs_shares_ssd.conf
volume_backend_name=nfs_ssd
quota_volumes = -1
nfs_mount_options=rw,hard,intr,nolock
- No notable output in nova log
- System log /dmesg after a hang:
Nov 24 04:10:41 openstack1.itgix.com kernel: INFO: task qemu-kvm:11726 blocked for more than 120 seconds.
Nov 24 04:10:41 openstack1.itgix.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 24 04:10:41 openstack1.itgix.com kernel: qemu-kvm D ffff88118b1b1f60 0 11726 1 0x00000080
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77c40 0000000000000082 ffff881184b86780 ffff880da4c77fd8
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77fd8 ffff880da4c77fd8 ffff881184b86780 ffff88118b1b1f58
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff88118b1b1f5c ffff881184b86780 00000000ffffffff ffff88118b1b1f60
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>] schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff812fbff4>] ? timerqueue_del+0x24/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>] mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>] generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>] schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>] mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>] generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>] nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811dde5d>] do_sync_write+0x8d/0xd0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811de67d>] vfs_write+0xbd/0x1e0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811df2d2>] SyS_pwrite64+0x92/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1646896/+subscriptions