kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #172038
[Bug 1570195] Re: Net tools cause kernel soft lockup after DPDK touched VirtIO-pci devices
Before going into discussions how it "should" be I added more debug code
and gatherered some good case vs bad case data.
First of all it is "ok" to have no more buffers.
I had a prink in a codepath that only triggers when !more_used triggers.
And I've seen plentry for all kind of idx values.
On adding virtio traffic it triggers a few times as well.
Eventually that is what the loop is for, to wait until there is ia buffer that it can get.
So things aren't broken if this triggers ever - but of course it is if it never changes.
IIRC: last_used is != vring_used->idx just means nothing happened since
our last interaction (to be confirmed).
Good case:
Some !more_used might occur, but not related and not infintely
[ 393.542550] __virtqueue_get_buf: No more buffers in vq ffff8801b74b3000 - vq->last_used_idx 303 == vq->vring.used->idx 303
[ 394.097117] __virtqueue_get_buf: No more buffers in vq ffff8801b74b3000 - vq->last_used_idx 304 == vq->vring.used->idx 304
[ 394.097413] __virtqueue_get_buf: No more buffers in vq ffff8801b74b4000 - vq->last_used_idx 125 == vq->vring.used->idx 125
[...]
[ 394.449672] __virtqueue_get_buf: Entry checks passed - vq ffff8800bbaef000 from _vq ffff8800bbaef000
[ 394.452734] __virtqueue_get_buf: Exit checks passed - ffff8801b74b5840 vq->data[i]
[ 394.455087] __virtqueue_get_buf: Returning ret ffff8801b74b5840
Done
Bad case (after DPDK ran):
Now both debug printk's trigger
I get a LOT of
[ 552.018862] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
Followed by a sequence like that in between
[ 554.157376] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2
[ 554.158916] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
[ 554.160135] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2
[ 554.161583] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
[ 554.162776] __virtqueue_get_buf: No more buffers in vq ffff8800bbaef000 - vq->last_used_idx 2 == vq->vring.used->idx 2
[ 554.164189] __virtqueue_is_broken: - vq ffff8800bbaef000 from _vq ffff8800bbaef000 -> broken 0
[...] (infinite loop)
Current assumption: DPDK disables something in the host part of the virtio device that makes the host no more response "correctly".
Via unbinding/binding the driver we can reinitialize that, but if not we will run into this hang.
Remember: we only initialize DPDK with testpmd, no load whatsoever is driven by it.
We likely need two fixes:
1. find what DPDK does "to" the device and avoid it
2. the kernel should give up after some number of retries or so and give up returning a fail (not good, but much better than hanging)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1570195
Title:
Net tools cause kernel soft lockup after DPDK touched VirtIO-pci
devices
Status in dpdk package in Ubuntu:
Confirmed
Status in linux package in Ubuntu:
Confirmed
Bug description:
Guys,
I'm facing an issue here with both "ethtool" and "ip", while trying
to manage black-listed by DPDK PCI VirtIO devices.
You'll need an Ubuntu Xenial KVM guest, with 4 VirtIO vNIC cards, to
run those tests
PCI device example from inside a Xenial guest:
---
# lspci | grep Ethernet
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
00:06.0 Ethernet controller: Red Hat, Inc Virtio network device
---
Where "ens3" is the first / default interface, attached to Libvirt's
"default" network. The "ens4" is reserved for "ethtool / ip" tests
(attached to another Libvirt's network without IPs or DHCP), "ens5"
will be "dpdk0" and "ens6" "dpdk1"...
---
*** How it works?
1- For example, try to enable multi-queue on DPDK's devices, boot
your Xenial guest, and run:
ethtool -L ens5 combined 4
ethtool -L ens6 combined 4
2- Install openvswitch-switch-dpdk configure DPDK and OVS and fire it
up.
https://help.ubuntu.com/16.04/serverguide/DPDK.html
service openvswitch-switch stop
service dpdk stop
OVS DPDK Options (/etc/default/openvswitch-switch):
--
DPDK_OPTS='--dpdk -c 0x1 -n 4 --socket-mem 1024 --pci-blacklist 0000:00:03.0,0000:00:04.0'
--
service dpdk start
service openvswitch-switch start
- Enable multi-queue on OVS+DPDK inside of the VM:
ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xff00
* Multi-queue apparently works! ovs-vswitchd consumes more that 100%
of CPU, meaning that it multi-queue is there...
*** Where it fails?
1- Reboot the VM and try to run ethtool again (or go straight to 2
below):
ethtool -L ens5 combined 4
2- Try to fire up ens4:
ip link set dev ens4 up
# FAIL! Both commands hangs, consuming 100% of guest's CPU...
So, it looks like a Linux fault, because it is "allowing" the DPDK
VirtIO App (a user land App), to interfere with kernel devices in a
strange way...
Best,
Thiago
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-18-generic 4.4.0-18.34
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Uname: Linux 4.4.0-18-generic x86_64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Apr 14 00:35 seq
crw-rw---- 1 root audio 116, 33 Apr 14 00:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
CRDA: N/A
Date: Thu Apr 14 01:27:27 2016
HibernationDevice: RESUME=UUID=833e999c-e066-433c-b8a2-4324bb8d56de
InstallationDate: Installed on 2016-04-07 (7 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160406)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=9911604e-353b-491f-a0a9-804724350592 ro
RelatedPackageVersions:
linux-restricted-modules-4.4.0-18-generic N/A
linux-backports-modules-4.4.0-18-generic N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-wily
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-wily:cvnQEMU:ct1:cvrpc-i440fx-wily:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-wily
dmi.sys.vendor: QEMU
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1570195/+subscriptions
References