← Back to team overview

kernel-packages team mailing list archive

[Bug 1570195] Re: Net tools cause kernel soft lockup after DPDK touched VirtIO-pci devices

 

pull-lp-source linux 4.4.0-18.34
Build from source with oldconfig and such
Enable all kind of debug for virtio
Add some checks where we expect it to fail
mkdir /home/ubuntu/4.4.0-debug
# not needed make INSTALL_MOD_PATH=/home/ubuntu/4.4.0-debug modules_install
make INSTALL_PATH=/home/ubuntu/4.4.0-debug install

    <kernel>/home/ubuntu/linux-4.4.0/vmlinuz-4.4.6</kernel>
    <cmdline>root=/dev/vda1 console=tty1 console=ttyS0 net.ifnames=0</cmdline>

Attach debugger as before and retrigger the bug
Ensure /home/ubuntu/linux-4.4.0/scripts/gdb/vmlinux-gdb.py gets loaded properly for helpers

On boot my debug starts to work on the one device that gets innitialized on boot:
[    3.557697] __virtqueue_get_buf: Entry checks passed - vq ffff8800bbae6400 from _vq ffff8800bbae6400
[    3.559320] __virtqueue_get_buf: Exit checks passed - ffff8801b74b2840 vq->data[i]
[    3.560515] __virtqueue_get_buf: Returning ret ffff8801b74b2840

Prep issue:
sudo /usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048 -- --interactive --total-num-mbufs=2048

* it might be worth to mention that nothing regarding the queues came by
running testpmd - neither in console nor in gdb

Trigger hang:
sudo ethtool -L eth1 combined 3

__virtqueue_is_broken: - vq ffff8800bbae7000 from _vq ffff8800bbae7000 -> broken 0
__virtqueue_is_broken: - vq ffff8800bbae7000 from _vq ffff8800bbae7000 -> broken 0
[...]

With the debug we have we can check the vvq's status
BTW - the offset of that container_of is 0 - so we can just cast it :-/

$4 = {vq = {list = {next = 0xffff8800bb892b00, prev = 0xffff8801b7518000}, callback = 0x0 <irq_stack_union>, name = 0xffffffff81d0f164 "control", 
    vdev = 0xffff8800bb892800, index = 8, num_free = 63, priv = 0x1c010}, vring = {num = 64, desc = 0xffff8801b7514000, avail = 0xffff8801b7514400, 
    used = 0xffff8801b7515000}, weak_barriers = true, broken = false, indirect = true, event = true, free_head = 1, num_added = 0, last_used_idx = 0, 
  avail_flags_shadow = 1, avail_idx_shadow = 1, notify = 0xffffffff814bca40 <vp_notify>, data = 0xffff8800bbae7078}

So it considers itself not broken.

But I've seen it run over the usually disabled (so we don't see it by default):
pr_debug("No more buffers in queue\n");

That depends on !more_used(vq)
Which is:
  return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
               0           !=        0

(gdb) p ((struct vring_virtqueue *)0xffff8800bbae7000)->vring
$19 = {num = 64, desc = 0xffff8801b7514000, avail = 0xffff8801b7514400, used = 0xffff8801b7515000}
(gdb) p *((struct vring_virtqueue *)0xffff8800bbae7000)->vring.used
$21 = {flags = 0, idx = 0, ring = 0xffff8801b7515004}
(gdb) p *((struct vring_virtqueue *)0xffff8800bbae7000)->vring.avail
$22 = {flags = 1, idx = 1, ring = 0xffff8801b7514404}
(gdb) p *((struct vring_virtqueue *)0xffff8800bbae7000)->vring.desc
$23 = {addr = 3140568064, len = 48, flags = 4, next = 1}

0!=0 => false -> so more_used returns fals
But the call said !more_used, so virtqueue_get_buf returns NULL - and that is all it does "forever".

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1570195

Title:
  Net tools cause kernel soft lockup after DPDK touched  VirtIO-pci
  devices

Status in dpdk package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Guys,

   I'm facing an issue here with both "ethtool" and "ip", while trying
  to manage black-listed by DPDK PCI VirtIO devices.

   You'll need an Ubuntu Xenial KVM guest, with 4 VirtIO vNIC cards, to
  run those tests

   PCI device example from inside a Xenial guest:

  ---
  # lspci | grep Ethernet
  00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:06.0 Ethernet controller: Red Hat, Inc Virtio network device
  ---

  Where "ens3" is the first / default interface, attached to Libvirt's
  "default" network. The "ens4" is reserved for "ethtool / ip" tests
  (attached to another Libvirt's network without IPs or DHCP), "ens5"
  will be "dpdk0" and "ens6" "dpdk1"...

  ---
   *** How it works?

   1- For example, try to enable multi-queue on DPDK's devices, boot
  your Xenial guest, and run:

   ethtool -L ens5 combined 4
   ethtool -L ens6 combined 4

   2- Install openvswitch-switch-dpdk configure DPDK and OVS and fire it
  up.

   https://help.ubuntu.com/16.04/serverguide/DPDK.html

   service openvswitch-switch stop
   service dpdk stop

   OVS DPDK Options (/etc/default/openvswitch-switch):

  --
  DPDK_OPTS='--dpdk -c 0x1 -n 4 --socket-mem 1024 --pci-blacklist 0000:00:03.0,0000:00:04.0'
  --

   service dpdk start
   service openvswitch-switch start

   - Enable multi-queue on OVS+DPDK inside of the VM:

   ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xff00

   * Multi-queue apparently works! ovs-vswitchd consumes more that 100%
  of CPU, meaning that it multi-queue is there...

   *** Where it fails?

   1- Reboot the VM and try to run ethtool again (or go straight to 2
  below):

   ethtool -L ens5 combined 4

   2- Try to fire up ens4:

   ip link set dev ens4 up

  
   # FAIL! Both commands hangs, consuming 100% of guest's CPU...

   So, it looks like a Linux fault, because it is "allowing" the DPDK
  VirtIO App (a user land App), to interfere with kernel devices in a
  strange way...

  Best,
  Thiago

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-18-generic 4.4.0-18.34
  ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
  Uname: Linux 4.4.0-18-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Apr 14 00:35 seq
   crw-rw---- 1 root audio 116, 33 Apr 14 00:35 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
  CRDA: N/A
  Date: Thu Apr 14 01:27:27 2016
  HibernationDevice: RESUME=UUID=833e999c-e066-433c-b8a2-4324bb8d56de
  InstallationDate: Installed on 2016-04-07 (7 days ago)
  InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160406)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
  PciMultimedia:
   
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=9911604e-353b-491f-a0a9-804724350592 ro
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-18-generic N/A
   linux-backports-modules-4.4.0-18-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/01/2014
  dmi.bios.vendor: SeaBIOS
  dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
  dmi.chassis.type: 1
  dmi.chassis.vendor: QEMU
  dmi.chassis.version: pc-i440fx-wily
  dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-wily:cvnQEMU:ct1:cvrpc-i440fx-wily:
  dmi.product.name: Standard PC (i440FX + PIIX, 1996)
  dmi.product.version: pc-i440fx-wily
  dmi.sys.vendor: QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1570195/+subscriptions


References