← Back to team overview

kernel-packages team mailing list archive

[Bug 1570195] Re: Network tools like "ethtool" or "ip" freezes when DPDK Apps are running with VirtIO


Appears running:
4     0 26330 26263  20   0   7588   980 -      R+   pts/2     33:52 \_ ethtool -L eth1 combined 3

All that touches it seems to get affected, so e.g. a ltrace/strace get
stuck as well.

Meanwhile the log on virsh console of the guest goes towards soft lockups:
[  568.394870] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ethtool:26330]
[  575.418868] INFO: rcu_sched self-detected stall on CPU
[  575.419674]  0-...: (14999 ticks this GP) idle=66d/140000000000001/0 softirq=21127/21127 fqs=14994 
[  575.420779]   (t=15000 jiffies g=11093 c=11092 q=9690)

More Info in the journal:
 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ethtool:26330]
 Modules linked in: openvswitch nf_defrag_ipv6 nf_conntrack isofs ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul parport_pc parport joydev serio_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd floppy
 CPU: 0 PID: 26330 Comm: ethtool Not tainted 4.4.0-18-generic #34-Ubuntu
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
 task: ffff8801b747d280 ti: ffff8800ba58c000 task.ti: ffff8800ba58c000
 RIP: 0010:[<ffffffff815f1a43>]  [<ffffffff815f1a43>] virtnet_send_command+0xf3/0x150
 RSP: 0018:ffff8800ba58fb60  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: ffff8800bba62840 RCX: ffff8801b64a9000
 RDX: 000000000000c010 RSI: ffff8800ba58fb64 RDI: ffff8800bba6c400
 RBP: ffff8800ba58fbf8 R08: 0000000000000004 R09: ffff8801b9001b00
 R10: ffff8801b671b080 R11: 0000000000000246 R12: 0000000000000002
 R13: ffff8800ba58fb88 R14: 0000000000000000 R15: 0000000000000004
 FS:  00007fb57d56c700(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb57cd7b680 CR3: 00000000ba85a000 CR4: 00000000001406f0
  ffff8800ba58fc28 ffffea0002ee9882 0000000200000940 0000000000000000
  0000000000000000 ffffea0002ee9882 0000000100000942 0000000000000000
  0000000000000000 ffff8800ba58fb68 ffff8800ba58fc10 ffff8800ba58fb88
 Call Trace:
  [<ffffffff815f1d9a>] virtnet_set_queues+0x9a/0x100
  [<ffffffff815f1e52>] virtnet_set_channels+0x52/0xa0
  [<ffffffff8171fc3c>] ethtool_set_channels+0xfc/0x140
  [<ffffffff81720afd>] dev_ethtool+0x40d/0x1d70
  [<ffffffff811cafc5>] ? page_add_file_rmap+0x25/0x60
  [<ffffffff8172f8d5>] ? __rtnl_unlock+0x15/0x20
  [<ffffffff8171ec61>] ? netdev_run_todo+0x61/0x320
  [<ffffffff8118d8a9>] ? unlock_page+0x69/0x70
  [<ffffffff81733b42>] dev_ioctl+0x182/0x580
  [<ffffffff811bf9f4>] ? handle_mm_fault+0xe44/0x1820
  [<ffffffff816fb932>] sock_do_ioctl+0x42/0x50
  [<ffffffff816fbe32>] sock_ioctl+0x1d2/0x290
  [<ffffffff8121ff9f>] do_vfs_ioctl+0x29f/0x490
  [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400
  [<ffffffff81220209>] SyS_ioctl+0x79/0x90
  [<ffffffff818243b2>] entry_SYSCALL_64_fastpath+0x16/0x71
 Code: 44 89 e2 4c 89 6c c5 b0 e8 3b dc ec ff 48 8b 7b 08 e8 f2 db ec ff 84 c0 75 11 eb 24 48 8b 7b 08 e8 d3 d6 ec ff 84 c0 75 17 f3 90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 4d e0 ec ff 48 85 c0 74 dc

Sometimes there is this on top
 [<ffffffff815f1a53>] ? virtnet_send_command+0x103/0x150

Need to check if there is a loop in virtnet_set_queues that could call
virtnet_send_command infinitely.

Being stuck in the kernel explains why signals and traces can't attach.

Note - we are already on todays: 
Linux guest-virtio-dpdk 4.4.0-18-generic #34-Ubuntu SMP Wed Apr 6 14:01:02 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I seem to be able to work on old ssh sessions, but new sessions get
stuck as well - need to prepare more next time :-)

Next Steps:
 - analyze code pointed out by hangs

You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

  Network tools like "ethtool" or "ip" freezes when DPDK Apps are
  running with VirtIO

Status in dpdk package in Ubuntu:
Status in linux package in Ubuntu:

Bug description:

   I'm facing an issue here with both "ethtool" and "ip", while trying
  to manage black-listed by DPDK PCI VirtIO devices.

   You'll need an Ubuntu Xenial KVM guest, with 4 VirtIO vNIC cards, to
  run those tests

   PCI device example from inside a Xenial guest:

  # lspci | grep Ethernet
  00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:06.0 Ethernet controller: Red Hat, Inc Virtio network device

  Where "ens3" is the first / default interface, attached to Libvirt's
  "default" network. The "ens4" is reserved for "ethtool / ip" tests
  (attached to another Libvirt's network without IPs or DHCP), "ens5"
  will be "dpdk0" and "ens6" "dpdk1"...

   *** How it works?

   1- For example, try to enable multi-queue on DPDK's devices, boot
  your Xenial guest, and run:

   ethtool -L ens5 combined 4
   ethtool -L ens6 combined 4

   2- Install openvswitch-switch-dpdk configure DPDK and OVS and fire it


   service openvswitch-switch stop
   service dpdk stop

   OVS DPDK Options (/etc/default/openvswitch-switch):

  DPDK_OPTS='--dpdk -c 0x1 -n 4 --socket-mem 1024 --pci-blacklist 0000:00:03.0,0000:00:04.0'

   service dpdk start
   service openvswitch-switch start

   - Enable multi-queue on OVS+DPDK inside of the VM:

   ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xff00

   * Multi-queue apparently works! ovs-vswitchd consumes more that 100%
  of CPU, meaning that it multi-queue is there...

   *** Where it fails?

   1- Reboot the VM and try to run ethtool again (or go straight to 2

   ethtool -L ens5 combined 4

   2- Try to fire up ens4:

   ip link set dev ens4 up

   # FAIL! Both commands hangs, consuming 100% of guest's CPU...

   So, it looks like a Linux fault, because it is "allowing" the DPDK
  VirtIO App (a user land App), to interfere with kernel devices in a
  strange way...


  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-18-generic 4.4.0-18.34
  ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
  Uname: Linux 4.4.0-18-generic x86_64
   total 0
   crw-rw---- 1 root audio 116,  1 Apr 14 00:35 seq
   crw-rw---- 1 root audio 116, 33 Apr 14 00:35 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
  Date: Thu Apr 14 01:27:27 2016
  HibernationDevice: RESUME=UUID=833e999c-e066-433c-b8a2-4324bb8d56de
  InstallationDate: Installed on 2016-04-07 (7 days ago)
  InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160406)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=9911604e-353b-491f-a0a9-804724350592 ro
   linux-restricted-modules-4.4.0-18-generic N/A
   linux-backports-modules-4.4.0-18-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/01/2014
  dmi.bios.vendor: SeaBIOS
  dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
  dmi.chassis.type: 1
  dmi.chassis.vendor: QEMU
  dmi.chassis.version: pc-i440fx-wily
  dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-wily:cvnQEMU:ct1:cvrpc-i440fx-wily:
  dmi.product.name: Standard PC (i440FX + PIIX, 1996)
  dmi.product.version: pc-i440fx-wily
  dmi.sys.vendor: QEMU

To manage notifications about this bug go to: