← Back to team overview

kernel-packages team mailing list archive

[Bug 1570195] Re: Network tools like "ethtool" or "ip" freezes when DPDK Apps are running with VirtIO

 

Appears running:
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0 26330 26263  20   0   7588   980 -      R+   pts/2     33:52 \_ ethtool -L eth1 combined 3

All that touches it seems to get affected, so e.g. a ltrace/strace get
stuck as well.

Meanwhile the log on virsh console of the guest goes towards soft lockups:
[  568.394870] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ethtool:26330]
[  575.418868] INFO: rcu_sched self-detected stall on CPU
[  575.419674]  0-...: (14999 ticks this GP) idle=66d/140000000000001/0 softirq=21127/21127 fqs=14994 
[  575.420779]   (t=15000 jiffies g=11093 c=11092 q=9690)

More Info in the journal:
 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ethtool:26330]
 Modules linked in: openvswitch nf_defrag_ipv6 nf_conntrack isofs ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul parport_pc parport joydev serio_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd floppy
 CPU: 0 PID: 26330 Comm: ethtool Not tainted 4.4.0-18-generic #34-Ubuntu
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
 task: ffff8801b747d280 ti: ffff8800ba58c000 task.ti: ffff8800ba58c000
 RIP: 0010:[<ffffffff815f1a43>]  [<ffffffff815f1a43>] virtnet_send_command+0xf3/0x150
 RSP: 0018:ffff8800ba58fb60  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: ffff8800bba62840 RCX: ffff8801b64a9000
 RDX: 000000000000c010 RSI: ffff8800ba58fb64 RDI: ffff8800bba6c400
 RBP: ffff8800ba58fbf8 R08: 0000000000000004 R09: ffff8801b9001b00
 R10: ffff8801b671b080 R11: 0000000000000246 R12: 0000000000000002
 R13: ffff8800ba58fb88 R14: 0000000000000000 R15: 0000000000000004
 FS:  00007fb57d56c700(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb57cd7b680 CR3: 00000000ba85a000 CR4: 00000000001406f0
 Stack:
  ffff8800ba58fc28 ffffea0002ee9882 0000000200000940 0000000000000000
  0000000000000000 ffffea0002ee9882 0000000100000942 0000000000000000
  0000000000000000 ffff8800ba58fb68 ffff8800ba58fc10 ffff8800ba58fb88
 Call Trace:
  [<ffffffff815f1d9a>] virtnet_set_queues+0x9a/0x100
  [<ffffffff815f1e52>] virtnet_set_channels+0x52/0xa0
  [<ffffffff8171fc3c>] ethtool_set_channels+0xfc/0x140
  [<ffffffff81720afd>] dev_ethtool+0x40d/0x1d70
  [<ffffffff811cafc5>] ? page_add_file_rmap+0x25/0x60
  [<ffffffff8172f8d5>] ? __rtnl_unlock+0x15/0x20
  [<ffffffff8171ec61>] ? netdev_run_todo+0x61/0x320
  [<ffffffff8118d8a9>] ? unlock_page+0x69/0x70
  [<ffffffff81733b42>] dev_ioctl+0x182/0x580
  [<ffffffff811bf9f4>] ? handle_mm_fault+0xe44/0x1820
  [<ffffffff816fb932>] sock_do_ioctl+0x42/0x50
  [<ffffffff816fbe32>] sock_ioctl+0x1d2/0x290
  [<ffffffff8121ff9f>] do_vfs_ioctl+0x29f/0x490
  [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400
  [<ffffffff81220209>] SyS_ioctl+0x79/0x90
  [<ffffffff818243b2>] entry_SYSCALL_64_fastpath+0x16/0x71
 Code: 44 89 e2 4c 89 6c c5 b0 e8 3b dc ec ff 48 8b 7b 08 e8 f2 db ec ff 84 c0 75 11 eb 24 48 8b 7b 08 e8 d3 d6 ec ff 84 c0 75 17 f3 90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 4d e0 ec ff 48 85 c0 74 dc

Sometimes there is this on top
 [<ffffffff815f1a53>] ? virtnet_send_command+0x103/0x150

Need to check if there is a loop in virtnet_set_queues that could call
virtnet_send_command infinitely.

Being stuck in the kernel explains why signals and traces can't attach.

Note - we are already on todays: 
Linux guest-virtio-dpdk 4.4.0-18-generic #34-Ubuntu SMP Wed Apr 6 14:01:02 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I seem to be able to work on old ssh sessions, but new sessions get
stuck as well - need to prepare more next time :-)

Next Steps:
 - analyze code pointed out by hangs

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1570195

Title:
  Network tools like "ethtool" or "ip" freezes when DPDK Apps are
  running with VirtIO

Status in dpdk package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Guys,

   I'm facing an issue here with both "ethtool" and "ip", while trying
  to manage black-listed by DPDK PCI VirtIO devices.

   You'll need an Ubuntu Xenial KVM guest, with 4 VirtIO vNIC cards, to
  run those tests

   PCI device example from inside a Xenial guest:

  ---
  # lspci | grep Ethernet
  00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
  00:06.0 Ethernet controller: Red Hat, Inc Virtio network device
  ---

  Where "ens3" is the first / default interface, attached to Libvirt's
  "default" network. The "ens4" is reserved for "ethtool / ip" tests
  (attached to another Libvirt's network without IPs or DHCP), "ens5"
  will be "dpdk0" and "ens6" "dpdk1"...

  ---
   *** How it works?

   1- For example, try to enable multi-queue on DPDK's devices, boot
  your Xenial guest, and run:

   ethtool -L ens5 combined 4
   ethtool -L ens6 combined 4

   2- Install openvswitch-switch-dpdk configure DPDK and OVS and fire it
  up.

   https://help.ubuntu.com/16.04/serverguide/DPDK.html

   service openvswitch-switch stop
   service dpdk stop

   OVS DPDK Options (/etc/default/openvswitch-switch):

  --
  DPDK_OPTS='--dpdk -c 0x1 -n 4 --socket-mem 1024 --pci-blacklist 0000:00:03.0,0000:00:04.0'
  --

   service dpdk start
   service openvswitch-switch start

   - Enable multi-queue on OVS+DPDK inside of the VM:

   ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=4
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xff00

   * Multi-queue apparently works! ovs-vswitchd consumes more that 100%
  of CPU, meaning that it multi-queue is there...

   *** Where it fails?

   1- Reboot the VM and try to run ethtool again (or go straight to 2
  below):

   ethtool -L ens5 combined 4

   2- Try to fire up ens4:

   ip link set dev ens4 up

  
   # FAIL! Both commands hangs, consuming 100% of guest's CPU...

   So, it looks like a Linux fault, because it is "allowing" the DPDK
  VirtIO App (a user land App), to interfere with kernel devices in a
  strange way...

  Best,
  Thiago

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-18-generic 4.4.0-18.34
  ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
  Uname: Linux 4.4.0-18-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Apr 14 00:35 seq
   crw-rw---- 1 root audio 116, 33 Apr 14 00:35 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu1
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
  CRDA: N/A
  Date: Thu Apr 14 01:27:27 2016
  HibernationDevice: RESUME=UUID=833e999c-e066-433c-b8a2-4324bb8d56de
  InstallationDate: Installed on 2016-04-07 (7 days ago)
  InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160406)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
  PciMultimedia:
   
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-18-generic root=UUID=9911604e-353b-491f-a0a9-804724350592 ro
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-18-generic N/A
   linux-backports-modules-4.4.0-18-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/01/2014
  dmi.bios.vendor: SeaBIOS
  dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
  dmi.chassis.type: 1
  dmi.chassis.vendor: QEMU
  dmi.chassis.version: pc-i440fx-wily
  dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-wily:cvnQEMU:ct1:cvrpc-i440fx-wily:
  dmi.product.name: Standard PC (i440FX + PIIX, 1996)
  dmi.product.version: pc-i440fx-wily
  dmi.sys.vendor: QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1570195/+subscriptions


References