← Back to team overview

kernel-packages team mailing list archive

[Bug 1476244] Re: openvswitch, ppc64el: oops when calling kmem_cache_free from flow_free

 

** Description changed:

  [Impact]
  
  Users of openvswitch on ppc64el 4.1+ kernels may run into the following
  kernel oops:
  
- [  168.987013] Faulting instruction address: 0xc000000000291d60
- [  168.987032] Oops: Kernel access of bad area, sig: 11 [#1]
- [  168.987086] SMP NR_CPUS=2048 NUMA PowerNV
- [  168.987134] Modules linked in: veth openvswitch libcrc32c ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs xor raid6_pq powernv_rng uio_pdrv_genirq uio autofs4 ses enclosure ipr
- [  168.987473] CPU: 16 PID: 996 Comm: kworker/16:1 Not tainted 4.1.0-1-generic #1~rc2-Ubuntu
- [  168.987546] Workqueue: events od_dbs_timer
- [  168.987592] task: c000000fe49ea400 ti: c000000fe1380000 task.ti: c000000fe1380000
- [  168.987659] NIP: c000000000291d60 LR: c000000000292254 CTR: c000000000292140
- [  168.987725] REGS: c000000fe1383170 TRAP: 0300   Not tainted  (4.1.0-1-generic)
- [  168.987791] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28028082  XER: 20000000
- [  168.987959] CFAR: c000000000008468 DAR: 00000000ffffffff DSISR: 42000000 SOFTE: 1 
- GPR00: c000000000292254 c000000fe13833f0 c0000000014bda00 c000000ff701f800 
- GPR04: f0000000003fffc0 00000000ffffffff d0000000120cd694 0000000000002fb2 
- GPR08: 0000000000000000 0000000000000000 0000000000210d00 d0000000120d2520 
- GPR12: c000000000292140 c00000000fb89000 c0000000000de5f8 000000000000000a 
- GPR16: c000000febbe3828 0000000000000001 0000000000000000 c0000000013d0c80 
- GPR20: c000000000aa36f8 7fffffffffffffff 0000000000000000 0000000000000001 
- GPR24: c0000000013c9200 0000000000210d00 00000000ffffffff c000000ff701f800 
- GPR28: 0000000000000001 0000000000000000 0000000000000000 f0000000003fffc0 
- [  168.988845] NIP [c000000000291d60] __slab_free+0x90/0x470
- [  168.988891] LR [c000000000292254] kmem_cache_free+0x114/0x2d0
- [  168.988947] Call Trace:
- [  168.988971] [c000000fe13833f0] [c000000fe1383500] 0xc000000fe1383500 (unreliable)
- [  168.989049] [c000000fe13834f0] [c000000000292254] kmem_cache_free+0x114/0x2d0
- [  168.989118] [c000000fe1383570] [d0000000120cd694] flow_free+0xa4/0x120 [openvswitch]
- [  168.989196] [c000000fe13835b0] [c000000000136090] rcu_process_callbacks+0x360/0x730
- [  168.989275] [c000000fe1383660] [c0000000000b824c] __do_softirq+0x19c/0x3b0
- [  168.989342] [c000000fe1383760] [c0000000000b86d8] irq_exit+0xc8/0x100
- [  168.989409] [c000000fe1383780] [c00000000003ed78] doorbell_exception+0xa8/0xe0
- [  168.989488] [c000000fe13837b0] [c000000000003314] h_doorbell_common+0x114/0x180
- [  168.989567] --- interrupt: e81 at osq_lock+0xb8/0x1f0
- [  168.989567]     LR = mutex_optimistic_spin+0xdc/0x280
- [  168.989656] [c000000fe1383aa0] [c00000000013e280] add_timer_on+0xc0/0x160 (unreliable)
- [  168.989734] [c000000fe1383ad0] [c000000000116f4c] mutex_optimistic_spin+0xdc/0x280
- [  168.989813] [c000000fe1383b30] [c000000000a60d04] __mutex_lock_slowpath+0x54/0x1f0
- [  168.989891] [c000000fe1383bb0] [c000000000a60f18] mutex_lock+0x78/0xa0
- [  168.989959] [c000000fe1383be0] [c0000000008965bc] od_dbs_timer+0x7c/0x1d0
- [  168.990026] [c000000fe1383c50] [c0000000000d6434] process_one_work+0x1a4/0x4c0
- [  168.990104] [c000000fe1383ce0] [c0000000000d68e4] worker_thread+0x194/0x600
- [  168.990171] [c000000fe1383d80] [c0000000000de700] kthread+0x110/0x130
- [  168.990239] [c000000fe1383e30] [c0000000000094f4] ret_from_kernel_thread+0x5c/0x68
- [  168.990316] Instruction dump:
- [  168.990349] 614a0d00 7d484838 2fa80000 409e029c 3f200021 3b800001 63390d00 408200a4 
- [  168.990461] e93b0022 82ff0018 ebdf0010 92e10078 <7fda492a> a1210078 3929ffff 79290420 
- [  168.990618] ---[ end trace bd509c1e05c7f71f ]---
+ Faulting instruction address: 0xc000000000291d60
+ Oops: Kernel access of bad area, sig: 11 [#1]
+ SMP NR_CPUS=2048 NUMA PowerNV
+ Modules linked in: veth openvswitch libcrc32c ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs xor raid6_pq powernv_rng uio_pdrv_genirq uio autofs4 ses enclosure ipr
+ CPU: 16 PID: 996 Comm: kworker/16:1 Not tainted 4.1.0-1-generic #1~rc2-Ubuntu
+ Workqueue: events od_dbs_timer
+ task: c000000fe49ea400 ti: c000000fe1380000 task.ti: c000000fe1380000
+ NIP: c000000000291d60 LR: c000000000292254 CTR: c000000000292140
+ REGS: c000000fe1383170 TRAP: 0300   Not tainted  (4.1.0-1-generic)
+ MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28028082  XER: 20000000
+ CFAR: c000000000008468 DAR: 00000000ffffffff DSISR: 42000000 SOFTE: 1
+ GPR00: c000000000292254 c000000fe13833f0 c0000000014bda00 c000000ff701f800
+ GPR04: f0000000003fffc0 00000000ffffffff d0000000120cd694 0000000000002fb2
+ GPR08: 0000000000000000 0000000000000000 0000000000210d00 d0000000120d2520
+ GPR12: c000000000292140 c00000000fb89000 c0000000000de5f8 000000000000000a
+ GPR16: c000000febbe3828 0000000000000001 0000000000000000 c0000000013d0c80
+ GPR20: c000000000aa36f8 7fffffffffffffff 0000000000000000 0000000000000001
+ GPR24: c0000000013c9200 0000000000210d00 00000000ffffffff c000000ff701f800
+ GPR28: 0000000000000001 0000000000000000 0000000000000000 f0000000003fffc0
+ NIP [c000000000291d60] __slab_free+0x90/0x470
+ LR [c000000000292254] kmem_cache_free+0x114/0x2d0
+ Call Trace:
+ [c000000fe13833f0] [c000000fe1383500] 0xc000000fe1383500 (unreliable)
+ [c000000fe13834f0] [c000000000292254] kmem_cache_free+0x114/0x2d0
+ [c000000fe1383570] [d0000000120cd694] flow_free+0xa4/0x120 [openvswitch]
+ [c000000fe13835b0] [c000000000136090] rcu_process_callbacks+0x360/0x730
+ [c000000fe1383660] [c0000000000b824c] __do_softirq+0x19c/0x3b0
+ [c000000fe1383760] [c0000000000b86d8] irq_exit+0xc8/0x100
+ [c000000fe1383780] [c00000000003ed78] doorbell_exception+0xa8/0xe0
+ [c000000fe13837b0] [c000000000003314] h_doorbell_common+0x114/0x180
+ --- interrupt: e81 at osq_lock+0xb8/0x1f0
+     LR = mutex_optimistic_spin+0xdc/0x280
+ [c000000fe1383aa0] [c00000000013e280] add_timer_on+0xc0/0x160 (unreliable)
+ [c000000fe1383ad0] [c000000000116f4c] mutex_optimistic_spin+0xdc/0x280
+ [c000000fe1383b30] [c000000000a60d04] __mutex_lock_slowpath+0x54/0x1f0
+ [c000000fe1383bb0] [c000000000a60f18] mutex_lock+0x78/0xa0
+ [c000000fe1383be0] [c0000000008965bc] od_dbs_timer+0x7c/0x1d0
+ [c000000fe1383c50] [c0000000000d6434] process_one_work+0x1a4/0x4c0
+ [c000000fe1383ce0] [c0000000000d68e4] worker_thread+0x194/0x600
+ [c000000fe1383d80] [c0000000000de700] kthread+0x110/0x130
+ [c000000fe1383e30] [c0000000000094f4] ret_from_kernel_thread+0x5c/0x68
+ Instruction dump:
+ 614a0d00 7d484838 2fa80000 409e029c 3f200021 3b800001 63390d00 408200a4
+ e93b0022 82ff0018 ebdf0010 92e10078 <7fda492a> a1210078 3929ffff 79290420
+ ---[ end trace bd509c1e05c7f71f ]---
  
- 
- This seems to not happen in VMs running on ppc64el, and doesn't occur on x86_64. I've also tested latest mainline tree and it also occurs.
+ This seems to not happen in VMs running on ppc64el, and doesn't occur on
+ x86_64. I've also tested latest mainline tree and it also occurs.
  
  [Test Case]
  
  # apt-get install openvswitch-switch
  # ip link add type veth peer name testveth0
  # ovs-vsctl add-br integbr

** Description changed:

  [Impact]
  
  Users of openvswitch on ppc64el 4.1+ kernels may run into the following
  kernel oops:
  
  Faulting instruction address: 0xc000000000291d60
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: veth openvswitch libcrc32c ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs xor raid6_pq powernv_rng uio_pdrv_genirq uio autofs4 ses enclosure ipr
  CPU: 16 PID: 996 Comm: kworker/16:1 Not tainted 4.1.0-1-generic #1~rc2-Ubuntu
  Workqueue: events od_dbs_timer
  task: c000000fe49ea400 ti: c000000fe1380000 task.ti: c000000fe1380000
  NIP: c000000000291d60 LR: c000000000292254 CTR: c000000000292140
  REGS: c000000fe1383170 TRAP: 0300   Not tainted  (4.1.0-1-generic)
  MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28028082  XER: 20000000
  CFAR: c000000000008468 DAR: 00000000ffffffff DSISR: 42000000 SOFTE: 1
  GPR00: c000000000292254 c000000fe13833f0 c0000000014bda00 c000000ff701f800
  GPR04: f0000000003fffc0 00000000ffffffff d0000000120cd694 0000000000002fb2
  GPR08: 0000000000000000 0000000000000000 0000000000210d00 d0000000120d2520
  GPR12: c000000000292140 c00000000fb89000 c0000000000de5f8 000000000000000a
  GPR16: c000000febbe3828 0000000000000001 0000000000000000 c0000000013d0c80
  GPR20: c000000000aa36f8 7fffffffffffffff 0000000000000000 0000000000000001
  GPR24: c0000000013c9200 0000000000210d00 00000000ffffffff c000000ff701f800
  GPR28: 0000000000000001 0000000000000000 0000000000000000 f0000000003fffc0
  NIP [c000000000291d60] __slab_free+0x90/0x470
  LR [c000000000292254] kmem_cache_free+0x114/0x2d0
  Call Trace:
  [c000000fe13833f0] [c000000fe1383500] 0xc000000fe1383500 (unreliable)
  [c000000fe13834f0] [c000000000292254] kmem_cache_free+0x114/0x2d0
  [c000000fe1383570] [d0000000120cd694] flow_free+0xa4/0x120 [openvswitch]
  [c000000fe13835b0] [c000000000136090] rcu_process_callbacks+0x360/0x730
  [c000000fe1383660] [c0000000000b824c] __do_softirq+0x19c/0x3b0
  [c000000fe1383760] [c0000000000b86d8] irq_exit+0xc8/0x100
  [c000000fe1383780] [c00000000003ed78] doorbell_exception+0xa8/0xe0
  [c000000fe13837b0] [c000000000003314] h_doorbell_common+0x114/0x180
  --- interrupt: e81 at osq_lock+0xb8/0x1f0
-     LR = mutex_optimistic_spin+0xdc/0x280
+     LR = mutex_optimistic_spin+0xdc/0x280
  [c000000fe1383aa0] [c00000000013e280] add_timer_on+0xc0/0x160 (unreliable)
  [c000000fe1383ad0] [c000000000116f4c] mutex_optimistic_spin+0xdc/0x280
  [c000000fe1383b30] [c000000000a60d04] __mutex_lock_slowpath+0x54/0x1f0
  [c000000fe1383bb0] [c000000000a60f18] mutex_lock+0x78/0xa0
  [c000000fe1383be0] [c0000000008965bc] od_dbs_timer+0x7c/0x1d0
  [c000000fe1383c50] [c0000000000d6434] process_one_work+0x1a4/0x4c0
  [c000000fe1383ce0] [c0000000000d68e4] worker_thread+0x194/0x600
  [c000000fe1383d80] [c0000000000de700] kthread+0x110/0x130
  [c000000fe1383e30] [c0000000000094f4] ret_from_kernel_thread+0x5c/0x68
  Instruction dump:
  614a0d00 7d484838 2fa80000 409e029c 3f200021 3b800001 63390d00 408200a4
  e93b0022 82ff0018 ebdf0010 92e10078 <7fda492a> a1210078 3929ffff 79290420
  ---[ end trace bd509c1e05c7f71f ]---
  
  This seems to not happen in VMs running on ppc64el, and doesn't occur on
  x86_64. I've also tested latest mainline tree and it also occurs.
  
+ With numa=off this problem doesn't occur.
+ 
  [Test Case]
  
  # apt-get install openvswitch-switch
  # ip link add type veth peer name testveth0
  # ovs-vsctl add-br integbr

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1476244

Title:
  openvswitch, ppc64el: oops when calling kmem_cache_free from flow_free

Status in linux package in Ubuntu:
  In Progress

Bug description:
  [Impact]

  Users of openvswitch on ppc64el 4.1+ kernels may run into the
  following kernel oops:

  Faulting instruction address: 0xc000000000291d60
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: veth openvswitch libcrc32c ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs xor raid6_pq powernv_rng uio_pdrv_genirq uio autofs4 ses enclosure ipr
  CPU: 16 PID: 996 Comm: kworker/16:1 Not tainted 4.1.0-1-generic #1~rc2-Ubuntu
  Workqueue: events od_dbs_timer
  task: c000000fe49ea400 ti: c000000fe1380000 task.ti: c000000fe1380000
  NIP: c000000000291d60 LR: c000000000292254 CTR: c000000000292140
  REGS: c000000fe1383170 TRAP: 0300   Not tainted  (4.1.0-1-generic)
  MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28028082  XER: 20000000
  CFAR: c000000000008468 DAR: 00000000ffffffff DSISR: 42000000 SOFTE: 1
  GPR00: c000000000292254 c000000fe13833f0 c0000000014bda00 c000000ff701f800
  GPR04: f0000000003fffc0 00000000ffffffff d0000000120cd694 0000000000002fb2
  GPR08: 0000000000000000 0000000000000000 0000000000210d00 d0000000120d2520
  GPR12: c000000000292140 c00000000fb89000 c0000000000de5f8 000000000000000a
  GPR16: c000000febbe3828 0000000000000001 0000000000000000 c0000000013d0c80
  GPR20: c000000000aa36f8 7fffffffffffffff 0000000000000000 0000000000000001
  GPR24: c0000000013c9200 0000000000210d00 00000000ffffffff c000000ff701f800
  GPR28: 0000000000000001 0000000000000000 0000000000000000 f0000000003fffc0
  NIP [c000000000291d60] __slab_free+0x90/0x470
  LR [c000000000292254] kmem_cache_free+0x114/0x2d0
  Call Trace:
  [c000000fe13833f0] [c000000fe1383500] 0xc000000fe1383500 (unreliable)
  [c000000fe13834f0] [c000000000292254] kmem_cache_free+0x114/0x2d0
  [c000000fe1383570] [d0000000120cd694] flow_free+0xa4/0x120 [openvswitch]
  [c000000fe13835b0] [c000000000136090] rcu_process_callbacks+0x360/0x730
  [c000000fe1383660] [c0000000000b824c] __do_softirq+0x19c/0x3b0
  [c000000fe1383760] [c0000000000b86d8] irq_exit+0xc8/0x100
  [c000000fe1383780] [c00000000003ed78] doorbell_exception+0xa8/0xe0
  [c000000fe13837b0] [c000000000003314] h_doorbell_common+0x114/0x180
  --- interrupt: e81 at osq_lock+0xb8/0x1f0
      LR = mutex_optimistic_spin+0xdc/0x280
  [c000000fe1383aa0] [c00000000013e280] add_timer_on+0xc0/0x160 (unreliable)
  [c000000fe1383ad0] [c000000000116f4c] mutex_optimistic_spin+0xdc/0x280
  [c000000fe1383b30] [c000000000a60d04] __mutex_lock_slowpath+0x54/0x1f0
  [c000000fe1383bb0] [c000000000a60f18] mutex_lock+0x78/0xa0
  [c000000fe1383be0] [c0000000008965bc] od_dbs_timer+0x7c/0x1d0
  [c000000fe1383c50] [c0000000000d6434] process_one_work+0x1a4/0x4c0
  [c000000fe1383ce0] [c0000000000d68e4] worker_thread+0x194/0x600
  [c000000fe1383d80] [c0000000000de700] kthread+0x110/0x130
  [c000000fe1383e30] [c0000000000094f4] ret_from_kernel_thread+0x5c/0x68
  Instruction dump:
  614a0d00 7d484838 2fa80000 409e029c 3f200021 3b800001 63390d00 408200a4
  e93b0022 82ff0018 ebdf0010 92e10078 <7fda492a> a1210078 3929ffff 79290420
  ---[ end trace bd509c1e05c7f71f ]---

  This seems to not happen in VMs running on ppc64el, and doesn't occur
  on x86_64. I've also tested latest mainline tree and it also occurs.

  With numa=off this problem doesn't occur.

  [Test Case]

  # apt-get install openvswitch-switch
  # ip link add type veth peer name testveth0
  # ovs-vsctl add-br integbr

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1476244/+subscriptions


References