← Back to team overview

kernel-packages team mailing list archive

[Bug 1467912] [NEW] error "ib_dealloc_pd failed" when load unload ib_ipoib module

 

Public bug reported:

When we try to load and unload the ib_ipoib driver we get an error
message.

Steps to reproduce:

load ib_ipoib: modprobe -v ib_ipoib
configure ip: ifconfig ib0 11.135.196.7/16
unload driver: modprobe -rv ib_ipoib

We will see this message in dmesg:

[  709.652944] ib0: ib_dealloc_pd failed

uname output:

root@qa-h-vrt-035:~# uname -a
Linux qa-h-vrt-035 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The following two upstream commits fix this issue:

commit 9ab874b6593045886b699df2bc3ff803d88a9f7c
Author: Doug Ledford <dledford@xxxxxxxxxx>
Date:   Sat Feb 21 19:27:00 2015 -0500

    IB/ipoib: change init sequence ordering
    
    In preparation for using per device work queues, we need to move the
    start of the neighbor thread task to after ipoib_ib_dev_init and move
    the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
    Otherwise we will end up freeing our workqueue with work possibly
    still on it.
    
    Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>


commit 6387d8d5896536b904ba6937fe019a29548e3a86
Author: Doug Ledford <dledford@xxxxxxxxxx>
Date:   Sat Feb 21 19:26:59 2015 -0500

    IB/ipoib: factor out ah flushing
    
    Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
    appropriate times to flush out all remaining ah entries before we shut
    the device down.
    
    Because neighbors and mcast entries can each have a reference on any
    given ah, we must make sure to free all of those first before our ah
    will actually have a 0 refcount and be able to be reaped.
    
    This factoring is needed in preparation for having per-device work
    queues.  The original per-device workqueue code resulted in the following
    error message:
    
    <ibdev>: ib_dealloc_pd failed
    
    That error was tracked down to this issue.  With the changes to which
    workqueues were flushed when, there were no flushes of the per device
    workqueue after the last ah's were freed, resulting in an attempt to
    dealloc the pd with outstanding resources still allocated.  This code
    puts the explicit flushes in the needed places to avoid that problem.
    
    Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: vivid

** Description changed:

  When we try to load and unload the ib_ipoib driver we get an error
  message.
  
  Steps to reproduce:
  
  load ib_ipoib: modprobe -v ib_ipoib
  configure ip: ifconfig ib0 11.135.196.7/16
  unload driver: modprobe -rv ib_ipoib
  
  We will see this message in dmesg:
  
  [  709.652944] ib0: ib_dealloc_pd failed
  
  uname output:
  
  root@qa-h-vrt-035:~# uname -a
  Linux qa-h-vrt-035 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  
- 
- The following two upstream commits fix issue:
+ The following two upstream commits fix this issue:
  
  9ab874b IB/ipoib: change init sequence ordering
  6387d8d IB/ipoib: factor out ah flushing

** Description changed:

  When we try to load and unload the ib_ipoib driver we get an error
  message.
  
  Steps to reproduce:
  
  load ib_ipoib: modprobe -v ib_ipoib
  configure ip: ifconfig ib0 11.135.196.7/16
  unload driver: modprobe -rv ib_ipoib
  
  We will see this message in dmesg:
  
  [  709.652944] ib0: ib_dealloc_pd failed
  
  uname output:
  
  root@qa-h-vrt-035:~# uname -a
  Linux qa-h-vrt-035 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  
  The following two upstream commits fix this issue:
  
- 9ab874b IB/ipoib: change init sequence ordering
- 6387d8d IB/ipoib: factor out ah flushing
+ commit 9ab874b6593045886b699df2bc3ff803d88a9f7c
+ Author: Doug Ledford <dledford@xxxxxxxxxx>
+ Date:   Sat Feb 21 19:27:00 2015 -0500
+ 
+     IB/ipoib: change init sequence ordering
+     
+     In preparation for using per device work queues, we need to move the
+     start of the neighbor thread task to after ipoib_ib_dev_init and move
+     the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
+     Otherwise we will end up freeing our workqueue with work possibly
+     still on it.
+     
+     Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>
+ 
+ 
+ commit 6387d8d5896536b904ba6937fe019a29548e3a86
+ Author: Doug Ledford <dledford@xxxxxxxxxx>
+ Date:   Sat Feb 21 19:26:59 2015 -0500
+ 
+     IB/ipoib: factor out ah flushing
+     
+     Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
+     appropriate times to flush out all remaining ah entries before we shut
+     the device down.
+     
+     Because neighbors and mcast entries can each have a reference on any
+     given ah, we must make sure to free all of those first before our ah
+     will actually have a 0 refcount and be able to be reaped.
+     
+     This factoring is needed in preparation for having per-device work
+     queues.  The original per-device workqueue code resulted in the following
+     error message:
+     
+     <ibdev>: ib_dealloc_pd failed
+     
+     That error was tracked down to this issue.  With the changes to which
+     workqueues were flushed when, there were no flushes of the per device
+     workqueue after the last ah's were freed, resulting in an attempt to
+     dealloc the pd with outstanding resources still allocated.  This code
+     puts the explicit flushes in the needed places to avoid that problem.
+     
+     Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1467912

Title:
  error "ib_dealloc_pd failed" when load unload ib_ipoib module

Status in linux package in Ubuntu:
  New

Bug description:
  When we try to load and unload the ib_ipoib driver we get an error
  message.

  Steps to reproduce:

  load ib_ipoib: modprobe -v ib_ipoib
  configure ip: ifconfig ib0 11.135.196.7/16
  unload driver: modprobe -rv ib_ipoib

  We will see this message in dmesg:

  [  709.652944] ib0: ib_dealloc_pd failed

  uname output:

  root@qa-h-vrt-035:~# uname -a
  Linux qa-h-vrt-035 3.19.0-16-generic #16-Ubuntu SMP Thu Apr 30 16:09:58 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

  The following two upstream commits fix this issue:

  commit 9ab874b6593045886b699df2bc3ff803d88a9f7c
  Author: Doug Ledford <dledford@xxxxxxxxxx>
  Date:   Sat Feb 21 19:27:00 2015 -0500

      IB/ipoib: change init sequence ordering
      
      In preparation for using per device work queues, we need to move the
      start of the neighbor thread task to after ipoib_ib_dev_init and move
      the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
      Otherwise we will end up freeing our workqueue with work possibly
      still on it.
      
      Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>


  commit 6387d8d5896536b904ba6937fe019a29548e3a86
  Author: Doug Ledford <dledford@xxxxxxxxxx>
  Date:   Sat Feb 21 19:26:59 2015 -0500

      IB/ipoib: factor out ah flushing
      
      Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
      appropriate times to flush out all remaining ah entries before we shut
      the device down.
      
      Because neighbors and mcast entries can each have a reference on any
      given ah, we must make sure to free all of those first before our ah
      will actually have a 0 refcount and be able to be reaped.
      
      This factoring is needed in preparation for having per-device work
      queues.  The original per-device workqueue code resulted in the following
      error message:
      
      <ibdev>: ib_dealloc_pd failed
      
      That error was tracked down to this issue.  With the changes to which
      workqueues were flushed when, there were no flushes of the per device
      workqueue after the last ah's were freed, resulting in an attempt to
      dealloc the pd with outstanding resources still allocated.  This code
      puts the explicit flushes in the needed places to avoid that problem.
      
      Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467912/+subscriptions


Follow ups

References