← Back to team overview

kernel-packages team mailing list archive

[Bug 1557172] Re: khubd/usbhid deadlock(?) creates processes in state D

 

Hi Joseph

I'll see if I can nab some extra hardware to try it out. Keep in mind
that I haven't found a way to reproduce this reliably yet, with our
usual setup it can take a few weeks before it happens. Basically we have
a cron job that periodically exports the state of our machines, which
includes calling lsusb. Eventually this bug happens, all future calls to
the job get stuck in D, and the machine eventually runs out of
resources, requiring it to be rebooted.

I did browse kernel sources upstream and didn't notice anything that
would appear to fix this issue, but I'm certainly no kernel-guru.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1557172

Title:
  khubd/usbhid deadlock(?) creates processes in state D

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  (Note that the attached logs are not a machine exhibiting the issue)

  I've observed this issue specifically on Dell's PowerEdge R410 with
  its DRAC. I manage quite a few other machines with DRACs and haven't
  seen this issue so it may be limited to the R410. I've seen cases of
  this both with the precise and trusty kernel.

  Basically, what happens is that things that read any info about the
  DRAC's usbhid device will block in an uninterruptible state. A quick
  way to do this is with "cat
  /sys/devices/pci0000:00/0000:00:1d.0/usb5/5-2/manufacturer". This
  means that commands like lspci will block as well. The thing is that
  it doesn't happen every time, but once it happens the first time all
  future reads on the device will block in state D too.

  This is also associated with the following from the kernel (first from a precise machine):
  [197852.595820] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197857.716899] usb 5-2: device descriptor read/64, error -71
  [197857.944966] usb 5-2: device descriptor read/64, error -71
  [197858.165020] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197858.285090] usb 5-2: device descriptor read/64, error -71
  [197858.513108] usb 5-2: device descriptor read/64, error -71
  [197858.733154] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197859.145214] usb 5-2: device not accepting address 2, error -71
  [197859.261234] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197859.669322] usb 5-2: device not accepting address 2, error -71
  [197859.675357] usb 5-2: USB disconnect, device number 2
  [198047.530714] INFO: task khubd:203 blocked for more than 120 seconds.
  [198047.537147]       Not tainted 3.13.0-74-generic #118~precise1-Ubuntu
  [198047.543691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [198047.551792] khubd           D ffff880803fe18a0     0   203      2 0x00000000
  [198047.551803]  ffff8810036b1c18 0000000000000046 ffff88102fc53180 ffff8810036b1fd8
  [198047.551809]  0000000000013180 0000000000013180 ffff881003f7c800 ffff880804634800
  [198047.551814]  ffff8810036b1c38 ffff8810020908e8 ffff8810020908ec 00000000ffffffff
  [198047.551819] Call Trace:
  [198047.551828]  [<ffffffff81764b59>] schedule+0x29/0x70
  [198047.551833]  [<ffffffff81764e7e>] schedule_preempt_disabled+0xe/0x10
  [198047.551837]  [<ffffffff81766cb4>] __mutex_lock_slowpath+0x114/0x1b0
  [198047.551845]  [<ffffffff8156e66e>] ? usb_alloc_urb+0x1e/0x50
  [198047.551848]  [<ffffffff81766d73>] mutex_lock+0x23/0x37
  [198047.551852]  [<ffffffff8156597a>] usb_disconnect+0x5a/0x210
  [198047.551857]  [<ffffffff8156ff6a>] ? usb_control_msg+0xea/0x110
  [198047.551860]  [<ffffffff8156833f>] hub_port_connect_change+0xcf/0x9c0
  [198047.551864]  [<ffffffff81562715>] ? hub_port_status+0xd5/0x120
  [198047.551868]  [<ffffffff81569094>] hub_events+0x464/0x8c0
  [198047.551871]  [<ffffffff8176472e>] ? __schedule+0x38e/0x700
  [198047.551875]  [<ffffffff81569525>] hub_thread+0x35/0x150
  [198047.551881]  [<ffffffff810b0410>] ? __wake_up_sync+0x20/0x20
  [198047.551885]  [<ffffffff815694f0>] ? hub_events+0x8c0/0x8c0
  [198047.551890]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [198047.551893]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [198047.551898]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [198047.551902]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [198047.551908] INFO: task kworker/9:1:246 blocked for more than 120 seconds.
  [198047.558892]       Not tainted 3.13.0-74-generic #118~precise1-Ubuntu
  [198047.565491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [198047.573524] kworker/9:1     D ffff880803fe18a0     0   246      2 0x00000000
  [198047.573547] Workqueue: events hid_reset [usbhid]
  [198047.573550]  ffff8808011f9788 0000000000000046 ffff88080fc93180 ffff8808011f9fd8
  [198047.573555]  0000000000013180 0000000000013180 ffff881003f94800 ffff8808011f0000
  [198047.573559]  ffff8808011f9798 ffff8808011f98d8 7fffffffffffffff 7fffffffffffffff
  [198047.573564] Call Trace:
  [198047.573579]  [<ffffffff81764b59>] schedule+0x29/0x70
  [198047.573582]  [<ffffffff81763e25>] schedule_timeout+0x1e5/0x250
  [198047.573587]  [<ffffffff810a4b1d>] ? wake_affine+0x16d/0x2d0
  [198047.573591]  [<ffffffff81765df7>] wait_for_completion+0xa7/0x160
  [198047.573596]  [<ffffffff8109fe00>] ? try_to_wake_up+0x210/0x210
  [198047.573602]  [<ffffffff81089509>] flush_work+0x29/0x40
  [198047.573605]  [<ffffffff81085170>] ? start_worker+0x40/0x40
  [198047.573609]  [<ffffffff8108962c>] __cancel_work_timer+0x9c/0x1b0
  [198047.573623]  [<ffffffff81089770>] cancel_work_sync+0x10/0x20
  [198047.573632]  [<ffffffffa00a6a19>] hid_cancel_delayed_stuff+0x29/0x30 [usbhid]
  [198047.573640]  [<ffffffffa00a88cc>] usbhid_close+0xcc/0x110 [usbhid]
  [198047.573650]  [<ffffffffa0036772>] hidinput_close+0x22/0x30 [hid]
  [198047.573659]  [<ffffffff815b3bf1>] input_close_device+0x61/0x90
  [198047.573663]  [<ffffffff815b9b7f>] evdev_cleanup+0xbf/0xd0
  [198047.573670]  [<ffffffff815b9bc6>] evdev_disconnect+0x36/0x70
  [198047.573674]  [<ffffffff815b5b75>] __input_unregister_device+0xc5/0x1a0
  [198047.573678]  [<ffffffff815b5ca5>] input_unregister_device+0x55/0x80
  [198047.573687]  [<ffffffffa0036105>] hidinput_disconnect+0x95/0xc0 [hid]
  [198047.573695]  [<ffffffffa0033548>] hid_disconnect+0x68/0x70 [hid]
  [198047.573708]  [<ffffffffa0033625>] hid_device_remove+0xd5/0xf0 [hid]
  [198047.573720]  [<ffffffff814b7c7f>] __device_release_driver+0x7f/0xf0
  [198047.573724]  [<ffffffff814b7d1c>] device_release_driver+0x2c/0x40
  [198047.573728]  [<ffffffff814b76c4>] bus_remove_device+0x104/0x170
  [198047.573736]  [<ffffffff814b46e8>] device_del+0x118/0x1a0
  [198047.573744]  [<ffffffffa0032af0>] hid_destroy_device+0x30/0x70 [hid]
  [198047.573754]  [<ffffffffa00a6a4b>] usbhid_disconnect+0x2b/0x50 [usbhid]
  [198047.573758]  [<ffffffff81573380>] usb_unbind_interface+0x60/0x1b0
  [198047.573764]  [<ffffffff814b7c7f>] __device_release_driver+0x7f/0xf0
  [198047.573768]  [<ffffffff814b7d1c>] device_release_driver+0x2c/0x40
  [198047.573772]  [<ffffffff81573560>] usb_driver_release_interface+0x90/0xa0
  [198047.573780]  [<ffffffffa00a8c51>] ? hid_post_reset+0x51/0x200 [usbhid]
  [198047.573785]  [<ffffffff815735a7>] usb_forced_unbind_intf+0x37/0x60
  [198047.573788]  [<ffffffff81573629>] unbind_marked_interfaces.isra.8+0x59/0x80
  [198047.573792]  [<ffffffff815737dd>] usb_unbind_and_rebind_marked_interfaces+0x1d/0x30
  [198047.573796]  [<ffffffff81567c1b>] usb_reset_device+0x15b/0x1d0
  [198047.573804]  [<ffffffffa00a7e2f>] hid_reset+0x16f/0x1e0 [usbhid]
  [198047.573808]  [<ffffffff81087c5f>] process_one_work+0x17f/0x4c0
  [198047.573811]  [<ffffffff81088dbb>] worker_thread+0x11b/0x3d0
  [198047.573815]  [<ffffffff81088ca0>] ? manage_workers.isra.21+0x190/0x190
  [198047.573819]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [198047.573823]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [198047.573827]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [198047.573830]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0

  And these from a trusty machine (which includes info from one of the stuck processes too):
  [1367979.659164] SysRq : Show Blocked State
  [1367979.663114]   task                        PC stack   pid father
  [1367979.669300] khubd           D ffff88100054f160     0   203      2 0x00000000
  [1367979.676574]  ffff88100352fc18 0000000000000046 ffff88080fc93180 ffff88100352ffd8
  [1367979.684216]  0000000000013180 0000000000013180 ffff8808049f4800 ffff8808043f4800
  [1367979.691855]  ffff88100352fc38 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
  [1367979.699495] Call Trace:
  [1367979.702134]  [<ffffffff81764b69>] schedule+0x29/0x70
  [1367979.707288]  [<ffffffff81764e8e>] schedule_preempt_disabled+0xe/0x10
  [1367979.713831]  [<ffffffff81766cc4>] __mutex_lock_slowpath+0x114/0x1b0
  [1367979.720288]  [<ffffffff8156e67e>] ? usb_alloc_urb+0x1e/0x50
  [1367979.726049]  [<ffffffff81766d83>] mutex_lock+0x23/0x37
  [1367979.731376]  [<ffffffff8156598a>] usb_disconnect+0x5a/0x210
  [1367979.737138]  [<ffffffff8156ff7a>] ? usb_control_msg+0xea/0x110
  [1367979.743160]  [<ffffffff8156834f>] hub_port_connect_change+0xcf/0x9c0
  [1367979.749703]  [<ffffffff81562725>] ? hub_port_status+0xd5/0x120
  [1367979.755724]  [<ffffffff815690a4>] hub_events+0x464/0x8c0
  [1367979.761223]  [<ffffffff8176473e>] ? __schedule+0x38e/0x700
  [1367979.766897]  [<ffffffff81569535>] hub_thread+0x35/0x150
  [1367979.772313]  [<ffffffff810b0410>] ? __wake_up_sync+0x20/0x20
  [1367979.778160]  [<ffffffff81569500>] ? hub_events+0x8c0/0x8c0
  [1367979.784942]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [1367979.790008]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367979.796378]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [1367979.801965]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367979.808345] kworker/8:1     D ffff88100054f160     0   329      2 0x00000000
  [1367979.815632] Workqueue: events hid_reset [usbhid]
  [1367979.820445]  ffff881002625788 0000000000000046 ffff88102fc93180 ffff881002625fd8
  [1367979.828088]  0000000000013180 0000000000013180 ffff8808049f3000 ffff881002639800
  [1367979.835726]  ffff881002625798 ffff8810026258d8 7fffffffffffffff 7fffffffffffffff
  [1367979.843367] Call Trace:
  [1367979.845999]  [<ffffffff81764b69>] schedule+0x29/0x70
  [1367979.851151]  [<ffffffff81763e35>] schedule_timeout+0x1e5/0x250
  [1367979.857174]  [<ffffffff810aa9cf>] ? enqueue_entity+0x2bf/0x760
  [1367979.863196]  [<ffffffff81765e07>] wait_for_completion+0xa7/0x160
  [1367979.869392]  [<ffffffff8109fe00>] ? try_to_wake_up+0x210/0x210
  [1367979.875414]  [<ffffffff81089509>] flush_work+0x29/0x40
  [1367979.880741]  [<ffffffff81085170>] ? start_worker+0x40/0x40
  [1367979.886415]  [<ffffffff8108962c>] __cancel_work_timer+0x9c/0x1b0
  [1367979.892609]  [<ffffffff81089770>] cancel_work_sync+0x10/0x20
  [1367979.898461]  [<ffffffffa0072a19>] hid_cancel_delayed_stuff+0x29/0x30 [usbhid]
  [1367979.905801]  [<ffffffffa00748cc>] usbhid_close+0xcc/0x110 [usbhid]
  [1367979.912176]  [<ffffffffa0042772>] hidinput_close+0x22/0x30 [hid]
  [1367979.918374]  [<ffffffff815b3c01>] input_close_device+0x61/0x90
  [1367979.924397]  [<ffffffff815b9b8f>] evdev_cleanup+0xbf/0xd0
  [1367979.929983]  [<ffffffff815b9bd6>] evdev_disconnect+0x36/0x70
  [1367979.935832]  [<ffffffff815b5b85>] __input_unregister_device+0xc5/0x1a0
  [1367979.942547]  [<ffffffff815b5cb5>] input_unregister_device+0x55/0x80
  [1367979.949007]  [<ffffffffa0042105>] hidinput_disconnect+0x95/0xc0 [hid]
  [1367979.955638]  [<ffffffffa003f548>] hid_disconnect+0x68/0x70 [hid]
  [1367979.961837]  [<ffffffffa003f625>] hid_device_remove+0xd5/0xf0 [hid]
  [1367979.968298]  [<ffffffff814b7c8f>] __device_release_driver+0x7f/0xf0
  [1367979.974752]  [<ffffffff814b7d2c>] device_release_driver+0x2c/0x40
  [1367979.981095]  [<ffffffff814b76d4>] bus_remove_device+0x104/0x170
  [1367979.987202]  [<ffffffff814b46f8>] device_del+0x118/0x1a0
  [1367979.992707]  [<ffffffffa003eaf0>] hid_destroy_device+0x30/0x70 [hid]
  [1367979.999253]  [<ffffffffa0072a4b>] usbhid_disconnect+0x2b/0x50 [usbhid]
  [1367980.005968]  [<ffffffff81573390>] usb_unbind_interface+0x60/0x1b0
  [1367980.012252]  [<ffffffff814b7c8f>] __device_release_driver+0x7f/0xf0
  [1367980.018710]  [<ffffffff814b7d2c>] device_release_driver+0x2c/0x40
  [1367980.024992]  [<ffffffff81573570>] usb_driver_release_interface+0x90/0xa0
  [1367980.031885]  [<ffffffffa0074c51>] ? hid_post_reset+0x51/0x200 [usbhid]
  [1367980.038603]  [<ffffffff815735b7>] usb_forced_unbind_intf+0x37/0x60
  [1367980.044971]  [<ffffffff81573639>] unbind_marked_interfaces.isra.8+0x59/0x80
  [1367980.052121]  [<ffffffff815737ed>] usb_unbind_and_rebind_marked_interfaces+0x1d/0x30
  [1367980.059981]  [<ffffffff81567c2b>] usb_reset_device+0x15b/0x1d0
  [1367980.066004]  [<ffffffffa0073e2f>] hid_reset+0x16f/0x1e0 [usbhid]
  [1367980.072199]  [<ffffffff81087c5f>] process_one_work+0x17f/0x4c0
  [1367980.078221]  [<ffffffff81088dbb>] worker_thread+0x11b/0x3d0
  [1367980.083982]  [<ffffffff81088ca0>] ? manage_workers.isra.21+0x190/0x190
  [1367980.090699]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [1367980.095765]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367980.102132]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [1367980.107719]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367980.114098] python2.7       D ffff88100054f160     0 18303      1 0x00000004
  [1367980.121369]  ffff881001497d08 0000000000000082 ffff88080fc53180 ffff881001497fd8
  [1367980.129007]  0000000000013180 0000000000013180 ffff881003f86000 ffff881001688000
  [1367980.136648]  0000881000000024 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
  [1367980.144286] Call Trace:
  [1367980.146921]  [<ffffffff81764b69>] schedule+0x29/0x70
  [1367980.152073]  [<ffffffff81764e8e>] schedule_preempt_disabled+0xe/0x10
  [1367980.158615]  [<ffffffff81766cc4>] __mutex_lock_slowpath+0x114/0x1b0
  [1367980.165069]  [<ffffffff81766d83>] mutex_lock+0x23/0x37
  [1367980.170397]  [<ffffffff81576a8b>] manufacturer_show+0x2b/0x60
  [1367980.176331]  [<ffffffff814b4890>] dev_attr_show+0x20/0x60
  [1367980.181918]  [<ffffffff81766d76>] ? mutex_lock+0x16/0x37
  [1367980.187418]  [<ffffffff811eec6b>] ? seq_buf_alloc+0x1b/0x50
  [1367980.193180]  [<ffffffff81242875>] sysfs_seq_show+0x145/0x260
  [1367980.199027]  [<ffffffff811ef3fb>] seq_read+0xfb/0x3c0
  [1367980.204267]  [<ffffffff811cbc8b>] vfs_read+0xab/0x180
  [1367980.209506]  [<ffffffff811cbf12>] SyS_read+0x52/0xa0
  [1367980.214657]  [<ffffffff8177181d>] system_call_fastpath+0x1a/0x1f

  lsusb for the device in question:
  $ sudo lsusb -v -d 0624:0248

  Bus 005 Device 002: ID 0624:0248 Avocent Corp. Virtual Hub
  Device Descriptor:
    bLength                18
    bDescriptorType         1
    bcdUSB               1.10
    bDeviceClass            0 (Defined at Interface level)
    bDeviceSubClass         0 
    bDeviceProtocol         0 
    bMaxPacketSize0        64
    idVendor           0x0624 Avocent Corp.
    idProduct          0x0248 Virtual Hub
    bcdDevice            0.00
    iManufacturer           1 Avocent
    iProduct                2 USB Composite Device-0
    iSerial                 3 20080519
    bNumConfigurations      1
    Configuration Descriptor:
      bLength                 9
      bDescriptorType         2
      wTotalLength           59
      bNumInterfaces          2
      bConfigurationValue     1
      iConfiguration          0 
      bmAttributes         0xc0
        Self Powered
      MaxPower                2mA
      Interface Descriptor:
        bLength                 9
        bDescriptorType         4
        bInterfaceNumber        0
        bAlternateSetting       0
        bNumEndpoints           1
        bInterfaceClass         3 Human Interface Device
        bInterfaceSubClass      1 Boot Interface Subclass
        bInterfaceProtocol      1 Keyboard
        iInterface              4 Keyboard
          HID Device Descriptor:
            bLength                 9
            bDescriptorType        33
            bcdHID               1.00
            bCountryCode            0 Not supported
            bNumDescriptors         1
            bDescriptorType        34 Report
            wDescriptorLength      65
           Report Descriptors: 
             ** UNAVAILABLE **
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x81  EP 1 IN
          bmAttributes            3
            Transfer Type            Interrupt
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0008  1x 8 bytes
          bInterval               8
      Interface Descriptor:
        bLength                 9
        bDescriptorType         4
        bInterfaceNumber        1
        bAlternateSetting       0
        bNumEndpoints           1
        bInterfaceClass         3 Human Interface Device
        bInterfaceSubClass      1 Boot Interface Subclass
        bInterfaceProtocol      2 Mouse
        iInterface              5 Mouse
          HID Device Descriptor:
            bLength                 9
            bDescriptorType        33
            bcdHID               1.00
            bCountryCode            0 Not supported
            bNumDescriptors         1
            bDescriptorType        34 Report
            wDescriptorLength      63
           Report Descriptors: 
             ** UNAVAILABLE **
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x82  EP 2 IN
          bmAttributes            3
            Transfer Type            Interrupt
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0008  1x 8 bytes
          bInterval               8
  Device Status:     0x0000
    (Bus Powered)

  And now for my purely speculative analysis of what might be going on
  after digging through the code...

  There are two devices interacting here, the usbhid that is the
  keyboard for the DRAC and the hub that it's connected to.

  As a part of the usbhid initialization (usbhid_probe) it creates a
  tasklet for hid_reset (I have no idea why, but perhaps it needs to
  reset the underlying hid/usb devices to start working) which unloads
  and reloads all the parts of the hidusb driver (usb, hid, input,
  evdev). As a part of the unload for the usbhid it calls
  hid_cancel_delayed_stuff (don't let it fool you, it actually works
  with a usbhid device, not a hid device) iff it is no longer opened,
  which normally it should already be opened, but...

  I haven't been able to confirm this, but I think that if the hub
  closes it can close the devices associated with it, which gives us a
  fun race condition where if we start the probe for the usbhid device
  and then the underlying hub gets removed it will have no instances
  open, which causes usbhid to call hid_cancel_delayed_stuff, from its
  own cancel tasklet. Unsurprisingly when it does this it ends up
  waiting on itself forever.

  I'm currently working around the issue by adding a udev rule to
  prevent the device from being used by setting authorized=0 for it.
  This obviously prevents the DRAC from actually working, though.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1557172/+subscriptions


References