kernel-packages team mailing list archive
  
  - 
     kernel-packages team kernel-packages team
- 
    Mailing list archive
  
- 
    Message #181372
  
 [Bug 1557172] Re: khubd/usbhid deadlock(?) creates processes in state D
  
I'm running Debian stable with the 4.5 kernel for 8 days without restart, now.
First I did some 5 suspend/resumes without any error. Then I used it as my desktop machine,
doing at least one suspend/resume every day. I'm convinced the issue I had is fixed in the 4.5 kernel.
Attached the difference between the Debian 3.16 and 4.5 hid-core.c (hid-core.c-3.16-to-4.5.diff).
It's quite a bit more than a one-line fix.
** Patch added: "Difference between Debian 3.16 and 4.5 hid-core.c"
   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1557172/+attachment/4673143/+files/hid-core.c-3.16-to-4.5.diff
-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1557172
Title:
  khubd/usbhid deadlock(?) creates processes in state D
Status in linux package in Ubuntu:
  Incomplete
Bug description:
  (Note that the attached logs are not a machine exhibiting the issue)
  I've observed this issue specifically on Dell's PowerEdge R410 with
  its DRAC. I manage quite a few other machines with DRACs and haven't
  seen this issue so it may be limited to the R410. I've seen cases of
  this both with the precise and trusty kernel.
  Basically, what happens is that things that read any info about the
  DRAC's usbhid device will block in an uninterruptible state. A quick
  way to do this is with "cat
  /sys/devices/pci0000:00/0000:00:1d.0/usb5/5-2/manufacturer". This
  means that commands like lspci will block as well. The thing is that
  it doesn't happen every time, but once it happens the first time all
  future reads on the device will block in state D too.
  This is also associated with the following from the kernel (first from a precise machine):
  [197852.595820] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197857.716899] usb 5-2: device descriptor read/64, error -71
  [197857.944966] usb 5-2: device descriptor read/64, error -71
  [197858.165020] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197858.285090] usb 5-2: device descriptor read/64, error -71
  [197858.513108] usb 5-2: device descriptor read/64, error -71
  [197858.733154] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197859.145214] usb 5-2: device not accepting address 2, error -71
  [197859.261234] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
  [197859.669322] usb 5-2: device not accepting address 2, error -71
  [197859.675357] usb 5-2: USB disconnect, device number 2
  [198047.530714] INFO: task khubd:203 blocked for more than 120 seconds.
  [198047.537147]       Not tainted 3.13.0-74-generic #118~precise1-Ubuntu
  [198047.543691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [198047.551792] khubd           D ffff880803fe18a0     0   203      2 0x00000000
  [198047.551803]  ffff8810036b1c18 0000000000000046 ffff88102fc53180 ffff8810036b1fd8
  [198047.551809]  0000000000013180 0000000000013180 ffff881003f7c800 ffff880804634800
  [198047.551814]  ffff8810036b1c38 ffff8810020908e8 ffff8810020908ec 00000000ffffffff
  [198047.551819] Call Trace:
  [198047.551828]  [<ffffffff81764b59>] schedule+0x29/0x70
  [198047.551833]  [<ffffffff81764e7e>] schedule_preempt_disabled+0xe/0x10
  [198047.551837]  [<ffffffff81766cb4>] __mutex_lock_slowpath+0x114/0x1b0
  [198047.551845]  [<ffffffff8156e66e>] ? usb_alloc_urb+0x1e/0x50
  [198047.551848]  [<ffffffff81766d73>] mutex_lock+0x23/0x37
  [198047.551852]  [<ffffffff8156597a>] usb_disconnect+0x5a/0x210
  [198047.551857]  [<ffffffff8156ff6a>] ? usb_control_msg+0xea/0x110
  [198047.551860]  [<ffffffff8156833f>] hub_port_connect_change+0xcf/0x9c0
  [198047.551864]  [<ffffffff81562715>] ? hub_port_status+0xd5/0x120
  [198047.551868]  [<ffffffff81569094>] hub_events+0x464/0x8c0
  [198047.551871]  [<ffffffff8176472e>] ? __schedule+0x38e/0x700
  [198047.551875]  [<ffffffff81569525>] hub_thread+0x35/0x150
  [198047.551881]  [<ffffffff810b0410>] ? __wake_up_sync+0x20/0x20
  [198047.551885]  [<ffffffff815694f0>] ? hub_events+0x8c0/0x8c0
  [198047.551890]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [198047.551893]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [198047.551898]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [198047.551902]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [198047.551908] INFO: task kworker/9:1:246 blocked for more than 120 seconds.
  [198047.558892]       Not tainted 3.13.0-74-generic #118~precise1-Ubuntu
  [198047.565491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [198047.573524] kworker/9:1     D ffff880803fe18a0     0   246      2 0x00000000
  [198047.573547] Workqueue: events hid_reset [usbhid]
  [198047.573550]  ffff8808011f9788 0000000000000046 ffff88080fc93180 ffff8808011f9fd8
  [198047.573555]  0000000000013180 0000000000013180 ffff881003f94800 ffff8808011f0000
  [198047.573559]  ffff8808011f9798 ffff8808011f98d8 7fffffffffffffff 7fffffffffffffff
  [198047.573564] Call Trace:
  [198047.573579]  [<ffffffff81764b59>] schedule+0x29/0x70
  [198047.573582]  [<ffffffff81763e25>] schedule_timeout+0x1e5/0x250
  [198047.573587]  [<ffffffff810a4b1d>] ? wake_affine+0x16d/0x2d0
  [198047.573591]  [<ffffffff81765df7>] wait_for_completion+0xa7/0x160
  [198047.573596]  [<ffffffff8109fe00>] ? try_to_wake_up+0x210/0x210
  [198047.573602]  [<ffffffff81089509>] flush_work+0x29/0x40
  [198047.573605]  [<ffffffff81085170>] ? start_worker+0x40/0x40
  [198047.573609]  [<ffffffff8108962c>] __cancel_work_timer+0x9c/0x1b0
  [198047.573623]  [<ffffffff81089770>] cancel_work_sync+0x10/0x20
  [198047.573632]  [<ffffffffa00a6a19>] hid_cancel_delayed_stuff+0x29/0x30 [usbhid]
  [198047.573640]  [<ffffffffa00a88cc>] usbhid_close+0xcc/0x110 [usbhid]
  [198047.573650]  [<ffffffffa0036772>] hidinput_close+0x22/0x30 [hid]
  [198047.573659]  [<ffffffff815b3bf1>] input_close_device+0x61/0x90
  [198047.573663]  [<ffffffff815b9b7f>] evdev_cleanup+0xbf/0xd0
  [198047.573670]  [<ffffffff815b9bc6>] evdev_disconnect+0x36/0x70
  [198047.573674]  [<ffffffff815b5b75>] __input_unregister_device+0xc5/0x1a0
  [198047.573678]  [<ffffffff815b5ca5>] input_unregister_device+0x55/0x80
  [198047.573687]  [<ffffffffa0036105>] hidinput_disconnect+0x95/0xc0 [hid]
  [198047.573695]  [<ffffffffa0033548>] hid_disconnect+0x68/0x70 [hid]
  [198047.573708]  [<ffffffffa0033625>] hid_device_remove+0xd5/0xf0 [hid]
  [198047.573720]  [<ffffffff814b7c7f>] __device_release_driver+0x7f/0xf0
  [198047.573724]  [<ffffffff814b7d1c>] device_release_driver+0x2c/0x40
  [198047.573728]  [<ffffffff814b76c4>] bus_remove_device+0x104/0x170
  [198047.573736]  [<ffffffff814b46e8>] device_del+0x118/0x1a0
  [198047.573744]  [<ffffffffa0032af0>] hid_destroy_device+0x30/0x70 [hid]
  [198047.573754]  [<ffffffffa00a6a4b>] usbhid_disconnect+0x2b/0x50 [usbhid]
  [198047.573758]  [<ffffffff81573380>] usb_unbind_interface+0x60/0x1b0
  [198047.573764]  [<ffffffff814b7c7f>] __device_release_driver+0x7f/0xf0
  [198047.573768]  [<ffffffff814b7d1c>] device_release_driver+0x2c/0x40
  [198047.573772]  [<ffffffff81573560>] usb_driver_release_interface+0x90/0xa0
  [198047.573780]  [<ffffffffa00a8c51>] ? hid_post_reset+0x51/0x200 [usbhid]
  [198047.573785]  [<ffffffff815735a7>] usb_forced_unbind_intf+0x37/0x60
  [198047.573788]  [<ffffffff81573629>] unbind_marked_interfaces.isra.8+0x59/0x80
  [198047.573792]  [<ffffffff815737dd>] usb_unbind_and_rebind_marked_interfaces+0x1d/0x30
  [198047.573796]  [<ffffffff81567c1b>] usb_reset_device+0x15b/0x1d0
  [198047.573804]  [<ffffffffa00a7e2f>] hid_reset+0x16f/0x1e0 [usbhid]
  [198047.573808]  [<ffffffff81087c5f>] process_one_work+0x17f/0x4c0
  [198047.573811]  [<ffffffff81088dbb>] worker_thread+0x11b/0x3d0
  [198047.573815]  [<ffffffff81088ca0>] ? manage_workers.isra.21+0x190/0x190
  [198047.573819]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [198047.573823]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [198047.573827]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [198047.573830]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  And these from a trusty machine (which includes info from one of the stuck processes too):
  [1367979.659164] SysRq : Show Blocked State
  [1367979.663114]   task                        PC stack   pid father
  [1367979.669300] khubd           D ffff88100054f160     0   203      2 0x00000000
  [1367979.676574]  ffff88100352fc18 0000000000000046 ffff88080fc93180 ffff88100352ffd8
  [1367979.684216]  0000000000013180 0000000000013180 ffff8808049f4800 ffff8808043f4800
  [1367979.691855]  ffff88100352fc38 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
  [1367979.699495] Call Trace:
  [1367979.702134]  [<ffffffff81764b69>] schedule+0x29/0x70
  [1367979.707288]  [<ffffffff81764e8e>] schedule_preempt_disabled+0xe/0x10
  [1367979.713831]  [<ffffffff81766cc4>] __mutex_lock_slowpath+0x114/0x1b0
  [1367979.720288]  [<ffffffff8156e67e>] ? usb_alloc_urb+0x1e/0x50
  [1367979.726049]  [<ffffffff81766d83>] mutex_lock+0x23/0x37
  [1367979.731376]  [<ffffffff8156598a>] usb_disconnect+0x5a/0x210
  [1367979.737138]  [<ffffffff8156ff7a>] ? usb_control_msg+0xea/0x110
  [1367979.743160]  [<ffffffff8156834f>] hub_port_connect_change+0xcf/0x9c0
  [1367979.749703]  [<ffffffff81562725>] ? hub_port_status+0xd5/0x120
  [1367979.755724]  [<ffffffff815690a4>] hub_events+0x464/0x8c0
  [1367979.761223]  [<ffffffff8176473e>] ? __schedule+0x38e/0x700
  [1367979.766897]  [<ffffffff81569535>] hub_thread+0x35/0x150
  [1367979.772313]  [<ffffffff810b0410>] ? __wake_up_sync+0x20/0x20
  [1367979.778160]  [<ffffffff81569500>] ? hub_events+0x8c0/0x8c0
  [1367979.784942]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [1367979.790008]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367979.796378]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [1367979.801965]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367979.808345] kworker/8:1     D ffff88100054f160     0   329      2 0x00000000
  [1367979.815632] Workqueue: events hid_reset [usbhid]
  [1367979.820445]  ffff881002625788 0000000000000046 ffff88102fc93180 ffff881002625fd8
  [1367979.828088]  0000000000013180 0000000000013180 ffff8808049f3000 ffff881002639800
  [1367979.835726]  ffff881002625798 ffff8810026258d8 7fffffffffffffff 7fffffffffffffff
  [1367979.843367] Call Trace:
  [1367979.845999]  [<ffffffff81764b69>] schedule+0x29/0x70
  [1367979.851151]  [<ffffffff81763e35>] schedule_timeout+0x1e5/0x250
  [1367979.857174]  [<ffffffff810aa9cf>] ? enqueue_entity+0x2bf/0x760
  [1367979.863196]  [<ffffffff81765e07>] wait_for_completion+0xa7/0x160
  [1367979.869392]  [<ffffffff8109fe00>] ? try_to_wake_up+0x210/0x210
  [1367979.875414]  [<ffffffff81089509>] flush_work+0x29/0x40
  [1367979.880741]  [<ffffffff81085170>] ? start_worker+0x40/0x40
  [1367979.886415]  [<ffffffff8108962c>] __cancel_work_timer+0x9c/0x1b0
  [1367979.892609]  [<ffffffff81089770>] cancel_work_sync+0x10/0x20
  [1367979.898461]  [<ffffffffa0072a19>] hid_cancel_delayed_stuff+0x29/0x30 [usbhid]
  [1367979.905801]  [<ffffffffa00748cc>] usbhid_close+0xcc/0x110 [usbhid]
  [1367979.912176]  [<ffffffffa0042772>] hidinput_close+0x22/0x30 [hid]
  [1367979.918374]  [<ffffffff815b3c01>] input_close_device+0x61/0x90
  [1367979.924397]  [<ffffffff815b9b8f>] evdev_cleanup+0xbf/0xd0
  [1367979.929983]  [<ffffffff815b9bd6>] evdev_disconnect+0x36/0x70
  [1367979.935832]  [<ffffffff815b5b85>] __input_unregister_device+0xc5/0x1a0
  [1367979.942547]  [<ffffffff815b5cb5>] input_unregister_device+0x55/0x80
  [1367979.949007]  [<ffffffffa0042105>] hidinput_disconnect+0x95/0xc0 [hid]
  [1367979.955638]  [<ffffffffa003f548>] hid_disconnect+0x68/0x70 [hid]
  [1367979.961837]  [<ffffffffa003f625>] hid_device_remove+0xd5/0xf0 [hid]
  [1367979.968298]  [<ffffffff814b7c8f>] __device_release_driver+0x7f/0xf0
  [1367979.974752]  [<ffffffff814b7d2c>] device_release_driver+0x2c/0x40
  [1367979.981095]  [<ffffffff814b76d4>] bus_remove_device+0x104/0x170
  [1367979.987202]  [<ffffffff814b46f8>] device_del+0x118/0x1a0
  [1367979.992707]  [<ffffffffa003eaf0>] hid_destroy_device+0x30/0x70 [hid]
  [1367979.999253]  [<ffffffffa0072a4b>] usbhid_disconnect+0x2b/0x50 [usbhid]
  [1367980.005968]  [<ffffffff81573390>] usb_unbind_interface+0x60/0x1b0
  [1367980.012252]  [<ffffffff814b7c8f>] __device_release_driver+0x7f/0xf0
  [1367980.018710]  [<ffffffff814b7d2c>] device_release_driver+0x2c/0x40
  [1367980.024992]  [<ffffffff81573570>] usb_driver_release_interface+0x90/0xa0
  [1367980.031885]  [<ffffffffa0074c51>] ? hid_post_reset+0x51/0x200 [usbhid]
  [1367980.038603]  [<ffffffff815735b7>] usb_forced_unbind_intf+0x37/0x60
  [1367980.044971]  [<ffffffff81573639>] unbind_marked_interfaces.isra.8+0x59/0x80
  [1367980.052121]  [<ffffffff815737ed>] usb_unbind_and_rebind_marked_interfaces+0x1d/0x30
  [1367980.059981]  [<ffffffff81567c2b>] usb_reset_device+0x15b/0x1d0
  [1367980.066004]  [<ffffffffa0073e2f>] hid_reset+0x16f/0x1e0 [usbhid]
  [1367980.072199]  [<ffffffff81087c5f>] process_one_work+0x17f/0x4c0
  [1367980.078221]  [<ffffffff81088dbb>] worker_thread+0x11b/0x3d0
  [1367980.083982]  [<ffffffff81088ca0>] ? manage_workers.isra.21+0x190/0x190
  [1367980.090699]  [<ffffffff8108ff09>] kthread+0xc9/0xe0
  [1367980.095765]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367980.102132]  [<ffffffff81771768>] ret_from_fork+0x58/0x90
  [1367980.107719]  [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
  [1367980.114098] python2.7       D ffff88100054f160     0 18303      1 0x00000004
  [1367980.121369]  ffff881001497d08 0000000000000082 ffff88080fc53180 ffff881001497fd8
  [1367980.129007]  0000000000013180 0000000000013180 ffff881003f86000 ffff881001688000
  [1367980.136648]  0000881000000024 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
  [1367980.144286] Call Trace:
  [1367980.146921]  [<ffffffff81764b69>] schedule+0x29/0x70
  [1367980.152073]  [<ffffffff81764e8e>] schedule_preempt_disabled+0xe/0x10
  [1367980.158615]  [<ffffffff81766cc4>] __mutex_lock_slowpath+0x114/0x1b0
  [1367980.165069]  [<ffffffff81766d83>] mutex_lock+0x23/0x37
  [1367980.170397]  [<ffffffff81576a8b>] manufacturer_show+0x2b/0x60
  [1367980.176331]  [<ffffffff814b4890>] dev_attr_show+0x20/0x60
  [1367980.181918]  [<ffffffff81766d76>] ? mutex_lock+0x16/0x37
  [1367980.187418]  [<ffffffff811eec6b>] ? seq_buf_alloc+0x1b/0x50
  [1367980.193180]  [<ffffffff81242875>] sysfs_seq_show+0x145/0x260
  [1367980.199027]  [<ffffffff811ef3fb>] seq_read+0xfb/0x3c0
  [1367980.204267]  [<ffffffff811cbc8b>] vfs_read+0xab/0x180
  [1367980.209506]  [<ffffffff811cbf12>] SyS_read+0x52/0xa0
  [1367980.214657]  [<ffffffff8177181d>] system_call_fastpath+0x1a/0x1f
  lsusb for the device in question:
  $ sudo lsusb -v -d 0624:0248
  Bus 005 Device 002: ID 0624:0248 Avocent Corp. Virtual Hub
  Device Descriptor:
    bLength                18
    bDescriptorType         1
    bcdUSB               1.10
    bDeviceClass            0 (Defined at Interface level)
    bDeviceSubClass         0 
    bDeviceProtocol         0 
    bMaxPacketSize0        64
    idVendor           0x0624 Avocent Corp.
    idProduct          0x0248 Virtual Hub
    bcdDevice            0.00
    iManufacturer           1 Avocent
    iProduct                2 USB Composite Device-0
    iSerial                 3 20080519
    bNumConfigurations      1
    Configuration Descriptor:
      bLength                 9
      bDescriptorType         2
      wTotalLength           59
      bNumInterfaces          2
      bConfigurationValue     1
      iConfiguration          0 
      bmAttributes         0xc0
        Self Powered
      MaxPower                2mA
      Interface Descriptor:
        bLength                 9
        bDescriptorType         4
        bInterfaceNumber        0
        bAlternateSetting       0
        bNumEndpoints           1
        bInterfaceClass         3 Human Interface Device
        bInterfaceSubClass      1 Boot Interface Subclass
        bInterfaceProtocol      1 Keyboard
        iInterface              4 Keyboard
          HID Device Descriptor:
            bLength                 9
            bDescriptorType        33
            bcdHID               1.00
            bCountryCode            0 Not supported
            bNumDescriptors         1
            bDescriptorType        34 Report
            wDescriptorLength      65
           Report Descriptors: 
             ** UNAVAILABLE **
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x81  EP 1 IN
          bmAttributes            3
            Transfer Type            Interrupt
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0008  1x 8 bytes
          bInterval               8
      Interface Descriptor:
        bLength                 9
        bDescriptorType         4
        bInterfaceNumber        1
        bAlternateSetting       0
        bNumEndpoints           1
        bInterfaceClass         3 Human Interface Device
        bInterfaceSubClass      1 Boot Interface Subclass
        bInterfaceProtocol      2 Mouse
        iInterface              5 Mouse
          HID Device Descriptor:
            bLength                 9
            bDescriptorType        33
            bcdHID               1.00
            bCountryCode            0 Not supported
            bNumDescriptors         1
            bDescriptorType        34 Report
            wDescriptorLength      63
           Report Descriptors: 
             ** UNAVAILABLE **
        Endpoint Descriptor:
          bLength                 7
          bDescriptorType         5
          bEndpointAddress     0x82  EP 2 IN
          bmAttributes            3
            Transfer Type            Interrupt
            Synch Type               None
            Usage Type               Data
          wMaxPacketSize     0x0008  1x 8 bytes
          bInterval               8
  Device Status:     0x0000
    (Bus Powered)
  And now for my purely speculative analysis of what might be going on
  after digging through the code...
  There are two devices interacting here, the usbhid that is the
  keyboard for the DRAC and the hub that it's connected to.
  As a part of the usbhid initialization (usbhid_probe) it creates a
  tasklet for hid_reset (I have no idea why, but perhaps it needs to
  reset the underlying hid/usb devices to start working) which unloads
  and reloads all the parts of the hidusb driver (usb, hid, input,
  evdev). As a part of the unload for the usbhid it calls
  hid_cancel_delayed_stuff (don't let it fool you, it actually works
  with a usbhid device, not a hid device) iff it is no longer opened,
  which normally it should already be opened, but...
  I haven't been able to confirm this, but I think that if the hub
  closes it can close the devices associated with it, which gives us a
  fun race condition where if we start the probe for the usbhid device
  and then the underlying hub gets removed it will have no instances
  open, which causes usbhid to call hid_cancel_delayed_stuff, from its
  own cancel tasklet. Unsurprisingly when it does this it ends up
  waiting on itself forever.
  I'm currently working around the issue by adding a udev rule to
  prevent the device from being used by setting authorized=0 for it.
  This obviously prevents the DRAC from actually working, though.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1557172/+subscriptions
References