kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #168073
[Bug 1557172] Re: khubd/usbhid deadlock(?) creates processes in state D
Hi Joseph
I'll see if I can nab some extra hardware to try it out. Keep in mind
that I haven't found a way to reproduce this reliably yet, with our
usual setup it can take a few weeks before it happens. Basically we have
a cron job that periodically exports the state of our machines, which
includes calling lsusb. Eventually this bug happens, all future calls to
the job get stuck in D, and the machine eventually runs out of
resources, requiring it to be rebooted.
I did browse kernel sources upstream and didn't notice anything that
would appear to fix this issue, but I'm certainly no kernel-guru.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1557172
Title:
khubd/usbhid deadlock(?) creates processes in state D
Status in linux package in Ubuntu:
Incomplete
Bug description:
(Note that the attached logs are not a machine exhibiting the issue)
I've observed this issue specifically on Dell's PowerEdge R410 with
its DRAC. I manage quite a few other machines with DRACs and haven't
seen this issue so it may be limited to the R410. I've seen cases of
this both with the precise and trusty kernel.
Basically, what happens is that things that read any info about the
DRAC's usbhid device will block in an uninterruptible state. A quick
way to do this is with "cat
/sys/devices/pci0000:00/0000:00:1d.0/usb5/5-2/manufacturer". This
means that commands like lspci will block as well. The thing is that
it doesn't happen every time, but once it happens the first time all
future reads on the device will block in state D too.
This is also associated with the following from the kernel (first from a precise machine):
[197852.595820] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197857.716899] usb 5-2: device descriptor read/64, error -71
[197857.944966] usb 5-2: device descriptor read/64, error -71
[197858.165020] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197858.285090] usb 5-2: device descriptor read/64, error -71
[197858.513108] usb 5-2: device descriptor read/64, error -71
[197858.733154] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197859.145214] usb 5-2: device not accepting address 2, error -71
[197859.261234] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197859.669322] usb 5-2: device not accepting address 2, error -71
[197859.675357] usb 5-2: USB disconnect, device number 2
[198047.530714] INFO: task khubd:203 blocked for more than 120 seconds.
[198047.537147] Not tainted 3.13.0-74-generic #118~precise1-Ubuntu
[198047.543691] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[198047.551792] khubd D ffff880803fe18a0 0 203 2 0x00000000
[198047.551803] ffff8810036b1c18 0000000000000046 ffff88102fc53180 ffff8810036b1fd8
[198047.551809] 0000000000013180 0000000000013180 ffff881003f7c800 ffff880804634800
[198047.551814] ffff8810036b1c38 ffff8810020908e8 ffff8810020908ec 00000000ffffffff
[198047.551819] Call Trace:
[198047.551828] [<ffffffff81764b59>] schedule+0x29/0x70
[198047.551833] [<ffffffff81764e7e>] schedule_preempt_disabled+0xe/0x10
[198047.551837] [<ffffffff81766cb4>] __mutex_lock_slowpath+0x114/0x1b0
[198047.551845] [<ffffffff8156e66e>] ? usb_alloc_urb+0x1e/0x50
[198047.551848] [<ffffffff81766d73>] mutex_lock+0x23/0x37
[198047.551852] [<ffffffff8156597a>] usb_disconnect+0x5a/0x210
[198047.551857] [<ffffffff8156ff6a>] ? usb_control_msg+0xea/0x110
[198047.551860] [<ffffffff8156833f>] hub_port_connect_change+0xcf/0x9c0
[198047.551864] [<ffffffff81562715>] ? hub_port_status+0xd5/0x120
[198047.551868] [<ffffffff81569094>] hub_events+0x464/0x8c0
[198047.551871] [<ffffffff8176472e>] ? __schedule+0x38e/0x700
[198047.551875] [<ffffffff81569525>] hub_thread+0x35/0x150
[198047.551881] [<ffffffff810b0410>] ? __wake_up_sync+0x20/0x20
[198047.551885] [<ffffffff815694f0>] ? hub_events+0x8c0/0x8c0
[198047.551890] [<ffffffff8108ff09>] kthread+0xc9/0xe0
[198047.551893] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[198047.551898] [<ffffffff81771768>] ret_from_fork+0x58/0x90
[198047.551902] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[198047.551908] INFO: task kworker/9:1:246 blocked for more than 120 seconds.
[198047.558892] Not tainted 3.13.0-74-generic #118~precise1-Ubuntu
[198047.565491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[198047.573524] kworker/9:1 D ffff880803fe18a0 0 246 2 0x00000000
[198047.573547] Workqueue: events hid_reset [usbhid]
[198047.573550] ffff8808011f9788 0000000000000046 ffff88080fc93180 ffff8808011f9fd8
[198047.573555] 0000000000013180 0000000000013180 ffff881003f94800 ffff8808011f0000
[198047.573559] ffff8808011f9798 ffff8808011f98d8 7fffffffffffffff 7fffffffffffffff
[198047.573564] Call Trace:
[198047.573579] [<ffffffff81764b59>] schedule+0x29/0x70
[198047.573582] [<ffffffff81763e25>] schedule_timeout+0x1e5/0x250
[198047.573587] [<ffffffff810a4b1d>] ? wake_affine+0x16d/0x2d0
[198047.573591] [<ffffffff81765df7>] wait_for_completion+0xa7/0x160
[198047.573596] [<ffffffff8109fe00>] ? try_to_wake_up+0x210/0x210
[198047.573602] [<ffffffff81089509>] flush_work+0x29/0x40
[198047.573605] [<ffffffff81085170>] ? start_worker+0x40/0x40
[198047.573609] [<ffffffff8108962c>] __cancel_work_timer+0x9c/0x1b0
[198047.573623] [<ffffffff81089770>] cancel_work_sync+0x10/0x20
[198047.573632] [<ffffffffa00a6a19>] hid_cancel_delayed_stuff+0x29/0x30 [usbhid]
[198047.573640] [<ffffffffa00a88cc>] usbhid_close+0xcc/0x110 [usbhid]
[198047.573650] [<ffffffffa0036772>] hidinput_close+0x22/0x30 [hid]
[198047.573659] [<ffffffff815b3bf1>] input_close_device+0x61/0x90
[198047.573663] [<ffffffff815b9b7f>] evdev_cleanup+0xbf/0xd0
[198047.573670] [<ffffffff815b9bc6>] evdev_disconnect+0x36/0x70
[198047.573674] [<ffffffff815b5b75>] __input_unregister_device+0xc5/0x1a0
[198047.573678] [<ffffffff815b5ca5>] input_unregister_device+0x55/0x80
[198047.573687] [<ffffffffa0036105>] hidinput_disconnect+0x95/0xc0 [hid]
[198047.573695] [<ffffffffa0033548>] hid_disconnect+0x68/0x70 [hid]
[198047.573708] [<ffffffffa0033625>] hid_device_remove+0xd5/0xf0 [hid]
[198047.573720] [<ffffffff814b7c7f>] __device_release_driver+0x7f/0xf0
[198047.573724] [<ffffffff814b7d1c>] device_release_driver+0x2c/0x40
[198047.573728] [<ffffffff814b76c4>] bus_remove_device+0x104/0x170
[198047.573736] [<ffffffff814b46e8>] device_del+0x118/0x1a0
[198047.573744] [<ffffffffa0032af0>] hid_destroy_device+0x30/0x70 [hid]
[198047.573754] [<ffffffffa00a6a4b>] usbhid_disconnect+0x2b/0x50 [usbhid]
[198047.573758] [<ffffffff81573380>] usb_unbind_interface+0x60/0x1b0
[198047.573764] [<ffffffff814b7c7f>] __device_release_driver+0x7f/0xf0
[198047.573768] [<ffffffff814b7d1c>] device_release_driver+0x2c/0x40
[198047.573772] [<ffffffff81573560>] usb_driver_release_interface+0x90/0xa0
[198047.573780] [<ffffffffa00a8c51>] ? hid_post_reset+0x51/0x200 [usbhid]
[198047.573785] [<ffffffff815735a7>] usb_forced_unbind_intf+0x37/0x60
[198047.573788] [<ffffffff81573629>] unbind_marked_interfaces.isra.8+0x59/0x80
[198047.573792] [<ffffffff815737dd>] usb_unbind_and_rebind_marked_interfaces+0x1d/0x30
[198047.573796] [<ffffffff81567c1b>] usb_reset_device+0x15b/0x1d0
[198047.573804] [<ffffffffa00a7e2f>] hid_reset+0x16f/0x1e0 [usbhid]
[198047.573808] [<ffffffff81087c5f>] process_one_work+0x17f/0x4c0
[198047.573811] [<ffffffff81088dbb>] worker_thread+0x11b/0x3d0
[198047.573815] [<ffffffff81088ca0>] ? manage_workers.isra.21+0x190/0x190
[198047.573819] [<ffffffff8108ff09>] kthread+0xc9/0xe0
[198047.573823] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[198047.573827] [<ffffffff81771768>] ret_from_fork+0x58/0x90
[198047.573830] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
And these from a trusty machine (which includes info from one of the stuck processes too):
[1367979.659164] SysRq : Show Blocked State
[1367979.663114] task PC stack pid father
[1367979.669300] khubd D ffff88100054f160 0 203 2 0x00000000
[1367979.676574] ffff88100352fc18 0000000000000046 ffff88080fc93180 ffff88100352ffd8
[1367979.684216] 0000000000013180 0000000000013180 ffff8808049f4800 ffff8808043f4800
[1367979.691855] ffff88100352fc38 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
[1367979.699495] Call Trace:
[1367979.702134] [<ffffffff81764b69>] schedule+0x29/0x70
[1367979.707288] [<ffffffff81764e8e>] schedule_preempt_disabled+0xe/0x10
[1367979.713831] [<ffffffff81766cc4>] __mutex_lock_slowpath+0x114/0x1b0
[1367979.720288] [<ffffffff8156e67e>] ? usb_alloc_urb+0x1e/0x50
[1367979.726049] [<ffffffff81766d83>] mutex_lock+0x23/0x37
[1367979.731376] [<ffffffff8156598a>] usb_disconnect+0x5a/0x210
[1367979.737138] [<ffffffff8156ff7a>] ? usb_control_msg+0xea/0x110
[1367979.743160] [<ffffffff8156834f>] hub_port_connect_change+0xcf/0x9c0
[1367979.749703] [<ffffffff81562725>] ? hub_port_status+0xd5/0x120
[1367979.755724] [<ffffffff815690a4>] hub_events+0x464/0x8c0
[1367979.761223] [<ffffffff8176473e>] ? __schedule+0x38e/0x700
[1367979.766897] [<ffffffff81569535>] hub_thread+0x35/0x150
[1367979.772313] [<ffffffff810b0410>] ? __wake_up_sync+0x20/0x20
[1367979.778160] [<ffffffff81569500>] ? hub_events+0x8c0/0x8c0
[1367979.784942] [<ffffffff8108ff09>] kthread+0xc9/0xe0
[1367979.790008] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[1367979.796378] [<ffffffff81771768>] ret_from_fork+0x58/0x90
[1367979.801965] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[1367979.808345] kworker/8:1 D ffff88100054f160 0 329 2 0x00000000
[1367979.815632] Workqueue: events hid_reset [usbhid]
[1367979.820445] ffff881002625788 0000000000000046 ffff88102fc93180 ffff881002625fd8
[1367979.828088] 0000000000013180 0000000000013180 ffff8808049f3000 ffff881002639800
[1367979.835726] ffff881002625798 ffff8810026258d8 7fffffffffffffff 7fffffffffffffff
[1367979.843367] Call Trace:
[1367979.845999] [<ffffffff81764b69>] schedule+0x29/0x70
[1367979.851151] [<ffffffff81763e35>] schedule_timeout+0x1e5/0x250
[1367979.857174] [<ffffffff810aa9cf>] ? enqueue_entity+0x2bf/0x760
[1367979.863196] [<ffffffff81765e07>] wait_for_completion+0xa7/0x160
[1367979.869392] [<ffffffff8109fe00>] ? try_to_wake_up+0x210/0x210
[1367979.875414] [<ffffffff81089509>] flush_work+0x29/0x40
[1367979.880741] [<ffffffff81085170>] ? start_worker+0x40/0x40
[1367979.886415] [<ffffffff8108962c>] __cancel_work_timer+0x9c/0x1b0
[1367979.892609] [<ffffffff81089770>] cancel_work_sync+0x10/0x20
[1367979.898461] [<ffffffffa0072a19>] hid_cancel_delayed_stuff+0x29/0x30 [usbhid]
[1367979.905801] [<ffffffffa00748cc>] usbhid_close+0xcc/0x110 [usbhid]
[1367979.912176] [<ffffffffa0042772>] hidinput_close+0x22/0x30 [hid]
[1367979.918374] [<ffffffff815b3c01>] input_close_device+0x61/0x90
[1367979.924397] [<ffffffff815b9b8f>] evdev_cleanup+0xbf/0xd0
[1367979.929983] [<ffffffff815b9bd6>] evdev_disconnect+0x36/0x70
[1367979.935832] [<ffffffff815b5b85>] __input_unregister_device+0xc5/0x1a0
[1367979.942547] [<ffffffff815b5cb5>] input_unregister_device+0x55/0x80
[1367979.949007] [<ffffffffa0042105>] hidinput_disconnect+0x95/0xc0 [hid]
[1367979.955638] [<ffffffffa003f548>] hid_disconnect+0x68/0x70 [hid]
[1367979.961837] [<ffffffffa003f625>] hid_device_remove+0xd5/0xf0 [hid]
[1367979.968298] [<ffffffff814b7c8f>] __device_release_driver+0x7f/0xf0
[1367979.974752] [<ffffffff814b7d2c>] device_release_driver+0x2c/0x40
[1367979.981095] [<ffffffff814b76d4>] bus_remove_device+0x104/0x170
[1367979.987202] [<ffffffff814b46f8>] device_del+0x118/0x1a0
[1367979.992707] [<ffffffffa003eaf0>] hid_destroy_device+0x30/0x70 [hid]
[1367979.999253] [<ffffffffa0072a4b>] usbhid_disconnect+0x2b/0x50 [usbhid]
[1367980.005968] [<ffffffff81573390>] usb_unbind_interface+0x60/0x1b0
[1367980.012252] [<ffffffff814b7c8f>] __device_release_driver+0x7f/0xf0
[1367980.018710] [<ffffffff814b7d2c>] device_release_driver+0x2c/0x40
[1367980.024992] [<ffffffff81573570>] usb_driver_release_interface+0x90/0xa0
[1367980.031885] [<ffffffffa0074c51>] ? hid_post_reset+0x51/0x200 [usbhid]
[1367980.038603] [<ffffffff815735b7>] usb_forced_unbind_intf+0x37/0x60
[1367980.044971] [<ffffffff81573639>] unbind_marked_interfaces.isra.8+0x59/0x80
[1367980.052121] [<ffffffff815737ed>] usb_unbind_and_rebind_marked_interfaces+0x1d/0x30
[1367980.059981] [<ffffffff81567c2b>] usb_reset_device+0x15b/0x1d0
[1367980.066004] [<ffffffffa0073e2f>] hid_reset+0x16f/0x1e0 [usbhid]
[1367980.072199] [<ffffffff81087c5f>] process_one_work+0x17f/0x4c0
[1367980.078221] [<ffffffff81088dbb>] worker_thread+0x11b/0x3d0
[1367980.083982] [<ffffffff81088ca0>] ? manage_workers.isra.21+0x190/0x190
[1367980.090699] [<ffffffff8108ff09>] kthread+0xc9/0xe0
[1367980.095765] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[1367980.102132] [<ffffffff81771768>] ret_from_fork+0x58/0x90
[1367980.107719] [<ffffffff8108fe40>] ? flush_kthread_worker+0xb0/0xb0
[1367980.114098] python2.7 D ffff88100054f160 0 18303 1 0x00000004
[1367980.121369] ffff881001497d08 0000000000000082 ffff88080fc53180 ffff881001497fd8
[1367980.129007] 0000000000013180 0000000000013180 ffff881003f86000 ffff881001688000
[1367980.136648] 0000881000000024 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
[1367980.144286] Call Trace:
[1367980.146921] [<ffffffff81764b69>] schedule+0x29/0x70
[1367980.152073] [<ffffffff81764e8e>] schedule_preempt_disabled+0xe/0x10
[1367980.158615] [<ffffffff81766cc4>] __mutex_lock_slowpath+0x114/0x1b0
[1367980.165069] [<ffffffff81766d83>] mutex_lock+0x23/0x37
[1367980.170397] [<ffffffff81576a8b>] manufacturer_show+0x2b/0x60
[1367980.176331] [<ffffffff814b4890>] dev_attr_show+0x20/0x60
[1367980.181918] [<ffffffff81766d76>] ? mutex_lock+0x16/0x37
[1367980.187418] [<ffffffff811eec6b>] ? seq_buf_alloc+0x1b/0x50
[1367980.193180] [<ffffffff81242875>] sysfs_seq_show+0x145/0x260
[1367980.199027] [<ffffffff811ef3fb>] seq_read+0xfb/0x3c0
[1367980.204267] [<ffffffff811cbc8b>] vfs_read+0xab/0x180
[1367980.209506] [<ffffffff811cbf12>] SyS_read+0x52/0xa0
[1367980.214657] [<ffffffff8177181d>] system_call_fastpath+0x1a/0x1f
lsusb for the device in question:
$ sudo lsusb -v -d 0624:0248
Bus 005 Device 002: ID 0624:0248 Avocent Corp. Virtual Hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0624 Avocent Corp.
idProduct 0x0248 Virtual Hub
bcdDevice 0.00
iManufacturer 1 Avocent
iProduct 2 USB Composite Device-0
iSerial 3 20080519
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 59
bNumInterfaces 2
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 2mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 3 Human Interface Device
bInterfaceSubClass 1 Boot Interface Subclass
bInterfaceProtocol 1 Keyboard
iInterface 4 Keyboard
HID Device Descriptor:
bLength 9
bDescriptorType 33
bcdHID 1.00
bCountryCode 0 Not supported
bNumDescriptors 1
bDescriptorType 34 Report
wDescriptorLength 65
Report Descriptors:
** UNAVAILABLE **
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0008 1x 8 bytes
bInterval 8
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 3 Human Interface Device
bInterfaceSubClass 1 Boot Interface Subclass
bInterfaceProtocol 2 Mouse
iInterface 5 Mouse
HID Device Descriptor:
bLength 9
bDescriptorType 33
bcdHID 1.00
bCountryCode 0 Not supported
bNumDescriptors 1
bDescriptorType 34 Report
wDescriptorLength 63
Report Descriptors:
** UNAVAILABLE **
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0008 1x 8 bytes
bInterval 8
Device Status: 0x0000
(Bus Powered)
And now for my purely speculative analysis of what might be going on
after digging through the code...
There are two devices interacting here, the usbhid that is the
keyboard for the DRAC and the hub that it's connected to.
As a part of the usbhid initialization (usbhid_probe) it creates a
tasklet for hid_reset (I have no idea why, but perhaps it needs to
reset the underlying hid/usb devices to start working) which unloads
and reloads all the parts of the hidusb driver (usb, hid, input,
evdev). As a part of the unload for the usbhid it calls
hid_cancel_delayed_stuff (don't let it fool you, it actually works
with a usbhid device, not a hid device) iff it is no longer opened,
which normally it should already be opened, but...
I haven't been able to confirm this, but I think that if the hub
closes it can close the devices associated with it, which gives us a
fun race condition where if we start the probe for the usbhid device
and then the underlying hub gets removed it will have no instances
open, which causes usbhid to call hid_cancel_delayed_stuff, from its
own cancel tasklet. Unsurprisingly when it does this it ends up
waiting on itself forever.
I'm currently working around the issue by adding a udev rule to
prevent the device from being used by setting authorized=0 for it.
This obviously prevents the DRAC from actually working, though.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1557172/+subscriptions
References