← Back to team overview

kernel-packages team mailing list archive

[Bug 1597908] Re: linux-kernel: Freeing IRQ from IRQ context

 

This system crashes making apport-collect not possible after the fact,
though I confirm it is a bug. As the upstream nvme driver maintainer, I
can recommend either which driver commits need to be reverted, or which
kernel commit needs to be cherry-picked (preferring the latter :)).

Here is a snippet of stack trace:

<3>[51827.132142] BUG: scheduling while atomic: swapper/19/0/0x00000100
<4>[51827.242686] Modules linked in: nvme binfmt_misc PlxSvc(OE) ipmi_devintf intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass input_leds joydev sb_edac ipmi_ssif edac_core mei_me mei lpc_ich ioatdma shpchp ipmi_si ipmi_msghandler 8250_fintek acpi_pad acpi_power_meter mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear igb dca ptp ahci crct10dif_pclmul crc32_pclmul hid_generic mxm_wmi aesni_intel aes_x86_64 lrw gf128mul usbhid glue_helper ablk_helper pps_core cryptd hid libahci i2c_algo_bit fjes wmi
<4>[51827.242743] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G        W  OE   4.4.0-24-generic #43-Ubuntu
<4>[51827.242746] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.11.01.0132.060620160917 06/06/2016
<4>[51827.242748]  0000000000000286 374975818f2884ca ffff88105de43a98 ffffffff813eab23
<4>[51827.242752]  ffff88105de56d00 0000000000000000 ffff88105de43aa8 ffffffff810a5ceb
<4>[51827.242762]  ffff88105de43af8 ffffffff818217d6 ffff88105de43ac8 3749758100000013
<4>[51827.242765] Call Trace:
<4>[51827.242768]  <IRQ>  [<ffffffff813eab23>] dump_stack+0x63/0x90
<4>[51827.242781]  [<ffffffff810a5ceb>] __schedule_bug+0x4b/0x60
<4>[51827.242788]  [<ffffffff818217d6>] __schedule+0x726/0xa30
<4>[51827.242792]  [<ffffffff81821b15>] schedule+0x35/0x80
<4>[51827.242797]  [<ffffffff81824ba9>] schedule_timeout+0x129/0x270
<4>[51827.242802]  [<ffffffff810ec480>] ? trace_event_raw_event_tick_stop+0x120/0x120
<4>[51827.242807]  [<ffffffff810ec89d>] msleep+0x2d/0x40
<4>[51827.242813]  [<ffffffffc02cd470>] nvme_wait_ready+0x90/0x100 [nvme]
<4>[51827.242818]  [<ffffffffc02cee70>] nvme_disable_ctrl+0x40/0x50 [nvme]
<4>[51827.242823]  [<ffffffffc02d1b3d>] nvme_disable_admin_queue+0x8d/0x90 [nvme]
<4>[51827.242828]  [<ffffffffc02d1dde>] nvme_dev_disable+0x29e/0x2c0 [nvme]
<4>[51827.242833]  [<ffffffffc02d03a0>] ? __nvme_process_cq+0x200/0x200 [nvme]
<4>[51827.242838]  [<ffffffff8154955c>] ? dev_warn+0x6c/0x90
<4>[51827.242843]  [<ffffffffc02d1ff0>] nvme_timeout+0x110/0x1d0 [nvme]
<4>[51827.242847]  [<ffffffff813ea92f>] ? cpumask_next_and+0x2f/0x40
<4>[51827.242850]  [<ffffffff810bd4bc>] ? load_balance+0x18c/0x980
<4>[51827.242854]  [<ffffffff813c5cdf>] blk_mq_rq_timed_out+0x2f/0x70
<4>[51827.242857]  [<ffffffff813c5d6e>] blk_mq_check_expired+0x4e/0x80
<4>[51827.242861]  [<ffffffff813c86c8>] bt_for_each+0xd8/0xe0
<4>[51827.242864]  [<ffffffff813c5d20>] ? blk_mq_rq_timed_out+0x70/0x70
<4>[51827.242868]  [<ffffffff813c5d20>] ? blk_mq_rq_timed_out+0x70/0x70
<4>[51827.242871]  [<ffffffff813c8ed7>] blk_mq_queue_tag_busy_iter+0x47/0xc0
<4>[51827.242875]  [<ffffffff813c4a80>] ? blk_mq_attempt_merge+0xb0/0xb0
<4>[51827.242878]  [<ffffffff813c4ac1>] blk_mq_rq_timer+0x41/0xf0
<4>[51827.242882]  [<ffffffff810ec4c5>] call_timer_fn+0x35/0x120
<4>[51827.242885]  [<ffffffff813c4a80>] ? blk_mq_attempt_merge+0xb0/0xb0
<4>[51827.242890]  [<ffffffff810ece7a>] run_timer_softirq+0x23a/0x2f0
<4>[51827.242894]  [<ffffffff81085b11>] __do_softirq+0x101/0x290
<4>[51827.242899]  [<ffffffff81085e13>] irq_exit+0xa3/0xb0
<4>[51827.242902]  [<ffffffff818286a2>] smp_apic_timer_interrupt+0x42/0x50
<4>[51827.242905]  [<ffffffff81826962>] apic_timer_interrupt+0x82/0x90
<4>[51827.242907]  <EOI>  [<ffffffff816bcd21>] ? cpuidle_enter_state+0x111/0x2b0
<4>[51827.242914]  [<ffffffff816bcef7>] cpuidle_enter+0x17/0x20
<4>[51827.242918]  [<ffffffff810c3ec2>] call_cpuidle+0x32/0x60
<4>[51827.242921]  [<ffffffff816bced3>] ? cpuidle_select+0x13/0x20
<4>[51827.242925]  [<ffffffff810c4180>] cpu_startup_entry+0x290/0x350
<4>[51827.242929]  [<ffffffff81051714>] start_secondary+0x154/0x190
<3>[51827.242934] bad: scheduling from the idle thread!


** Changed in: linux (Ubuntu)
       Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1597908

Title:
  linux-kernel: Freeing IRQ from IRQ context

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  It looks like the Ubuntu 16.04 took the nvme driver from 4.5 kernel,
  but is missing some critical block updates that it was depending on.
  Specifically this one moving the timeout handler to a work queue
  instead of a irq context timer task:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=287922eb0b186e2a5bf54fdd04b734c25c90035c

  This mismatch causes lots of warnings and errors during recovery from
  failure.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1597908/+subscriptions