← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1670634] Re: blk-mq: possible deadlock on CPU hot(un)plug

 

This bug was fixed in the package linux - 4.4.0-97.120

---------------
linux (4.4.0-97.120) xenial; urgency=low

  * linux: 4.4.0-97.120 -proposed tracker (LP: #1718149)

  * blk-mq: possible deadlock on CPU hot(un)plug (LP: #1670634)
    - [Config] s390x -- disable CONFIG_{DM, SCSI}_MQ_DEFAULT

  * Xenial update to 4.4.87 stable release (LP: #1715678)
    - irqchip: mips-gic: SYNC after enabling GIC region
    - i2c: ismt: Don't duplicate the receive length for block reads
    - i2c: ismt: Return EMSGSIZE for block reads with bogus length
    - ceph: fix readpage from fscache
    - cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
    - cpuset: Fix incorrect memory_pressure control file mapping
    - alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
    - CIFS: remove endian related sparse warning
    - wl1251: add a missing spin_lock_init()
    - xfrm: policy: check policy direction value
    - drm/ttm: Fix accounting error when fail to get pages for pool
    - kvm: arm/arm64: Fix race in resetting stage2 PGD
    - kvm: arm/arm64: Force reading uncached stage2 PGD
    - epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
    - crypto: algif_skcipher - only call put_page on referenced and used pages
    - Linux 4.4.87

  * Xenial update to 4.4.86 stable release (LP: #1715430)
    - scsi: isci: avoid array subscript warning
    - ALSA: au88x0: Fix zero clear of stream->resources
    - btrfs: remove duplicate const specifier
    - i2c: jz4780: drop superfluous init
    - gcov: add support for gcc version >= 6
    - gcov: support GCC 7.1
    - lightnvm: initialize ppa_addr in dev_to_generic_addr()
    - p54: memset(0) whole array
    - lpfc: Fix Device discovery failures during switch reboot test.
    - arm64: mm: abort uaccess retries upon fatal signal
    - x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
    - arm64: fpsimd: Prevent registers leaking across exec
    - scsi: sg: protect accesses to 'reserved' page array
    - scsi: sg: reset 'res_in_use' after unlinking reserved array
    - drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c
    - Linux 4.4.86

  * Xenial update to 4.4.85 stable release (LP: #1714298)
    - af_key: do not use GFP_KERNEL in atomic contexts
    - dccp: purge write queue in dccp_destroy_sock()
    - dccp: defer ccid_hc_tx_delete() at dismantle time
    - ipv4: fix NULL dereference in free_fib_info_rcu()
    - net_sched/sfq: update hierarchical backlog when drop packet
    - ipv4: better IP_MAX_MTU enforcement
    - sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
    - tipc: fix use-after-free
    - ipv6: reset fn->rr_ptr when replacing route
    - ipv6: repair fib6 tree in failure case
    - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
    - irda: do not leak initialized list.dev to userspace
    - net: sched: fix NULL pointer dereference when action calls some targets
    - net_sched: fix order of queue length updates in qdisc_replace()
    - mei: me: add broxton pci device ids
    - mei: me: add lewisburg device ids
    - Input: trackpoint - add new trackpoint firmware ID
    - Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
    - ALSA: core: Fix unexpected error at replacing user TLV
    - ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
    - ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
    - i2c: designware: Fix system suspend
    - drm: Release driver tracking before making the object available again
    - drm/atomic: If the atomic check fails, return its value first
    - drm: rcar-du: lvds: Fix PLL frequency-related configuration
    - drm: rcar-du: lvds: Rename PLLEN bit to PLLON
    - drm: rcar-du: Fix crash in encoder failure error path
    - drm: rcar-du: Fix display timing controller parameter
    - drm: rcar-du: Fix H/V sync signal polarity configuration
    - tracing: Fix freeing of filter in create_filter() when set_str is false
    - cifs: Fix df output for users with quota limits
    - cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
    - nfsd: Limit end of page list when decoding NFSv4 WRITE
    - perf/core: Fix group {cpu,task} validation
    - Bluetooth: hidp: fix possible might sleep error in hidp_session_thread
    - Bluetooth: cmtp: fix possible might sleep error in cmtp_session
    - Bluetooth: bnep: fix possible might sleep error in bnep_session
    - binder: use group leader instead of open thread
    - binder: Use wake up hint for synchronous transactions.
    - ANDROID: binder: fix proc->tsk check.
    - iio: imu: adis16480: Fix acceleration scale factor for adis16480
    - iio: hid-sensor-trigger: Fix the race with user space powering up sensors
    - staging: rtl8188eu: add RNX-N150NUB support
    - ASoC: simple-card: don't fail if sysclk setting is not supported
    - ASoC: rsnd: disable SRC.out only when stop timing
    - ASoC: rsnd: avoid pointless loop in rsnd_mod_interrupt()
    - ASoC: rsnd: Add missing initialization of ADG req_rate
    - ASoC: rsnd: ssi: 24bit data needs right-aligned settings
    - ASoC: rsnd: don't call update callback if it was NULL
    - ntb_transport: fix qp count bug
    - ntb_transport: fix bug calculating num_qps_mw
    - ACPI: ioapic: Clear on-stack resource before using it
    - ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal
    - Linux 4.4.85

  * Xenial update to 4.4.84 stable release (LP: #1713729)
    - audit: Fix use after free in audit_remove_watch_rule()
    - parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
    - crypto: x86/sha1 - Fix reads beyond the number of blocks passed
    - Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB
    - ALSA: seq: 2nd attempt at fixing race creating a queue
    - Revert "UBUNTU: SAUCE: (no-up) ALSA: usb-audio: Add quirk for sennheiser
      officerunner"
    - ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
    - ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
    - mm/mempolicy: fix use after free when calling get_mempolicy
    - xen: fix bio vec merging
    - x86/asm/64: Clear AC on NMI entries
    - irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
    - irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
    - Sanitize 'move_pages()' permission checks
    - pids: make task_tgid_nr_ns() safe
    - perf/x86: Fix LBR related crashes on Intel Atom
    - usb: optimize acpi companion search for usb port devices
    - usb: qmi_wwan: add D-Link DWM-222 device ID
    - Linux 4.4.84

  * Intel i40e PF reset due to incorrect MDD detection (LP: #1713553)
    - i40e: Limit TX descriptor count in cases where frag size is greater than 16K

  * Neighbour confirmation broken, breaks ARP cache aging (LP: #1715812)
    - sock: add sk_dst_pending_confirm flag
    - net: add dst_pending_confirm flag to skbuff
    - sctp: add dst_pending_confirm flag
    - tcp: replace dst_confirm with sk_dst_confirm
    - net: add confirm_neigh method to dst_ops
    - net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP
    - net: pending_confirm is not used anymore

  * CVE-2017-14106
    - tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

  * [CIFS] Fix maximum SMB2 header size (LP: #1713884)
    - CIFS: Fix maximum SMB2 header size

  * Middle button of trackpoint doesn't work (LP: #1715271)
    - Input: trackpoint - assume 3 buttons when buttons detection fails

  * kernel BUG at /build/linux-lts-xenial-_hWfOZ/linux-lts-
    xenial-4.4.0/security/apparmor/include/context.h:69! (LP: #1626984)
    - SAUCE: fix oops when disabled and module parameters, are accessed

  * Touchpad not detected (LP: #1708852)
    - Input: elan_i2c - add ELAN0608 to the ACPI table

 -- Kleber Sacilotto de Souza <kleber.souza@xxxxxxxxxxxxx>  Tue, 19 Sep
2017 17:55:11 +0200

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-14106

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1670634

Title:
  blk-mq: possible deadlock on CPU hot(un)plug

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Released

Bug description:
  == Comment: #0 - Carsten Jacobi <jacobi@xxxxxxxxxx> - 2017-03-07 03:35:31 ==
  I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured).
  The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit.
  The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that):

  With the default settings and under a certain workload,
  blk_mq seems to get into a presumed "deadlock".
  Possibly this happens on CPU hot(un)plug.

  After the I/O stalled, a dump was pulled manually.
  The following information is from the crash dump pre-analysis.

  $ zgetdump -i dump.0
  General dump info:
    Dump format........: elf
    Version............: 1
    UTS node name......: mclint
    UTS kernel release.: 4.4.0-65-generic
    UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017
    System arch........: s390x (64 bit)
    CPU count (online).: 2
    Dump memory range..: 8192 MB
  Memory map:
    0000000000000000 - 00000001b831afff (7043 MB)
    00000001b831b000 - 00000001ffffffff (1149 MB)

  Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1.

        KERNEL: vmlinux.full
      DUMPFILE: dump.0
          CPUS: 2
          DATE: Fri Mar  3 14:31:07 2017
        UPTIME: 02:11:20
  LOAD AVERAGE: 13.00, 12.92, 11.37
         TASKS: 411
      NODENAME: mclint
       RELEASE: 4.4.0-65-generic
       VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017
       MACHINE: s390x  (unknown Mhz)
        MEMORY: 7.8 GB
         PANIC: ""
           PID: 0
       COMMAND: "swapper/0"
          TASK: bad528  (1 of 2)  [THREAD_INFO: b78000]
           CPU: 0
         STATE: TASK_RUNNING (ACTIVE)
          INFO: no panic task found

  crash> dev -d
  MAJOR GENDISK            NAME       REQUEST_QUEUE      TOTAL ASYNC  SYNC   DRV
  ...
      8 1e1d6d800          sda        1e1d51210              0 23151 4294944145 N/A(MQ)
      8 1e4e06800          sdc        2081b18                0 23148 4294944148 N/A(MQ)
      8 1f07800            sdb        20c7568                0 23195 4294944101 N/A(MQ)
      8 1e4e06000          sdd        1e4e31210              0 23099 4294944197 N/A(MQ)
    252 1e1d6c800          dm-0       1e1d51b18              9     1     8 N/A(MQ)
  ...

  So both dm-mpath and sd have requests pending in their block multiqueue.
  The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1.

  [    0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49)
  [    0.798262] setup: Linux is running natively in 64-bit mode
  [    0.798290] setup: Max memory size: 8192MB
  [    0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB)

  [    0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root
  rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0

  [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds.
  [ 5281.179437]       Not tainted 4.4.0-65-generic #86-Ubuntu
  [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 5281.179440] xfsaild/dm-11   D 00000000007bcf52     0  1604      2 0x00000000
  [ 5281.179444]        00000001e931c230 00000000001a6964 00000001e6f9b958 00000001e6f9b9d8
                        00000001e15795f0 00000001e6f9b988 0000000000ce8c00 00000001ea805c70
                        00000001ea805c00 0000000000ba5ed0 00000001e931c1d0 00000001e1579b20
                        00000001ea805c00 00000001e15795f0 00000001ea805c00 0000000000000000
                        00000000007d3978 00000000007bc9f8 00000001e6f9b9d8 00000001e6f9ba40
  [ 5281.179454] Call Trace:
  [ 5281.179461] ([<00000000007bc9f8>] __schedule+0x300/0x810)
  [ 5281.179462]  [<00000000007bcf52>] schedule+0x4a/0xb0
  [ 5281.179465]  [<00000000007c02aa>] schedule_timeout+0x232/0x2a8
  [ 5281.179466]  [<00000000007bde50>] wait_for_common+0x110/0x1c8
  [ 5281.179472]  [<000000000017b602>] flush_work+0x42/0x58
  [ 5281.179564]  [<000003ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs]
  [ 5281.179589]  [<000003ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs]
  [ 5281.179615]  [<000003ff805df114>] xfs_log_force+0x44/0x100 [xfs]
  [ 5281.179640]  [<000003ff805ec668>] xfsaild+0x170/0x798 [xfs]
  [ 5281.179643]  [<000000000018335a>] kthread+0x10a/0x110
  [ 5281.179645]  [<00000000007c0ff6>] kernel_thread_starter+0x6/0xc
  [ 5281.179646]  [<00000000007c0ff0>] kernel_thread_starter+0x0/0xc

  see below

  [ 5281.179664] INFO: task cpuplugd:2260 blocked for more than 120 seconds.
  [ 5281.179665]       Not tainted 4.4.0-65-generic #86-Ubuntu
  [ 5281.179666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 5281.179668] cpuplugd        D 00000000007bcf52     0  2260      1 0x00000000
  [ 5281.179670]        00000001e782e318 00000000001a6964 000000007bc4ba58 000000007bc4bad8
                        00000001d4076db0 000000007bc4ba88 0000000000ce8c00 00000001ea805c70
                        00000001ea805c00 0000000000ba5ed0 00000001e782e2b8 00000001d40772e0
                        00000001ea805c00 00000001d4076db0 00000001ea805c00 0000000000000000
                        00000000007d3978 00000000007bc9f8 000000007bc4bad8 000000007bc4bb40
  [ 5281.179678] Call Trace:
  [ 5281.179680] ([<00000000007bc9f8>] __schedule+0x300/0x810)
  [ 5281.179681]  [<00000000007bcf52>] schedule+0x4a/0xb0
  [ 5281.179685]  [<0000000000516cc2>] blk_mq_freeze_queue_wait+0x62/0xc8
  [ 5281.179687]  [<0000000000519412>] blk_mq_queue_reinit_notify+0x11a/0x240
  [ 5281.179690]  [<00000000001844c6>] notifier_call_chain+0x56/0x98
  [ 5281.179692]  [<000000000018466a>] __raw_notifier_call_chain+0x2a/0x38
  [ 5281.179696]  [<00000000001605ac>] _cpu_up+0x10c/0x1b0
  [ 5281.179698]  [<0000000000160738>] cpu_up+0xe8/0x108
  [ 5281.179700]  [<00000000005d08be>] cpu_subsys_online+0x56/0xb0
  [ 5281.179703]  [<00000000005ca1c2>] device_online+0x82/0xc0
  [ 5281.179704]  [<00000000005ca28a>] online_store+0x8a/0x98
  [ 5281.179710]  [<00000000003a4d12>] kernfs_fop_write+0x13a/0x190
  [ 5281.179712]  [<000000000031218c>] vfs_write+0x94/0x1a0
  [ 5281.179714]  [<0000000000312e9e>] SyS_write+0x66/0xd8
  [ 5281.179715]  [<00000000007c0e3e>] system_call+0xd6/0x264
  [ 5281.179718]  [<000003ff803df478>] zlib_tr_flush_block+0x650/0x830 [zlib_deflate]

  Cpuplugd performs CPU hot(un)plug based on configurable rules.
  https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_r_cpuplugdcmd.html
  https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_t_cpu_act.html
  https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_r_numa_know_cpu.html

  [ 5281.179769] INFO: task kworker/0:2:23669 blocked for more than 120 seconds.
  [ 5281.179770]       Not tainted 4.4.0-65-generic #86-Ubuntu
  [ 5281.179771] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 5281.179773] kworker/0:2     D 00000000007bcf52     0 23669      2 0x00000000
  [ 5281.179801] Workqueue: xfs-cil/dm-11 xlog_cil_push_work [xfs]
  [ 5281.179802]        00000001c925e318 00000000001a6964 0000000068c5f9f8 0000000068c5fa78
                        00000001da99b6d8 0000000068c5fa10 0000000000ce8c00 00000001ea805c70
                        00000001ea805c00 00000001e782e94c 00000001c925e2b8 00000001da99bc08
                        00000001ea805c00 00000001da99b6d8 00000001ea805c00 0000000000000000
                        00000000007d3978 00000000007bc9f8 0000000068c5fa78 0000000068c5fae0
  [ 5281.179810] Call Trace:
  [ 5281.179812] ([<00000000007bc9f8>] __schedule+0x300/0x810)
  [ 5281.179813]  [<00000000007bcf52>] schedule+0x4a/0xb0
  [ 5281.179839]  [<000003ff805de144>] xlog_state_get_iclog_space+0x124/0x338 [xfs]
  [ 5281.179864]  [<000003ff805de702>] xlog_write+0x1ea/0x800 [xfs]
  [ 5281.179890]  [<000003ff805e09a6>] xlog_cil_push+0x286/0x508 [xfs]
  [ 5281.179891]  [<000000000017c400>] process_one_work+0x1a0/0x4f8
  [ 5281.179893]  [<000000000017c7a2>] worker_thread+0x4a/0x530
  [ 5281.179894]  [<000000000018335a>] kthread+0x10a/0x110
  [ 5281.179896]  [<00000000007c0ff6>] kernel_thread_starter+0x6/0xc
  [ 5281.179898]  [<00000000007c0ff0>] kernel_thread_starter+0x0/0xc

  While above kworker executes a work item for a long duration, other
  processes block in turn on flush_work for a long duration.

  [ 5281.179728] INFO: task kworker/0:1:4454 blocked for more than 120 seconds.
  [ 5281.179730]       Not tainted 4.4.0-65-generic #86-Ubuntu
  [ 5281.179731] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [ 5281.179732] kworker/0:1     D 00000000007bcf52     0  4454      2 0x00000000
  [ 5281.179738] Workqueue: events vmstat_shepherd
  [ 5281.179739]        00000001c925ac40 00000000001a6964 000000007eb8bb38 000000007eb8bbb8
                        00000001e782e2b8 000000007eb8bb50 0000000000ce8c00 00000001ea805c70
                        00000001c925abe0 00000001c925b274 00000001c925abe0 00000001e782e7e8
                        00000001ea805c00 00000001e782e2b8 00000001ea805c00 0000000000000000
                        00000000007d3978 00000000007bc9f8 000000007eb8bbb8 000000007eb8bc20
  [ 5281.179747] Call Trace:
  [ 5281.179749] ([<00000000007bc9f8>] __schedule+0x300/0x810)
  [ 5281.179750]  [<00000000007bcf52>] schedule+0x4a/0xb0
  [ 5281.179752]  [<00000000007bd39a>] schedule_preempt_disabled+0x2a/0x38
  [ 5281.179753]  [<00000000007becc4>] __mutex_lock_slowpath+0xcc/0x170
  [ 5281.179755]  [<00000000007bedc6>] mutex_lock+0x5e/0x78
  [ 5281.179756]  [<000000000015fba0>] get_online_cpus+0x40/0x68
  [ 5281.179757]  [<00000000002a3ccc>] vmstat_shepherd+0x44/0x168
  [ 5281.179759]  [<000000000017c400>] process_one_work+0x1a0/0x4f8
  [ 5281.179761]  [<000000000017c7a2>] worker_thread+0x4a/0x530
  [ 5281.179762]  [<000000000018335a>] kthread+0x10a/0x110
  [ 5281.179764]  [<00000000007c0ff6>] kernel_thread_starter+0x6/0xc
  [ 5281.179765]  [<00000000007c0ff0>] kernel_thread_starter+0x0/0xc

  This work item cannot progress maybe because cpuplugd:2260 above
  "hangs" in the cpu hotplug notifier chain.

  The low level device driver (here zfcp) is completely idle without any
  pending I/O after the lockup happened and all its paths are in good
  state and could service I/O but it simply does not get any new I/O
  requests from the upper layers (scsi / block). Zfcp does not implement
  blk_mq so dm or scsi translate which works in general but fails with
  above workload. There were no other undesired events, i.e. no path
  interruptions nor any recovery in zfcp.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1670634/+subscriptions