← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1774336] Re: FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false

 

This bug was fixed in the package linux - 4.15.0-29.31

---------------
linux (4.15.0-29.31) bionic; urgency=medium

  * linux: 4.15.0-29.31 -proposed tracker (LP: #1782173)

  * [SRU Bionic][Cosmic] kernel panic in ipmi_ssif at msg_done_handler
    (LP: #1777716)
    - ipmi_ssif: Fix kernel panic at msg_done_handler

  * Update to ocxl driver for 18.04.1 (LP: #1775786)
    - misc: ocxl: use put_device() instead of device_unregister()
    - powerpc: Add TIDR CPU feature for POWER9
    - powerpc: Use TIDR CPU feature to control TIDR allocation
    - powerpc: use task_pid_nr() for TID allocation
    - ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
    - ocxl: Expose the thread_id needed for wait on POWER9
    - ocxl: Add an IOCTL so userspace knows what OCXL features are available
    - ocxl: Document new OCXL IOCTLs
    - ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

  * Hard LOCKUP observed on stressing Ubuntu 18 04 (LP: #1777194)
    - powerpc: use NMI IPI for smp_send_stop
    - powerpc: Fix smp_send_stop NMI IPI handling

  * IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
    CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n (LP: #1773964)
    - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
    comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
    - SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
      stress-ng: Corrupt inode bitmap"
    - SAUCE: ext4: check for allocation block validity with block group locked

linux (4.15.0-28.30) bionic; urgency=medium

  * linux: 4.15.0-28.30 -proposed tracker (LP: #1781433)

  * Cannot set MTU higher than 1500 in Xen instance (LP: #1781413)
    - xen-netfront: Fix mismatched rtnl_unlock
    - xen-netfront: Update features after registering netdev

linux (4.15.0-27.29) bionic; urgency=medium

  * linux: 4.15.0-27.29 -proposed tracker (LP: #1781062)

  * [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99:
    comm stress-ng: Corrupt inode bitmap (LP: #1780137)
    - SAUCE: ext4: fix ext4_validate_inode_bitmap: comm stress-ng: Corrupt inode
      bitmap

linux (4.15.0-26.28) bionic; urgency=medium

  * linux: 4.15.0-26.28 -proposed tracker (LP: #1780112)

  * failure to boot with linux-image-4.15.0-24-generic (LP: #1779827) // Cloud-
    init causes potentially huge boot delays with 4.15 kernels (LP: #1780062)
    - random: Make getrandom() ready earlier

linux (4.15.0-25.27) bionic; urgency=medium

  * linux: 4.15.0-25.27 -proposed tracker (LP: #1779354)

  * hisi_sas_v3_hw: internal task abort: timeout and not done. (LP: #1777736)
    - scsi: hisi_sas: Update a couple of register settings for v3 hw

  * hisi_sas: Add missing PHY spinlock init (LP: #1777734)
    - scsi: hisi_sas: Add missing PHY spinlock init

  * hisi_sas: improve read performance by pre-allocating slot DMA buffers
    (LP: #1777727)
    - scsi: hisi_sas: use dma_zalloc_coherent()
    - scsi: hisi_sas: Use dmam_alloc_coherent()
    - scsi: hisi_sas: Pre-allocate slot DMA buffers

  * hisi_sas: Failures during host reset (LP: #1777696)
    - scsi: hisi_sas: Only process broadcast change in phy_bcast_v3_hw()
    - scsi: hisi_sas: Fix the conflict between dev gone and host reset
    - scsi: hisi_sas: Adjust task reject period during host reset
    - scsi: hisi_sas: Add a flag to filter PHY events during reset
    - scsi: hisi_sas: Release all remaining resources in clear nexus ha

  * Fake SAS addresses for SATA disks on HiSilicon D05 are non-unique
    (LP: #1776750)
    - scsi: hisi_sas: make SAS address of SATA disks unique

  * Vcs-Git header on bionic linux source package points to zesty git tree
    (LP: #1766055)
    - [Packaging]: Update Vcs-Git

  * large KVM instances run out of IRQ routes (LP: #1778261)
    - SAUCE: kvm -- increase KVM_MAX_IRQ_ROUTES to 2048 on x86

 -- Stefan Bader <stefan.bader@xxxxxxxxxxxxx>  Tue, 17 Jul 2018 10:57:50
+0200

** Changed in: linux (Ubuntu)
       Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1774336

Title:
  FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Artful:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  == SRU Justification ==

  [Impact]
  Oops during heavy NFS + FSCache use:

  [81738.886634] FS-Cache: 
  [81738.888281] FS-Cache: Assertion failed
  [81738.889461] FS-Cache: 6 == 5 is false
  [81738.890625] ------------[ cut here ]------------
  [81738.891706] kernel BUG at /build/linux-hVVhWi/linux-4.4.0/fs/fscache/operation.c:494!

  6 == 5 represents an operation being DEAD when it was not expected to
  be.

  [Cause]
  There is a race in fscache and cachefiles. 

  One thread is in cachefiles_read_waiter:
   1) object->work_lock is taken.
   2) the operation is added to the to_do list.
   3) the work lock is dropped.
   4) fscache_enqueue_retrieval is called, which takes a reference.

  Another thread is in cachefiles_read_copier:
   1) object->work_lock is taken
   2) an item is popped off the to_do list.
   3) object->work_lock is dropped.
   4) some processing is done on the item, and fscache_put_retrieval() is called, dropping a reference.

  Now if the this process in cachefiles_read_copier takes place
  *between* steps 3 and 4 in cachefiles_read_waiter, a reference will be
  dropped before it is taken, which leads to the objects reference count
  hitting zero, which leads to lifecycle events for the object happening
  too soon, leading to the assertion failure later on.

  (This is simplified and clarified from the original upstream analysis
  for this patch at https://www.redhat.com/archives/linux-
  cachefs/2018-February/msg00001.html and from a similar patch with a
  different approach to fixing the bug at
  https://www.redhat.com/archives/linux-cachefs/2017-June/msg00002.html)

  [Fix]
  Move fscache_enqueue_retrieval under the lock in cachefiles_read_waiter. This means that the object cannot be popped off the to_do list until it is in a fully consistent state with the reference taken.

  [Testcase]
  A user has run ~100 hours of NFS stress tests and not seen this bug recur.

  [Regression Potential]
   - Limited to fscache/cachefiles. 
   - The change makes things more conservative (doing more under lock) so that's reassuring. 
   - There may be performance impacts but none have been observed so far.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1774336/+subscriptions