← Back to team overview

kernel-packages team mailing list archive

[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"


This bug was fixed in the package linux - 4.4.0-30.49

linux (4.4.0-30.49) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597897

  * FCP devices are not detected correctly nor deterministically (LP: #1567602)
    - scsi_dh_alua: Disable ALUA handling for non-disk devices
    - scsi_dh_alua: Use vpd_pg83 information
    - scsi_dh_alua: improved logging
    - scsi_dh_alua: sanitze sense code handling
    - scsi_dh_alua: use standard logging functions
    - scsi_dh_alua: return standard SCSI return codes in submit_rtpg
    - scsi_dh_alua: fixup description of stpg_endio()
    - scsi_dh_alua: use flag for RTPG extended header
    - scsi_dh_alua: use unaligned access macros
    - scsi_dh_alua: rework alua_check_tpgs() to return the tpgs mode
    - scsi_dh_alua: simplify sense code handling
    - scsi: Add scsi_vpd_lun_id()
    - scsi: Add scsi_vpd_tpg_id()
    - scsi_dh_alua: use scsi_vpd_tpg_id()
    - scsi_dh_alua: Remove stale variables
    - scsi_dh_alua: Pass buffer as function argument
    - scsi_dh_alua: separate out alua_stpg()
    - scsi_dh_alua: Make stpg synchronous
    - scsi_dh_alua: call alua_rtpg() if stpg fails
    - scsi_dh_alua: switch to scsi_execute_req_flags()
    - scsi_dh_alua: allocate RTPG buffer separately
    - scsi_dh_alua: Use separate alua_port_group structure
    - scsi_dh_alua: use unique device id
    - scsi_dh_alua: simplify alua_initialize()
    - revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach should
      succeed while TPG is transitioning")
    - scsi_dh_alua: move optimize_stpg evaluation
    - scsi_dh_alua: remove 'rel_port' from alua_dh_data structure
    - scsi_dh_alua: Use workqueue for RTPG
    - scsi_dh_alua: Allow workqueue to run synchronously
    - scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA'
    - scsi_dh_alua: Recheck state on unit attention
    - scsi_dh_alua: update all port states
    - scsi_dh_alua: Send TEST UNIT READY to poll for transitioning
    - scsi_dh_alua: do not fail for unknown VPD identification

linux (4.4.0-29.48) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597015

  * Wireless hotkey fails on Dell XPS 15 9550 (LP: #1589886)
    - intel-hid: new hid event driver for hotkeys
    - intel-hid: fix incorrect entries in intel_hid_keymap
    - intel-hid: allocate correct amount of memory for private struct
    - intel-hid: add a workaround to ignore an event after waking up from S4.

  * cgroupfs mounts can hang (LP: #1588056)
    - Revert "UBUNTU: SAUCE: (namespace) mqueue: Super blocks must be owned by the
      user ns which owns the ipc ns"
    - Revert "UBUNTU: SAUCE: kernfs: Do not match superblock in another user
      namespace when mounting"
    - Revert "UBUNTU: SAUCE: cgroup: Use a new super block when mounting in a
      cgroup namespace"
    - (namespace) bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
    - (namespace) bpf, inode: disallow userns mounts
    - (namespace) ipc: Initialize ipc_namespace->user_ns early.
    - (namespace) vfs: Pass data, ns, and ns->userns to mount_ns
    - SAUCE: (namespace) Sync with upstream s_user_ns patches
    - (namespace) kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
    - (namespace) ipc/mqueue: The mqueue filesystem should never contain

  * KVM system crashes after starting guest (LP: #1596635)
    - xhci: Cleanup only when releasing primary hcd

  * Upstream patch "crypto: vmx - IV size failing on skcipher API" for Ubuntu
    16.04 (LP: #1596557)
    - crypto: vmx - IV size failing on skcipher API

  * [Bug]tpm initialization fails on x86 (LP: #1596469)
    - tpm_crb: drop struct resource res from struct crb_priv
    - tpm_crb: fix mapping of the buffers

  * Device shutdown notification for CAPI Flash cards (LP: #1592114)
    - cxlflash: Fix regression issue with re-ordering patch
    - cxlflash: Fix to drain operations from previous reset
    - cxlflash: Add device dependent flags
    - cxlflash: Shutdown notify support for CXL Flash cards

  * scsi-modules udeb should include pm80xx (LP: #1595628)
    - [Config] Add pm80xx scsi driver to d-i

  * Sync up latest relevant upstream bug fixes (LP: #1594871)
    - SAUCE: (noup) Update zfs to

  * Cannot compile module tda10071 (LP: #1592531)
    - [media] tda10071: Fix dependency to REGMAP_I2C

  * lsvpd doesn't show correct location code for devices attached to a CAPI card
    (LP: #1594847)
    - cxl: Make vPHB device node match adapter's

  * enable CRC32 and AES ARM64 by default or as module (LP: #1594455)
    - [Config] Enable arm64 AES and CRC32 crypto

  * VMX kernel crypto module exhibits poor performance in Ubuntu 16.04
    (LP: #1592481)
    - crypto: vmx - comply with ABIs that specify vrsave as reserved.
    - crypto: vmx - Fix ABI detection
    - crypto: vmx - Increase priority of aes-cbc cipher

  * build squashfs into xenial kernels by default (LP: #1593134)
    - [Config] CONFIG_SQUASHFS=y

  * Restore irqfd fast path for PPC (LP: #1592809)
    - KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts

  * Unable to start guests with memballoon default. (LP: #1592042)
    - virtio_balloon: fix PFN format for virtio-1

  * Key 5 automatically pressed on some Logitech wireless keyboards
    (LP: #1579190)
    - HID: core: prevent out-of-bound readings

  * ZFS: Running ztest repeatedly for long periods of time eventually results in
    "zdb: can't open 'ztest': No such file or directory" (LP: #1587686)
    - Fix ztest truncated cache file

  * STC840.20:Alpine:alp7fp1:Ubuntu 16.04, BlueFin (SAN) EEH 6 times during boot
    then disabled SRC BA188002:b0314a_1612.840 (LP: #1587316)
    - lpfc: Fix DMA faults observed upon plugging loopback connector

 -- Kamal Mostafa <kamal@xxxxxxxxxxxxx>  Thu, 30 Jun 2016 12:52:15 -0700

** Changed in: linux (Ubuntu)
       Status: Incomplete => Fix Released

You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.

  ZFS: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

Status in Native ZFS for Linux:
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  In Progress
Status in zfs-linux source package in Xenial:
  Fix Committed

Bug description:
  [SRU Justification][XENIAL]

  Problem: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"


  Upstream commit

  Without the fix, the ztest will fail after hours of soak testing. With the fix, the issue can't be reproduced.


  This fix is an upstream fix and therefore passed the ZFS integration
  tested.  I have also tested this thoroughly with the kernel team ZFS
  regression tests and not found any issues, so the regression potential
  is slim to zero.


  Problem: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

  This bug affects the xenial kernel built-in ZFS as well as the package
  zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are
  effected, 0.6.5-release seems to have included the offending commit.
  Sorry for excessive "Affects" tagging, I'm still new to this and
  unsure of the proper packages to report this against and/or how to
  properly add the upstream issues/commits.

  Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129
  "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."

  How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
  ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
  (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)

  Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51
  Description: Fix ztest truncated cache file
  "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
  truncate and overwrite rather than rename the cache file.  This is
  the correct fix but it should have only been applied for the kernel
  build.  In user space rename(2) is needed because ztest depends on
  the cache file."
  Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130

  I'm not sure why this wasn't backported to release but it's in zfs
  master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic,
  4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well
  as various xenial master-next builds. After applying the above commit
  patch to kernel and building/installing kernel manually, ztest runs
  fine. I've also separately tested the commit patch on zfs-dkms package
  which also appears to fix the issue. Note however, there may still be
  some other outstanding ztest related issues upstream - especially when
  preempt and hires timers are used. I'm currently testing more heavily
  against lowlatency builds and master-next.

  (I'm unsure how to associate this bug with multiple packages but zfs-
  dkms and linux-image-* packages both are affected).

  P.S. Also of note is
  "Fix inverted logic on none elevator comparison" - which interestingly
  was signed-off-by canonical but curiously not included in the xenial
  kernel or zfs-dkms packages. It was however, backported to
  0.6.5-release upstream.

To manage notifications about this bug go to: