← Back to team overview

kernel-packages team mailing list archive

[Bug 1587686] Re: ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"

 

This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu10

---------------
zfs-linux (0.6.5.6-0ubuntu10) xenial; urgency=medium

  * Sync with relevant upstream fixes (LP: #1594871)
   - Fix user namespaces uid/gid mapping
     As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
     not the current user_ns, should be passed to posix_acl_from_xattr()
     and posix_acl_to_xattr().  Conveniently the init_user_ns is
     available through the init credential (kcred).
     (upstream commit 874bd959f4f15b3d4b007160ee7ad3f4111dd341)
     ZFS #4177
   - Fix ZPL miswrite of default POSIX ACL
     Commit 4967a3e introduced a typo that caused the ZPL to store the
     intended default ACL as an access ACL. Due to caching this problem
     may not become visible until the filesystem is remounted or the inode
     is evicted from the cache. Fix the typo.
     (upstream commit 98f03691a4c08f38ca4538c468e9523f8e6b24be)
     ZFS #4520
   - Create unique partition labels
     When partitioning a device a name may be specified for each partition.
     Internally zfs doesn't use this partition name for anything so it
     has always just been set to "zfs".
     However this isn't optimal because udev will create symlinks using
     this name in /dev/disk/by-partlabel/.  If the name isn't unique
     then all the links cannot be created.
     Therefore a random 64-bit value has been added to the partition
     label, i.e "zfs-1234567890abcdef".  Additional information could
     be encoded here but since partitions may be reused that might
     result in confusion and it was decided against.
     (upstream commit fbffa53a5cdb9b796de5afc9be8c1f79619253d4)
     ZFS #4517
   - Fix inverted logic on none elevator comparison
     Commit d1d7e2689db9e03f1 ("cstyle: Resolve C style issues") inverted
     the logic on the none elevator comparison.  Fix this and make it
     cstyle warning clean.
     (upstream commit 60a4ea3f948f1596b92b666fc7dd21202544edbb)
     ZFS #4507
   - Remove wrong ASSERT in annotate_ecksum
     When using large blocks like 1M, there will be more than UINT16_MAX
     qwords in one block, so this ASSERT would go off. Also, it is possible
     for the histogram to overflow. We cap them to UINT16_MAX to prevent this.
     (upstream commit 21ea9460fa880bb072a9ca9d845aef740f9d3af6)
     ZFS #4257
   - Fix 'zpool import' blkid device names
     When importing a pool using the blkid cache only the device
     node path was added to the list of known paths for a device.
     This results in 'zpool import' always using the sdX names
     in preference to the 'path' name stored in the label.
     To fix the issue the blkid import path has been updated to
     add both the 'path', 'devid', and 'devname' names from the
     label to the known paths.  A sanity check is done to ensure
     these paths do refer to the same device identified by blkid.
     (upstream commit c9ca152fd1de1b0fd959e772b9a25d14a891952b)
     ZFS #4523, #3043
   - Use udev for partition detection
     When ZFS partitions a block device it must wait for udev to create
     both a device node and all the device symlinks.  This process takes
     a variable length of time and depends on factors such how many links
     must be created, the complexity of the rules, etc.  Complicating
     the situation further it is not uncommon for udev to create and
     then remove a link multiple times while processing the udev rules.
     In order to address this the zpool_label_disk_wait() function
     has been updated to use libudev.  Until the registered system
     device acknowledges that it in fully initialized the function
     will wait.  Once fully initialized all device links are checked
     and allowed to settle for 50ms.  This makes it far more likely
     that all the device nodes will exist when the kernel modules
     need to open them.
     For systems without libudev an alternate zpool_label_disk_wait()
     was updated to include a settle time.  In addition, the kernel
     modules were updated to include retry logic for this ENOENT case.
     Due to the improved checks in the utilities it is unlikely this
     logic will be invoked.  However, if the rare event it is needed
     it will prevent a failure.
     (upstream commit 2cb77346cb698ae0c233c7baf8b4c787205b54e9)
     ZFS #4523, #3708, #4077, #4144, #4214, #4517
   * Fix ztest truncated cache file (LP: #1587686)
     Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
     truncate and overwrite rather than rename the cache file.  This is
     the correct fix but it should have only been applied for the kernel
     build.  In user space rename(2) is needed because ztest depends on
     the cache file.
     (upstream commit 151f84e2c32f690b92c424d8c55d2dfccaa76e51)
     ZFS #4129

 -- Colin Ian King <colin.king@xxxxxxxxxxxxx>  Tue, 21 Jun 2016 15:49:12
+0100

** Changed in: zfs-linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1587686

Title:
  ZFS: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

Status in Native ZFS for Linux:
  New
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  In Progress
Status in zfs-linux source package in Xenial:
  Fix Released

Bug description:
  [SRU Justification][XENIAL]

  Problem: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

  [FIX]

  Upstream commit
  https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51

  [TEST CASE]
  Without the fix, the ztest will fail after hours of soak testing. With the fix, the issue can't be reproduced.

  [REGRESSION POTENTIAL]

  This fix is an upstream fix and therefore passed the ZFS integration
  tested.  I have also tested this thoroughly with the kernel team ZFS
  regression tests and not found any issues, so the regression potential
  is slim to zero.

  ------------------------------------------------------------------


  Problem: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

  This bug affects the xenial kernel built-in ZFS as well as the package
  zfs-dkms. I don't believe ZFS 0.6.3-stable or 0.6.4-release are
  effected, 0.6.5-release seems to have included the offending commit.
  Sorry for excessive "Affects" tagging, I'm still new to this and
  unsure of the proper packages to report this against and/or how to
  properly add the upstream issues/commits.

  Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129
  "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."

  How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
  ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
  (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)

  Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51
  Description: Fix ztest truncated cache file
  "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
  truncate and overwrite rather than rename the cache file.  This is
  the correct fix but it should have only been applied for the kernel
  build.  In user space rename(2) is needed because ztest depends on
  the cache file."
  Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130

  I'm not sure why this wasn't backported to release but it's in zfs
  master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic,
  4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well
  as various xenial master-next builds. After applying the above commit
  patch to kernel and building/installing kernel manually, ztest runs
  fine. I've also separately tested the commit patch on zfs-dkms package
  which also appears to fix the issue. Note however, there may still be
  some other outstanding ztest related issues upstream - especially when
  preempt and hires timers are used. I'm currently testing more heavily
  against lowlatency builds and master-next.

  (I'm unsure how to associate this bug with multiple packages but zfs-
  dkms and linux-image-* packages both are affected).

  P.S. Also of note is
  https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb
  "Fix inverted logic on none elevator comparison" - which interestingly
  was signed-off-by canonical but curiously not included in the xenial
  kernel or zfs-dkms packages. It was however, backported to
  0.6.5-release upstream.

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1587686/+subscriptions


References