← Back to team overview

kernel-packages team mailing list archive

[Bug 1587686] [NEW] ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"

 

Public bug reported:

Problem: Running ztest repeatedly for long periods of time eventually
results in "zdb: can't open 'ztest': No such file or directory"

This bug affects the xenial kernel built-in ZFS as well as the package
zfs-dkms. I don't believe ZFS 0.6.3-stable is affected, as the offending
commit that caused this issue was introduced in 0.6.4-release and
0.6.5-release

Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129
"ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."

How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
(I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)

Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51
Description:
"Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
truncate and overwrite rather than rename the cache file.  This is
the correct fix but it should have only been applied for the kernel
build.  In user space rename(2) is needed because ztest depends on
the cache file."
Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130

I'm not sure why this wasn't backported to release but it's in zfs
master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic,
4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well
as various xenial master-next builds. After applying the above commit
patch to kernel and building/installing kernel manually, ztest runs
fine. I've also separately tested the commit patch on zfs-dkms package
which also appears to fix the issue. Note however, there may still be
some other outstanding ztest related issues upstream - especially when
preempt and hires timers are used. I'm currently testing more heavily
against lowlatency builds and master-next.

(I'm unsure how to associate this bug with multiple packages but zfs-
dkms and linux-image-* packages both are affected).

P.S. Also of note is
https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb
"Fix inverted logic on none elevator comparison" - which interestingly
was signed-off-by canonical but curiously not included in the xenial
kernel or zfs-dkms packages. It was however, backported to 0.6.5-release
upstream.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: zfs-linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: linux linux-generic linux-lowlatency zfs zfs-dkms

** Also affects: zfs-linux (Ubuntu)
   Importance: Undecided
       Status: New

** Summary changed:

- Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"
+ ZFS: Running ztest repeatedly for long periods of time eventually results in "zdb: can't open 'ztest': No such file or directory"

** Description changed:

  Problem: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"
  
  This bug affects the xenial kernel built-in ZFS as well as the package
- zfs-dkms. (I'm unsure how to tag this to multiple packages)
+ zfs-dkms.
  
  Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129
  "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."
  
- How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail: 
+ How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
  ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
  (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)
  
  Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51
  Description:
  "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
  truncate and overwrite rather than rename the cache file.  This is
  the correct fix but it should have only been applied for the kernel
  build.  In user space rename(2) is needed because ztest depends on
  the cache file."
  Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130
  
- 
- I'm not sure why this wasn't backported to release but it's in zfs master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic, 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well as various xenial master-next builds. After applying the above commit patch to kernel and building/installing kernel manually, ztest runs fine. I've also separately tested the commit patch on zfs-dkms package which also appears to fix the issue. Note however, there may still be some other outstanding ztest related issues upstream - especially when preempt and hires timers are used. I'm currently testing more heavily against lowlatency builds and master-next.
+ I'm not sure why this wasn't backported to release but it's in zfs
+ master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic,
+ 4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well
+ as various xenial master-next builds. After applying the above commit
+ patch to kernel and building/installing kernel manually, ztest runs
+ fine. I've also separately tested the commit patch on zfs-dkms package
+ which also appears to fix the issue. Note however, there may still be
+ some other outstanding ztest related issues upstream - especially when
+ preempt and hires timers are used. I'm currently testing more heavily
+ against lowlatency builds and master-next.
  
  (I'm unsure how to associate this bug with multiple packages but zfs-
  dkms and linux-image-* packages both are affected).
  
  P.S. Also of note is
  https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb
  "Fix inverted logic on none elevator comparison" - which interestingly
  was signed-off-by canonical but curiously not included in the xenial
  kernel or zfs-dkms packages. It was however, backported to 0.6.5-release
  upstream.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1587686

Title:
  ZFS: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

Status in linux package in Ubuntu:
  New
Status in zfs-linux package in Ubuntu:
  New

Bug description:
  Problem: Running ztest repeatedly for long periods of time eventually
  results in "zdb: can't open 'ztest': No such file or directory"

  This bug affects the xenial kernel built-in ZFS as well as the package
  zfs-dkms. I don't believe ZFS 0.6.3-stable is affected, as the
  offending commit that caused this issue was introduced in
  0.6.4-release and 0.6.5-release

  Upstream bug report: https://github.com/zfsonlinux/zfs/issues/4129
  "ztest can occasionally fail because zdb cannot locate the pool after several hours of run time. This appears to be caused be an empty cache file."

  How to reproduce: run ztest repeatedly such as a command like this and it will eventually fail:
  ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z* && sleep 3 && ztest -T 3600 && rm /tmp/z*
  (I have /tmp mounted on tmpfs with a 10G limit but I don't believe this is related in any way, and I've confirmed it's not running out of space)

  Upstream fix: https://github.com/zfsonlinux/zfs/commit/151f84e2c32f690b92c424d8c55d2dfccaa76e51
  Description:
  "Commit efc412b updated spa_config_write() for Linux 4.2 kernels to
  truncate and overwrite rather than rename the cache file.  This is
  the correct fix but it should have only been applied for the kernel
  build.  In user space rename(2) is needed because ztest depends on
  the cache file."
  Associated pull request for above commit: https://github.com/zfsonlinux/zfs/pull/4130

  I'm not sure why this wasn't backported to release but it's in zfs
  master. I've Reproduced this bug on xenial kernels 4.4.0-22-generic,
  4.4.0-23-generic, 4.4.0-22-lowlatency, and 4.4.0-23-lowlatency as well
  as various xenial master-next builds. After applying the above commit
  patch to kernel and building/installing kernel manually, ztest runs
  fine. I've also separately tested the commit patch on zfs-dkms package
  which also appears to fix the issue. Note however, there may still be
  some other outstanding ztest related issues upstream - especially when
  preempt and hires timers are used. I'm currently testing more heavily
  against lowlatency builds and master-next.

  (I'm unsure how to associate this bug with multiple packages but zfs-
  dkms and linux-image-* packages both are affected).

  P.S. Also of note is
  https://github.com/zfsonlinux/zfs/commit/60a4ea3f948f1596b92b666fc7dd21202544edbb
  "Fix inverted logic on none elevator comparison" - which interestingly
  was signed-off-by canonical but curiously not included in the xenial
  kernel or zfs-dkms packages. It was however, backported to
  0.6.5-release upstream.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1587686/+subscriptions


Follow ups