canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #04525
[Bug 2056461]
> Can someone ping the mailing list again? I think this issue might have
> dropped off the radar.
I've now subscribed to this bz so no need to ping the list again (and
David and Eric have been there from the start and should be getting the
mails too... But at this point you won't get more people from the list
anyway)
As I said on the list I don't think the fix is appropriate -- and I
still haven't had the time to dig into this properly.
My understanding so far is that the guest is reading a file that is being modified from the host, and then for some reason the file size isn't coherent internally and the read falls short despite userspace knowing the file is bigger.
Is that correct?
We don't have cache invalidation in 9p so for cache=loose/fscache we
just don't support files being modified from the host and I'd understand
this behaviour, but for cache=none (the default you're using here) we
should just not limit the read sizes to whatever the inode's i_size is.
So the fix really isn't to do a new stat after short reads, but to
ignore the size attribute in the first place.
Since I haven't had time to check the traces properly I also obviously
haven't had time to properly try to reproduce either, but given there is
no cache invalidation I'd expect this shouldn't be difficult... I'd also
expect the fix to not be too hard to do once we've been able to trace
where the size gets cuts off, which is probably easy enough with
qemu/GDB or for someone familiar with the netfs read helpers.
So, I'm sorry after a first fix has been made -- the analysis up till now already is great help! -- but my free time has gotten very short and I'll try to get to it when possible but if someone can help with a more appropriate fix in that direction it'd be great.
(Kernel work unfortunately isn't easy unless your work pays for it or you have a lot of free time, and both stopped being true for me a while ago...)
At a higher level, cache=loose/fscache could also perhaps use some invalidation logic like NFS has (iirc NFS v3 re-checks the ctime every 60s or so?), but that's not needed here, cache=none (and possibly writeback) should not trust server attributes at all and should not have this problem.
--
You received this bug notification because you are a member of
Canonical's Ubuntu QA, which is subscribed to autopkgtest in Ubuntu.
https://bugs.launchpad.net/bugs/2056461
Title:
autopkgtest-virt-qemu on noble images sometimes hangs doing copydown
Status in Linux:
Confirmed
Status in autopkgtest package in Ubuntu:
Confirmed
Status in linux package in Ubuntu:
In Progress
Status in autopkgtest package in Debian:
New
Bug description:
[Impact]
It seems that kernel 6.8 introduced a regression in the 9pfs related
to caching and netfslib, that can cause some user-space apps to read
content from files that is not up-to-date (when they are used in a
producer/consumer fashion).
It seems that the offending commit is this one:
80105ed2fd27 ("9p: Use netfslib read/write_iter")
Reverting the commit seems to fix the problem. However the actual bug
might be in netfslib or how netfslib is used in the 9p context.
The regression has been reported upstream and we are still
investigating (https://lore.kernel.org/lkml/Zj0ErxVBE3DYT2Ea@gpd/).
In the meantime it probably makes sense to temporarily revert the
commit as a SAUCE patch. Then we will drop the SAUCE patch once we'll
have a proper fix upstream.
[Test case]
The following test should complete correctly without any timeout:
pull-lp-source -d hello
autopkgtest-buildvm-ubuntu-cloud -r noble
autopkgtest -U hello*.dsc -- qemu ./autopkgtest-noble-amd64.img
[Fix]
Revert the following commit (until we have a proper fix upstream):
80105ed2fd27 ("9p: Use netfslib read/write_iter")
[Regression potential]
We may experience other regressions related to 9pfs with this change,
however it's quite unlikely to happen since we are reverting a commit,
restoring the previous behavior.
[Original bug report]
autopkgtest-virt-qemu sometimes hangs when running tests on noble
images. Originally reported by schopin, who also provided a
reproducer:
pull-lp-source -d hello
autopkgtest-buildvm-ubuntu-cloud -r noble
autopkgtest -U hello*.dsc -- qemu ./autopkgtest-noble-amd64.img
I've been able to reproduce it with debugging enabled:
autopkgtest -ddd -U hello_2.10-3.dsc -- qemu --debug --show-boot
/path/to/image
It can get stuck during different stages, but AFAICT always during
"copydown" operations, log excerpts follow. It may be a coincidence,
but this started happening around the time linux-
image-6.8.0-11-generic (6.8.0-11.11) migrated to noble. The testbeds I
used booted 6.6 but then rebooted into that 6.8 kernel after being
upgraded by autopkgtest.
-- logs --
Removing autopkgtest-satdep (0) ...
[...]
autopkgtest-virt-qemu: DBG: executing copydown /tmp/autopkgtest.output.g8v75e8g/tests-tree/ /t/
autopkgtest-virt-qemu: DBG: ['cmdls', "(['tar', '--directory', '/tmp/autopkgtest.output.g8v75e]
autopkgtest-virt-qemu: DBG: ['srcstdin', "<_io.BufferedReader name='/dev/null'>", 'deststdout']
autopkgtest-virt-qemu: DBG: +< tar --directory /tmp/autopkgtest.output.g8v75e8g/tests-tree/ --
autopkgtest-virt-qemu: DBG: +> /tmp/autopkgtest-qemu.ztmr6f5k/runcmd sh -ec if ! test -d /tmp-
autopkgtest-virt-qemu: DBG: +>?
-- or --
autopkgtest: DBG: sending command to testbed: copydown /tmp/autopkgtest.output.c9utq3bx/tests-tree/ /tmp/autopkgtest.H8NDfW/build.DLR/src/
autopkgtest-virt-qemu: DBG: executing copydown /tmp/autopkgtest.output.c9utq3bx/tests-tree/ /tmp/autopkgtest.H8NDfW/build.DLR/src/
autopkgtest-virt-qemu: DBG: ['cmdls', "(['tar', '--directory', '/tmp/autopkgtest.output.c9utq3bx/tests-tree/', '--warning=none', '-c', '.', '-f', '-'], ['/tmp/autopkgtest-qemu.qtkcgg5l/runcm]
autopkgtest-virt-qemu: DBG: ['srcstdin', "<_io.BufferedReader name='/dev/null'>", 'deststdout', "<_io.BufferedReader name='/dev/null'>", 'devnull_read', <_io.BufferedReader name='/dev/null'>]
autopkgtest-virt-qemu: DBG: +< tar --directory /tmp/autopkgtest.output.c9utq3bx/tests-tree/ --warning=none -c . -f -
autopkgtest-virt-qemu: DBG: +> /tmp/autopkgtest-qemu.qtkcgg5l/runcmd sh -ec if ! test -d /tmp/autopkgtest.H8NDfW/build.DLR/src/; then mkdir -- /tmp/autopkgtest.H8NDfW/build.DLR/src/; fi; cd-
autopkgtest-virt-qemu: DBG: +>?
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/2056461/+subscriptions
References