group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #36196
[Bug 1869948] Re: Multiple Kexec in AWS Nitro instances fail
** Changed in: linux (Ubuntu)
Status: Fix Committed => Fix Released
** No longer affects: linux (Ubuntu Disco)
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1869948
Title:
Multiple Kexec in AWS Nitro instances fail
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Xenial:
Fix Released
Status in linux source package in Bionic:
Fix Released
Status in linux source package in Eoan:
Fix Released
Status in linux source package in Focal:
Fix Released
Bug description:
[Impact]
* Currently, users cannot perform multiple kernel kexec loads on AWS Nitro instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is observed, with the following signature:
Initramfs unpacking failed: junk within compressed archive
[...]
Kernel panic - not syncing: No working init found.
Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26 Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017
Call Trace:
dump_stack+0x6d/0x9a
? csum_partial_copy_generic+0x150/0x170
panic+0x101/0x2e3
? do_execve+0x25/0x30
? rest_init+0xb0/0xb0
kernel_init+0xfb/0x100
ret_from_fork+0x35/0x40
* After investigation (see comment 2), it was noticed the Amazon ena
network driver doesn't provide a shutdown() handler, hence it could be
performing a DMA transaction to a previous valid address during boot,
which would then corrupt kernel memory. The following patch was
proposed and fixed the issue, allowing 1000 kexecs to be executed
successfully with no issues observed: 428c491332bc("net: ena: Add PCI
shutdown handler to allow safe kexec") [
git.kernel.org/linus/428c491332bc ].
* Hence, we are hereby requesting SRU for this patch. It was tested in
all supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances
with success, and reviewed/acked by ena driver team and a kexec
developer from other distro. Worth mentioning that we proposed an
upstream multi-vendor discussion about this issue:
marc.info/?l=kexec&m=158299605013194
[Test case]
* The basic test procedure is about performing multiple kexecs
sequentially; AWS does not provide a full console, so in case of
failures one could check the instance screenshot or use pstore/ramoops
in order to collect dmesg after a crash in a preserved memory area.
The commands used to perform kexec are:
kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline
systemctl kexec
Alternatively, one could user "--append=" instead of "--reuse-cmdline"
if a change in kexec command-line is desired; also, to execute the
kexec-loaded kernel both "kexec -e" and "systemctl kexec" are equally
valid.
* On comment 3 we proposed a script/approach to auto-test kexecs, used
here to perform 1000 kexecs with the proposed patch.
[Regression Potential]
* Although the patch proposed here introduce a PCI handler, it kept
the remove handler identical and based shutdown strongly on
ena_remove(), changing just netdev handling following other upstream
drivers. It was extensively tested and presented no issue. Also, it's
self-contained and affect only one driver, so any other cloud
providers or non-cloud environment wouldn't be even affected by the
patch.
* In case of a potential regression, it could manifest as a delay or
issue on reboot/shutdown path, only if ena driver is in use.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869948/+subscriptions
References