curtin-dev team mailing list archive
-
curtin-dev team
-
Mailing list archive
-
Message #00311
[Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
Ryan Harper has proposed merging ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master.
Commit message:
vmtest: trigger guest panic to fail fast
A number of vmtest scenarios trigger stuck or hung kernels and leave
the VM running in such state which continues to consume resources on
the host, prolonging the total time for a vmtest complete run. This
patch reconfigures the guest kernel to panic on soft-lockups, NMI
watchdog misses, and hung tasks and configures QEMU to exit when a
reboot occurs.
The combination will ensure that when a guest cannot progress we fail
fast and exit. The default install timeout is 3000 seconds. In the
case of a failure we now will immediate exit, recording failure, and
move on to the next test instead of burning resources for the
remaining portion of the timeout. This will dramatically reduce the
total amount of time to complete a run and we typically see an install
failure in the first 300 or so.
For test-cases which have provent to be a challenge, we can optionally
enable the 'crashdump' flag in a VMTest class which will modify a VM
to enable the linux kernel crashdump feature and if such a panic
occurs, then lkcd would trigger a dump and capture more debugging
state. This is disabled by default; there are some bugs in
configuring/enabling crashdump "live" in the ephemeral enviroment so
we'll only turn this on for hard-to-debug crashes
Requested reviews:
curtin developers (curtin-dev)
For more details, see:
https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/383805
--
Your team curtin developers is requested to review the proposed merge of ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master.
diff --git a/examples/tests/crashdump.cfg b/examples/tests/crashdump.cfg
new file mode 100644
index 0000000..e010961
--- /dev/null
+++ b/examples/tests/crashdump.cfg
@@ -0,0 +1,19 @@
+_install_crashdump:
+ - &install_crashdump |
+ command -v apt &>/dev/null && {
+ DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-image-generic
+ debconf-set-selections <<< "kexec-tools kexec-tools/load_kexec boolean true"
+ debconf-set-selections <<< "kdump-tools kdump-tools/use_kdname boolean true"
+ DEBIAN_FRONTEND=noninteractive apt-get -qy install linux-crashdump;
+ mkdir -p /var/lib/kdump
+ # fix up crashdump post-inst to just put all of the modules in
+ sed -i -e 's,MODULES=dep,MODULES=most,' /etc/kernel/postinst.d/kdump-tools
+ kdump-config load
+ kdump-config show
+ }
+ exit 0
+
+
+early_commands:
+ # run before other install commands
+ 0000_aaaa_install_crashdump: ['bash', '-c', *install_crashdump]
diff --git a/tests/vmtests/__init__.py b/tests/vmtests/__init__.py
index 222adcc..e102b6d 100644
--- a/tests/vmtests/__init__.py
+++ b/tests/vmtests/__init__.py
@@ -601,6 +601,7 @@ class VMBaseClass(TestCase):
arch_skip = []
boot_timeout = BOOT_TIMEOUT
collect_scripts = []
+ crashdump = False
extra_collect_scripts = []
conf_file = "examples/tests/basic.yaml"
nr_cpus = None
@@ -967,6 +968,25 @@ class VMBaseClass(TestCase):
for service in ["systemd.mask=snapd.seeded.service",
"systemd.mask=snapd.service"]])
+ # We set guest kernel panic=1 to trigger immediate rebooot, combined
+ # with the (xkvm) -no-reboot qemu parameter should prevent vmtests from
+ # wasting time in a soft-lockup loop. Add the params after the '---'
+ # separator to extend the parameters to the target system as well.
+ cmd.extend(["--no-reboot", "--append=panic=-1",
+ "--append=softlockup_panic=1",
+ "--append=hung_task_panic=1",
+ "--append=nmi_watchdog=panic,1"])
+
+ # configure guest with crashdump to capture kernel failures for debug
+ if cls.crashdump:
+ # we need to install a kernel and modules so bump the memory by 2g
+ # for the ephemeral environment to hold it all
+ cls.mem = int(cls.mem) + 2048
+ logger.info(
+ 'Enabling linux-crashdump during install, mem += 2048 = %s',
+ cls.mem)
+ cmd.extend(["--append=crashkernel=384M-5000M:192M"])
+
# getting resolvconf configured is only fixed in bionic
# the iscsi_auto handles resolvconf setup via call to
# configure_networking in initramfs
@@ -1353,7 +1373,7 @@ class VMBaseClass(TestCase):
target_disks.extend([output_disk])
# create xkvm cmd
- cmd = (["tools/xkvm", "-v", dowait] +
+ cmd = (["tools/xkvm", "-v", dowait, '--no-reboot'] +
uefi_flags + netdevs +
cls.mpath_diskargs(target_disks + extra_disks + nvme_disks) +
["--disk=file=%s,if=virtio,media=cdrom" % cls.td.seed_disk] +
@@ -2111,6 +2131,7 @@ def check_install_log(install_log, nrchars=200):
# regexps expected in curtin output
install_pass = INSTALL_PASS_MSG
install_fail = "({})".format("|".join([
+ 'INFO:.* blocked for more than.*seconds.',
'Installation failed',
'ImportError: No module named.*',
'Out of memory:',
diff --git a/tools/launch b/tools/launch
index db18c80..b49dd76 100755
--- a/tools/launch
+++ b/tools/launch
@@ -50,6 +50,7 @@ Usage: ${0##*/} [ options ] curtin install [args]
--serial-log F : log to F (default 'serial.log')
--root-arg X pass 'X' through as the root= param when booting a
kernel. default: $DEFAULT_ROOT_PARAM
+ --no-reboot Pass '-no-reboot' through to QEMU
-v | --verbose be more verbose
--no-install-deps do not install insert '--install-deps'
on curtin command invocations
@@ -408,7 +409,7 @@ get_img_fmt() {
main() {
local short_opts="a:A:d:h:i:k:n:p:s:v"
- long_opts="add:,append:,arch:,bios:,boot-image:,disk:,dowait,help,initrd:,kernel:,mem:,netdev:,no-dowait,no-proxy-config,power:,publish:,root-arg:,silent,serial-log:,smp:,uefi-nvram:,verbose,vnc:"
+ long_opts="add:,append:,arch:,bios:,boot-image:,disk:,dowait,help,initrd:,kernel:,mem:,netdev:,no-dowait,no-proxy-config,no-reboot,power:,publish:,root-arg:,silent,serial-log:,smp:,uefi-nvram:,verbose,vnc:"
local getopt_out=""
getopt_out=$(getopt --name "${0##*/}" \
--options "${short_opts}" --long "${long_opts}" -- "$@") &&
@@ -461,6 +462,7 @@ main() {
--no-dowait) pt[${#pt[@]}]="$cur"; dowait=false;;
--no-install-deps) install_deps="";;
--no-proxy-config) proxy_config=false;;
+ --no-reboot) pt[${#pt[@]}]="--no-reboot";;
--power)
case "$next" in
off) pstate="poweroff";;
diff --git a/tools/xkvm b/tools/xkvm
index 4bb4343..02b9f62 100755
--- a/tools/xkvm
+++ b/tools/xkvm
@@ -339,7 +339,7 @@ get_bios_opts() {
main() {
local short_opts="hd:n:v"
- local long_opts="bios:,help,dowait,disk:,dry-run,kvm:,no-dowait,netdev:,uefi,uefi-nvram:,verbose"
+ local long_opts="bios:,help,dowait,disk:,dry-run,kvm:,no-dowait,no-reboot,netdev:,uefi,uefi-nvram:,verbose"
local getopt_out=""
getopt_out=$(getopt --name "${0##*/}" \
--options "${short_opts}" --long "${long_opts}" -- "$@") &&
@@ -371,6 +371,7 @@ main() {
# We default to dowait=false if input and output are a terminal
local dowait=""
[ -t 0 -a -t 1 ] && dowait=false || dowait=true
+ local noreboot=false
while [ $# -ne 0 ]; do
cur=${1}; next=${2};
case "$cur" in
@@ -384,6 +385,7 @@ main() {
-v|--verbose) VERBOSITY=$((${VERBOSITY}+1));;
--dowait) dowait=true;;
--no-dowait) dowait=false;;
+ --no-reboot) noreboot=true;;
--bios) bios="$next"; shift;;
--uefi) uefi=true;;
--uefi-nvram) uefi=true; uefi_nvram="$next"; shift;;
@@ -683,6 +685,10 @@ main() {
local rng_devices
rng_devices=( -object "rng-random,filename=/dev/urandom,id=objrng0"
-device "$virtio_rng_device,rng=objrng0,id=rng0" )
+ local reboot_arg
+ if $noreboot; then
+ kvmcmd=( "${kvmcmd[@]}" -no-reboot )
+ fi
cmd=( "${kvmcmd[@]}" "${archopts[@]}"
"${bios_opts[@]}"
"${bus_devices[@]}"
Follow ups
-
[Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Server Team CI bot, 2020-05-21
-
[Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Paride Legovini, 2020-05-21
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Server Team CI bot, 2020-05-21
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Paride Legovini, 2020-05-20
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Ryan Harper, 2020-05-19
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Paride Legovini, 2020-05-19
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Server Team CI bot, 2020-05-18
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Paride Legovini, 2020-05-15
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Server Team CI bot, 2020-05-14
-
Re: [Merge] ~raharper/curtin:vmtest/enable-kernel-crashdump into curtin:master
From: Server Team CI bot, 2020-05-12