← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2042388] Re: context test in ubuntu_stress_smoke_test failed with M-6.5 riscv / starfive instances

 

The stress-ng maintainer, Colin, has investigated this, allow me to copy
his comment:

Seems like a signal handler occurring in a swapped context is causing a
SIGSEGV when we're using an alternative stack (via sigaltstack).
Disabling the alternative stack allows the test to run successfully.
I've experimented with also using ss_flags=SS_AUTODISARM when setting up
the alternative stack and this also breaks with a SIGSEGV.

It seems there maybe historic issues with linux and alternative stacks
when executing in a swap context, as sigaltstack man page states:

       ss.ss_flags
              This field contains either 0, or the following flag:

              SS_AUTODISARM (since Linux 4.7)
                     Clear the alternate signal stack settings on entry to the signal handler.  When  the  signal  handler
                     returns, the previous alternate signal stack settings are restored.

                     This  flag  was  added  in order to make it safe to switch away from the signal handler with swapcon‐
                     text(3).  Without this flag, a subsequently handled signal will corrupt the state  of  the  switched-
                     away signal handler.  On kernels where this flag is not supported, sigaltstack() fails with the error
                     EINVAL when this flag is supplied.

since this is a regression in behaviour I think somebody with riscv libc
know-how should investigate this further. Meanwhile I'll push a change
that disables the use of the alternative stack for the context switch
stressor.

https://github.com/ColinIanKing/stress-ng/issues/331

** Also affects: glibc (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: glibc (Ubuntu Mantic)
   Importance: Undecided
       Status: New

** Bug watch added: github.com/ColinIanKing/stress-ng/issues #331
   https://github.com/ColinIanKing/stress-ng/issues/331

** Changed in: glibc (Ubuntu)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2042388

Title:
  context test in ubuntu_stress_smoke_test failed with M-6.5 riscv /
  starfive instances

Status in ubuntu-kernel-tests:
  New
Status in glibc package in Ubuntu:
  Invalid
Status in glibc source package in Mantic:
  New

Bug description:
  This issue can be found from the very beginning of these two kernels
  * mantic/linux-starfive/6.5.0-1001.2
  * mantic/linux-riscv/6.5.0-7.7.2

  Test failed with:
   context STARTING
   context RETURNED 2
   context FAILED
   stress-ng: debug: [12644] invoked with './stress-ng -v -t 5 --context 4 --context-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
   stress-ng: debug: [12644] stress-ng 0.16.05 gaea6f3306f46
   stress-ng: debug: [12644] system: Linux mantic-starfive-riscv64 6.5.0-1001-starfive #2-Ubuntu SMP Fri Oct  6 12:08:59 UTC 2023 riscv64, gcc 13.2.0, glibc 2.38
   stress-ng: debug: [12644] RAM total: 7.7G, RAM free: 6.6G, swap free: 1024.0M
   stress-ng: debug: [12644] temporary file path: '/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng', filesystem type: ext2 (4617314 blocks available)
   stress-ng: debug: [12644] 8 processors online, 8 processors configured
   stress-ng: info:  [12644] setting to a 5 secs run per stressor 
   stress-ng: debug: [12644] cache allocate: using defaults, cannot determine cache level details
   stress-ng: debug: [12644] cache allocate: shared cache buffer size: 2048K
   stress-ng: info:  [12644] dispatching hogs: 4 context
   stress-ng: debug: [12644] starting stressors
   stress-ng: debug: [12644] 4 stressors started
   stress-ng: debug: [12645] context: [12645] started (instance 0 on CPU 2)
   stress-ng: debug: [12647] context: [12647] started (instance 2 on CPU 5)
   stress-ng: debug: [12648] context: [12648] started (instance 3 on CPU 7)
   stress-ng: debug: [12646] context: [12646] started (instance 1 on CPU 4)
   stress-ng: debug: [12644] context: [12645] terminated on signal: 11 (Segmentation fault)
   stress-ng: debug: [12644] context: [12645] terminated (success)
   stress-ng: debug: [12644] context: [12646] terminated on signal: 11 (Segmentation fault)
   stress-ng: debug: [12644] context: [12646] terminated (success)
   stress-ng: debug: [12644] context: [12647] terminated on signal: 11 (Segmentation fault)
   stress-ng: debug: [12644] context: [12647] terminated (success)
   stress-ng: debug: [12644] context: [12648] terminated on signal: 11 (Segmentation fault)
   stress-ng: debug: [12644] context: [12648] terminated (success)
   stress-ng: warn:  [12644] metrics-check: all bogo-op counters are zero, data may be incorrect
   stress-ng: debug: [12644] metrics-check: all stressor metrics validated and sane 
   stress-ng: info:  [12644] skipped: 0
   stress-ng: info:  [12644] passed: 4: context (4)
   stress-ng: info:  [12644] failed: 0
   stress-ng: info:  [12644] metrics untrustworthy: 0 
   stress-ng: info:  [12644] unsuccessful run completed in 9.98 secs

  Looks like the tests have passed. But marked as failed with a non-zero
  return code.

  Tested with stress-ng V0.16.05 and V0.17.00, they all failed with the
  same issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2042388/+subscriptions



References