← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2076213] Re: eventinj ubuntu_kvm_unit_test bionic:oracle-5.4 flaky "vec 32/int $33"

 

** Description changed:

  Found with bionic:linux-oracle-5.4 (5.4.0-1129.138~18.04.1) on
  Oracle:VM.Standard.E3.Flex-4c.8m for 2024.07.08 cycle.
  
  eventinj test failed (see the following log) due to interrupt handler
  for a NMI (int $33) not running before interrupt handler for a maskable
  interrupt (vec 32).
  
  In test, NMI is send immediately after enabling maskable interrupts with
  the following sequence asm volatile ("sti; int $33"); in line 313. After
  "sti" instruction (which enables interrupts) there is a delay for next
  instruction to execute. However, it is also stated in Intel® 64 and
  IA-32 Architectures Software Developer’s Manual (
  https://software.intel.com/en-us/download/intel-64-and-
  ia-32-architectures-sdm-combined-
  volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4):
  
-     "The IF flag and the STI and CLI instructions do not prohibit the 
-      generation of exceptions and nonmaskable interrupts (NMIs). 
-      However, NMIs (and system-management interrupts) may be inhibited 
-      on the instruction boundary following an execution of STI that 
-      begins with IF = 0."
+     "The IF flag and the STI and CLI instructions do not prohibit the
+      generation of exceptions and nonmaskable interrupts (NMIs).
+      However, NMIs (and system-management interrupts) may be inhibited
+      on the instruction boundary following an execution of STI that
+      begins with IF = 0."
  
  This test passed after 4 runs, first 3 runs failed in the same way (see
  logs below). This could be that test is either flaky (more likely) or we
  have a race condition (less likely, can be caused by lack of
  memory/instruction barriers). This test is hinted for focal:linux-
  oracle-5.15 (5.15.0-1063) for the same reason which leans toward test
  being flaky. I just wanted to record the issue to keep an eye on this, I
  am not going to permanently hint this test for now.
+ 
+ x86 Instruction Desc:
+  - CLI clears the IF flag in the EFLAGS register and no other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no effect on the generation of exceptions and NMI interrupts.
+  - STI sets the interrupt flag (IF) in the EFLAGS register. This allows the processor to respond to maskable hardware interrupts.
+  - INT n instruction generates a call to the interrupt or exception handler specified with the destination operand
  
  
  13:51:20 INFO |     START    ubuntu_kvm_unit_tests.eventinj    ubuntu_kvm_unit_tests.eventinj    timestamp=1722693080    timeout=1800    localtime=Aug 03 13:51:20
  13:51:20 DEBUG| Persistent state client._record_indent now set to 2
  13:51:20 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_kvm_unit_tests.eventinj', 'ubuntu_kvm_unit_tests.eventinj')
  13:51:20 DEBUG| Waiting for pid 15374 for 1800 seconds
  13:51:20 DEBUG| Running 'kvm-ok'
  13:51:20 DEBUG| [stdout] INFO: /dev/kvm exists
  13:51:20 DEBUG| [stdout] KVM acceleration can be used
  13:51:20 DEBUG| Running '/home/ubuntu/autotest/client/tmp/ubuntu_kvm_unit_tests/src/kvm-unit-tests/tests/eventinj'
  13:51:20 DEBUG| [stdout] BUILD_HEAD=b04954c9
  13:51:21 DEBUG| [stdout] timeout -k 1s --foreground 90s /usr/bin/qemu-system-x86_64 --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel /tmp/tmp.dOXYNu6N1b -smp 1 # -initrd /tmp/tmp.U6zbIRL4sr
  13:51:21 DEBUG| [stderr] qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  13:51:22 DEBUG| [stdout] enabling apic
  13:51:22 DEBUG| [stdout] smp: waiting for 0 APs
  13:51:22 DEBUG| [stdout] paging enabled
  13:51:22 DEBUG| [stdout] cr0 = 80010011
  13:51:22 DEBUG| [stdout] cr3 = 1007000
  13:51:22 DEBUG| [stdout] cr4 = 20
  13:51:22 DEBUG| [stdout] Try to divide by 0
  13:51:22 DEBUG| [stdout] DE isr running divider is 0
  13:51:22 DEBUG| [stdout] Result is 150
  13:51:22 DEBUG| [stdout] PASS: DE exception
  13:51:22 DEBUG| [stdout] Try int 3
  13:51:22 DEBUG| [stdout] BP isr running
  13:51:22 DEBUG| [stdout] After int 3
  13:51:22 DEBUG| [stdout] PASS: BP exception
  13:51:22 DEBUG| [stdout] Sending vec 33 to self
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After vec 33 to self
  13:51:22 DEBUG| [stdout] PASS: vec 33
  13:51:22 DEBUG| [stdout] Try int $33
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int $33
  13:51:22 DEBUG| [stdout] PASS: int $33
  13:51:22 DEBUG| [stdout] Sending vec 32 and 33 to self
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] irq0 running
  13:51:22 DEBUG| [stdout] After vec 32 and 33 to self
  13:51:22 DEBUG| [stdout] PASS: vec 32/33
  13:51:22 DEBUG| [stdout] Sending vec 32 and int $33
  13:51:22 DEBUG| [stdout] irq0 running
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After vec 32 and int $33
  13:51:22 DEBUG| [stdout] FAIL: vec 32/int $33
  13:51:22 DEBUG| [stdout] Sending vec 33 and 62 and mask one with TPR
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After 33/62 TPR test
  13:51:22 DEBUG| [stdout] PASS: TPR
  13:51:22 DEBUG| [stdout] irq0 running
  13:51:22 DEBUG| [stdout] Before NP test
  13:51:22 DEBUG| [stdout] NP isr running 4008d9 err=18
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int33
  13:51:22 DEBUG| [stdout] PASS: NP exception
  13:51:22 DEBUG| [stdout] Sending NMI to self
  13:51:22 DEBUG| [stdout] NMI isr running 0x406765
  13:51:22 DEBUG| [stdout] Sending nested NMI to self
  13:51:22 DEBUG| [stdout] After nested NMI to self
  13:51:22 DEBUG| [stdout] Nested NMI isr running rip=405da3
  13:51:22 DEBUG| [stdout] After NMI to self
  13:51:22 DEBUG| [stdout] PASS: NMI
  13:51:22 DEBUG| [stdout] Before NMI IRET test
  13:51:22 DEBUG| [stdout] Sending NMI to self
  13:51:22 DEBUG| [stdout] NMI isr running stack 0x1013000
  13:51:22 DEBUG| [stdout] Sending nested NMI to self
  13:51:22 DEBUG| [stdout] After nested NMI to self
  13:51:22 DEBUG| [stdout] Nested NMI isr running rip=400213
  13:51:22 DEBUG| [stdout] After iret
  13:51:22 DEBUG| [stdout] After NMI to self
  13:51:22 DEBUG| [stdout] PASS: NMI
  13:51:22 DEBUG| [stdout] Try to divide by 0
  13:51:22 DEBUG| [stdout] PF running
  13:51:22 DEBUG| [stdout] DE isr running divider is 0
  13:51:22 DEBUG| [stdout] Result is 150
  13:51:22 DEBUG| [stdout] PASS: DE PF exceptions
  13:51:22 DEBUG| [stdout] Before NP test
  13:51:22 DEBUG| [stdout] PF running
  13:51:22 DEBUG| [stdout] NP isr running 400b85 err=18
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int33
  13:51:22 DEBUG| [stdout] PASS: NP PF exceptions
  13:51:22 DEBUG| [stdout] Try int 33 with shadowed stack
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int 33 with shadowed stack
  13:51:22 DEBUG| [stdout] PASS: int 33 with shadowed stack
  13:51:22 DEBUG| [stdout] SUMMARY: 13 tests, 1 unexpected failures
  13:51:22 DEBUG| [stdout] FAIL eventinj (13 tests, 1 unexpected failures)
  13:51:22 ERROR| Exception escaping from test:
  Traceback (most recent call last):
  File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
  _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
  return func(*args, **dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
  postprocess_profiled_run, args, dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
  self.run_once(*args, **dargs)
  File "/home/ubuntu/autotest/client/tests/ubuntu_kvm_unit_tests/ubuntu_kvm_unit_tests.py", line 101, in run_once
  raise error.TestError("Test failed for {}".format(test_name))
  TestError: Test failed for eventinj
  13:51:22 ERROR| child process failed
  13:51:22 DEBUG| Traceback (most recent call last):
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/parallel.py", line 25, in fork_start
  13:51:22 DEBUG|     l()
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/job.py", line 505, in
  13:51:22 DEBUG|     l = lambda: test.runtest(self, url, tag, args, dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/test.py", line 125, in runtest
  13:51:22 DEBUG|     job.sysinfo.log_after_each_iteration)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 913, in runtest
  13:51:22 DEBUG|     mytest._exec(args, dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
  13:51:22 DEBUG|     _call_test_function(self.execute, *p_args, **p_dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
  13:51:22 DEBUG|     return func(*args, **dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
  13:51:22 DEBUG|     postprocess_profiled_run, args, dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
  13:51:22 DEBUG|     self.run_once(*args, **dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/tests/ubuntu_kvm_unit_tests/ubuntu_kvm_unit_tests.py", line 101, in run_once
  13:51:22 DEBUG|     raise error.TestError("Test failed for {}".format(test_name))
  13:51:22 DEBUG| TestError: Test failed for eventinj
  13:51:22 INFO |         ERROR    ubuntu_kvm_unit_tests.eventinj    ubuntu_kvm_unit_tests.eventinj    timestamp=1722693082    localtime=Aug 03 13:51:22    Test failed for eventinj
  13:51:22 INFO |     END ERROR    ubuntu_kvm_unit_tests

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2076213

Title:
  eventinj ubuntu_kvm_unit_test bionic:oracle-5.4 flaky "vec 32/int $33"

Status in ubuntu-kernel-tests:
  New

Bug description:
  Found with bionic:linux-oracle-5.4 (5.4.0-1129.138~18.04.1) on
  Oracle:VM.Standard.E3.Flex-4c.8m for 2024.07.08 cycle.

  eventinj test failed (see the following log) due to interrupt handler
  for a NMI (int $33) not running before interrupt handler for a
  maskable interrupt (vec 32).

  In test, NMI is send immediately after enabling maskable interrupts
  with the following sequence asm volatile ("sti; int $33"); in line
  313. After "sti" instruction (which enables interrupts) there is a
  delay for next instruction to execute. However, it is also stated in
  Intel® 64 and IA-32 Architectures Software Developer’s Manual (
  https://software.intel.com/en-us/download/intel-64-and-
  ia-32-architectures-sdm-combined-
  volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4):

      "The IF flag and the STI and CLI instructions do not prohibit the
       generation of exceptions and nonmaskable interrupts (NMIs).
       However, NMIs (and system-management interrupts) may be inhibited
       on the instruction boundary following an execution of STI that
       begins with IF = 0."

  This test passed after 4 runs, first 3 runs failed in the same way
  (see logs below). This could be that test is either flaky (more
  likely) or we have a race condition (less likely, can be caused by
  lack of memory/instruction barriers). This test is hinted for
  focal:linux-oracle-5.15 (5.15.0-1063) for the same reason which leans
  toward test being flaky. I just wanted to record the issue to keep an
  eye on this, I am not going to permanently hint this test for now.

  x86 Instruction Desc:
   - CLI clears the IF flag in the EFLAGS register and no other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no effect on the generation of exceptions and NMI interrupts.
   - STI sets the interrupt flag (IF) in the EFLAGS register. This allows the processor to respond to maskable hardware interrupts.
   - INT n instruction generates a call to the interrupt or exception handler specified with the destination operand


  13:51:20 INFO |     START    ubuntu_kvm_unit_tests.eventinj    ubuntu_kvm_unit_tests.eventinj    timestamp=1722693080    timeout=1800    localtime=Aug 03 13:51:20
  13:51:20 DEBUG| Persistent state client._record_indent now set to 2
  13:51:20 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_kvm_unit_tests.eventinj', 'ubuntu_kvm_unit_tests.eventinj')
  13:51:20 DEBUG| Waiting for pid 15374 for 1800 seconds
  13:51:20 DEBUG| Running 'kvm-ok'
  13:51:20 DEBUG| [stdout] INFO: /dev/kvm exists
  13:51:20 DEBUG| [stdout] KVM acceleration can be used
  13:51:20 DEBUG| Running '/home/ubuntu/autotest/client/tmp/ubuntu_kvm_unit_tests/src/kvm-unit-tests/tests/eventinj'
  13:51:20 DEBUG| [stdout] BUILD_HEAD=b04954c9
  13:51:21 DEBUG| [stdout] timeout -k 1s --foreground 90s /usr/bin/qemu-system-x86_64 --no-reboot -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel /tmp/tmp.dOXYNu6N1b -smp 1 # -initrd /tmp/tmp.U6zbIRL4sr
  13:51:21 DEBUG| [stderr] qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  13:51:22 DEBUG| [stdout] enabling apic
  13:51:22 DEBUG| [stdout] smp: waiting for 0 APs
  13:51:22 DEBUG| [stdout] paging enabled
  13:51:22 DEBUG| [stdout] cr0 = 80010011
  13:51:22 DEBUG| [stdout] cr3 = 1007000
  13:51:22 DEBUG| [stdout] cr4 = 20
  13:51:22 DEBUG| [stdout] Try to divide by 0
  13:51:22 DEBUG| [stdout] DE isr running divider is 0
  13:51:22 DEBUG| [stdout] Result is 150
  13:51:22 DEBUG| [stdout] PASS: DE exception
  13:51:22 DEBUG| [stdout] Try int 3
  13:51:22 DEBUG| [stdout] BP isr running
  13:51:22 DEBUG| [stdout] After int 3
  13:51:22 DEBUG| [stdout] PASS: BP exception
  13:51:22 DEBUG| [stdout] Sending vec 33 to self
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After vec 33 to self
  13:51:22 DEBUG| [stdout] PASS: vec 33
  13:51:22 DEBUG| [stdout] Try int $33
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int $33
  13:51:22 DEBUG| [stdout] PASS: int $33
  13:51:22 DEBUG| [stdout] Sending vec 32 and 33 to self
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] irq0 running
  13:51:22 DEBUG| [stdout] After vec 32 and 33 to self
  13:51:22 DEBUG| [stdout] PASS: vec 32/33
  13:51:22 DEBUG| [stdout] Sending vec 32 and int $33
  13:51:22 DEBUG| [stdout] irq0 running
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After vec 32 and int $33
  13:51:22 DEBUG| [stdout] FAIL: vec 32/int $33
  13:51:22 DEBUG| [stdout] Sending vec 33 and 62 and mask one with TPR
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After 33/62 TPR test
  13:51:22 DEBUG| [stdout] PASS: TPR
  13:51:22 DEBUG| [stdout] irq0 running
  13:51:22 DEBUG| [stdout] Before NP test
  13:51:22 DEBUG| [stdout] NP isr running 4008d9 err=18
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int33
  13:51:22 DEBUG| [stdout] PASS: NP exception
  13:51:22 DEBUG| [stdout] Sending NMI to self
  13:51:22 DEBUG| [stdout] NMI isr running 0x406765
  13:51:22 DEBUG| [stdout] Sending nested NMI to self
  13:51:22 DEBUG| [stdout] After nested NMI to self
  13:51:22 DEBUG| [stdout] Nested NMI isr running rip=405da3
  13:51:22 DEBUG| [stdout] After NMI to self
  13:51:22 DEBUG| [stdout] PASS: NMI
  13:51:22 DEBUG| [stdout] Before NMI IRET test
  13:51:22 DEBUG| [stdout] Sending NMI to self
  13:51:22 DEBUG| [stdout] NMI isr running stack 0x1013000
  13:51:22 DEBUG| [stdout] Sending nested NMI to self
  13:51:22 DEBUG| [stdout] After nested NMI to self
  13:51:22 DEBUG| [stdout] Nested NMI isr running rip=400213
  13:51:22 DEBUG| [stdout] After iret
  13:51:22 DEBUG| [stdout] After NMI to self
  13:51:22 DEBUG| [stdout] PASS: NMI
  13:51:22 DEBUG| [stdout] Try to divide by 0
  13:51:22 DEBUG| [stdout] PF running
  13:51:22 DEBUG| [stdout] DE isr running divider is 0
  13:51:22 DEBUG| [stdout] Result is 150
  13:51:22 DEBUG| [stdout] PASS: DE PF exceptions
  13:51:22 DEBUG| [stdout] Before NP test
  13:51:22 DEBUG| [stdout] PF running
  13:51:22 DEBUG| [stdout] NP isr running 400b85 err=18
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int33
  13:51:22 DEBUG| [stdout] PASS: NP PF exceptions
  13:51:22 DEBUG| [stdout] Try int 33 with shadowed stack
  13:51:22 DEBUG| [stdout] irq1 running
  13:51:22 DEBUG| [stdout] After int 33 with shadowed stack
  13:51:22 DEBUG| [stdout] PASS: int 33 with shadowed stack
  13:51:22 DEBUG| [stdout] SUMMARY: 13 tests, 1 unexpected failures
  13:51:22 DEBUG| [stdout] FAIL eventinj (13 tests, 1 unexpected failures)
  13:51:22 ERROR| Exception escaping from test:
  Traceback (most recent call last):
  File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
  _call_test_function(self.execute, *p_args, **p_dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
  return func(*args, **dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
  postprocess_profiled_run, args, dargs)
  File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
  self.run_once(*args, **dargs)
  File "/home/ubuntu/autotest/client/tests/ubuntu_kvm_unit_tests/ubuntu_kvm_unit_tests.py", line 101, in run_once
  raise error.TestError("Test failed for {}".format(test_name))
  TestError: Test failed for eventinj
  13:51:22 ERROR| child process failed
  13:51:22 DEBUG| Traceback (most recent call last):
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/parallel.py", line 25, in fork_start
  13:51:22 DEBUG|     l()
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/job.py", line 505, in
  13:51:22 DEBUG|     l = lambda: test.runtest(self, url, tag, args, dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/test.py", line 125, in runtest
  13:51:22 DEBUG|     job.sysinfo.log_after_each_iteration)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 913, in runtest
  13:51:22 DEBUG|     mytest._exec(args, dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 411, in _exec
  13:51:22 DEBUG|     _call_test_function(self.execute, *p_args, **p_dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 823, in _call_test_function
  13:51:22 DEBUG|     return func(*args, **dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 291, in execute
  13:51:22 DEBUG|     postprocess_profiled_run, args, dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/shared/test.py", line 212, in _call_run_once
  13:51:22 DEBUG|     self.run_once(*args, **dargs)
  13:51:22 DEBUG|   File "/home/ubuntu/autotest/client/tests/ubuntu_kvm_unit_tests/ubuntu_kvm_unit_tests.py", line 101, in run_once
  13:51:22 DEBUG|     raise error.TestError("Test failed for {}".format(test_name))
  13:51:22 DEBUG| TestError: Test failed for eventinj
  13:51:22 INFO |         ERROR    ubuntu_kvm_unit_tests.eventinj    ubuntu_kvm_unit_tests.eventinj    timestamp=1722693082    localtime=Aug 03 13:51:22    Test failed for eventinj
  13:51:22 INFO |     END ERROR    ubuntu_kvm_unit_tests

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2076213/+subscriptions



References