← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 1979179] Re: cpuhotplug04 in cpuhotplug from ubuntu_ltp cause test hang on J-5.15 P9

 

Yes, I have this tested with Noble 6.8.0-41-generic on node baltar. With
latest LTP commit 2b319e3dc upstream. And this is the output from dmesg:

[  380.370280] LTP: starting cpuhotplug04 (cpuhotplug04.sh -l 1)
[  533.403524602,3] PHB#0003[0:3]: brdgCtl        = 00000002
[  533.403590087,3] PHB#0003[0:3]:             deviceStatus = 00060040
[  533.403632812,3] PHB#0003[0:3]:               slotStatus = 00402000
[  533.403688954,3] PHB#0003[0:3]:               linkStatus = a0830008
[  533.403737010,3] PHB#0003[0:3]:             devCmdStatus = 00100107
[  533.403787651,3] PHB#0003[0:3]:             devSecStatus = 00000800
[  533.403827581,3] PHB#0003[0:3]:          rootErrorStatus = 00000000
[  533.403872922,3] PHB#0003[0:3]:          corrErrorStatus = 00000000
[  533.403915403,3] PHB#0003[0:3]:        uncorrErrorStatus = 00000000
[  533.403966243,3] PHB#0003[0:3]:                   devctl = 00000040
[  533.404006378,3] PHB#0003[0:3]:                  devStat = 00000006
[  533.404051878,3] PHB#0003[0:3]:                  tlpHdr1 = 00000000
[  533.404094369,3] PHB#0003[0:3]:                  tlpHdr2 = 00000000
[  533.404142376,3] PHB#0003[0:3]:                  tlpHdr3 = 00000000
[  533.404184956,3] PHB#0003[0:3]:                  tlpHdr4 = 00000000
[  533.404232785,3] PHB#0003[0:3]:                 sourceId = 00000000
[  533.404275326,3] PHB#0003[0:3]:                     nFir = 0000000000000000
[  533.404322906,3] PHB#0003[0:3]:                 nFirMask = 0030001c00000000
[  533.404370518,3] PHB#0003[0:3]:                  nFirWOF = 0000000000000000
[  533.404418055,3] PHB#0003[0:3]:                 phbPlssr = 0000001c00000000
[  533.404465799,3] PHB#0003[0:3]:                   phbCsr = 0000001c00000000
[  533.404513412,3] PHB#0003[0:3]:                   lemFir = 0000000100000080
[  533.404563730,3] PHB#0003[0:3]:             lemErrorMask = 0000000000000000
[  533.404613908,3] PHB#0003[0:3]:                   lemWOF = 0000000000000080
[  533.404664164,3] PHB#0003[0:3]:           phbErrorStatus = 0000028000000000
[  533.404711700,3] PHB#0003[0:3]:      phbFirstErrorStatus = 0000020000000000
[  533.404759256,3] PHB#0003[0:3]:             phbErrorLog0 = 2148000098000240
[  533.404806934,3] PHB#0003[0:3]:             phbErrorLog1 = a008400000000000
[  533.404854445,3] PHB#0003[0:3]:        phbTxeErrorStatus = 0000000000000000
[  533.404902144,3] PHB#0003[0:3]:   phbTxeFirstErrorStatus = 0000000000000000
[  533.404952569,3] PHB#0003[0:3]:          phbTxeErrorLog0 = 0000000000000000
[  533.405236497,3] PHB#0003[0:3]:          phbTxeErrorLog1 = 0000000000000000
[  533.405284285,3] PHB#0003[0:3]:     phbRxeArbErrorStatus = 0000000000000000
[  533.405334522,3] PHB#0003[0:3]: phbRxeArbFrstErrorStatus = 0000000000000000
[  533.405707010,3] PHB#0003[0:3]:       phbRxeArbErrorLog0 = 0000000000000000
[  533.405757180,3] PHB#0003[0:3]:       phbRxeArbErrorLog1 = 0000000000000000
[  533.405804808,3] PHB#0003[0:3]:     phbRxeMrgErrorStatus = 0000000000000000
[  533.406169212,3] PHB#0003[0:3]: phbRxeMrgFrstErrorStatus = 0000000000000000
[  533.406219384,3] PHB#0003[0:3]:       phbRxeMrgErrorLog0 = 0000000000000000
[  533.406269541,3] PHB#0003[0:3]:       phbRxeMrgErrorLog1 = 0000000000000000
[  533.406628562,3] PHB#0003[0:3]:     phbRxeTceErrorStatus = 2000000000000000
[  533.406678961,3] PHB#0003[0:3]: phbRxeTceFrstErrorStatus = 2000000000000000
[  533.406726509,3] PHB#0003[0:3]:       phbRxeTceErrorLog0 = 40000000000001fd
[  533.407091100,3] PHB#0003[0:3]:       phbRxeTceErrorLog1 = 0000000000000000
[  533.407138810,3] PHB#0003[0:3]:        phbPblErrorStatus = 0000000000020000
[  533.407191803,3] PHB#0003[0:3]:   phbPblFirstErrorStatus = 0000000000020000
[  533.407550885,3] PHB#0003[0:3]:          phbPblErrorLog0 = 0000000000000000
[  533.407611773,3] PHB#0003[0:3]:          phbPblErrorLog1 = 0000000000000000
[  533.407664673,3] PHB#0003[0:3]:      phbPcieDlpErrorLog1 = 0000000000000000
[  533.407712170,3] PHB#0003[0:3]:      phbPcieDlpErrorLog2 = 0000000000000000
[  533.408071169,3] PHB#0003[0:3]:    phbPcieDlpErrorStatus = 0000000000000000
[  533.408121194,3] PHB#0003[0:3]:       phbRegbErrorStatus = 0000004000000000
[  533.408168914,3] PHB#0003[0:3]:  phbRegbFirstErrorStatus = 0000004000000000
[  533.411696407,3] PHB#0003[0:3]:         phbRegbErrorLog0 = 8800000c00000000
[  533.411747514,3] PHB#0003[0:3]:         phbRegbErrorLog1 = 0000000007011000
[  533.411797140,3] PHB#0003[0:3]:                PEST[509] = 8000b03800000000 8000000000000000
[  381.368130] EEH: Recovering PHB#3-PE#1fd
[  381.368175] EEH: PE location: Onboard SAS, PHB location: N/A
[  381.368207] EEH: Frozen PHB#3-PE#1fd detected
[  381.368238] EEH: Call Trace:
[  381.368261] EEH: [c0000000000581c8] __eeh_send_failure_event+0x9c/0x164
[  381.368308] EEH: [c00000000004f658] eeh_dev_check_failure+0x32c/0x6d4
[  381.368338] EEH: [c000000000d05518] __pci_write_msi_msg+0x2dc/0x384
[  381.368366] EEH: [c000000000d06ec4] pci_msi_domain_write_msg+0x44/0x6c
[  381.368402] EEH: [c000000000284568] msi_domain_set_affinity+0xfc/0x10c
[  381.368440] EEH: [c0000000002715e8] irq_do_set_affinity+0x20c/0x334
[  381.368479] EEH: [c000000000283194] migrate_one_irq+0x144/0x22c
[  381.368513] EEH: [c000000000283318] irq_migrate_all_off_this_cpu+0x9c/0x184
[  381.368549] EEH: [c0000000000e1b0c] xive_smp_disable_cpu+0x34/0xe0
[  381.368583] EEH: [c0000000000f6524] pnv_smp_cpu_disable+0x88/0x140
[  381.368619] EEH: [c0000000000685c4] __cpu_disable+0x54/0xe0
[  381.368654] EEH: [c000000000196dcc] take_cpu_down+0x4c/0x108
[  381.368691] EEH: [c000000000337694] multi_cpu_stop+0xd8/0x248
[  381.368726] EEH: [c000000000337214] cpu_stopper_thread+0x150/0x2a8
[  381.368763] EEH: [c0000000001e2610] smpboot_thread_fn+0x200/0x26c
[  381.368781] EEH: [c0000000001d90bc] kthread+0x138/0x140
[  381.368798] EEH: [c00000000000ded8] start_kernel_thread+0x14/0x18
[  381.368833] aacraid 0003:01:00.0: aacraid: PCI error detected 2
[  381.368130] EEH: Recovering PHB#3-PE#1fd
[  381.368175] EEH: PE location: Onboard SAS, PHB location: N/A
[  381.368207] EEH: Frozen PHB#3-PE#1fd detected
[  381.368238] EEH: Call Trace:
[  381.368261] EEH: [c0000000000581c8] __eeh_send_failure_event+0x9c/0x164
[  381.368308] EEH: [c00000000004f658] eeh_dev_check_failure+0x32c/0x6d4
[  381.368338] EEH: [c000000000d05518] __pci_write_msi_msg+0x2dc/0x384
[  381.368366] EEH: [c000000000d06ec4] pci_msi_domain_write_msg+0x44/0x6c
[  381.368402] EEH: [c000000000284568] msi_domain_set_affinity+0xfc/0x10c
[  381.368440] EEH: [c0000000002715e8] irq_do_set_affinity+0x20c/0x334
[  381.368479] EEH: [c000000000283194] migrate_one_irq+0x144/0x22c
[  381.368513] EEH: [c000000000283318] irq_migrate_all_off_this_cpu+0x9c/0x184
[  381.368549] EEH: [c0000000000e1b0c] xive_smp_disable_cpu+0x34/0xe0
[  381.368583] EEH: [c0000000000f6524] pnv_smp_cpu_disable+0x88/0x140
[  381.368619] EEH: [c0000000000685c4] __cpu_disable+0x54/0xe0
[  381.368654] EEH: [c000000000196dcc] take_cpu_down+0x4c/0x108
[  381.368691] EEH: [c000000000337694] multi_cpu_stop+0xd8/0x248
[  381.368726] EEH: [c000000000337214] cpu_stopper_thread+0x150/0x2a8
[  381.368763] EEH: [c0000000001e2610] smpboot_thread_fn+0x200/0x26c
[  381.368781] EEH: [c0000000001d90bc] kthread+0x138/0x140
[  381.368798] EEH: [c00000000000ded8] start_kernel_thread+0x14/0x18
[  381.368814] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
[  381.368818] EEH: Notify device drivers to shutdown
[  381.368822] EEH: Beginning: 'error_detected(IO frozen)'
[  381.368826] PCI 0003:01:00.0#01fd: EEH: Invoking aacraid->error_detected(IO frozen)
[  381.368833] aacraid 0003:01:00.0: aacraid: PCI error detected 2
[  381.392800] I/O error, dev sda, sector 1652636416 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.392847] I/O error, dev sda, sector 1652636928 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.392774] sd 0:2:0:0: [sda] tag#739 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.392793] sd 0:2:0:0: [sda] tag#739 CDB: Write(10) 2a 00 62 81 3b 00 00 02 00 00
[  381.392887] I/O error, dev sda, sector 1652637440 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.392800] I[  381.392926] I/O error, dev sd/O error, dev sda, sector 165263a, sector 1652636416 op 0x1:(WRI7952 op 0x1:(WRITE) flags 0x4800TE) flags 0x4800 phys_
seg 4 prio phys_seg 4 prio class 0
[  381 class 0
.392834] sd 0:2:0:0: [sda] tag#740 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.392975] I/O error, dev sda, sector 1652638464 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.392842] sd 0:2:0:0: [sda] tag#740 CDB: Write(10) 2a 00 62 81 3d 00 00 02 00 00
[  381.392847] I/O error, dev sda, sector[  381.393013] I 1652636928 op 0/O error, dev sdx1:(WRITE) flagsa, sector 165263 0x4800 phys_seg8976 op 0x1:(WRI 4 prio class
TE) flags 0x4800
                [  381.392874]  phys_seg 4 priosd 0:2:0:0: [sda class 0
] tag#741 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.393062] I/O error, dev sda, sector 165263[  381.392882] s9488 op 0x1:(WRId 0:2:0:0: [sda]TE) flags 0x800  tag#741 CDB: Wrphys_seg 4 prio ite(10) 2a 00 62class 
0
 81 3f 00 00 02 00 00
[  381.392887] I/O error, dev sda, sector 1652637440 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.392913] sd 0:2:0:0: [sda] tag#742 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.392921] sd 0:2:0:0: [sda] tag#742 CDB: Write(10) 2a 00 62 81 41 00 00 02 00 00
[  381.392926] I/O error, dev sda, sector 1652637952 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.392962] sd 0:2:0:0: [sda] tag#743 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.392970] sd 0:2:0:0: [sda] tag#743 CDB: Write(10) 2a 00 62 81 43 00 00 02 00 00
[  381.392975] I/O error, dev sda, sector 1652638464 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.393000] sd 0:2:0:0: [sda] tag#744 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.393008] sd 0:2:0:0: [sda] tag#744 CDB: Write(10) 2a 00 62 81 45 00 00 02 00 00
[  381.393013] I/O error, dev sda, sector 1652638976 op 0x1:(WRITE) flags 0x4800 phys_seg 4 prio class 0
[  381.393050] sd 0:2:0:0: [sda] tag#745 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[  381.393057] sd 0:2:0:0: [sda] tag#745 CDB: Write(10) 2a 00 62 81 47 00 00 02 00 00
[  381.393062] I/O error, dev sda, sector 1652639488 op 0x1:(WRITE) flags 0x800 phys_seg 4 prio class 0
[  381.394249] PCI 0003:01:00.0#01fd: EEH: aacraid driver reports: 'need reset'
[  381.394256] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[  381.394264] EEH: Collect temporary log
[  381.394289] EEH: of node=0003:01:00.0
[  381.394295] EEH: PCI device/vendor: 028d9005
[  381.394299] EEH: PCI cmd/status register: 00100146
[  381.394303] EEH: PCI-E capabilities and status follow:
[  381.394315] EEH: PCI-E 00: 00020010 000081a2 00002950 00437083 
[  381.394325] EEH: PCI-E 10: 10830000 00000000 00000000 00000000 
[  381.394328] EEH: PCI-E 20: 00000000 
[  381.394331] EEH: PCI-E AER capability register set follows:
[  381.394343] EEH: PCI-E AER 00: 30020001 00000000 00400000 00462030 
[  381.394353] EEH: PCI-E AER 10: 00000000 0000e000 000001e0 00000000 
[  381.394363] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 
[  381.394368] EEH: PCI-E AER 30: 00000000 00000000 
[  381.394372] PHB4 PHB#3 Diag-data (Version: 1)
[  381.394376] brdgCtl:    00000002
[  381.394378] RootSts:    00060040 00402000 a0830008 00100107 00000800
[  381.394383] PhbSts:     0000001c00000000 0000001c00000000
[  381.394387] Lem:        0000000100000080 0000000000000000 0000000000000080
[  381.394391] PhbErr:     0000028000000000 0000020000000000 2148000098000240 a008400000000000
[  381.394395] RxeTceErr:  2000000000000000 2000000000000000 40000000000001fd 0000000000000000
[  381.394400] PblErr:     0000000000020000 0000000000020000 0000000000000000 0000000000000000
[  381.394404] RegbErr:    0000004000000000 0000004000000000 8800000c00000000 0000000007011000
[  381.394411] PE[1fd] A/B: 8000b03800000000 8000000000000000
[  381.394415] EEH: Reset without hotplug activity
[  540.463211545,3] PHB#0003:01:00.0 CRS timeout !
[  544.463277414,3] PHB#0003:01:00.0 CRS timeout !
[  396.661112] aacraid 0003:01:00.0: not ready 1023ms after bus reset; waiting
[  397.717135] aacraid 0003:01:00.0: not ready 2047ms after bus reset; waiting
[  399.773181] aacraid 0003:01:00.0: not ready 4095ms after bus reset; waiting
[  403.893795] aacraid 0003:01:00.0: not ready 8191ms after bus reset; waiting


** Tags added: 6.8 noble sru-20240805

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/1979179

Title:
   cpuhotplug04 in cpuhotplug from ubuntu_ltp cause test hang on J-5.15
  P9

Status in ubuntu-kernel-tests:
  In Progress
Status in The Ubuntu-power-systems project:
  In Progress

Bug description:
  Issue found on Jammy 5.15.0-40.43 Power 9 node "baltar"

  The cpuhotplug04 test will hang with:

  Test output:
  <<<test_start>>>
  tag=cpuhotplug04 stime=1655706629
  cmdline="cpuhotplug04.sh -l 1"
  contacts=""
  analysis=exit
  <<<test_output>>>
  incrementing stop
  Name:   cpuhotplug04
  Date:   Mon Jun 20 06:30:29 UTC 2022
  Desc:   Does it prevent us from offlining the last CPU?

  dmesg output:
  [  342.099325] LTP: starting cpuhotplug04 (cpuhotplug04.sh -l 1)
  [  343.109144] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  343.336137] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  343.571276] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  343.787207] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  344.019402] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  344.255955] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  344.495940] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  344.659877] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  344.927993] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  345.124037] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
  [  346.583794] EEH: Recovering PHB#3-PE#1fd
  [  346.583840] EEH: PE location: Onboard SAS, PHB location: N/A
  [  346.583885] EEH: Frozen PHB#3-PE#1fd detected
  [  346.583927] EEH: Call Trace:
  [  346.583950] EEH: [c0000000000554ec] __eeh_send_failure_event+0x6c/0x120
  [  346.583989] EEH: [c00000000004e0b4] eeh_dev_check_failure.part.0+0x254/0x630
  [  346.584015] EEH: [c000000000aba988] __pci_write_msi_msg+0x2a8/0x350
  [  346.584048] EEH: [c000000000245308] msi_domain_set_affinity+0xe8/0x100
  [  346.584081] EEH: [c0000000002354cc] irq_do_set_affinity+0x7c/0x230
  [  346.584104] EEH: [c0000000002446a4] irq_migrate_all_off_this_cpu+0x334/0x3a0
  [  346.584138] EEH: [c0000000000d79a4] xive_smp_disable_cpu+0x34/0xd0
  [  346.584160] EEH: [c0000000000eaaa8] pnv_smp_cpu_disable+0x88/0x130
  [  346.584192] EEH: [c0000000000644f4] __cpu_disable+0x54/0xb0
  [  346.584213] EEH: [c0000000001788cc] take_cpu_down+0x4c/0x110
  [  346.584236] EEH: [c0000000002db254] multi_cpu_stop+0xe4/0x240
  [  346.584273] EEH: [c0000000002daa84] cpu_stopper_thread+0x164/0x2c0
  [  346.584298] EEH: [c0000000001bc2e0] smpboot_thread_fn+0x1f0/0x260
  [  346.584340] EEH: [c0000000001b5774] kthread+0x184/0x190
  [  346.584378] EEH: [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
  [  346.584417] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
  [  346.584420] EEH: Notify device drivers to shutdown
  [  346.584423] EEH: Beginning: 'error_detected(IO frozen)'
  [  346.584427] PCI 0003:01:00.0#01fd: EEH: Invoking aacraid->error_detected(IO frozen)
  [  346.584434] aacraid 0003:01:00.0: aacraid: PCI error detected 2
  [  346.622048] sd 0:2:0:0: [sda] tag#764 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
  [  346.622067] sd 0:2:0:0: [sda] tag#764 CDB: Write(10) 2a 00 95 69 92 80 00 00 10 00
  [  346.622075] blk_update_request: I/O error, dev sda, sector 2506723968 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
  [  346.622123] EXT4-fs warning (device sda2): ext4_end_bio:342: I/O error 10 writing to inode 38276474 starting block 313340498)
  [  346.622134] Buffer I/O error on device sda2, logical block 313338192
  [  346.622167] Buffer I/O error on device sda2, logical block 313338193
  [  346.622213] sd 0:2:0:0: [sda] tag#765 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
  [  346.622220] sd 0:2:0:0: [sda] tag#765 CDB: Write(10) 2a 00 95 69 cc e8 00 00 08 00
  [  346.622226] blk_update_request: I/O error, dev sda, sector 2506738920 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
  [  346.622278] EXT4-fs warning (device sda2): ext4_end_bio:342: I/O error 10 writing to inode 38276475 starting block 313342366)
  [  346.622289] Buffer I/O error on device sda2, logical block 313340061
  [  346.622329] sd 0:2:0:0: [sda] tag#766 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
  [  346.622336] sd 0:2:0:0: [sda] tag#766 CDB: Write(10) 2a 00 6e 80 49 00 00 00 08 00
  [  346.622341] blk_update_request: I/O error, dev sda, sector 1853901056 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
  [  346.622415] Buffer I/O error on dev sda2, logical block 231735328, lost async page write
  [  346.622461] EXT4-fs error (device sda2): ext4_check_bdev_write_error:217: comm rs:main Q:Reg: Error while async write back metadata
  [  346.622591] sd 0:2:0:0: [sda] tag#513 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
  [  346.622602] sd 0:2:0:0: [sda] tag#513 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
  [  346.622613] blk_update_request: I/O error, dev sda, sector 1950670200 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
  [  346.622694] Aborting journal on device sda2-8.
  [  346.623901] PCI 0003:01:00.0#01fd: EEH: aacraid driver reports: 'need reset'
  [  346.623910] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
  [  346.623919] EEH: Collect temporary log
  [  346.623947] EEH: of node=0003:01:00.0
  [  346.623954] EEH: PCI device/vendor: 028d9005
  [  346.623960] EEH: PCI cmd/status register: 00100146
  [  346.623964] EEH: PCI-E capabilities and status follow:
  [  346.623979] EEH: PCI-E 00: 00020010 000081a2 00002950 00437083
  [  346.623991] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
  [  346.623995] EEH: PCI-E 20: 00000000
  [  346.623998] EEH: PCI-E AER capability register set follows:
  [  346.624013] EEH: PCI-E AER 00: 30020001 00000000 00400000 00462030
  [  346.624025] EEH: PCI-E AER 10: 00000000 0000e000 000001e0 00000000
  [  346.624037] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
  [  346.624045] EEH: PCI-E AER 30: 00000000 00000000
  [  346.624048] PHB4 PHB#3 Diag-data (Version: 1)
  [  346.624052] brdgCtl:    00000002
  [  346.624055] RootSts:    00060040 00402000 a0830008 00100107 00000800
  [  346.624061] PhbSts:     0000001c00000000 0000001c00000000
  [  346.624065] Lem:        0000000100000080 0000000000000000 0000000000000080
  [  346.624069] PhbErr:     0000028000000000 0000020000000000 2148000098000240 a008400000000000
  [  346.624075] RxeTceErr:  2000000000000000 2000000000000000 40000000000001fd 0000000000000000
  [  346.624081] PblErr:     0000000000020000 0000000000020000 0000000000000000 0000000000000000
  [  346.624085] RegbErr:    0000004000000000 0000004000000000 8800003c00000000 0000000007011000
  [  346.624094] PE[1fd] A/B: 8000b03800000000 8000000000000000
  [  346.624098] EEH: Reset without hotplug activity
  [  346.625998] EXT4-fs error (device sda2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
  [  346.626020] EXT4-fs error (device sda2): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
  [  349.134491] EXT4-fs error (device sda2): ext4_journal_check_start:83: comm ext4lazyinit: Detected aborted journal
  [  359.548470] EEH: Beginning: 'slot_reset'
  [  359.548480] PCI 0003:01:00.0#01fd: EEH: Invoking aacraid->slot_reset()
  [  359.548486] aacraid 0003:01:00.0: aacraid: PCI error - slot reset
  [  359.615158] PCI 0003:01:00.0#01fd: EEH: aacraid driver reports: 'recovered'
  [  359.615162] EEH: Finished:'slot_reset' with aggregate recovery state:'recovered'
  [  359.615163] EEH: Notify device driver to resume
  [  359.615164] EEH: Beginning: 'resume'
  [  359.615172] PCI 0003:01:00.0#01fd: EEH: Invoking aacraid->resume()
  [  474.477185] EXT4-fs error (device sda2): ext4_journal_check_start:83: comm snapd: Detected aborted journal
  [  484.561619] INFO: task kworker/17:1:1020 blocked for more than 120 seconds.
  [  484.561643]       Not tainted 5.15.0-40-generic #43-Ubuntu
  [  484.561664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [  484.561679] task:kworker/17:1    state:D stack:    0 pid: 1020 ppid:     2 flags:0x00000800
  [  484.561686] Workqueue: events cpuset_hotplug_workfn
  [  484.561694] Call Trace:
  [  484.561695] [c0000000198c3350] [c0000000198c3420] 0xc0000000198c3420 (unreliable)
  [  484.561703] [c0000000198c3540] [c00000000001ddf4] __switch_to+0x134/0x230
  [  484.561711] [c0000000198c35a0] [c0000000012a8c38] __schedule+0x228/0x6f0
  [  484.561718] [c0000000198c3660] [c0000000012a916c] schedule+0x6c/0x130
  [  484.561724] [c0000000198c3690] [c000000000220250] rwsem_down_write_slowpath+0x340/0x6a0
  [  484.561731] [c0000000198c3770] [c000000000535084] mpol_rebind_mm+0x44/0x120
  [  484.561737] [c0000000198c37b0] [c0000000002cfb7c] update_tasks_nodemask+0xfc/0x210
  [  484.561743] [c0000000198c3870] [c0000000002d4df8] cpuset_hotplug_update_tasks+0x548/0x610
  [  484.561749] [c0000000198c3900] [c0000000002d5114] cpuset_hotplug_workfn+0x254/0x7d0
  [  484.561754] [c0000000198c3c70] [c0000000001a8634] process_one_work+0x2f4/0x5e0
  [  484.561760] [c0000000198c3d10] [c0000000001a8f08] worker_thread+0xa8/0x610
  [  484.561765] [c0000000198c3da0] [c0000000001b5774] kthread+0x184/0x190
  [  484.561772] [c0000000198c3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
  [  484.561792] INFO: task jbd2/sda2-8:1767 blocked for more than 120 seconds.
  [  484.561816]       Not tainted 5.15.0-40-generic #43-Ubuntu
  [  484.561837] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [  484.561860] task:jbd2/sda2-8     state:D stack:    0 pid: 1767 ppid:     2 flags:0x00000800
  [  484.561866] Call Trace:
  [  484.561867] [c00000005640b540] [c00000005630ad00] 0xc00000005630ad00 (unreliable)
  [  484.561873] [c00000005640b730] [c00000000001ddf4] __switch_to+0x134/0x230
  [  484.561879] [c00000005640b790] [c0000000012a8c38] __schedule+0x228/0x6f0
  [  484.561886] [c00000005640b850] [c0000000012a916c] schedule+0x6c/0x130
  [  484.561891] [c00000005640b880] [c0000000012a92b4] io_schedule+0x54/0x80
  [  484.561897] [c00000005640b8b0] [c0000000012a9ec8] bit_wait_io+0x28/0xa0
  [  484.561903] [c00000005640b8e0] [c0000000012a9898] __wait_on_bit+0x98/0x160
  [  484.561909] [c00000005640b930] [c0000000012a9a00] out_of_line_wait_on_bit+0xa0/0xd0
  [  484.561916] [c00000005640b9a0] [c000000000619c90] __wait_on_buffer+0x50/0x70
  [  484.561923] [c00000005640b9d0] [c0000000007a2ac8] jbd2_write_superblock+0x228/0x330
  [  484.561930] [c00000005640ba80] [c0000000007a2894] jbd2_journal_abort+0x134/0x140
  [  484.561936] [c00000005640bb00] [c000000000795d14] jbd2_journal_commit_transaction+0x1294/0x1f20
  [  484.561943] [c00000005640bd00] [c00000000079f36c] kjournald2+0xfc/0x370
  [  484.561948] [c00000005640bda0] [c0000000001b5774] kthread+0x184/0x190
  [  484.561955] [c00000005640be10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64

  Past test result:
    * This test was not executed (or not successfully executed on this node with 5.15.0-28.29 ~ 5.15.0-36.37)

  I can reproduce it manually with 5.15.0-39

  Steps to reproduce:
    sudo apt install -y automake bison build-essential byacc flex git keyutils libacl1-dev libaio-dev libcap-dev libmm-dev libnuma-dev libsctp-dev libselinux1-dev libssl-dev libtirpc-dev pkg-config quota xfslibs-dev xfsprogs gcc
    git clone --depth=1 https://github.com/linux-test-project/ltp.git
    cd ltp
    make autotools
    ./configure
    sudo make install
    echo "cpuhotplug04 cpuhotplug04.sh -l 1" > /tmp/target
    sudo /opt/ltp/runltp -f /tmp/target

  Note that this issue does not exist on Impish 5.13.0 with this P9
  node.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1979179/+subscriptions