← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2039515] Re: OOM by cpuacct_100_100 in ubuntu_ltp_controllers caused network connectivity lost on openstack P8 with B-hwe-5.4

 

This issue is also affecting Focal Openstack PowerPC VM.

** Tags added: focal

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2039515

Title:
  OOM by cpuacct_100_100 in ubuntu_ltp_controllers caused network
  connectivity lost on openstack P8 with B-hwe-5.4

Status in ubuntu-kernel-tests:
  New

Bug description:
  This is not a regression, it can be found since cycle 2023.07.10 with
  B-hwe-5.4.0-156.173~18.04.1 when we first start testing this
  ubuntu_ltp_controllers on openstack instances.

  This is only affecting P8 instance on openstack.

  The instance will be disconnected when running the cpuacct_100_100 test, due to the systemd-network was killed, test output:
  04:51:04 INFO | 	START	ubuntu_ltp_controllers.cpuacct_100_1	ubuntu_ltp_controllers.cpuacct_100_1	timestamp=1697431864	timeout=4500	localtime=Oct 16 04:51:04
  04:51:04 DEBUG| Persistent state client._record_indent now set to 2
  04:51:04 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ltp_controllers.cpuacct_100_1', 'ubuntu_ltp_controllers.cpuacct_100_1')
  04:51:04 DEBUG| Waiting for pid 10009 for 4500 seconds
  04:51:05 INFO | Checking for required user/group ids
  04:51:05 INFO |
  04:51:05 INFO | 'root' user id and group found.
  04:51:05 INFO | 'nobody' user id and group found.
  04:51:05 INFO | 'bin' user id and group found.
  04:51:05 INFO | 'daemon' user id and group found.
  04:51:05 INFO | Users group found.
  04:51:05 INFO | Sys group found.
  04:51:05 INFO | Required users/groups exist.
  04:51:05 INFO | no big block device was specified on commandline.
  04:51:05 INFO | Tests which require a big block device are disabled.
  04:51:05 INFO | You can specify it with option -z
  04:51:05 INFO | INFO: Test start time: Mon Oct 16 04:51:04 UTC 2023
  04:51:05 INFO | COMMAND:    /opt/ltp/bin/ltp-pan -q  -e -S   -a 10013     -n 10013  -f /tmp/ltp-p3OGf1KQRt/alltests -l /dev/null  -C /dev/null -T /dev/null
  04:51:05 INFO | LOG File: /dev/null
  04:51:05 INFO | FAILED COMMAND File: /dev/null
  04:51:05 INFO | TCONF COMMAND File: /dev/null
  04:51:05 INFO | Running tests.......
  04:51:05 INFO | cpuacct 1 TINFO: timeout per run is 0h 5m 0s
  04:51:05 INFO | tst_pid.c:84: TINFO: Cannot read session user limits from '/sys/fs/cgroup/user.slice/user-1000.slice/pids.max'
  04:51:05 INFO | tst_pid.c:94: TINFO: Found limit of processes 10331 (from /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max)
  04:51:05 INFO | cpuacct 1 TINFO: task limit fulfilled (approximate need 100, limit 10119)
  04:51:05 INFO | cpuacct 1 TINFO: cpuacct: /sys/fs/cgroup/cpu,cpuacct
  04:51:05 INFO | cpuacct 1 TINFO: Creating 100 subgroups each with 1 processes
  04:51:05 INFO | cpuacct 1 TPASS: cpuacct.usage is not equal to 0 for every subgroup
  04:51:05 INFO | cpuacct 1 TPASS: cpuacct.usage equal to subgroup*/cpuacct.usage
  04:51:05 INFO | cpuacct 2 TINFO: removing created directories
  04:51:05 INFO |
  04:51:05 INFO | Summary:
  04:51:05 INFO | passed   2
  04:51:05 INFO | failed   0
  04:51:05 INFO | broken   0
  04:51:05 INFO | skipped  0
  04:51:05 INFO | warnings 0
  04:51:05 INFO | INFO: ltp-pan reported all tests PASS
  04:51:05 INFO | LTP Version: 20230516
  04:51:05 INFO | INFO: Test end time: Mon Oct 16 04:51:05 UTC 2023
  04:51:06 INFO | 		GOOD	ubuntu_ltp_controllers.cpuacct_100_1	ubuntu_ltp_controllers.cpuacct_100_1	timestamp=1697431866	localtime=Oct 16 04:51:06	completed successfully
  04:51:06 INFO | 	END GOOD	ubuntu_ltp_controllers.cpuacct_100_1	ubuntu_ltp_controllers.cpuacct_100_1	timestamp=1697431866	localtime=Oct 16 04:51:06
  04:51:06 DEBUG| Persistent state client._record_indent now set to 1
  04:51:06 DEBUG| Persistent state client.unexpected_reboot deleted
  04:51:06 DEBUG| Test has timeout: 4500 sec.
  04:51:06 INFO | 	START	ubuntu_ltp_controllers.cpuacct_100_100	ubuntu_ltp_controllers.cpuacct_100_100	timestamp=1697431866	timeout=4500	localtime=Oct 16 04:51:06
  04:51:06 DEBUG| Persistent state client._record_indent now set to 2
  04:51:06 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ltp_controllers.cpuacct_100_100', 'ubuntu_ltp_controllers.cpuacct_100_100')
  04:51:06 DEBUG| Waiting for pid 10507 for 4500 seconds
  # system disconnects here, test interrupted
  -------------------------------------------------------------------------------------------------------
  R E S U L T S
  -------------------------------------------------------------------------------------------------------

  With a manual test you will see this test caused OOM and kills systemd-network:
  Oct 17 02:48:47 10 systemd[1]: Started Session 11 of user ubuntu.
  Oct 17 02:50:25 10 kernel: [ 1435.609205] LTP: starting cpuacct_100_100 (cpuacct.sh 100 100)
  Oct 17 02:50:45 10 kernel: [ 1455.360651] cpuacct_task invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
  Oct 17 02:50:45 10 kernel: [ 1455.360659] CPU: 0 PID: 31263 Comm: cpuacct_task Not tainted 5.4.0-165-generic #182~18.04.1-Ubuntu
  Oct 17 02:50:45 10 kernel: [ 1455.360660] Call Trace:
  Oct 17 02:50:45 10 kernel: [ 1455.360667] [c000000013b6f7c0] [c000000000f2da68] dump_stack+0xbc/0x104 (unreliable)
  Oct 17 02:50:45 10 kernel: [ 1455.360671] [c000000013b6f800] [c00000000038e53c] dump_header+0x5c/0x2c0
  Oct 17 02:50:45 10 kernel: [ 1455.360673] [c000000013b6f890] [c00000000038ed9c] oom_kill_process+0x19c/0x2c0
  Oct 17 02:50:45 10 kernel: [ 1455.360675] [c000000013b6f8d0] [c000000000390088] out_of_memory+0x128/0x790
  Oct 17 02:50:45 10 kernel: [ 1455.360677] [c000000013b6f970] [c00000000040cb34] __alloc_pages_slowpath+0xb64/0xea0
  Oct 17 02:50:45 10 kernel: [ 1455.360679] [c000000013b6fb30] [c00000000040d188] __alloc_pages_nodemask+0x318/0x3d0
  Oct 17 02:50:45 10 kernel: [ 1455.360681] [c000000013b6fbb0] [c000000000436bf8] alloc_pages_vma+0xb8/0x300
  Oct 17 02:50:45 10 kernel: [ 1455.360683] [c000000013b6fc20] [c0000000003e2dc4] __handle_mm_fault+0x8d4/0x1ae0
  Oct 17 02:50:45 10 kernel: [ 1455.360685] [c000000013b6fd10] [c0000000003e40d0] handle_mm_fault+0x100/0x1d0
  Oct 17 02:50:45 10 kernel: [ 1455.360687] [c000000013b6fd50] [c00000000008b65c] __do_page_fault+0x30c/0xec0
  Oct 17 02:50:45 10 kernel: [ 1455.360689] [c000000013b6fe20] [c00000000000a908] handle_page_fault+0x10/0x30
  Oct 17 02:50:45 10 kernel: [ 1455.360693] --- interrupt: 301 at 0x789a6c9b6f60
  Oct 17 02:50:45 10 kernel: [ 1455.360693]     LR = 0x789a6c989a24
  Oct 17 02:50:45 10 kernel: [ 1455.360693] Mem-Info:
  Oct 17 02:50:45 10 kernel: [ 1455.360698] active_anon:40193 inactive_anon:44 isolated_anon:0
  Oct 17 02:50:45 10 kernel: [ 1455.360698]  active_file:7 inactive_file:7 isolated_file:25
  Oct 17 02:50:45 10 kernel: [ 1455.360698]  unevictable:0 dirty:0 writeback:0 unstable:0
  Oct 17 02:50:45 10 kernel: [ 1455.360698]  slab_reclaimable:782 slab_unreclaimable:6253
  Oct 17 02:50:45 10 kernel: [ 1455.360698]  mapped:22 shmem:129 pagetables:4689 bounce:0
  Oct 17 02:50:45 10 kernel: [ 1455.360698]  free:2804 free_pcp:31 free_cma:0
  Oct 17 02:50:45 10 kernel: [ 1455.360701] Node 0 active_anon:2572352kB inactive_anon:2816kB active_file:448kB inactive_file:448kB unevictable:0kB isolated(anon):0kB isolated(file):1600kB mapped:1408kB dirty:0kB writeback:0kB shmem:8256kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
  Oct 17 02:50:45 10 kernel: [ 1455.360702] Node 0 Normal free:179456kB min:180224kB low:225280kB high:270336kB active_anon:2572352kB inactive_anon:2816kB active_file:448kB inactive_file:448kB unevictable:0kB writepending:0kB present:4194304kB managed:4144512kB mlocked:0kB kernel_stack:76496kB pagetables:300096kB bounce:0kB free_pcp:1984kB local_pcp:960kB free_cma:0kB
  Oct 17 02:50:45 10 kernel: [ 1455.360706] lowmem_reserve[]: 0 0 0
  Oct 17 02:50:45 10 kernel: [ 1455.360707] Node 0 Normal: 238*64kB (UME) 53*128kB (UME) 279*256kB (UME) 164*512kB (UM) 2*1024kB (UM) 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 179456kB
  Oct 17 02:50:45 10 kernel: [ 1455.360713] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  Oct 17 02:50:45 10 kernel: [ 1455.360714] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
  Oct 17 02:50:45 10 kernel: [ 1455.360715] 168 total pagecache pages
  Oct 17 02:50:45 10 kernel: [ 1455.360716] 0 pages in swap cache
  Oct 17 02:50:45 10 kernel: [ 1455.360717] Swap cache stats: add 0, delete 0, find 0/0
  Oct 17 02:50:45 10 kernel: [ 1455.360718] Free swap  = 0kB
  Oct 17 02:50:45 10 kernel: [ 1455.360718] Total swap = 0kB
  Oct 17 02:50:45 10 kernel: [ 1455.360719] 65536 pages RAM
  Oct 17 02:50:45 10 kernel: [ 1455.360719] 0 pages HighMem/MovableOnly
  Oct 17 02:50:45 10 kernel: [ 1455.360720] 778 pages reserved
  Oct 17 02:50:45 10 kernel: [ 1455.360720] 0 pages cma reserved
  Oct 17 02:50:45 10 kernel: [ 1455.360721] 0 pages hwpoisoned
  Oct 17 02:50:45 10 kernel: [ 1455.360721] Tasks state (memory values in pages):
  Oct 17 02:50:45 10 kernel: [ 1455.360722] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
  Oct 17 02:50:45 10 kernel: [ 1455.360728] [    396]     0   396      771       69    33280        0             0 systemd-journal
  Oct 17 02:50:45 10 kernel: [ 1455.360730] [    411]     0   411     1266       22    26624        0             0 lvmetad
  Oct 17 02:50:45 10 kernel: [ 1455.360731] [    412]     0   412      319       51    31744        0         -1000 systemd-udevd
  Oct 17 02:50:45 10 kernel: [ 1455.360734] [    457] 62583   457     1411       67    28416        0             0 systemd-timesyn
  Oct 17 02:50:45 10 kernel: [ 1455.360736] [    869]   100   869      428       68    32256        0             0 systemd-network
  Oct 17 02:50:45 10 kernel: [ 1455.360738] [    887]   101   887      276       77    27904        0             0 systemd-resolve
  Oct 17 02:50:45 10 kernel: [ 1455.360739] [   1015]     0  1015      166       33    30976        0             0 cron
  Oct 17 02:50:45 10 kernel: [ 1455.360741] [   1016]     0  1016     1331       40    31232        0             0 irqbalance
  Oct 17 02:50:45 10 kernel: [ 1455.360743] [   1025]     0  1025     1739      208    30720        0             0 networkd-dispat
  Oct 17 02:50:45 10 kernel: [ 1455.360745] [   1026]     0  1026     1331        8    25856        0             0 iprdump
  Oct 17 02:50:45 10 kernel: [ 1455.360746] [   1027]     0  1027      101       30    26368        0             0 atd
  Oct 17 02:50:45 10 kernel: [ 1455.360755] [   1028]     0  1028     2381       15    31232        0             0 lxcfs
  Oct 17 02:50:45 10 kernel: [ 1455.360757] [   1029]     0  1029      268       73    27648        0             0 systemd-logind
  Oct 17 02:50:45 10 kernel: [ 1455.360759] [   1030]     0  1030     3787       75    33792        0             0 accounts-daemon
  Oct 17 02:50:45 10 kernel: [ 1455.360760] [   1035]   102  1035     3513       51    32256        0             0 rsyslogd
  Oct 17 02:50:45 10 kernel: [ 1455.360762] [   1037]   103  1037      186       57    26880        0          -900 dbus-daemon
  Oct 17 02:50:45 10 kernel: [ 1455.360764] [   1041]     0  1041      269       70    32000        0         -1000 sshd
  Oct 17 02:50:45 10 kernel: [ 1455.360766] [   1042]     0  1042      118       26    26368        0             0 rtas_errd
  Oct 17 02:50:45 10 kernel: [ 1455.360768] [   1164]     0  1164     3741       85    33536        0             0 polkitd
  Oct 17 02:50:45 10 kernel: [ 1455.360769] [   1169]     0  1169       54        9    26112        0             0 iprupdate
  Oct 17 02:50:45 10 kernel: [ 1455.360771] [   1173]     0  1173     1841      205    30976        0             0 unattended-upgr
  Oct 17 02:50:45 10 kernel: [ 1455.360772] [   1189]     0  1189       54        9    25856        0             0 iprinit
  Oct 17 02:50:45 10 kernel: [ 1455.360774] [   1334]     0  1334      130       16    30720        0             0 agetty
  Oct 17 02:50:45 10 kernel: [ 1455.360776] [   1346]     0  1346       96       17    30208        0             0 agetty
  Oct 17 02:50:45 10 kernel: [ 1455.360778] [   3408]     0  3408      173       46    26880        0             0 rpcbind
  Oct 17 02:50:45 10 kernel: [ 1455.360780] [   4148]     0  4148       88       26    30208        0             0 rpc.idmapd
  Oct 17 02:50:45 10 kernel: [ 1455.360782] [   4149]     0  4149      142       45    26624        0             0 rpc.mountd
  Oct 17 02:50:45 10 kernel: [ 1455.360784] [   4268]     0  4268      147       64    30720        0             0 haveged
  Oct 17 02:50:45 10 kernel: [ 1455.360786] [  25518]     0 25518      334      107    28160        0             0 sshd
  Oct 17 02:50:45 10 kernel: [ 1455.360788] [  25520]  1000 25520      315       78    28160        0             0 systemd
  Oct 17 02:50:45 10 kernel: [ 1455.360790] [  25521]  1000 25521     2719      157    34048        0             0 (sd-pam)
  Oct 17 02:50:45 10 kernel: [ 1455.360792] [  25599]  1000 25599      334      105    27904        0             0 sshd
  Oct 17 02:50:45 10 kernel: [ 1455.360794] [  25600]  1000 25600      188       47    27392        0             0 bash
  Oct 17 02:50:45 10 kernel: [ 1455.360796] [  25612]     0 25612      334      107    32512        0             0 sshd
  Oct 17 02:50:45 10 kernel: [ 1455.360797] [  25677]  1000 25677      334      105    32256        0             0 sshd
  Oct 17 02:50:45 10 kernel: [ 1455.360799] [  25678]  1000 25678      185       46    26624        0             0 bash
  Oct 17 02:50:45 10 kernel: [ 1455.360800] [  25690]  1000 25690      124        9    26624        0             0 dmesg
  Oct 17 02:50:45 10 kernel: [ 1455.360802] [  25730]     0 25730      230       61    27904        0             0 sudo
  Oct 17 02:50:45 10 kernel: [ 1455.360804] [  25731]     0 25731       55       11    30208        0             0 runltp
  Oct 17 02:50:45 10 kernel: [ 1455.360806] [  25867]     0 25867       52        8    25856        0             0 ltp-pan
  Oct 17 02:50:45 10 kernel: [ 1455.360808] [  25868]     0 25868       58       22    26112        0             0 cpuacct.sh
  Oct 17 02:50:45 10 kernel: [ 1455.360810] [  25887]     0 25887       49        7    25856        0             0 tst_timeout_kil
  Oct 17 02:50:45 10 kernel: [ 1455.360811] [  26507]     0 26507       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.360813] [  26508]     0 26508       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.360815] [  26509]     0 26509       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.360816] [  26510]     0 26510       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.360818] [  26511]     0 26511       52        7    26112        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504029] [  28610]     0 28610       52        7    26112        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504030] [  28611]     0 28611       52        8    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504032] [  28613]     0 28613       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504033] [  28614]     0 28614       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504034] [  28615]     0 28615       52        7    29952        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504036] [  28616]     0 28616       52        7    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504037] [  28617]     0 28617       52        7    26112        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504038] [  28618]     0 28618       52        8    25856        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504039] [  28619]     0 28619       52        7    30208        0             0 cpuacct_task
  Oct 17 02:50:45 10 kernel: [ 1455.504041] [  28620]     0 28620       52        7    30208        0             0 cpuacct_task
  ....
  Oct 17 02:50:45 10 kernel: [ 1455.507546] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/systemd-networkd.service,task=systemd-network,pid=869,uid=100

  Memory on this instance: 
  $ free -mh
                total        used        free      shared  buff/cache   available
  Mem:           4.0G        300M        3.5G        7.9M        140M        3.3G
  Swap:            0B          0B          0B

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2039515/+subscriptions



References