← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Bug 2072501] Re: net:pmtu.sh of ubuntu_kselftests_net and subsequent subtests time out

 

I was able to reproduce the issue on N-nvidia and N-nvidia-64k, versions
6.8.0-1009.9 and 6.8.0-1008.8, so it does not appear to be a new
regression. I could also reproduce the issue on upstream 6.10.0-rc7.

I can cause the issue reliably on node hinyari by repeatedly running
pmtu.sh from kernel selftests individually with only the
pmtu_ipv6_ipv6_exception test case:

# while true; do ./pmtu.sh pmtu_ipv6_ipv6_exception; done

After a couple dozen iterations, I see the following in dmesg:

[   67.586600] ip6_tunnel: ip_a xmit: Local address not yet configured!
[   73.426596] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
[   73.435693] audit: type=1400 audit(1720645447.729:108): apparmor="DENIED" operation="open" class="file" profile="rsyslogd" name="/run/systemd/sessions/" pid=1668 comm=72733A6D61696E20513A526567 requested_mask="r" denied_mask="r" fsuid=102 ouid=0
[   76.802597] ip6_tunnel: ip_a xmit: Local address not yet configured!
[   83.546588] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
[   83.555666] audit: type=1400 audit(1720645457.849:109): apparmor="DENIED" operation="open" class="file" profile="rsyslogd" name="/run/systemd/sessions/" pid=1668 comm=72733A6D61696E20513A526567 requested_mask="r" denied_mask="r" fsuid=102 ouid=0
[   93.690586] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
[   93.699676] audit: type=1400 audit(1720645467.993:110): apparmor="DENIED" operation="open" class="file" profile="rsyslogd" name="/run/systemd/sessions/" pid=1668 comm=72733A6D61696E20513A526567 requested_mask="r" denied_mask="r" fsuid=102 ouid=0
[   94.722599] ip6_tunnel: ip_a xmit: Local address not yet configured!
[   96.259457] neighbour: ndisc_cache: neighbor table overflow!
[   96.259484] neighbour: ndisc_cache: neighbor table overflow!
[   96.259512] neighbour: ndisc_cache: neighbor table overflow!
[   96.259539] neighbour: ndisc_cache: neighbor table overflow!


Running ./pmtu.sh pmtu_vti6_exception normally works fine for many iterations. However, if ip6_vti wasn't loaded before hand, and the above "waiting for veth_A-R1 to become free" condition occurs, running it results in it hanging, unable to be interrupted.

The following eventually appears in dmesg:
[  492.525328] INFO: task modprobe:21918 blocked for more than 122 seconds.
[  492.532186]       Tainted: G           OE      6.8.0-1008-nvidia #8-Ubuntu
[  492.539219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  492.547228] task:modprobe        state:D stack:0     pid:21918 tgid:21918 ppid:828    flags:0x0080000c
[  492.547232] Call trace:
[  492.547233]  __switch_to+0xc8/0x100
[  492.547241]  __schedule+0x2a8/0x7b0
[  492.547243]  schedule+0x40/0x168
[  492.547245]  schedule_preempt_disabled+0x30/0x68
[  492.547248]  rwsem_down_write_slowpath+0x27c/0x768
[  492.547252]  down_write+0xcc/0xd8
[  492.547254]  register_pernet_device+0x38/0x98
[  492.547263]  vti6_tunnel_init+0x30/0xff8 [ip6_vti]
[  492.547272]  do_one_initcall+0x64/0x3b8
[  492.547276]  do_init_module+0xa4/0x280
[  492.547279]  load_module+0x7b8/0x8f0
[  492.547281]  init_module_from_file+0x98/0x118
[  492.547283]  idempotent_init_module+0x1a4/0x2c8
[  492.547285]  __arm64_sys_finit_module+0x70/0xf8
[  492.547286]  invoke_syscall+0x7c/0x128
[  492.547288]  el0_svc_common.constprop.0+0x4c/0x140
[  492.547290]  do_el0_svc+0x28/0x58
[  492.547291]  el0_svc+0x44/0x1a0
[  492.547294]  el0t_64_sync_handler+0x148/0x158
[  492.547297]  el0t_64_sync+0x1b0/0x1b8

-- 
You received this bug notification because you are a member of Canonical
Platform QA Team, which is subscribed to ubuntu-kernel-tests.
https://bugs.launchpad.net/bugs/2072501

Title:
  net:pmtu.sh of ubuntu_kselftests_net and subsequent subtests time out

Status in ubuntu-kernel-tests:
  New

Bug description:
  Found on N-nvidia-64k with node hinyari. After net:pmtu.sh times out,
  these tests also timeout until "TEST SYSTEM FAILURE DETECTED" is
  eventually triggered: net:udpgso.sh, net:ip_defrag.sh,
  net:udpgso_bench.sh, net:fib_rule_tests.sh.

  14:44:14 INFO | 	START	ubuntu_kselftests_net.net:pmtu.sh	ubuntu_kselftests_net.net:pmtu.sh	timeout=2700	timestamp=1719413054	localtime=Jun 26 14:44:14	
  14:44:14 DEBUG| Persistent state client._record_indent now set to 2
  14:44:14 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_kselftests_net.net:pmtu.sh', 'ubuntu_kselftests_net.net:pmtu.sh')
  14:44:14 DEBUG| Waiting for pid 43936 for 2700 seconds
  14:44:14 WARNI| System python is too old, crash handling disabled
  14:44:14 DEBUG| Running 'make run_tests -C net TEST_PROGS=pmtu.sh TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
  14:44:14 DEBUG| [stdout] make: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net'
  14:44:14 DEBUG| [stdout] TAP version 13
  14:44:14 DEBUG| [stdout] 1..1
  14:44:14 DEBUG| [stdout] # timeout set to 0
  14:44:14 DEBUG| [stdout] # selftests: net: pmtu.sh
  14:44:14 DEBUG| [stdout] # TEST: ipv4: PMTU exceptions                                         [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ipv6: PMTU exceptions                                         [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]
  14:44:14 DEBUG| [stdout] # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions                     [ OK ]
  14:44:15 DEBUG| [stdout] # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions - nexthop objects   [ OK ]
  14:44:15 DEBUG| [stdout] # 2024/06/26 14:44:15 socat[44926] E xioopen_ipdgram_listen(): unknown address family 0
  14:44:15 DEBUG| [stdout] # TEST: UDPv4 with DSCP and ECN: PMTU exceptions                      [ OK ]
  14:44:15 DEBUG| [stdout] # ./pmtu.sh: line 937: kill: (44926) - No such process
  14:44:15 DEBUG| [stdout] # 2024/06/26 14:44:15 socat[45066] E xioopen_ipdgram_listen(): unknown address family 0
  14:44:15 DEBUG| [stdout] # TEST: UDPv4 with DSCP and ECN: PMTU exceptions - nexthop objects    [ OK ]
  14:44:15 DEBUG| [stdout] # ./pmtu.sh: line 937: kill: (45066) - No such process
  14:44:16 DEBUG| [stdout] # TEST: IPv4 over vxlan4: PMTU exceptions                             [ OK ]
  14:44:17 DEBUG| [stdout] # TEST: IPv4 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
  14:44:18 DEBUG| [stdout] # TEST: IPv6 over vxlan4: PMTU exceptions                             [ OK ]
  14:44:19 DEBUG| [stdout] # TEST: IPv6 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
  14:44:20 DEBUG| [stdout] # TEST: IPv4 over vxlan6: PMTU exceptions                             [ OK ]
  14:44:22 DEBUG| [stdout] # TEST: IPv4 over vxlan6: PMTU exceptions - nexthop objects           [ OK ]
  14:44:23 DEBUG| [stdout] # TEST: IPv6 over vxlan6: PMTU exceptions                             [ OK ]
  14:44:24 DEBUG| [stdout] # TEST: IPv6 over vxlan6: PMTU exceptions - nexthop objects           [ OK ]
  14:44:25 DEBUG| [stdout] # TEST: IPv4 over geneve4: PMTU exceptions                            [ OK ]
  14:44:26 DEBUG| [stdout] # TEST: IPv4 over geneve4: PMTU exceptions - nexthop objects          [ OK ]
  14:44:27 DEBUG| [stdout] # TEST: IPv6 over geneve4: PMTU exceptions                            [ OK ]
  14:44:28 DEBUG| [stdout] # TEST: IPv6 over geneve4: PMTU exceptions - nexthop objects          [ OK ]
  14:44:30 DEBUG| [stdout] # TEST: IPv4 over geneve6: PMTU exceptions                            [ OK ]
  14:44:31 DEBUG| [stdout] # TEST: IPv4 over geneve6: PMTU exceptions - nexthop objects          [ OK ]
  14:44:32 DEBUG| [stdout] # TEST: IPv6 over geneve6: PMTU exceptions                            [ OK ]
  14:44:33 DEBUG| [stdout] # TEST: IPv6 over geneve6: PMTU exceptions - nexthop objects          [ OK ]
  14:44:35 DEBUG| [stdout] # TEST: IPv4, bridged vxlan4: PMTU exceptions                         [ OK ]
  14:44:37 DEBUG| [stdout] # TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects       [ OK ]
  14:44:40 DEBUG| [stdout] # TEST: IPv6, bridged vxlan4: PMTU exceptions                         [ OK ]
  14:44:42 DEBUG| [stdout] # TEST: IPv6, bridged vxlan4: PMTU exceptions - nexthop objects       [ OK ]
  14:44:44 DEBUG| [stdout] # TEST: IPv4, bridged vxlan6: PMTU exceptions                         [ OK ]
  14:44:46 DEBUG| [stdout] # TEST: IPv4, bridged vxlan6: PMTU exceptions - nexthop objects       [ OK ]
  14:44:49 DEBUG| [stdout] # TEST: IPv6, bridged vxlan6: PMTU exceptions                         [ OK ]
  14:44:51 DEBUG| [stdout] # TEST: IPv6, bridged vxlan6: PMTU exceptions - nexthop objects       [ OK ]
  14:44:53 DEBUG| [stdout] # TEST: IPv4, bridged geneve4: PMTU exceptions                        [ OK ]
  14:44:55 DEBUG| [stdout] # TEST: IPv4, bridged geneve4: PMTU exceptions - nexthop objects      [ OK ]
  14:44:58 DEBUG| [stdout] # TEST: IPv6, bridged geneve4: PMTU exceptions                        [ OK ]
  14:45:00 DEBUG| [stdout] # TEST: IPv6, bridged geneve4: PMTU exceptions - nexthop objects      [ OK ]
  14:45:02 DEBUG| [stdout] # TEST: IPv4, bridged geneve6: PMTU exceptions                        [ OK ]
  14:45:04 DEBUG| [stdout] # TEST: IPv4, bridged geneve6: PMTU exceptions - nexthop objects      [ OK ]
  14:45:07 DEBUG| [stdout] # TEST: IPv6, bridged geneve6: PMTU exceptions                        [ OK ]
  14:45:09 DEBUG| [stdout] # TEST: IPv6, bridged geneve6: PMTU exceptions - nexthop objects      [ OK ]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv4, OVS vxlan4: PMTU exceptions                             [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv6, OVS vxlan4: PMTU exceptions                             [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv4, OVS vxlan6: PMTU exceptions                             [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv6, OVS vxlan6: PMTU exceptions                             [SKIP]
  14:45:09 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:09 DEBUG| [stdout] # TEST: IPv4, OVS geneve4: PMTU exceptions                            [SKIP]
  14:45:10 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:10 DEBUG| [stdout] # TEST: IPv6, OVS geneve4: PMTU exceptions                            [SKIP]
  14:45:10 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:10 DEBUG| [stdout] # TEST: IPv4, OVS geneve6: PMTU exceptions                            [SKIP]
  14:45:10 DEBUG| [stdout] #   ovs_bridge not supported
  14:45:10 DEBUG| [stdout] # TEST: IPv6, OVS geneve6: PMTU exceptions                            [SKIP]
  14:45:11 DEBUG| [stdout] # TEST: IPv4 over fou4: PMTU exceptions                               [ OK ]
  14:45:11 DEBUG| [stdout] # TEST: IPv4 over fou4: PMTU exceptions - nexthop objects             [ OK ]
  14:45:11 DEBUG| [stdout] # TEST: IPv6 over fou4: PMTU exceptions                               [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv6 over fou4: PMTU exceptions - nexthop objects             [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv4 over fou6: PMTU exceptions                               [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv4 over fou6: PMTU exceptions - nexthop objects             [ OK ]
  14:45:12 DEBUG| [stdout] # TEST: IPv6 over fou6: PMTU exceptions                               [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv6 over fou6: PMTU exceptions - nexthop objects             [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv4 over gue4: PMTU exceptions                               [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv4 over gue4: PMTU exceptions - nexthop objects             [ OK ]
  14:45:13 DEBUG| [stdout] # TEST: IPv6 over gue4: PMTU exceptions                               [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv6 over gue4: PMTU exceptions - nexthop objects             [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv4 over gue6: PMTU exceptions                               [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv4 over gue6: PMTU exceptions - nexthop objects             [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv6 over gue6: PMTU exceptions                               [ OK ]
  14:45:14 DEBUG| [stdout] # TEST: IPv6 over gue6: PMTU exceptions - nexthop objects             [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv4 over IPv4: PMTU exceptions                               [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv4 over IPv4: PMTU exceptions - nexthop objects             [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv6 over IPv4: PMTU exceptions                               [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv6 over IPv4: PMTU exceptions - nexthop objects             [ OK ]
  14:45:15 DEBUG| [stdout] # TEST: IPv4 over IPv6: PMTU exceptions                               [ OK ]
  14:45:16 DEBUG| [stdout] # TEST: IPv4 over IPv6: PMTU exceptions - nexthop objects             [ OK ]
  14:45:16 DEBUG| [stdout] # TEST: IPv6 over IPv6: PMTU exceptions                               [ OK ]
  14:45:16 DEBUG| [stdout] # TEST: IPv6 over IPv6: PMTU exceptions - nexthop objects             [ OK ]
  15:29:14 INFO | Timer expired (2700 sec.), nuking pid 43936
  15:29:14 INFO | 		ERROR	ubuntu_kselftests_net.net:pmtu.sh	ubuntu_kselftests_net.net:pmtu.sh	timestamp=1719415754	localtime=Jun 26 15:29:14	Test timeout expired, rc=15
  15:29:14 INFO | 	END ERROR	ubuntu_kselftests_net.net:pmtu.sh	ubuntu_kselftests_net.net:pmtu.sh	timestamp=1719415754	localtime=Jun 26 15:29:14	
  15:29:14 DEBUG| Persistent state client._record_indent now set to 1
  15:29:14 DEBUG| Persistent state client.unexpected_reboot deleted
  15:29:14 DEBUG| Test has timeout: 2700 sec.
  15:29:14 INFO | 	START	ubuntu_kselftests_net.net:udpgso.sh	ubuntu_kselftests_net.net:udpgso.sh	timeout=2700	timestamp=1719415754	localtime=Jun 26 15:29:14	
  15:29:14 DEBUG| Persistent state client._record_indent now set to 2
  15:29:14 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_kselftests_net.net:udpgso.sh', 'ubuntu_kselftests_net.net:udpgso.sh')
  15:29:14 DEBUG| Waiting for pid 54956 for 2700 seconds
  15:29:14 WARNI| System python is too old, crash handling disabled
  15:29:14 DEBUG| Running 'make run_tests -C net TEST_PROGS=udpgso.sh TEST_GEN_PROGS='' TEST_CUSTOM_PROGS='''
  15:29:14 DEBUG| [stdout] make: Entering directory '/home/ubuntu/autotest/client/tmp/ubuntu_kselftests_net/src/linux/tools/testing/selftests/net'
  15:29:14 DEBUG| [stdout] TAP version 13
  15:29:14 DEBUG| [stdout] 1..1
  15:29:14 DEBUG| [stdout] # timeout set to 0
  15:29:14 DEBUG| [stdout] # selftests: net: udpgso.sh
  15:29:14 DEBUG| [stdout] # ipv4 cmsg
  16:14:14 INFO | Timer expired (2700 sec.), nuking pid 54956
  16:14:14 INFO | 		ERROR	ubuntu_kselftests_net.net:udpgso.sh	ubuntu_kselftests_net.net:udpgso.sh	timestamp=1719418454	localtime=Jun 26 16:14:14	Test timeout expired, rc=15
  16:14:14 INFO | 	END ERROR	ubuntu_kselftests_net.net:udpgso.sh	ubuntu_kselftests_net.net:udpgso.sh	timestamp=1719418454	localtime=Jun 26 
  ...

  
  Accompanied by the following dmesg entries, "waiting for veth_A-R1 to become free" appears to repeat until the system is shutdown:
  [  574.404587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  584.652587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  594.900587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  605.148587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  615.396587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  625.644587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  635.892587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  646.140587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  656.388587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  666.636587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  676.884586] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  687.132586] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  697.380587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  707.628586] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  717.876586] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  728.124587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  738.284631] INFO: task modprobe:54505 blocked for more than 122 seconds.
  [  738.291488]       Not tainted 6.8.0-1009-nvidia-64k #9-Ubuntu
  [  738.297364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [  738.372587] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  [  748.620586] unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 5
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072501/+subscriptions



References