← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1709032] Re: functional job tests get stuck

 

The reproducer is following:

kernel: 4.4.0-89-generic
conntrack: 1:1.4.3-3
conntrackd: 1:1.4.3-3

Create a conntrack entry:

sudo conntrack -I --protonum tcp --src 1.2.3.4 --sport 65535 --dst
8.8.8.8  --dport 60000  --state ESTABLISHED --timeout 120


Trace from dmesg:
 [ 2964.587682] ------------[ cut here ]------------
 [ 2964.588883] kernel BUG at /build/linux-YaBj6t/linux-4.4.0/net/netfilter/nf_conntrack_extend.c:91!
 [ 2964.589954] invalid opcode: 0000 [#1] SMP
 [ 2964.590556] Modules linked in: br_netfilter bridge openvswitch libcrc32c nf_conntrack_netlink nfnetlink ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nls_utf8 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 isofs nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables hid_generic ppdev crct10dif_pclmul crc32_pclmul usbhid hid snd_pcsp ghash_clmulni_intel joydev aesni_intel snd_pcm input_leds parport_pc aes_x86_64 i2c_piix4 snd_timer lrw evbug parport snd gf128mul 8250_fintek mac_hid serio_raw glue_helper soundcore ablk_helper cryptd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc autofs4 ttm drm_kms_helper
 [ 2964.598769]  syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse pata_acpi floppy
 [ 2964.599587] CPU: 0 PID: 12029 Comm: conntrack Not tainted 4.4.0-89-generic #112-Ubuntu
 [ 2964.600347] Hardware name: Fedora Project OpenStack Nova, BIOS 1.9.1-5.el7_3.1 04/01/2014
 [ 2964.601178] task: ffff8802331b5940 ti: ffff8800ba5dc000 task.ti: ffff8800ba5dc000
 [ 2964.602169] RIP: 0010:[<ffffffffc0368211>]  [<ffffffffc0368211>] __nf_ct_ext_add_length+0x141/0x1b0 [nf_conntrack]
 [ 2964.603408] RSP: 0018:ffff8800ba5df9a0  EFLAGS: 00010246
 [ 2964.604043] RAX: 0000000000000009 RBX: ffff880234303180 RCX: 0000000002080020
 [ 2964.604802] RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000000000
 [ 2964.606483] RBP: ffff8800ba5df9e8 R08: ffff88023fc1a0c0 R09: ffff8800bb108560
 [ 2964.607298] R10: ffff8800bb108500 R11: 000000003a8d6867 R12: ffff8800bb108500
 [ 2964.608090] R13: ffff8800ba5dfb58 R14: ffffffff81ef5f00 R15: ffff8800ba5dfa94
 [ 2964.608883] FS:  00007f4784492700(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
 [ 2964.609895] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 2964.610542] CR2: 00007f4784071520 CR3: 00000000bab56000 CR4: 00000000000006f0
 [ 2964.611327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [ 2964.612120] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [ 2964.612873] Stack:
 [ 2964.613197]  0000006000000078 0000000000000009 ffff880234303180 fffffffffffffff4
 [ 2964.614137]  ffff880234303180 0000000000000002 ffff8800ba5dfb58 ffffffff81ef5f00
 [ 2964.615091]  ffff8800ba5dfa94 ffff8800ba5dfa70 ffffffffc03a4c34 0000000000000000
 [ 2964.616096] Call Trace:
 [ 2964.616429]  [<ffffffffc03a4c34>] ctnetlink_create_conntrack+0x244/0x4d0 [nf_conntrack_netlink]
 [ 2964.617433]  [<ffffffffc035fecd>] ? __nf_conntrack_find_get+0x34d/0x370 [nf_conntrack]
 [ 2964.618392]  [<ffffffffc03a728b>] ctnetlink_new_conntrack+0x44b/0x650 [nf_conntrack_netlink]
 [ 2964.619549]  [<ffffffffc0398250>] ? nfnetlink_net_exit_batch+0x70/0x70 [nfnetlink]
 [ 2964.620561]  [<ffffffffc0398464>] nfnetlink_rcv_msg+0x214/0x220 [nfnetlink]
 [ 2964.621305]  [<ffffffffc0398250>] ? nfnetlink_net_exit_batch+0x70/0x70 [nfnetlink]
 [ 2964.622222]  [<ffffffff8176a824>] netlink_rcv_skb+0xa4/0xc0
 [ 2964.622805]  [<ffffffffc0398865>] nfnetlink_rcv+0x295/0x543 [nfnetlink]
 [ 2964.623517]  [<ffffffff8176880c>] ? netlink_lookup+0xdc/0x140
 [ 2964.624179]  [<ffffffff8176a1fa>] netlink_unicast+0x18a/0x240
 [ 2964.624803]  [<ffffffff8176a5ab>] netlink_sendmsg+0x2fb/0x3a0
 [ 2964.625426]  [<ffffffff813a0401>] ? aa_sock_msg_perm+0x61/0x150
 [ 2964.626158]  [<ffffffff81719ad8>] sock_sendmsg+0x38/0x50
 [ 2964.627035]  [<ffffffff8171a0c1>] SYSC_sendto+0x101/0x190
 [ 2964.627651]  [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
 [ 2964.628285]  [<ffffffff8171abde>] SyS_sendto+0xe/0x10
 [ 2964.628831]  [<ffffffff81841f32>] entry_SYSCALL_64_fastpath+0x16/0x71
 [ 2964.629523] Code: 45 89 66 24 4c 01 f3 41 29 c4 49 63 d4 48 89 df e8 b5 fd 09 c1 48 83 c4 20 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 31 db eb ea <0f> 0b 41 89 f6 4a 8b 04 f5 e0 0d 37 c0 48 85 c0 74 56 0f b6 70
 [ 2964.632792] RIP  [<ffffffffc0368211>] __nf_ct_ext_add_length+0x141/0x1b0 [nf_conntrack]
 [ 2964.633823]  RSP <ffff8800ba5df9a0>
 [ 2964.634615] ---[ end trace 7116c308b790b3d4 ]---

All following conntrack commands hang indefinitely and can't be killed.


** Summary changed:

- functional job tests get stuck
+ Creating conntrack entry failure with kernel 4.4.0-89

** Project changed: neutron => linux

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1709032

Title:
  Creating conntrack entry failure with kernel 4.4.0-89

Status in Linux:
  Confirmed

Bug description:
  The functional job failure rate is at 100%. Every time some test gets
  stuck and job is killed after timeout.

  logstash query:
  http://logstash.openstack.org/#dashboard/file/logstash.json?query=build_name%3A%5C
  %22gate-neutron-dsvm-functional-ubuntu-
  xenial%5C%22%20AND%20tags%3Aconsole%20AND%20message%3A%5C%22Killed%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20timeout%20-s%209%5C%22

  2017-08-05 12:36:50.127672 | /home/jenkins/workspace/gate-neutron-
  dsvm-functional-ubuntu-xenial/devstack-gate/functions.sh: line 1129:
  15261 Killed                  timeout -s 9 ${REMAINING_TIME}m bash -c
  "source $WORKSPACE/devstack-gate/functions.sh && $cmd"

  There are a few test executors left, which means there are more tests
  stuck:

  stack    15468 15445 15468  0.0  0.0   328   796 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpDTLPoX
  stack    15469 15468 15469  1.5  1.8 139332 150008 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpDTLPoX
  stack    15470 15445 15470  0.0  0.0   328   700 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpICNqRQ
  stack    15471 15470 15471  1.6  2.0 152056 164812 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpICNqRQ
  stack    15474 15445 15474  0.0  0.0   328   792 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpe646Tl
  stack    15475 15474 15475  1.6  1.9 149972 162516 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpe646Tl
  stack    15476 15445 15476  0.0  0.0   328   804 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpv2ovhz
  stack    15477 15476 15477  1.2  1.8 136760 149160 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpv2ovhz
  stack    15478 15445 15478  0.0  0.0   328   712 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpDqXE8S
  stack    15479 15478 15479  1.5  1.9 148784 161004 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpDqXE8S
  stack    15480 15445 15480  0.0  0.0   328   804 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit}  --load-list /tmp/tmpTmmShS
  stack    15482 15480 15482  1.6  1.9 148856 161516 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpTmmShS

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1709032/+subscriptions


References