yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #66543
[Bug 1709032] Re: functional job tests get stuck
The reproducer is following:
kernel: 4.4.0-89-generic
conntrack: 1:1.4.3-3
conntrackd: 1:1.4.3-3
Create a conntrack entry:
sudo conntrack -I --protonum tcp --src 1.2.3.4 --sport 65535 --dst
8.8.8.8 --dport 60000 --state ESTABLISHED --timeout 120
Trace from dmesg:
[ 2964.587682] ------------[ cut here ]------------
[ 2964.588883] kernel BUG at /build/linux-YaBj6t/linux-4.4.0/net/netfilter/nf_conntrack_extend.c:91!
[ 2964.589954] invalid opcode: 0000 [#1] SMP
[ 2964.590556] Modules linked in: br_netfilter bridge openvswitch libcrc32c nf_conntrack_netlink nfnetlink ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nls_utf8 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 isofs nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables hid_generic ppdev crct10dif_pclmul crc32_pclmul usbhid hid snd_pcsp ghash_clmulni_intel joydev aesni_intel snd_pcm input_leds parport_pc aes_x86_64 i2c_piix4 snd_timer lrw evbug parport snd gf128mul 8250_fintek mac_hid serio_raw glue_helper soundcore ablk_helper cryptd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc autofs4 ttm drm_kms_helper
[ 2964.598769] syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse pata_acpi floppy
[ 2964.599587] CPU: 0 PID: 12029 Comm: conntrack Not tainted 4.4.0-89-generic #112-Ubuntu
[ 2964.600347] Hardware name: Fedora Project OpenStack Nova, BIOS 1.9.1-5.el7_3.1 04/01/2014
[ 2964.601178] task: ffff8802331b5940 ti: ffff8800ba5dc000 task.ti: ffff8800ba5dc000
[ 2964.602169] RIP: 0010:[<ffffffffc0368211>] [<ffffffffc0368211>] __nf_ct_ext_add_length+0x141/0x1b0 [nf_conntrack]
[ 2964.603408] RSP: 0018:ffff8800ba5df9a0 EFLAGS: 00010246
[ 2964.604043] RAX: 0000000000000009 RBX: ffff880234303180 RCX: 0000000002080020
[ 2964.604802] RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000000000
[ 2964.606483] RBP: ffff8800ba5df9e8 R08: ffff88023fc1a0c0 R09: ffff8800bb108560
[ 2964.607298] R10: ffff8800bb108500 R11: 000000003a8d6867 R12: ffff8800bb108500
[ 2964.608090] R13: ffff8800ba5dfb58 R14: ffffffff81ef5f00 R15: ffff8800ba5dfa94
[ 2964.608883] FS: 00007f4784492700(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
[ 2964.609895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2964.610542] CR2: 00007f4784071520 CR3: 00000000bab56000 CR4: 00000000000006f0
[ 2964.611327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2964.612120] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2964.612873] Stack:
[ 2964.613197] 0000006000000078 0000000000000009 ffff880234303180 fffffffffffffff4
[ 2964.614137] ffff880234303180 0000000000000002 ffff8800ba5dfb58 ffffffff81ef5f00
[ 2964.615091] ffff8800ba5dfa94 ffff8800ba5dfa70 ffffffffc03a4c34 0000000000000000
[ 2964.616096] Call Trace:
[ 2964.616429] [<ffffffffc03a4c34>] ctnetlink_create_conntrack+0x244/0x4d0 [nf_conntrack_netlink]
[ 2964.617433] [<ffffffffc035fecd>] ? __nf_conntrack_find_get+0x34d/0x370 [nf_conntrack]
[ 2964.618392] [<ffffffffc03a728b>] ctnetlink_new_conntrack+0x44b/0x650 [nf_conntrack_netlink]
[ 2964.619549] [<ffffffffc0398250>] ? nfnetlink_net_exit_batch+0x70/0x70 [nfnetlink]
[ 2964.620561] [<ffffffffc0398464>] nfnetlink_rcv_msg+0x214/0x220 [nfnetlink]
[ 2964.621305] [<ffffffffc0398250>] ? nfnetlink_net_exit_batch+0x70/0x70 [nfnetlink]
[ 2964.622222] [<ffffffff8176a824>] netlink_rcv_skb+0xa4/0xc0
[ 2964.622805] [<ffffffffc0398865>] nfnetlink_rcv+0x295/0x543 [nfnetlink]
[ 2964.623517] [<ffffffff8176880c>] ? netlink_lookup+0xdc/0x140
[ 2964.624179] [<ffffffff8176a1fa>] netlink_unicast+0x18a/0x240
[ 2964.624803] [<ffffffff8176a5ab>] netlink_sendmsg+0x2fb/0x3a0
[ 2964.625426] [<ffffffff813a0401>] ? aa_sock_msg_perm+0x61/0x150
[ 2964.626158] [<ffffffff81719ad8>] sock_sendmsg+0x38/0x50
[ 2964.627035] [<ffffffff8171a0c1>] SYSC_sendto+0x101/0x190
[ 2964.627651] [<ffffffff8106b594>] ? __do_page_fault+0x1b4/0x400
[ 2964.628285] [<ffffffff8171abde>] SyS_sendto+0xe/0x10
[ 2964.628831] [<ffffffff81841f32>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 2964.629523] Code: 45 89 66 24 4c 01 f3 41 29 c4 49 63 d4 48 89 df e8 b5 fd 09 c1 48 83 c4 20 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 31 db eb ea <0f> 0b 41 89 f6 4a 8b 04 f5 e0 0d 37 c0 48 85 c0 74 56 0f b6 70
[ 2964.632792] RIP [<ffffffffc0368211>] __nf_ct_ext_add_length+0x141/0x1b0 [nf_conntrack]
[ 2964.633823] RSP <ffff8800ba5df9a0>
[ 2964.634615] ---[ end trace 7116c308b790b3d4 ]---
All following conntrack commands hang indefinitely and can't be killed.
** Summary changed:
- functional job tests get stuck
+ Creating conntrack entry failure with kernel 4.4.0-89
** Project changed: neutron => linux
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1709032
Title:
Creating conntrack entry failure with kernel 4.4.0-89
Status in Linux:
Confirmed
Bug description:
The functional job failure rate is at 100%. Every time some test gets
stuck and job is killed after timeout.
logstash query:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=build_name%3A%5C
%22gate-neutron-dsvm-functional-ubuntu-
xenial%5C%22%20AND%20tags%3Aconsole%20AND%20message%3A%5C%22Killed%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20timeout%20-s%209%5C%22
2017-08-05 12:36:50.127672 | /home/jenkins/workspace/gate-neutron-
dsvm-functional-ubuntu-xenial/devstack-gate/functions.sh: line 1129:
15261 Killed timeout -s 9 ${REMAINING_TIME}m bash -c
"source $WORKSPACE/devstack-gate/functions.sh && $cmd"
There are a few test executors left, which means there are more tests
stuck:
stack 15468 15445 15468 0.0 0.0 328 796 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit} --load-list /tmp/tmpDTLPoX
stack 15469 15468 15469 1.5 1.8 139332 150008 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpDTLPoX
stack 15470 15445 15470 0.0 0.0 328 700 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit} --load-list /tmp/tmpICNqRQ
stack 15471 15470 15471 1.6 2.0 152056 164812 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpICNqRQ
stack 15474 15445 15474 0.0 0.0 328 792 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit} --load-list /tmp/tmpe646Tl
stack 15475 15474 15475 1.6 1.9 149972 162516 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpe646Tl
stack 15476 15445 15476 0.0 0.0 328 804 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit} --load-list /tmp/tmpv2ovhz
stack 15477 15476 15477 1.2 1.8 136760 149160 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpv2ovhz
stack 15478 15445 15478 0.0 0.0 328 712 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit} --load-list /tmp/tmpDqXE8S
stack 15479 15478 15479 1.5 1.9 148784 161004 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpDqXE8S
stack 15480 15445 15480 0.0 0.0 328 804 /bin/sh -c OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \ OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \ OS_LOG_CAPTURE=${OS_LOG_CAPTURE:-1} \ OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \ ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron/tests/unit} --load-list /tmp/tmpTmmShS
stack 15482 15480 15482 1.6 1.9 148856 161516 python -m subunit.run discover -t ./ ./neutron/tests/functional --load-list /tmp/tmpTmmShS
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1709032/+subscriptions
References