kernel-packages team mailing list archive

Thread
Date
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Rafael David Tinoco <rafael.tinoco@xxxxxxxxxxxxx>
Date: Wed, 03 Jun 2015 17:23:43 -0000
Reply-to: Bug 1461620 <1461620@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
To understand better if this bug was triggered easy I created the
following test case:

I've been using a KVM guest emulating a NUMA environment with 32
different domains (1 for each vCPU):

root@numa:~# numactl -H 
available: 32 nodes (0-31) 
node 0 cpus: 0 
node 0 size: 237 MB 
node 0 free: 82 MB 
node 1 cpus: 1 
node 1 size: 251 MB 
node 1 free: 15 MB 
node 2 cpus: 2 
node 2 size: 251 MB 
node 2 free: 52 MB 
node 3 cpus: 3 
node 3 size: 251 MB 
node 3 free: 240 MB 
node 4 cpus: 4 
node 4 size: 251 MB 
node 4 free: 15 MB 
node 5 cpus: 5 
node 5 size: 251 MB 
node 5 free: 15 MB 
node 6 cpus: 6 
node 6 size: 251 MB 
node 6 free: 17 MB 
node 7 cpus: 7 
node 7 size: 251 MB 
node 7 free: 15 MB 
node 8 cpus: 8 
node 8 size: 251 MB 
node 8 free: 16 MB 
node 9 cpus: 9 
node 9 size: 251 MB 
node 9 free: 16 MB 
node 10 cpus: 10 
node 10 size: 251 MB 
node 10 free: 15 MB 
node 11 cpus: 11 
node 11 size: 187 MB 
node 11 free: 13 MB 
node 12 cpus: 12 
node 12 size: 251 MB 
node 12 free: 15 MB 
node 13 cpus: 13 
node 13 size: 251 MB 
node 13 free: 17 MB 
node 14 cpus: 14 
node 14 size: 251 MB 
node 14 free: 15 MB 
node 15 cpus: 15 
node 15 size: 251 MB 
node 15 free: 16 MB 
node 16 cpus: 16 
node 16 size: 251 MB 
node 16 free: 17 MB 
node 17 cpus: 17 
node 17 size: 251 MB 
node 17 free: 17 MB 
node 18 cpus: 18 
node 18 size: 251 MB 
node 18 free: 16 MB 
node 19 cpus: 19 
node 19 size: 251 MB 
node 19 free: 15 MB 
node 20 cpus: 20 
node 20 size: 251 MB 
node 20 free: 16 MB 
node 21 cpus: 21 
node 21 size: 251 MB 
node 21 free: 17 MB 
node 22 cpus: 22 
node 22 size: 251 MB 
node 22 free: 51 MB 
node 23 cpus: 23 
node 23 size: 251 MB 
node 23 free: 37 MB 
node 24 cpus: 24 
node 24 size: 251 MB 
node 24 free: 120 MB 
node 25 cpus: 25 
node 25 size: 251 MB 
node 25 free: 115 MB 
node 26 cpus: 26 
node 26 size: 251 MB 
node 26 free: 41 MB 
node 27 cpus: 27 
node 27 size: 251 MB 
node 27 free: 15 MB 
node 28 cpus: 28 
node 28 size: 251 MB 
node 28 free: 15 MB 
node 29 cpus: 29 
node 29 size: 251 MB 
node 29 free: 17 MB 
node 30 cpus: 30 
node 30 size: 251 MB 
node 30 free: 164 MB 
node 31 cpus: 31 
node 31 size: 251 MB 
node 31 free: 228 MB 

And stressing the environment (as you can see in "free memory" for every
NUMA node with a specific tool that allocates a certain amount of memory
and "touches" every 32 bytes of this memory (and dirtying it at the end,
restarting the same behavior). Together with that I'm creating enough
kernel tasks concurrent to these memory allocators for them to compete
for CPU -> forcing the memory threads to migrate between CPUs (and NUMA
domains since every CPU is inside a different NUMA domain).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Trusty:
  In Progress

Bug description:
  It was brought to my attention the follow kernel panic:

  """
  [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 <39> d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f 
  [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153] 
  [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sb_edac wmi lpc_ich edac_core mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache lp parport hid_generic ixgbe fnic libfcoe dca ptp 
  [3367068.100409] libfc megaraid_sas pps_core mdio usbhid scsi_transport_fc hid enic scsi_tgt 
  [3367068.100415] CPU: 16 PID: 153 Comm: migration/16 Tainted: GF O 3.13.0-34-generic #60-Ubuntu 
  [3367068.100417] Hardware name: Cisco Systems Inc UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.1.5.4f.0.111320130449 11/13/2013 
  [3367068.100419] task: ffff881fd2f517f0 ti: ffff881fd2f1c000 task.ti: ffff881fd2f1c000 
  [3367068.100420] RIP: 0010:[<ffffffff810f5944>] [<ffffffff810f5944>] multi_cpu_stop+0x64/0xf0 
  [3367068.100426] RSP: 0000:ffff881fd2f1dd98 EFLAGS: 00000246 
  [3367068.100427] RAX: ffffffff8180af40 RBX: 0000000000000086 RCX: 000000000000a402 
  [3367068.100428] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff883e607edb48 
  [3367068.100430] RBP: ffff881fd2f1ddb8 R08: 0000000000000282 R09: 0000000000000001 
  [3367068.100431] R10: 000000000000b6d8 R11: ffff881fc374dc80 R12: 0000000000014440 
  [3367068.100432] R13: ffff881fd291ae00 R14: ffff881fd291ae08 R15: 0000000200000010 
  [3367068.100433] FS: 0000000000000000(0000) GS:ffff881fffd00000(0000) knlGS:0000000000000000 
  [3367068.100434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
  [3367068.100435] CR2: 00007f6202134b98 CR3: 0000000001c0e000 CR4: 00000000001407e0 
  [3367068.100437] Stack: 
  [3367068.100438] ffff883e607edb70 ffff881fffd0ede0 ffff881fffd0ede8 ffff883e607edb48 
  [3367068.100441] ffff881fd2f1de78 ffffffff810f5b5e ffffffff8109dfc4 ffff881fffd14440 
  [3367068.100443] ffff881fd2f1de08 ffffffff81097508 0000000000000000 ffff881fffd14440 
  [3367068.100446] Call Trace: 
  [3367068.100450] [<ffffffff810f5b5e>] cpu_stopper_thread+0x7e/0x150 
  [3367068.100454] [<ffffffff8109dfc4>] ? vtime_common_task_switch+0x24/0x40 
  [3367068.100458] [<ffffffff81097508>] ? finish_task_switch+0x128/0x170 
  [3367068.100462] [<ffffffff8171fd41>] ? __schedule+0x381/0x7d0 
  [3367068.100465] [<ffffffff810926af>] smpboot_thread_fn+0xff/0x1b0 
  [3367068.100467] [<ffffffff810925b0>] ? SyS_setgroups+0x1a0/0x1a0 
  [3367068.100470] [<ffffffff8108b3d2>] kthread+0xd2/0xf0 
  [3367068.100473] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0 
  [3367068.100477] [<ffffffff8172c6bc>] ret_from_fork+0x7c/0xb0 
  [3367068.100479] [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0 
  [3367068.100480] Code: db 85 db 41 0f 95 c5 31 f6 31 d2 eb 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da <83> fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7
  """

  I'm explaining WHY this is happening in the first comments and HOW to
  fix it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions
References

[Bug 1461620] [NEW] NUMA task migration race condition due to stop task not being checked when balancing happens
From: Rafael David Tinoco, 2015-06-03