kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #158905
[Bug 1537666] Re: ISST-LTE: Ubuntu 14.04.4 LPAR interrupts at check_and_cede_processor
** Tags removed: targetmilestone-inin---
** Tags added: targetmilestone-inin14044
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1537666
Title:
ISST-LTE: Ubuntu 14.04.4 LPAR interrupts at check_and_cede_processor
Status in linux package in Ubuntu:
Triaged
Bug description:
== Comment: #0 - YUECHANG E. MEI <yemei@xxxxxxxxxx> - 2015-12-11 17:19:07 ==
---Problem Description---
We have an Ubuntu 14.04.4 LPAR, conelp2. It is running stress test: base, io, and tcp. When checking "dmesg", we see this interruption:
[Fri Dec 11 13:58:50 2015] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
[Fri Dec 11 13:58:50 2015] LR = check_and_cede_processor+0x34/0x50
In the previous test, conelp2 stopped all the stress tests by itself
because it ran out of memory. Is the out of memory issue relating to
the interruption?
Contact Information = Yuechang (Erin) Mei /yemei@xxxxxxxxxx, Raja Sunkari /rajasunk@xxxxxxxxxx
---uname output---
Linux conelp2 4.2.0-21-generic #25~14.04.1-Ubuntu SMP Thu Dec 3 13:55:42 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = EUH Alpine 8408-E8E
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1. install Ubuntu 14.04.4 in a LPAR, then update to the latest 14.04.4 kernel by using this workaround:
echo "deb http://software.linux.ibm.com/pub/ubuntu-ppc64el-repository/ trusty-proposed main restricted universe multiverse" >> /etc/apt/sources.list
apt-get update
apt-get install linux-image-generic-lts-wily
2. Setup the Stress test, and start base,io, tcp
3. After an hour, check dmesg, then you will see the message about the interruption
Stack trace output:
no
Oops output:
no
System Dump Info:
The system is not configured to capture a system dump.
*Additional Instructions for Yuechang (Erin) Mei /yemei@xxxxxxxxxx, Raja Sunkari /rajasunk@xxxxxxxxxx:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach sysctl -a output output to the bug.
== Comment: #1 - YUECHANG E. MEI <yemei@xxxxxxxxxx> - 2015-12-11
17:23:00 ==
== Comment: #3 - YUECHANG E. MEI <yemei@xxxxxxxxxx> - 2015-12-14 15:23:33 ==
== Comment: #4 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2015-12-15 03:56:14 ==
dmrsg show page allocation failure
[Fri Dec 11 13:45:38 2015] swapper/127: page allocation failure: order:0, mode:0x120
[Fri Dec 11 13:45:38 2015] CPU: 127 PID: 0 Comm: swapper/127 Not tainted 4.2.0-21-generic #25~14.04.1-Ubuntu
[Fri Dec 11 13:45:38 2015] Call Trace:
[Fri Dec 11 13:45:38 2015] [c00000027fbc3890] [c000000000a805ec] dump_stack+0x90/0xbc (unreliable)
[Fri Dec 11 13:45:38 2015] [c00000027fbc38c0] [c00000000021c118] warn_alloc_failed+0x118/0x160
[Fri Dec 11 13:45:38 2015] [c00000027fbc3960] [c000000000221114] __alloc_pages_nodemask+0x834/0xa60
[Fri Dec 11 13:45:38 2015] [c00000027fbc3b10] [c000000000221404] __alloc_page_frag+0xc4/0x190
[Fri Dec 11 13:45:38 2015] [c00000027fbc3b50] [c0000000008f6d20] netdev_alloc_frag+0x50/0x80
[Fri Dec 11 13:45:38 2015] [c00000027fbc3b80] [c000000000764e80] tg3_alloc_rx_data+0xa0/0x2c0
[Fri Dec 11 13:45:38 2015] [c00000027fbc3be0] [c000000000767344] tg3_poll_work+0x484/0x1070
[Fri Dec 11 13:45:38 2015] [c00000027fbc3ce0] [c000000000767f8c] tg3_poll_msix+0x5c/0x210
[Fri Dec 11 13:45:38 2015] [c00000027fbc3d30] [c00000000090ebb8] net_rx_action+0x2d8/0x430
[Fri Dec 11 13:45:38 2015] [c00000027fbc3e40] [c0000000000ba124] __do_softirq+0x174/0x390
[Fri Dec 11 13:45:38 2015] [c00000027fbc3f40] [c0000000000ba6c8] irq_exit+0xc8/0x100
[Fri Dec 11 13:45:38 2015] [c00000027fbc3f60] [c0000000000111ec] __do_irq+0x8c/0x190
[Fri Dec 11 13:45:38 2015] [c00000027fbc3f90] [c000000000024278] call_do_irq+0x14/0x24
[Fri Dec 11 13:45:38 2015] [c0000002763a39b0] [c000000000011390] do_IRQ+0xa0/0x120
[Fri Dec 11 13:45:38 2015] [c0000002763a3a10] [c0000000000099b0] restore_check_irq_replay+0x2c/0x70
[Fri Dec 11 13:45:38 2015] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
[Fri Dec 11 13:45:38 2015] LR = check_and_cede_processor+0x34/0x50
[Fri Dec 11 13:45:38 2015] [c0000002763a3d00] [c0000000008a8d90] check_and_cede_processor+0x20/0x50 (unreliable)
[Fri Dec 11 13:45:38 2015] [c0000002763a3d60] [c0000000008a8fb8] shared_cede_loop+0x68/0x170
[Fri Dec 11 13:45:38 2015] [c0000002763a3da0] [c0000000008a615c] cpuidle_enter_state+0xbc/0x350
[Fri Dec 11 13:45:38 2015] [c0000002763a3e00] [c000000000110f3c] call_cpuidle+0x7c/0xd0
[Fri Dec 11 13:45:38 2015] [c0000002763a3e40] [c0000000001112d0] cpu_startup_entry+0x340/0x450
[Fri Dec 11 13:45:38 2015] [c0000002763a3f10] [c000000000044ab4] start_secondary+0x364/0x3a0
[Fri Dec 11 13:45:38 2015] [c0000002763a3f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
[Fri Dec 11 13:45:38 2015] Mem-Info:
[Fri Dec 11 13:45:38 2015] active_anon:714 inactive_anon:2255 isolated_anon:0
== Comment: #5 - Luciano Chavez <chavez@xxxxxxxxxx> - 2016-01-04 14:28:59 ==
Hi Yuechang,
Atomic page allocation failure warnings originating from network stack
allocation request are common under stress conditions. The order 0x0
page allocation failures are probably the easiest to tune for assuming
there isn't a leak.
Suggest you start with at least having a minimum free pool reservation
of 64MB and see if that helps eliminate that particular warning.
First check that current value is lower than that
cat /proc/sys/vm/min_free_kbytes
and then set it with
echo 65536 > /proc/sys/vm/min_free_kbytes
If existing value is already higher than 64MB then pick a larger
value.
If this helps, update the /etc/sysctl.conf file to keep that
persistent between boots with an entry of
vm.min_free_kbytes = 65536
or whatever the best value that helped.
== Comment: #6 - Jonathan Dalton <jodalton@xxxxxxxxxx> - 2016-01-06 15:34:18 ==
root@conelp2:~#
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
180224
root@conelp2:~# echo 365536 > /proc/sys/vm/min_free_kbytes
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536
root@conelp2:~#
== Comment: #7 - Jonathan Dalton <jodalton@xxxxxxxxxx> - 2016-01-07 11:41:51 ==
root@conelp2:~#
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
180224
root@conelp2:~# echo 365536 > /proc/sys/vm/min_free_kbytes
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536
root@conelp2:~#
== Comment: #8 - Raja Shekhar Reddy Sunkari <rajasunk@xxxxxxxxxx> - 2016-01-11 02:30:19 ==
Hi Luciano,
I have run stress test on conelp2 after updating value to:
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536
Tests ran successfully for 72hrs without any interruption. However,
dmesg output still shows the page allocation failure messages but
appear less frequent when compared to last run.
== Comment: #9 - Jonathan Dalton <jodalton@xxxxxxxxxx> - 2016-01-13 13:02:16 ==
I restarted stress tests Monday and verified today (Wednesday) that:
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536
Was increased. With the increased "min_free_kbytes" there is nothing in the current dmesg that says:
interrupt 501
page allocation fault
So, increasing the "min_free_kbytes" during stress eliminated the
fault, however, is this still a bug? Should the "min_free_kbytes"
have to be increased?
Attached is the dmesg associated with this comment.
== Comment: #12 - Luciano Chavez <chavez@xxxxxxxxxx> - 2016-01-22 20:22:08 ==
(In reply to comment #11)
> Hi Luciano,
>
> I see some info for --set-recommended-min_free_kbytes documented in the
> following link
>
> http://manpages.ubuntu.com/manpages/trusty/man8/hugeadm.8.html
>
> Can you please check and let me know.
Hi Mamatha,
Thanks. That documentation is specific to a utility for huge pages
though so we may have to mirror it and see if the Canonical folks can
point to Ubuntu documentation they have on when to change
min_free_kbytes.
Hi canonical,
Please point to Ubuntu documentation that will explain when to change
min_free_kbytes.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1537666/+subscriptions