← Back to team overview

kernel-packages team mailing list archive

[Bug 1537666] [NEW] ISST-LTE: Ubuntu 14.04.4 LPAR interrupts at check_and_cede_processor

 

You have been subscribed to a public bug:

== Comment: #0 - YUECHANG E. MEI <yemei@xxxxxxxxxx> - 2015-12-11 17:19:07 ==
---Problem Description---
We have an Ubuntu 14.04.4 LPAR, conelp2. It is running stress test: base, io, and tcp. When checking "dmesg", we see this interruption: 

[Fri Dec 11 13:58:50 2015] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
[Fri Dec 11 13:58:50 2015]     LR = check_and_cede_processor+0x34/0x50

In the previous test, conelp2 stopped all the stress tests by itself
because it ran out of memory. Is the out of memory issue relating to the
interruption?


 
Contact Information = Yuechang (Erin) Mei /yemei@xxxxxxxxxx,  Raja  Sunkari /rajasunk@xxxxxxxxxx 
 
---uname output---
Linux conelp2 4.2.0-21-generic #25~14.04.1-Ubuntu SMP Thu Dec 3 13:55:42 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = EUH Alpine 8408-E8E 
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
 1. install Ubuntu 14.04.4 in a LPAR, then update to the latest 14.04.4 kernel by using this workaround:
echo "deb http://software.linux.ibm.com/pub/ubuntu-ppc64el-repository/ trusty-proposed main restricted universe multiverse" >> /etc/apt/sources.list

apt-get update

apt-get install linux-image-generic-lts-wily

2. Setup the Stress test, and start base,io, tcp
3. After an hour, check dmesg, then you will see the message about the interruption 
 
Stack trace output:
 no
 
Oops output:
 no
 
System Dump Info:
  The system is not configured to capture a system dump.
 
*Additional Instructions for Yuechang (Erin) Mei /yemei@xxxxxxxxxx,  Raja  Sunkari /rajasunk@xxxxxxxxxx: 
-Post a private note with access information to the machine that the bug is occuring on. 
-Attach sysctl -a output output to the bug.

== Comment: #1 - YUECHANG E. MEI <yemei@xxxxxxxxxx> - 2015-12-11
17:23:00 ==


== Comment: #3 - YUECHANG E. MEI <yemei@xxxxxxxxxx> - 2015-12-14 15:23:33 ==


== Comment: #4 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2015-12-15 03:56:14 ==
dmrsg show page allocation failure

[Fri Dec 11 13:45:38 2015] swapper/127: page allocation failure: order:0, mode:0x120
[Fri Dec 11 13:45:38 2015] CPU: 127 PID: 0 Comm: swapper/127 Not tainted 4.2.0-21-generic #25~14.04.1-Ubuntu
[Fri Dec 11 13:45:38 2015] Call Trace:
[Fri Dec 11 13:45:38 2015] [c00000027fbc3890] [c000000000a805ec] dump_stack+0x90/0xbc (unreliable)
[Fri Dec 11 13:45:38 2015] [c00000027fbc38c0] [c00000000021c118] warn_alloc_failed+0x118/0x160
[Fri Dec 11 13:45:38 2015] [c00000027fbc3960] [c000000000221114] __alloc_pages_nodemask+0x834/0xa60
[Fri Dec 11 13:45:38 2015] [c00000027fbc3b10] [c000000000221404] __alloc_page_frag+0xc4/0x190
[Fri Dec 11 13:45:38 2015] [c00000027fbc3b50] [c0000000008f6d20] netdev_alloc_frag+0x50/0x80
[Fri Dec 11 13:45:38 2015] [c00000027fbc3b80] [c000000000764e80] tg3_alloc_rx_data+0xa0/0x2c0
[Fri Dec 11 13:45:38 2015] [c00000027fbc3be0] [c000000000767344] tg3_poll_work+0x484/0x1070
[Fri Dec 11 13:45:38 2015] [c00000027fbc3ce0] [c000000000767f8c] tg3_poll_msix+0x5c/0x210
[Fri Dec 11 13:45:38 2015] [c00000027fbc3d30] [c00000000090ebb8] net_rx_action+0x2d8/0x430
[Fri Dec 11 13:45:38 2015] [c00000027fbc3e40] [c0000000000ba124] __do_softirq+0x174/0x390
[Fri Dec 11 13:45:38 2015] [c00000027fbc3f40] [c0000000000ba6c8] irq_exit+0xc8/0x100
[Fri Dec 11 13:45:38 2015] [c00000027fbc3f60] [c0000000000111ec] __do_irq+0x8c/0x190
[Fri Dec 11 13:45:38 2015] [c00000027fbc3f90] [c000000000024278] call_do_irq+0x14/0x24
[Fri Dec 11 13:45:38 2015] [c0000002763a39b0] [c000000000011390] do_IRQ+0xa0/0x120
[Fri Dec 11 13:45:38 2015] [c0000002763a3a10] [c0000000000099b0] restore_check_irq_replay+0x2c/0x70
[Fri Dec 11 13:45:38 2015] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
[Fri Dec 11 13:45:38 2015]     LR = check_and_cede_processor+0x34/0x50
[Fri Dec 11 13:45:38 2015] [c0000002763a3d00] [c0000000008a8d90] check_and_cede_processor+0x20/0x50 (unreliable)
[Fri Dec 11 13:45:38 2015] [c0000002763a3d60] [c0000000008a8fb8] shared_cede_loop+0x68/0x170
[Fri Dec 11 13:45:38 2015] [c0000002763a3da0] [c0000000008a615c] cpuidle_enter_state+0xbc/0x350
[Fri Dec 11 13:45:38 2015] [c0000002763a3e00] [c000000000110f3c] call_cpuidle+0x7c/0xd0
[Fri Dec 11 13:45:38 2015] [c0000002763a3e40] [c0000000001112d0] cpu_startup_entry+0x340/0x450
[Fri Dec 11 13:45:38 2015] [c0000002763a3f10] [c000000000044ab4] start_secondary+0x364/0x3a0
[Fri Dec 11 13:45:38 2015] [c0000002763a3f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14
[Fri Dec 11 13:45:38 2015] Mem-Info:
[Fri Dec 11 13:45:38 2015] active_anon:714 inactive_anon:2255 isolated_anon:0

== Comment: #5 - Luciano Chavez <chavez@xxxxxxxxxx> - 2016-01-04 14:28:59 ==
Hi Yuechang,

Atomic page allocation failure warnings originating from network stack
allocation request are common under stress conditions. The order 0x0
page allocation failures are probably the easiest to tune for assuming
there isn't a leak.

Suggest you start with at least having a minimum free pool reservation
of 64MB and see if that helps eliminate that particular warning.

First check that current value is lower than that

cat /proc/sys/vm/min_free_kbytes

and then set it with

echo 65536 > /proc/sys/vm/min_free_kbytes

If existing value is already higher than 64MB then pick a larger value.

If this helps, update the /etc/sysctl.conf file to keep that persistent
between boots with an entry of

vm.min_free_kbytes = 65536

or whatever the best value that helped.

== Comment: #6 - Jonathan Dalton <jodalton@xxxxxxxxxx> - 2016-01-06 15:34:18 ==
root@conelp2:~#
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
180224
root@conelp2:~# echo 365536 > /proc/sys/vm/min_free_kbytes
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536
root@conelp2:~#

== Comment: #7 - Jonathan Dalton <jodalton@xxxxxxxxxx> - 2016-01-07 11:41:51 ==
root@conelp2:~#
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
180224
root@conelp2:~# echo 365536 > /proc/sys/vm/min_free_kbytes
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536
root@conelp2:~#

== Comment: #8 - Raja Shekhar Reddy Sunkari <rajasunk@xxxxxxxxxx> - 2016-01-11 02:30:19 ==
Hi Luciano,

I have run stress test on conelp2 after updating value to:
root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536

Tests ran successfully for 72hrs without any interruption. However,
dmesg output still shows the page allocation failure messages but
appear less frequent when compared to last run.

== Comment: #9 - Jonathan Dalton <jodalton@xxxxxxxxxx> - 2016-01-13 13:02:16 ==
I restarted stress tests Monday and verified today (Wednesday) that:

root@conelp2:~# cat /proc/sys/vm/min_free_kbytes
365536

Was increased.  With the increased "min_free_kbytes" there is nothing in the current dmesg that says: 
interrupt 501
page allocation fault

So, increasing the "min_free_kbytes" during stress eliminated the fault,
however, is this still a bug?  Should the "min_free_kbytes" have to be
increased?

Attached is the dmesg associated with this comment.

== Comment: #12 - Luciano Chavez <chavez@xxxxxxxxxx> - 2016-01-22 20:22:08 ==
(In reply to comment #11)

> Hi Luciano,
> 
> I see some info for --set-recommended-min_free_kbytes documented in the
> following link
> 
> http://manpages.ubuntu.com/manpages/trusty/man8/hugeadm.8.html
> 
> Can you please  check and let me know.

Hi Mamatha,

Thanks. That documentation is specific to a utility for huge pages
though so we may have to mirror it and see if the Canonical folks can
point to Ubuntu documentation they have on when to change
min_free_kbytes.

Hi canonical,

Please point to Ubuntu documentation that will explain when to change
min_free_kbytes.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-134023 severity-high targetmilestone-inin---
-- 
ISST-LTE: Ubuntu 14.04.4 LPAR interrupts at check_and_cede_processor
https://bugs.launchpad.net/bugs/1537666
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.