← Back to team overview

ac100 team mailing list archive

Re: Stability Under Load

 

On 08/21/2011 12:27 PM, Julian Andres Klode wrote:
On Sun, Aug 21, 2011 at 12:06:39PM +0100, Gordan Bobic wrote:
Really? It complains about a max of 1000mV? On my AC100 975mV and
above produces no error. What model did you say you have?
Yes, a 10V. It should be fairly obvious the cpu_millivolts in
tegra2_dvfs.c includes 1100, the default level is 1100, as can
be seen in cpu_speedo_max_millivolts array, and
tegra2_dvfs_rail_vdd_cpu.

Hmm, that's interesting. So how do the differences between what's in tegra2_dvfs.c and what's in board-paz00-power.c get reconciled? Aren't these settings redundant?

I'm also curious how come my powertop is showing 1000MHz with no errors in the log when I set SM1 to 975mV.


Harmony sets minimum voltage to 750 and maximum voltage to 1125,
maybe that gives more stability?

Interesting. It also occurs to me that just tweaking voltages
(which, again, would be much easier if they were run-time adjustable
via /sys as I said in a previous post), it would be really handy to
get core temperature readings? Does the AC100 have temperature
sensors built in?
No, we don't know what happens if we started exposing the various
settings somewhere, when they are read, etc. To unsafe, in my
opinion.

Provided there are limit checks set in place (e.g. hard-code a limit check so that you can't set the voltage > 1250mV), I don't see what harm could come of it, other than making stability stress testing easier.

On an unrelated note, I noticed an interesting possible correlation
of an error in my message log with instability that I am currently
investigating. It is possible that I have been barking up a
completely wrong tree so far. I need to do some more investigating
(A _LOT_ of SLUB memory allocation failures, possibly to do with
zram swapping and/or the size of vmalloc set on the kernel command
line).
SLUB errors come from rt2800usb usually, without the module loaded,
the errors should vanish. You could also try using SLAB instead of
SLUB.

Yes, I did notice that the rt* modules were in the error dump. I don't remember seeing the option in the kernel config to choose SLAB over SLUB. Where is it?

This stability problem is particularly frustrating because I saw the errors occurring on 2.6.29 which didn't have zram, so in theory, it can't be directly zram related (and I've been running zram on my SheevaPlug on 2.6.36.2 kernel for ages with much heavier loads).

So I'm taking all my observations at the moment with a fist sized grain of salt. What is weird, however, is that I seem to be running completely stable today at 975mV SM1 set in board-paz00-power.c, and it's warmer than it was yesterday.

The only other differences are:
1) Disabled zram swap (still have normal swap)
2) Changed vm.swappiness from 100 to 0
3) Unloaded rt* and related modules
4) Rebuilding the kernel (with -j4) instead of glibc

The obvious difference with 4) is that glibc compile takes a lot more memory to compile than the kernel, which causes swapping. When the kernel compile finishes if there are no errors, I'll try the glibc building again. If that shakes it loose, the only thing I can think of is the vmalloc kernel boot parameter which came from the original Android setup (vmalloc=320M). I'm pretty sure this shouldn't be needed, but it is vaguely plausible it is causing issues under high memory pressure, at least in combination with other things that I have running.

Gordan


Follow ups

References