Re: Stability Under Load


On 08/21/2011 11:49 AM, Julian Andres Klode wrote:
On Sat, Aug 20, 2011 at 05:27:18PM +0100, Gordan Bobic wrote:
On 08/20/2011 05:25 PM, Gordan Bobic wrote:
On 08/20/2011 04:47 PM, Julian Andres Klode wrote:
On Sat, Aug 20, 2011 at 04:33:17PM +0100, Gordan Bobic wrote:
On 08/20/2011 04:07 PM, Julian Andres Klode wrote:
On Sat, Aug 20, 2011 at 03:31:56PM +0100, Gordan Bobic wrote:
Which kernel are you using?

A somewhat older build of Marc's kernel (one month), the one
I have in my Debian repository + the change to use 1000 instead
of 1200.

I presume you mean 1200mV rather than 1200MHz. If it's 1200MHz, I'd
like to know where to tweak that. ;)

I never said anything of Hz. It's all mV (if it's mV, what are 1.2V
used for? The voltage from the battery/charger is clearly 10-12V).

Well, my testing has been going on for 6+ hours. Considering I
couldn't get an hour without errors before (and sometimes
several/hour, and that's just the detected ones), I'd say it's a
very definitive improvement. So much so that I'm vaguely tempted to
try reducing it to 950mV. ;)

Given that the minimum it currently scales to would be 725, 950
is certainly save.

Seems there are limits in hardware. I built a kernel with the upper
bound set to 900, and now I get a lot of this in the logs, while the CPU
is stuck at 216MHz:

Failed to set dvfs regulator vdd_cpu
Failed to set regulator vdd_cpu for clock cpu to 875mV

This should have read 975mV

cpu-tegra: Failed to set cpu frequency to 1000000kHz

over, and over and over.

I get the same problem when setting SM1 to a maximum of 1000mV,
because dvfs tries to set it to 1100 mV, which fails. And
extreme instability.

Really? It complains about a max of 1000mV? On my AC100 975mV and above produces no error. What model did you say you have?

Harmony sets minimum voltage to 750 and maximum voltage to 1125,
maybe that gives more stability?

Interesting. It also occurs to me that just tweaking voltages (which, again, would be much easier if they were run-time adjustable via /sys as I said in a previous post), it would be really handy to get core temperature readings? Does the AC100 have temperature sensors built in?

On an unrelated note, I noticed an interesting possible correlation of an error in my message log with instability that I am currently investigating. It is possible that I have been barking up a completely wrong tree so far. I need to do some more investigating (A _LOT_ of SLUB memory allocation failures, possibly to do with zram swapping and/or the size of vmalloc set on the kernel command line).


