← Back to team overview

ac100 team mailing list archive

Re: Stability Under Load

 

On Fri, 19 Aug 2011 16:26:37 +0200, Marc Dietrich <marvin24@xxxxxx> wrote:
On Friday 19 August 2011 16:10:43 Gordan Bobic wrote:
On Fri, 19 Aug 2011 15:55:57 +0200, Marc Dietrich <marvin24@xxxxxx> wrote:
> Am Freitag 19 August 2011, 11:18:31 schrieb Gordan Bobic:
>> As some of you may have already heard on the IRC channel, I had my
>>  AC100 suddenly become very unstable under load. When doing big
>> compile
>> [...]
>> My gut feeling at the moment is that the RAM could be over-timed so >> I'm going to try modifying the kernel code to relax the RAM timings by
>>  a notch.
>
> we are not touching RAM timings so far on kernel 2.6.38. It may be
> possible that the original kernels does so.

 Well, my plan was to up the timings in
arch/arm/mach-tegra/board-paz00-memory.c. Can you confirm whether the values there are in units of clock cycles? Or is it ns? Also which line corresponds to CAS? I can see RAS, RC, RCD, RRD, RFC, but can't see a
 value for CAS, which is, at least in theory, the most imporant one.

AFAIK, they are in cycles, but as I said, these values are not used
now (see the
ifdef 0 at the end of the file). The reason is that I don't have the
values for
166 MHz on Micron and the Hynix tables caused instabilities. So they
are just
there as a "reminder". But of course, you can remove the ifdef and
try yourself.

Ah, OK. I missed that. Are the registers for setting the timings write-only? If not, it should be possible to dump out the default timings, should it not? They may not be correct, but it would at last give a good starting value for adjusting.

As for timings at 166 MHz vs. 333 MHz, most RAM chips I have seen have the spec sheet listing the timings in ns, so as the clock speed goes down, the cycle timings proportionately decrease (fewer cycles to wait at lower clock speed). So 166MHz timings _should_ be half the 333MHz timings (rounding up).

> [...]
> It could also be related to power supply. What we do is modifing the
> voltage
> supplies for serveral power sources. I had the feeling, that Toshiba > undervoltaged some CPU supplies in order to save energy (compared to
> other
> boards). So I increased SM1 from 1V to 1.2V which may have been
> wrong.

 How did you do this?

check board-paz00-power.c

Just looking now. Are you referring to the REGULATOR_INIT macro and specifically the line that defines a struct with:

REGULATOR_INIT(sm1, 725, 1200, true)?

Are you saying that the default value as provided by Toshiba was

REGULATOR_INIT(sm1, 725, 1000, true)?

Have you experienced significant instability with it set to 1000mV? Or was this change based on observation what other boards do? I'm wondering if the 20% increase in voltage (44% increase in thermal load!) might actually push things outside the thermal limits of cooling and thus be responsible for contributing to the instability.

> It would be nice if you could test a .32 based kernel and see if it
> also happens there. Also you could try your new model.

I haven't tried 2.6.32 because I couldn't find one at the time, but I tried the old 2.6.29 and 2.6.38, and the instability on my old AC100 was the same. Haven't tried it on the new one yet. Do you think 2.6.32 could be behaving differently to both of those? If so, why? Where can I get
 the Tegra-patched 2.6.32 kernel?

gitorious.org/ac100? (on the front page)

I'll try it, but I'm curious to know why you think 2.6.32 might be better in this sense than 2.6.29 and 2.6.38.

Gordan


Follow ups

References