← Back to team overview

ac100 team mailing list archive

Re: Stability Under Load

 

On 08/20/2011 04:07 PM, Julian Andres Klode wrote:
On Sat, Aug 20, 2011 at 03:31:56PM +0100, Gordan Bobic wrote:
Which kernel are you using?

A somewhat older build of Marc's kernel (one month), the one
I have in my Debian repository + the change to use 1000 instead
of 1200.

I presume you mean 1200mV rather than 1200MHz. If it's 1200MHz, I'd like to know where to tweak that. ;)

I just did the change to board-paz00-power.c reducing SM1 voltage
from 1200mV to 1000mV, and now I have been running the pbzip2 test
on a loop at low priority and recompiling glibc at normal priority,
and I've not been able to shake either of my AC100s loose. Since I
couldn't get anywhere near this sort of load before without
something erroring out, I'd say that this made quite a substantial
difference to stability.

You probably did not run it at 1200 for 2 hours and then flashed
a new kernel, rebooted, and tried another build directly. This
might have caused certain parts to heat up already to an extent
where it needs a bit of rest before running again. I expect it
to work after a cold boot, though.

Well, my testing has been going on for 6+ hours. Considering I couldn't get an hour without errors before (and sometimes several/hour, and that's just the detected ones), I'd say it's a very definitive improvement. So much so that I'm vaguely tempted to try reducing it to 950mV. ;)

While we're at it (questions for you, and my answers to them)
   (a) what cpufreq governor are you using? [performance]

Ondemand, but I haven't seen idle time move from 0.00%. each instance of pbzip2 should be running 2 threads, and it's all running in tmpfs. Plus the compiling job in the background for good measure. It's a fair point, though, I should really be testing with the performance governor.

   (b) do you have battery plugged in? [no]

Yes. Why should this matter?

   (c) where is gcc and source located? [btrfs on USB HDD]

All on the SD card, OS on ext4 without a journal, /usr/src (~/rpmbuild symlinked to /usr/src/rpmbuild) on nilfs2. The iowait time very rarely moves from 0%, and never exceeds low single figure % points.

The problem here could also be related to the USB hard disk which
is powered by the AC100, or the btrfs filesystem going crazy from
time to time.

Plausible, but if you were seeing fs corruption, surely you'd have had bigger problems by now. Why btrfs, BTW?

Marc, could you please change this to 1000mV in git? Running at
1200mV really seems to upset stability.

Marc, as it helps Gordan, I support this. I'll do testing on a cold
boot later today or tomorrow (that is, either soon or in about 19
hours).


What would it take to break these voltages out to somewhere under /sys? It would be really handy for stability testing to not have to recompile and reflash the kernel for testing. I haven't done much kernel level programming before, but if there's a good example on how to break out interfaces to files under /sys that somebody can point me at, I'd be willing to have a go.

Gordan


Follow ups

References