← Back to team overview

ac100 team mailing list archive

Re: Stability Under Load

 

I've already noticed unstabilitty under load, but it was when using
GPU (like flash or webgl).
I've already built kdebase (took several hours) without any problem.

2011/8/19 Gordan Bobic <gordan@xxxxxxxxxx>:
> As some of you may have already heard on the IRC channel, I had my AC100
> suddenly become very unstable under load. When doing big compile jobs, the
> compiler would relatively regularly segfault or detect hardware errors, or
> errors it didn't think was hardware and invited me to post a bug report with
> pre-processed C file. None of these were reproducible (it would error out in
> a different place on different runs). So I figured I had duff hardware and
> got another one. This is a lot better, but I still get spurious,
> unreproducible errors like this every few hours (old one would error out up
> to a few times/hour if it was being hammered with compiling jobs for a few
> hours). Both of mine are the 10U models with Micron RAM.
>
> Now, either I am incredibly unlucky or something else is going on. What I
> would like to know is:
> 1) Do you use their AC100 for big compile jobs (e.g. the 2-day gcc compile)?
> 2) If 1), are you seeing random errors like what I'm describing?
>
> On my old AC100, dropping the clock speed down to 700MHz using power
> management features didn't make a difference to stability. I haven't tested
> that on the new one.
>
> My gut feeling at the moment is that the RAM could be over-timed so I'm
> going to try modifying the kernel code to relax the RAM timings by a notch.
>
> The only competing idea is that Tegra2 comes pre-overclocked past the stable
> limits for 100% load for prolonged periods. This wouldn't surprise me either
> (Nvidia chips have proven unreliable in the past even at their default clock
> speeds, both the motherboard chipsets and GPUs), but I would like to think
> that Toshiba would have done some due dilligence testing of their product.
> For comparison, my SheevaPlug is compiling 24/7 for weeks at a time and has
> never errored out.
>
> Any additional data points you guys can provide would be useful.
>
> Gordan
>
> _______________________________________________
> Mailing list: https://launchpad.net/~ac100
> Post to     : ac100@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~ac100
> More help   : https://help.launchpad.net/ListHelp
>


References