ac100 team mailing list archive
Mailing list archive
Re: Stability Under Load
On Fri, 19 Aug 2011 18:01:02 +0200, Julian Andres Klode
>I ran some memory testing tools, but they did not find any
Ditto, I ran many, many times and it hasn't found any issues, but it
is generally not good for stress-testing.
But I believe it shows us that memory itself is correct, and the
problem must be somewhere else.
It doesn't eliminate the possibility that the memory may be too
overclocked. This is similar to OC testing on x86. I have run days of
memtest86 without finding any problems only to have OCCT detect an
OC-ing induced error in under 30 seconds. Memory testers aren't a harsh
enough test to show up marginal components in my experience. So it could
easily still be a memory timing issue.
It would be very hard for the binary Xorg driver to cause other
programs to randomly crash.
Part of the system memory is used by the display driver, so if
the kernel has a bug that it uses one of those portions of the
RAM despite it being used by the graphics system, then this could
Hmm, that is plausible. But would that also exhibit when no driver
other than the console FB is loaded?
The obvious question I have now is that since there clearly are
several people who have seen stability issues, why hasn't this been
I raised the issue multiple times on IRC, but obviously only when
you were not there.
Ah, good to know. Perhaps this is worth a page on the Wiki, linked from
the front page? This is something that is likely to be affecting a lot
If it turns out that AC100 is systematically suffering from duff,
pre-over-overclocked hardware (as is fairly typical of nvidia -
their chips generally cannot handle running at full load at default
clocks for reasonable periods of time, and they have no margin for
error at all, both in terms of default voltages and clock-speeds),
it seems the effort going into it may well be wasted, at least until
other similar hardware becomes available. I'm eagerly awaiting
Jeremiah's report on whether is TrimSlice is exhibiting the same
issues. I sincerely hope it isn't and that it's down to memory
timings, since at least we can try to do something about those.
We could still underclock devices if needed.
I underclocked my old AC100 down to <= 700MHz using the power
management governor, and it was still erroring out just the same. So
this doesn't seem to be a clock-speed issue, unless something else is
going out of whack at the same time (e.g. undervolding at all clock