Thread Previous • Date Previous • Date Next • Thread Next |
>I ran some memory testing tools, but they did not find any >problem. Ditto, I ran many, many times and it hasn't found any issues, but it is generally not good for stress-testing.But I believe it shows us that memory itself is correct, and the problem must be somewhere else.
It doesn't eliminate the possibility that the memory may be too overclocked. This is similar to OC testing on x86. I have run days of memtest86 without finding any problems only to have OCCT detect an OC-ing induced error in under 30 seconds. Memory testers aren't a harsh enough test to show up marginal components in my experience. So it could easily still be a memory timing issue.
It would be very hard for the binary Xorg driver to cause other programs to randomly crash.Part of the system memory is used by the display driver, so if the kernel has a bug that it uses one of those portions of the RAM despite it being used by the graphics system, then this could explain it.
Hmm, that is plausible. But would that also exhibit when no driver other than the console FB is loaded?
The obvious question I have now is that since there clearly are several people who have seen stability issues, why hasn't this been raised before?I raised the issue multiple times on IRC, but obviously only when you were not there.
Ah, good to know. Perhaps this is worth a page on the Wiki, linked from the front page? This is something that is likely to be affecting a lot of users.
If it turns out that AC100 is systematically suffering from duff, pre-over-overclocked hardware (as is fairly typical of nvidia - their chips generally cannot handle running at full load at default clocks for reasonable periods of time, and they have no margin for error at all, both in terms of default voltages and clock-speeds), it seems the effort going into it may well be wasted, at least until other similar hardware becomes available. I'm eagerly awaiting Jeremiah's report on whether is TrimSlice is exhibiting the same issues. I sincerely hope it isn't and that it's down to memory timings, since at least we can try to do something about those.We could still underclock devices if needed.
I underclocked my old AC100 down to <= 700MHz using the power management governor, and it was still erroring out just the same. So this doesn't seem to be a clock-speed issue, unless something else is going out of whack at the same time (e.g. undervolding at all clock speeds)
Gordan
Thread Previous • Date Previous • Date Next • Thread Next |