← Back to team overview

ubuntu-x-swat team mailing list archive

[Bug 768184] Re: [i965gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x01800020) - Black screen (dpms?)

 

Launchpad has imported 34 comments from the remote bug at
https://bugs.freedesktop.org/show_bug.cgi?id=36515.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2011-04-22T23:27:58+00:00 Bryce Harrington wrote:

Forwarding this bug from Ubuntu reporter Stuart Langridge:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/768184

[Problem]
Infrequent gpu lockup on i965.

We've had a handful of reports in the last couple weeks of a gpu lockup
on i965 systems which had not had freeze troubles for a long while (>6
months).  Most reporters have experienced the freeze only once or twice;
they don't know how to reproduce it, nor really have a way to
definitively tell whether it is fixed or just occurs rarely.

I'm forwarding this report on the chance that the bug is a recognizable
one to upstream; I don't think users are going to be able to pinpoint
this down any further.

Bugs I believe to be dupes, all on i965 systems:

768184  IPEHR: 0x01800020
767511  IPEHR: 0x60020100
767425  IPEHR: 0x08000000
757968  IPEHR: 0x14000000

These i965 reports started coming in shortly after when we updated
Ubuntu from xserver 1.10.0 to 1.10.1 and mesa from 7.10.1 to 7.10.2 and
adding patch 25521900d to -intel (bug #35808).  (Due to the
intermittency of the bug I haven't had people try downgrading those
packages.)

[Original Description]
Crash which required reboot. The crash itself is described in https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/768176 and this is after I persuaded apport-gpu-error-intel.py to run.

My screen went entirely black (both laptop screen and second monitor).
Switching to a VC did not show anything on screen. At first I could
still hear sounds from running applications, but eventually (after ~10
seconds) they stopped. I had to powercycle the machine to get control
back. The "system problem detected" apport dialog offered to let me file
a bug, but then I got another crash dialog saying "apport-gpu-error-
intel.py closed unexpectedly".

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-4ubuntu7
ProcVersionSignature: Ubuntu 2.6.38-8.42-generic-pae 2.6.38.2
Uname: Linux 2.6.38-8-generic-pae i686
Architecture: i386
Chipset: i965gm
CompositorRunning: compiz
DRM.card0.HDMI.A.1:
status: disconnected
enabled: disabled
dpms: Off
modes:
edid-base64:
DRM.card0.LVDS.1:
status: connected
enabled: enabled
dpms: On
modes: 1280x800
edid-base64: AP///////wAwZAYjMjQ5NTISAQOAHRJ4Cof1lFdPjCcnUFQAAAABAQEBAQEBAQEBAQEBAQEBKhwAqFAgHjAQMCIAH7QQAAAYAAAAAAAAAAAAAAAAAAAAAAAAAAAA/gBSUDc3NKMxMzNFV0REAAAA/gAIDBAUKFB/2AEBCiAgAL4=
DRM.card0.VGA.1:
status: connected
enabled: enabled
dpms: On
modes: 1680x1050 1280x1024 1280x1024 1280x960 1152x864 1024x768 1024x768 1024x768 832x624 800x600 800x600 800x600 800x600 640x480 640x480 640x480 640x480 720x400
edid-base64: AP///////wBMLdIDMjJBSCMTAQMOMB54KtxVo1lIniQRUFS/74CzAIGAgUBxTwEBAQEBAQEBITmQMGIaJ0BosDYA2igRAAAcAAAA/QA4Sx5REAAKICAgICAgAAAA/ABTeW5jTWFzdGVyCiAgAAAA/wBIOUZTODM5NDg1CiAgAAI=
Date: Thu Apr 21 10:25:20 2011
DistUpgraded: Log time: 2011-01-18 17:25:59.814253
DistroCodename: natty
DistroVariant: ubuntu
DuplicateSignature: (ESR: 0x00000001 IPEHR: 0x01800020)
ExecutablePath: /home/aquarius/apport-gpu-error-intel.py
GraphicsCard:
Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) [8086:2a02] (rev 0c) (prog-if 00 [VGA controller])
Subsystem: Dell Device [1028:0209]
Subsystem: Dell Device [1028:0209]
InterpreterPath: /usr/bin/python2.7
MachineType: Dell Inc. XPS M1330
ProcCmdline: python apport-gpu-error-intel.py
ProcEnviron:PATH=(custom, user)
LC_MESSAGES=en_GB.utf8
LANG=en_US.UTF-8
LANGUAGE=en_GB:en
ProcKernelCmdLine: root=UUID=b572742c-deea-43ec-92d3-b1d1e6b6802f ro quiet splash
ProcKernelCmdLine_: root=UUID=b572742c-deea-43ec-92d3-b1d1e6b6802f ro quiet splash
RelatedPackageVersions:
xserver-xorg             1:7.6+4ubuntu3
libdrm2                  2.4.23-1ubuntu6
xserver-xorg-video-intel 2:2.14.0-4ubuntu7
SourcePackage: xserver-xorg-video-intel
Title: [i965gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x01800020)
UpgradeStatus: Upgraded to natty on 2011-01-18 (92 days ago)
UserGroups: adm admin cdrom couchdb dialout dip floppy fuse lpadmin plugdev video
dmi.bios.date: 12/26/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 0N6705
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd12/26/2008:svnDellInc.:pnXPSM1330:pvr:rvnDellInc.:rn0N6705:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: XPS M1330
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.4+bzr20110415-0ubuntu2
version.libdrm2: libdrm2 2.4.23-1ubuntu6
version.libgl1-mesa-dri: libgl1-mesa-dri 7.10.2-0ubuntu2
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.10.2-0ubuntu2
version.xserver-xorg: xserver-xorg 1:7.6+4ubuntu3
version.xserver-xorg-video-ati: xserver-xorg-video-ati N/A
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-4ubuntu7
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110107+b795ca6e-0ubuntu7

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/5

------------------------------------------------------------------------
On 2011-04-22T23:31:26+00:00 Bryce Harrington wrote:

Created attachment 45980
BootDmesg.txt

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/6

------------------------------------------------------------------------
On 2011-04-22T23:31:47+00:00 Bryce Harrington wrote:

Created attachment 45981
CurrentDmesg.txt

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/7

------------------------------------------------------------------------
On 2011-04-22T23:32:03+00:00 Bryce Harrington wrote:

Created attachment 45982
CurrentDmesg.txt

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/8

------------------------------------------------------------------------
On 2011-04-22T23:41:37+00:00 Bryce Harrington wrote:

Here are links to some of the i915_error_state files for the various
(suspected dupe) bugs:

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-
intel/+bug/767511/+attachment/2075462/+files/i915_error_state.txt

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-
intel/+bug/767425/+attachment/2074874/+files/i915_error_state.txt

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-
intel/+bug/760054/+attachment/2031475/+files/i915_error_state.txt

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-
intel/+bug/757968/+attachment/2019472/+files/i915_error_state.txt

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/10

------------------------------------------------------------------------
On 2011-04-27T07:56:41+00:00 Chris Wilson wrote:

Bryce, one aspect that we are wary of with 965G[M] is that the early
chipsets had severe issues with memory above 4G. It the memory
configuration captured in the LP reports? The attached dmesg has 4G +
PAE, is that common?

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/11

------------------------------------------------------------------------
On 2011-04-27T08:14:02+00:00 Timo Jyrinki wrote:

One affected 965gm user here (bug report with attachments
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/771655) - 4GB of
memory but no PAE, ie. 64-bit. On the other hand my problem, is simply
X.org crashing/segfaulting, I don't get apport triggered for a GPU
lockup bug report. So sorry for the (possible) noise, even though my
problem is clearly coming from the same bunch of changes and is
similarly random/rare.

To make up for that, I went through the mentioned lockup bug reports to
answer the question and: only two has PAE, four don't have PAE, but all
those i965gm GPU lockup reports currently so far seem to be i686 unlike
me.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/12

------------------------------------------------------------------------
On 2011-06-16T12:08:29+00:00 Chris Wilson wrote:

Created attachment 48039
Apply the big hammer to finish the fb before disabling it.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/17

------------------------------------------------------------------------
On 2011-06-16T12:55:04+00:00 Chris Wilson wrote:

Created attachment 48043
Apply the big hammer to finish the fb before disabling it.

When flushing before disabling, it helps to do it before and not after
the disable.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/18

------------------------------------------------------------------------
On 2011-06-16T22:41:41+00:00 Bryce Harrington wrote:

Created attachment 48066
dmesg

I think I may have reproduced this same bug on my own i965 finally.  Not
sure exactly how I did it, but it showed up after a lid open event
(resume from sleep I guess).  The machine has been plugged into its
docking station with external monitor continuously.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/19

------------------------------------------------------------------------
On 2011-06-16T22:42:15+00:00 Bryce Harrington wrote:

Created attachment 48067
i915_error_state

IPEHR=0x01820000

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/20

------------------------------------------------------------------------
On 2011-06-16T23:15:59+00:00 Chris Wilson wrote:

I was hoping to see the contents of the display registers in the error
state to confirm the theory about the WAIT_FOR_EVENT being on a disabled
pipe. Alas, that feature isn't part of that kernel.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/21

------------------------------------------------------------------------
On 2011-06-16T23:18:52+00:00 Chris Wilson wrote:

May I also make a polite request that you enable pageflipping once more
;-)

I wonder if we should just be waiting for the VBLANK on a full screen
blit rather than a range that is impossible. Hmm.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/22

------------------------------------------------------------------------
On 2011-07-08T10:17:30+00:00 Chris Wilson wrote:

*** Bug 35576 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/23

------------------------------------------------------------------------
On 2011-07-08T10:22:19+00:00 Chris Wilson wrote:

*** Bug 37450 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/24

------------------------------------------------------------------------
On 2011-07-08T14:45:07+00:00 Kamil-42920 wrote:

A bug I reported (Bug 37450) has been marked as a duplicate of this bug,
and this bug is marked as NEEDINFO.

Since I can reproduce the bug I reported 100% of the time, please let me
know if you would like me to provide any additional info.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/25

------------------------------------------------------------------------
On 2011-07-18T15:00:59+00:00 Chris Wilson wrote:

Kamil, can you try applying the patch
https://bugs.freedesktop.org/attachment.cgi?id=48043 to your kernel and
seeing if that is sufficient.

I'm confident that's the fix, just waiting for testing.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/26

------------------------------------------------------------------------
On 2011-07-20T04:39:08+00:00 Kamil-42920 wrote:

I applied the patch to 2.6.39.3 kernel, but it did *not* help.  I'm
seeing the same problem as before (enabling an output after
suspend/resume hangs the server).  Do I need to be running a newer
kernel perhaps?

xf86-video-intel: 2.15.0
xorg-server: 1.10.2
mesa: 7.10.3
libdrm: 2.4.26
kernel: 2.6.39.3

Do I need to be running a newer kernel perhaps?

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/27

------------------------------------------------------------------------
On 2011-07-20T09:45:38+00:00 Chris Wilson wrote:

Sigh. After applying the patch can you post an i915_error_state.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/28

------------------------------------------------------------------------
On 2011-07-20T16:15:38+00:00 Kamil-42920 wrote:

$ cat /sys/kernel/debug/dri/0/i915_error_state 
no error state collected

That's after a restart of the X server (Ctrl-Alt-Bcksp) so that I can
access the machine again; I assume that would not reset
i915_error_state?

The only indication in the logs I can see is in /var/log/Xorg.0.log:

[   259.306] (WW) intel(0): flip queue failed: Invalid argument
[   259.306] (WW) intel(0): Page flip failed: Invalid argument
[   260.299] (WW) intel(0): flip queue failed: Device or resource busy
[   260.299] (WW) intel(0): Page flip failed: Device or resource busy
[last two lines repeating]

These start occurring after I enable an output using xrandr (after a
suspend/resume cycle); Xorg works for a while, but hangs immediately
after I switch to a text console and back to X (a required action to
actually see something via the new output, as per
https://bugzilla.kernel.org/show_bug.cgi?id=24982).

A workaround that works for me is to modify xf86-video-intel to force
intel->use_pageflipping to FALSE.  I believe there used to be a user-
accessible option to turn it off, but it's been removed?  That is rather
unfortunate, I must say.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/29

------------------------------------------------------------------------
On 2011-07-20T16:22:35+00:00 Chris Wilson wrote:

I was just about to add that you hit kernel bug # 24982...

So we can't tell if the GPU lockup itself has been fixed if the second
prevents you from testing.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/30

------------------------------------------------------------------------
On 2011-07-20T16:45:30+00:00 Kamil-42920 wrote:

Are you saying that *this* bug is probably fixed, but X still hangs
because of the (unrelated) DPMS bug in the kernel?  That could be, as I
no longer see the GPU hung messages.

Well, I guess all I can do at this point is sit and wait for that kernel
bug to be fixed, hopefully some time soon; it's been open since last
year...  I'd be happy to try any patches you guys might have.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/31

------------------------------------------------------------------------
On 2011-07-29T09:28:04+00:00 Chris Wilson wrote:

Ok, to be really complicated, can you please retest this patch on top of
keithp/drm-intel-fixes [
git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6.git].
Hopefully we have the modeswitching bug fixed and so we can then
successfully test the WAIT_FOR_EVENT fix...

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/32

------------------------------------------------------------------------
On 2011-07-30T05:22:22+00:00 Kamil-42920 wrote:

Chris, drm-intel-fixes (last commit
cda2bb78c24de7674eafa3210314dc75bed344a6) does *not* fix the modeswitching bug for me.  I guess no point in retesting your patch then?

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/33

------------------------------------------------------------------------
On 2011-07-30T08:47:00+00:00 Chris Wilson wrote:

The patch should prevent the GPU hang upon turning off a pipe, but it is
a nuisance if the machine is dying for other reason we can't but sure
that the patch is sufficient.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/34

------------------------------------------------------------------------
On 2011-10-19T19:32:34+00:00 Eugeni Dodonov wrote:

Hi,

does this still happens with the latest versions of the drivers, or it
is not an issue anymore?

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/36

------------------------------------------------------------------------
On 2011-10-19T20:16:22+00:00 Chris Wilson wrote:

Yes, the patch is still required, just no one has volunteered to test
it.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/37

------------------------------------------------------------------------
On 2011-10-20T05:07:31+00:00 Kamil-42920 wrote:

Well, I would've loved to test it, but I just tried kernel 3.1-rc10 and
with vanilla xf86-video-intel 2.16.0 the kernel still crashes for me on
enabling an output via xrandr.  I assume it's due to the infamous kernel
bug 24982, which has probably been open for a year now with no
resolution in sight, though with kernel bugzilla apparently still being
down (pathetic), it's hard to tell.

For what it's worth, with your patch applied, the kernel seems to crash
less easily for me than without it.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/38

------------------------------------------------------------------------
On 2011-10-20T09:53:36+00:00 Chris Wilson wrote:

(In reply to comment #27)
> Well, I would've loved to test it, but I just tried kernel 3.1-rc10 and with
> vanilla xf86-video-intel 2.16.0 the kernel still crashes for me on enabling an
> output via xrandr.  I assume it's due to the infamous kernel bug 24982, which
> has probably been open for a year now with no resolution in sight, though with
> kernel bugzilla apparently still being down (pathetic), it's hard to tell.

bugzilla.kernel.org and that I'm currently unaware of any crash inside
i915.ko, so you're going to have to remind me...

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/39

------------------------------------------------------------------------
On 2011-10-25T04:47:09+00:00 Kamil-42920 wrote:

(In reply to comment #28)
> I'm currently unaware of any crash inside i915.ko,
> so you're going to have to remind me...

Chris, please see comment #19 in this bugzilla entry, or, for a complete
description, see bug #37450.  In essence, it seems that stale DPMS
properties (kernel bug 24982), which normally just result in a blank
screen, can in some situations result in a crash/hang.  When I
originally reported it I could only trigger it after suspend/resume;
nowadays I can reproduce it just by repeatedly enabling and disabling an
output a few times.  The only workaround that works for me is modifying
the xf86-video-intel driver to force page flipping off.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/40

------------------------------------------------------------------------
On 2011-10-26T09:08:42+00:00 Chris Wilson wrote:

(In reply to comment #29)
> (In reply to comment #28)
> > I'm currently unaware of any crash inside i915.ko,
> > so you're going to have to remind me...
> 
> Chris, please see comment #19 in this bugzilla entry, or, for a complete
> description, see bug #37450.  In essence, it seems that stale DPMS properties
> (kernel bug 24982), which normally just result in a blank screen, can in some
> situations result in a crash/hang.  When I originally reported it I could only
> trigger it after suspend/resume; nowadays I can reproduce it just by repeatedly
> enabling and disabling an output a few times.  The only workaround that works
> for me is modifying the xf86-video-intel driver to force page flipping off.

Ok, I think we know that bug and had a fix for the races inside the
page-flipping code, but I think Keith dropped them on the floor...

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/43

------------------------------------------------------------------------
On 2011-11-09T15:45:27+00:00 Chris Wilson wrote:

*** Bug 40526 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/45

------------------------------------------------------------------------
On 2011-11-09T15:45:34+00:00 Chris Wilson wrote:

*** Bug 40527 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/46

------------------------------------------------------------------------
On 2011-11-09T16:47:22+00:00 Paulo Zanoni wrote:

All our 4 duplicates were high/major. Adjusting.

Reply at: https://bugs.launchpad.net/xserver-xorg-video-
intel/+bug/768184/comments/47


** Bug watch added: Linux Kernel Bug Tracker #24982
   http://bugzilla.kernel.org/show_bug.cgi?id=24982

-- 
You received this bug notification because you are a member of Ubuntu-X,
which is subscribed to xserver-xorg-video-intel in Ubuntu.
https://bugs.launchpad.net/bugs/768184

Title:
  [i965gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x01800020) - Black screen
  (dpms?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/768184/+subscriptions


References