← Back to team overview

kernel-packages team mailing list archive

[Bug 1339939] Re: [Lenovo ThinkPad T400] intel graphics fail after suspend with 3.15 kernel

 

Launchpad has imported 73 comments from the remote bug at
https://bugs.freedesktop.org/show_bug.cgi?id=76554.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2014-03-24T13:39:25+00:00 Jikos wrote:

This is not reliably reproducible, so not possible to do proper bisect.

There are suspend-resume cycles which end up with Xorg misbehaving
(graphics not redrawing properly, etc), and dmesg contains

[drm:init_ring_common] *ERROR* render ring initialization failed ctl
0001f001 head ffffff8804 tail 00000000 start 000e4000

This is both with current Linus' tree (HEAD 774868c70) and the issue is
still present even after merging drm-intel-testing (HEAD 14347c2) into
it.

There is currently ongoing mailinglist discussion here:

    https://lkml.org/lkml/2014/2/27/183

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/0

------------------------------------------------------------------------
On 2014-03-24T15:06:24+00:00 Chris Wilson wrote:

OpenGL should be dead after resume, but the DDX should still behave --
everything should be accessible following a failed resume. Can you
please attach your Xorg.0.log after such a failed resume?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/1

------------------------------------------------------------------------
On 2014-03-24T15:12:19+00:00 Chris Wilson wrote:

Created attachment 96294
Move all ring resets before setting the HWS patch

Out of curiousity, can you try?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/2

------------------------------------------------------------------------
On 2014-03-24T16:55:20+00:00 Jikos wrote:

(In reply to comment #2)
> Created attachment 96294 [details] [review]
> Move all ring resets before setting the HWS patch
> 
> Out of curiousity, can you try?

This actually seems to make things substantially worse -- out of two
suspend-resume cycles with the kernel that had this patch applied (on
top of drm-intel-testing), in both cases the issue triggered.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/3

------------------------------------------------------------------------
On 2014-03-24T16:56:22+00:00 Jikos wrote:

Created attachment 96298
Xorg.0.log from the broken resume (with Chris' patch applied on top of drm-intel-testing).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/4

------------------------------------------------------------------------
On 2014-03-24T17:15:57+00:00 Chris Wilson wrote:

Hmm, a piece of UXA state became corrupt (likely an invalid fb object or
something). How does SNA fare? In particular, we can then run the DDX
with --enable-debug=full to see what goes wrong. Or we might be able to
spot it from a drm.debug=7 dmesg.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/5

------------------------------------------------------------------------
On 2014-03-24T17:16:38+00:00 Chris Wilson wrote:

As for the kernel patch, that's weird... Presumably it is then the order
in which the ring registers are written.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/6

------------------------------------------------------------------------
On 2014-03-24T17:38:11+00:00 Chris Wilson wrote:

Created attachment 96304
Explicitly stop the rings before resetting

One last idea to try on top of the previous patch is to wait for ring-
idle first.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/7

------------------------------------------------------------------------
On 2014-03-24T17:48:17+00:00 Jikos wrote:

Created attachment 96305
dmesg with drm.debug=7

Attached is a dmesg with drm.debug=7 from the resume that had the
problem (had to gzip it due to size).

I had to increase ringbuffer size due to the flood of

  WARNING: CPU: 1 PID: 111 at drivers/gpu/drm/drm_modes.c:119
drm_mode_probed_add+0x51/0x60 [drm]()

which are new since I merged drm-intel-testing -- those are not there
with Linus' tree, but the ringbuffer issue still happens. I will report
the WARNs separately later.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/8

------------------------------------------------------------------------
On 2014-03-24T17:49:33+00:00 Jikos wrote:

The dmesg from comment#8 is from kernel that didn't yet have patch from
comment#7 applied. I will be testing that ASAP, thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/9

------------------------------------------------------------------------
On 2014-03-24T17:50:52+00:00 Chris Wilson wrote:

[   45.141243] [drm:drm_ioctl], pid=1519, dev=0xe200, auth=1, I915_GEM_EXECBUFFER2
[   45.141256] [drm:i915_gem_do_execbuffer], execbuf with invalid ring: 0
[   45.141260] [drm:drm_ioctl], ret = -22

Wow.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/10

------------------------------------------------------------------------
On 2014-03-24T17:58:32+00:00 Chris Wilson wrote:

Created attachment 96307
Mark device as wedged if we fail to resume

This should help UXA to render correctly following the resume failure.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/11

------------------------------------------------------------------------
On 2014-03-24T18:48:34+00:00 Jikos wrote:

Created attachment 96311
dmesg-2

Unfortunately the patch from comment #11 didn't help either. Attaching
dmesg of the failure with the patch applied.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/12

------------------------------------------------------------------------
On 2014-03-24T18:58:08+00:00 Chris Wilson wrote:

Hmm, UXA is being aggressively dumb. It even gets told the GPU is
wedged, but ignores it.

The patch did the right thing, but UXA is still not able to notice since
it doesn't check for errors when it should.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/13

------------------------------------------------------------------------
On 2014-03-24T21:17:54+00:00 Chris Wilson wrote:

Created attachment 96323
Report EIO after resume failure in execbuffer

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/14

------------------------------------------------------------------------
On 2014-03-26T11:47:29+00:00 Chris Wilson wrote:

Created attachment 96406
Preserve ring buffers across resume

Another patch to apply on top of the first 3.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/15

------------------------------------------------------------------------
On 2014-03-26T21:51:15+00:00 Jikos wrote:

(In reply to comment #15)
> Created attachment 96406 [details] [review]
> Preserve ring buffers across resume
> 
> Another patch to apply on top of the first 3.

What tree is this patch against please? I am getting rejects in
drivers/gpu/drm/i915/intel_ringbuffer.c both in Linus' tree and in drm-
intel-next branch of drm-intel tree.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/16

------------------------------------------------------------------------
On 2014-03-31T07:51:37+00:00 Chris Wilson wrote:

I've rebased the patches against drm-intel-nightly so they should apply
to most recent kernel trees:

http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug76554

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/17

------------------------------------------------------------------------
On 2014-04-01T15:43:49+00:00 Jikos wrote:

Built a kernel pulled from

  git://people.freedesktop.org/~ickle/linux-2.6 bug76554

with topmost commit being

   commit 1318add417cf6c9dba373393e5b7be62e3283c84
   Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
   Date:   Mon Mar 24 19:17:11 2014 +0000

       drm/i915: Allow the module to load even if we fail to setup rings

but unfortunately the symptoms on resume from hibernation are exactly
still the same.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/18

------------------------------------------------------------------------
On 2014-04-01T20:19:43+00:00 Chris Wilson wrote:

For reference, can you please attach the drm.debug=7 from the branch
across resume? At the least it should have prevented UXA from freezing.
:|

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/19

------------------------------------------------------------------------
On 2014-04-02T11:46:14+00:00 Jikos wrote:

Created attachment 96776
drm.debug=7 dmesg with patched kernel

Attached is drm.debug=7 dmesg demonstrating the problem happening
everything from

    git://people.freedesktop.org/~ickle/linux-2.6 bug76554

(HEAD == 1318add417c) applied.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/20

------------------------------------------------------------------------
On 2014-04-02T11:57:04+00:00 Chris Wilson wrote:

Ah, oops missed a patch from that branch to prevent the execbuffer from
quietly suceeding. That explains why UXA kept on failing, but not why
the rings still will not restart.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/21

------------------------------------------------------------------------
On 2014-04-02T12:09:45+00:00 Chris Wilson wrote:

One more random rearrangement that should apply on top of that branch:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 602432eaf346..bbcd6b5446f3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -456,9 +456,9 @@ static bool stop_ring(struct intel_ring_buffer *ring)
 		}
 	}
 
+	I915_WRITE_CTL(ring, 0);
 	I915_WRITE_HEAD(ring, 0);
 	ring->write_tail(ring, 0);
-	I915_WRITE_CTL(ring, 0);
 
 	if (!IS_GEN2(ring->dev)) {
 		(void)I915_READ_CTL(ring);
@@ -513,18 +513,19 @@ static int init_ring_common(struct intel_ring_buffer *ring)
 	I915_WRITE_CTL(ring,
 			((ring->size - PAGE_SIZE) & RING_NR_PAGES)
 			| RING_VALID);
+	I915_WRITE_HEAD(ring, 0);
+	ring->write_tail(ring, 0);
 
 	/* If the head is still not zero, the ring is dead */
 	if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
 		     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
 		     (I915_READ_HEAD(ring) & HEAD_ADDR) == 0, 50)) {
 		DRM_ERROR("%s initialization failed "
-				"ctl %08x head %08x tail %08x start %08x\n",
-				ring->name,
-				I915_READ_CTL(ring),
-				I915_READ_HEAD(ring),
-				I915_READ_TAIL(ring),
-				I915_READ_START(ring));
+			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08x]\n",
+			  ring->name,
+			  I915_READ_CTL(ring), I915_READ_CTL(ring) & RING_VALID,
+			  I915_READ_HEAD(ring), I915_READ_TAIL(ring),
+			  I915_READ_START(ring), i915_gem_obj_ggtt_offset(obj));
 		ret = -EIO;
 		goto out;
 	}

You may also want to cherry-pick
ec9da60002b2390a3932db36d61d1d4e30c4ee21 from the bug76554 branch to
prevent uxa from freezing.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/22

------------------------------------------------------------------------
On 2014-04-02T12:52:34+00:00 Jikos wrote:

I refetched the git branch to manually apply the reordering patch on top
of it (bugzilla is damaging it, could you please attach it next time?
thanks), but the branch doesn't build any more:

drivers/gpu/drm/i915/intel_ringbuffer.c: In function ‘stop_ring’:
drivers/gpu/drm/i915/intel_ringbuffer.c:444: error: ‘drm_i915_private_t’ undeclared (first use in this function)
drivers/gpu/drm/i915/intel_ringbuffer.c:444: error: (Each undeclared identifier is reported only once
drivers/gpu/drm/i915/intel_ringbuffer.c:444: error: for each function it appears in.)
drivers/gpu/drm/i915/intel_ringbuffer.c:444: error: ‘dev_priv’ undeclared (first use in this function)
make[2]: *** [drivers/gpu/drm/i915/intel_ringbuffer.o] Error 1
make[2]: *** Waiting for unfinished jobs....
drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_stop_ringbuffers’:
drivers/gpu/drm/i915/i915_gem.c:4240: error: ‘drm_i915_private_t’ undeclared (first use in this function)
drivers/gpu/drm/i915/i915_gem.c:4240: error: (Each undeclared identifier is reported only once
drivers/gpu/drm/i915/i915_gem.c:4240: error: for each function it appears in.)
drivers/gpu/drm/i915/i915_gem.c:4240: error: ‘dev_priv’ undeclared (first use in this function)
drivers/gpu/drm/i915/i915_gem.c:4241: warning: ISO C90 forbids mixed declarations and code
drivers/gpu/drm/i915/i915_gem.c:4244: warning: left-hand operand of comma expression has no effect
make[2]: *** [drivers/gpu/drm/i915/i915_gem.o] Error 1
make[1]: *** [drivers/gpu/drm/i915] Error 2
make: *** [drivers/gpu/drm/] Error 2


Topmost commit of the branch is

   commit ec9da60002b2390a3932db36d61d1d4e30c4ee21
   Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
   Date:   Mon Mar 24 17:56:36 2014 +0000

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/23

------------------------------------------------------------------------
On 2014-04-02T13:16:54+00:00 Chris Wilson wrote:

Bleh, rebase error. All suggested patches are now up on #bug76554.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/24

------------------------------------------------------------------------
On 2014-04-02T13:19:21+00:00 Chris Wilson wrote:

#bug76554 head is currently

commit cfa8aaa35f180268c99e72964228c944930af680
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date:   Wed Apr 2 13:37:24 2014 +0100

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/25

------------------------------------------------------------------------
On 2014-04-02T13:43:29+00:00 Jikos wrote:

Created attachment 96783
drm.debug=7 dmesg with patched kernel (cfa8aaa3)

With the branch that has cfa8aaa3 as a topmost commit, the ring
initialization failures are still popping up on resume, but Xorg
rendering turning into complete mess is finally solved, and the Xorg
session is not corrupted and works! (althrough it feels like the whole
things is slower, but that might be due to excessive logging going on).

dmesg with drm.debug=7 attached.

So if you are going to push anything of this upstream, please feel free
to add my

   Reported-and-tested-by: Jiri Kosina <jkosina@xxxxxxx>

to it, although I assume the ring initialization failure still needs to
be solved ... ?

Thanks!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/26

------------------------------------------------------------------------
On 2014-04-02T13:47:59+00:00 Chris Wilson wrote:

(In reply to comment #26)
> Created attachment 96783 [details]
> drm.debug=7 dmesg with patched kernel (cfa8aaa3)
> 
> With the branch that has cfa8aaa3 as a topmost commit, the ring
> initialization failures are still popping up on resume, but Xorg rendering
> turning into complete mess is finally solved, and the Xorg session is not
> corrupted and works! (althrough it feels like the whole things is slower,
> but that might be due to excessive logging going on).

Indeed. What happens is that UXA now finally detects that the kernel is
reporting that it cannot execute GPU commands, and instead it falls back
to CPU rendering directly into the framebuffer.

> dmesg with drm.debug=7 attached.
> 
> So if you are going to push anything of this upstream, please feel free to
> add my
> 
>    Reported-and-tested-by: Jiri Kosina <jkosina@xxxxxxx>
> 
> to it, although I assume the ring initialization failure still needs to be
> solved ... ?

Yes. We never knew why g45 failed in the first place, if we can figure
out what changed now, we may be able to create a better band-aid.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/27

------------------------------------------------------------------------
On 2014-04-02T13:58:29+00:00 Jikos wrote:

(In reply to comment #27)
> Indeed. What happens is that UXA now finally detects that the kernel is
> reporting that it cannot execute GPU commands, and instead it falls back to
> CPU rendering directly into the framebuffer.

Understood, thanks. So kernel should probably put a huge warning into
dmesg once such condition is detected and workaround applied.

> > to it, although I assume the ring initialization failure still needs to be
> > solved ... ?
> 
> Yes. We never knew why g45 failed in the first place, if we can figure out
> what changed now, we may be able to create a better band-aid.

Excellent, thanks. Happy to test any diag patches necessary.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/28

------------------------------------------------------------------------
On 2014-04-03T14:33:51+00:00 Jikos wrote:

BTW, may I kindly ask you what your plans with those patches are?

Although it's clear that root-causing the ring initialization failures
is still the priority, without having this kind of bandaid present in
the Linus' tree, it's almost completely useless on my system.

Thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/29

------------------------------------------------------------------------
On 2014-04-03T15:33:00+00:00 Chris Wilson wrote:

The temporary fix is on its way upstream (under review atm), as keeping
the system limping along is essential.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/30

------------------------------------------------------------------------
On 2014-04-05T10:56:47+00:00 Daniel-ffwll wrote:

Now that the proper fallback handling is on track, have we attempted to
bisect where the underlying root-cause (ring init failure on resume) was
made much worse? I guess on some older kernels this worked better.

No guarantee that it'll help since this gm45 ring init issue is really
ellusive, but it might shed some light on what's going on.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/31

------------------------------------------------------------------------
On 2014-04-07T07:34:29+00:00 Jikos wrote:

(In reply to comment #31)
> Now that the proper fallback handling is on track, have we attempted to
> bisect where the underlying root-cause (ring init failure on resume) was
> made much worse? I guess on some older kernels this worked better.

I am afraid this is close to impossible.

The frequency of the problem happening fluctuates *a lot* between
different kernel.

- I am pretty sure that I've *never ever* seen it happening on 3.7
kernel, and it has been excercised a lot on the system in question

- Around 3.13, this seems to happen in a rather "time to time" manner
(say once in 40 resumes, but with rather large standard deviation)

- with current Linus' tree and with the drm tree as well, this happens
super-reliably on almost every resume from hibernation

I don't have enough data from the kernels in between to be able claim
the ratio reliably.

I am afraid this pretty much implies that bisecting this reliably would
consume incredible amount of time and might still produce unreliable
result.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/32

------------------------------------------------------------------------
On 2014-04-07T09:25:28+00:00 Chris Wilson wrote:

Created attachment 97027
Print ring registers for debugging

I think this might help in working out what the values in the registers
mean. I think it is sticking to the old value, but I am not sure, hence
the patch.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/33

------------------------------------------------------------------------
On 2014-04-07T11:33:46+00:00 Jikos wrote:

Created attachment 97034
dmesg with ring contets dump before/after initialization

This is a dmesg from resume where ring initialization fails with all the
patches (including the before/after ring contents dump) posted here so
far applied.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/34

------------------------------------------------------------------------
On 2014-04-07T11:41:31+00:00 Chris Wilson wrote:

That's scary. The immediate read of RING_HEAD after it returned 0 during
the first initialisation returns a non-zero value... It only just barely
passed the self-checks during module load. Just as importantly, it did
not have the pattern I was expecting.

I think we should try emitting a dummy command and seeing if the CS ring
updates.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/35

------------------------------------------------------------------------
On 2014-04-07T11:45:19+00:00 Chris Wilson wrote:

Created attachment 97035
Poke the ring to see if it is awake

Maybe this is enough to see if the ring responds correctly. Please keep
the ring debug patch in place.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/36

------------------------------------------------------------------------
On 2014-04-07T11:57:01+00:00 Jikos wrote:

Created attachment 97036
dmesg with ring contents dump and MI_NOOP writes issued

Unfortunately the error is still there even with the MI_NOOP writes.
dmesg with that (and all the previous patches) applied is attached.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/37

------------------------------------------------------------------------
On 2014-04-14T13:53:23+00:00 Jikos wrote:

So, is there anything else I should try, given that bisecting is not really a viable option here, please?
It's rather annoying bug and it's my intention to help as much as possible to have it sorted out.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/38

------------------------------------------------------------------------
On 2014-04-21T07:37:22+00:00 Chris Wilson wrote:

Hmm. I missed that the "after initialisation" printk is correct. So
perhaps all we need is to wait a little longer...


diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2eb85cc2062f..5a74986348c6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -537,7 +537,7 @@ static int init_ring_common(struct intel_ring_buffer *ring)
        /* If the head is still not zero, the ring is dead */
        if (wait_for((I915_READ_CTL(ring) & RING_VALID) != 0 &&
                     I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj) &&
-                    (I915_READ_HEAD(ring) & HEAD_ADDR) == 8, 50)) {
+                    (I915_READ_HEAD(ring) & HEAD_ADDR) == 8, 1000)) {
                DRM_ERROR("%s initialization failed "
                          "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
                          ring->name,

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/39

------------------------------------------------------------------------
On 2014-04-22T08:29:41+00:00 Jikos wrote:

(In reply to comment #39)
> Hmm. I missed that the "after initialisation" printk is correct. So perhaps
> all we need is to wait a little longer...

Unfortunately the symptoms are still the same even with timeout == 1000:

[   54.108192] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 (valid? 1) head 000e299c tail 00000008 start 000e4000 [expected 000e4000]
[   54.108201] Ring render ring after initialisation: 0001f001 000e299c 00000008 000e4000

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/40

------------------------------------------------------------------------
On 2014-04-22T09:35:02+00:00 Chris Wilson wrote:

One last paste... (Apologies for any white space issues, this is just
trying to be quick and dirty.)


diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5a74986348c6..75365c1588fb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -526,9 +526,28 @@ static int init_ring_common(struct intel_ring_buffer *ring)
         * also enforces ordering), otherwise the hw might lose the new ring
         * register values. */
        I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
+       if (wait_for(I915_READ_START(ring) == i915_gem_obj_ggtt_offset(obj),
+                    1000)) {
+               DRM_ERROR("%s initialization failed "
+                         "start %08x [expected %08lx]\n",
+                         ring->name,
+                         I915_READ_START(ring),
+                         (unsigned long)i915_gem_obj_ggtt_offset(obj));
+               ret = -EIO;
+               goto out;
+       }
+
        I915_WRITE_CTL(ring,
                        ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
                        | RING_VALID);
+       if (wait_for(I915_READ_CTL(ring) & RING_VALID, 1000)) {
+               DRM_ERROR("%s initialization failed ctl %08x (valid? %d)\n",
+                         ring->name,
+                         I915_READ_CTL(ring),
+                         !!(I915_READ_CTL(ring) & RING_VALID));
+               ret = -EIO;
+               goto out;
+       }

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/41

------------------------------------------------------------------------
On 2014-04-22T11:23:05+00:00 Jikos wrote:

Created attachment 97739
dmesg with all the patches up to now applied

Attaching dmesg with all patches (up to and including the one in comment
#41) included with the error condition triggering.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/42

------------------------------------------------------------------------
On 2014-04-22T11:35:31+00:00 Chris Wilson wrote:

If it keeps resetting HEAD to a random value after switching the ring
on, how does it ever work? :|

Another hack:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5a74986348c6..e47324aa8963 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -530,6 +530,13 @@ static int init_ring_common(struct intel_ring_buffer *ring)
                        ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
                        | RING_VALID);
 
+       if (I915_READ_START(ring) != i915_gem_obj_ggtt_offset(obj)) {
+               printk(KERN_ERR "%s initialization failed [%08x != %08x], fudging\n",
+                      ring->name, I915_READ_START(ring), i915_gem_obj_ggtt_offset(obj));
+               I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
+               POSTING_READ(ring);
+       }
+
        iowrite32(MI_NOOP, ring->virtual_start + 0);
        iowrite32(MI_NOOP, ring->virtual_start + 4);
        ring->write_tail(ring, 8);

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/43

------------------------------------------------------------------------
On 2014-04-22T11:36:11+00:00 Chris Wilson wrote:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5a74986348c6..b46b3e928a7f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -530,6 +530,17 @@ static int init_ring_common(struct intel_ring_buffer *ring)
                        ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
                        | RING_VALID);
 
+       if (I915_READ_START(ring) != i915_gem_obj_ggtt_offset(obj)) {
+               printk(KERN_ERR
+                      "%s initialization failed"
+                      " [%08x != %08x], fudging\n",
+                      ring->name,
+                      I915_READ_START(ring),
+                      i915_gem_obj_ggtt_offset(obj));
+               I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
+               POSTING_READ(ring);
+       }
+
        iowrite32(MI_NOOP, ring->virtual_start + 0);
        iowrite32(MI_NOOP, ring->virtual_start + 4);
        ring->write_tail(ring, 8);

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/44

------------------------------------------------------------------------
On 2014-04-22T11:53:34+00:00 Jikos wrote:

(In reply to comment #44)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 5a74986348c6..b46b3e928a7f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -530,6 +530,17 @@ static int init_ring_common(struct intel_ring_buffer
> *ring)
>                         ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
>                         | RING_VALID);
>  
> +       if (I915_READ_START(ring) != i915_gem_obj_ggtt_offset(obj)) {
> +               printk(KERN_ERR
> +                      "%s initialization failed"
> +                      " [%08x != %08x], fudging\n",
> +                      ring->name,
> +                      I915_READ_START(ring),
> +                      i915_gem_obj_ggtt_offset(obj));
> +               I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
> +               POSTING_READ(ring);
> +       }
> +
>         iowrite32(MI_NOOP, ring->virtual_start + 0);
>         iowrite32(MI_NOOP, ring->virtual_start + 4);
>         ring->write_tail(ring, 8);

What is a baseline I should apply this on top of, please? The
surrounding code in my tree (with all the patches provided so far
apples) is

[ ... ]
        I915_WRITE_CTL(ring,
                        ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
                        | RING_VALID);
        if (wait_for(I915_READ_CTL(ring) & RING_VALID, 1000)) {
                DRM_ERROR("%s initialization failed ctl %08x (valid? %d)\n",
                                ring->name,
                                I915_READ_CTL(ring),
                                !!(I915_READ_CTL(ring) & RING_VALID));
                ret = -EIO;
                goto out;
        }
        I915_WRITE_HEAD(ring, 0);
        ring->write_tail(ring, 0);

        iowrite32(MI_NOOP, ring->virtual_start + 0);
        iowrite32(MI_NOOP, ring->virtual_start + 4);
        ring->write_tail(ring, 8);
[ ... ]

(i.e. it has the extra I915_WRITE_HEAD(ring, 0); ring->write_tail(ring,
0);, etc).

I can of course easily apply the hunk just between the

   ring->write_tail(ring, 0);

and

   iowrite32(MI_NOOP, ring->virtual_start + 0);

if that's what you want me to do.

Thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/45

------------------------------------------------------------------------
On 2014-04-22T12:12:17+00:00 Chris Wilson wrote:

(In reply to comment #45)
> (In reply to comment #44)
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 5a74986348c6..b46b3e928a7f 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -530,6 +530,17 @@ static int init_ring_common(struct intel_ring_buffer
> > *ring)
> >                         ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
> >                         | RING_VALID);
> >  
> > +       if (I915_READ_START(ring) != i915_gem_obj_ggtt_offset(obj)) {
> > +               printk(KERN_ERR
> > +                      "%s initialization failed"
> > +                      " [%08x != %08x], fudging\n",
> > +                      ring->name,
> > +                      I915_READ_START(ring),
> > +                      i915_gem_obj_ggtt_offset(obj));
> > +               I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
> > +               POSTING_READ(ring);
> > +       }
> > +
> >         iowrite32(MI_NOOP, ring->virtual_start + 0);
> >         iowrite32(MI_NOOP, ring->virtual_start + 4);
> >         ring->write_tail(ring, 8);
> 
> What is a baseline I should apply this on top of, please? The surrounding
> code in my tree (with all the patches provided so far apples) is
> 
> [ ... ]
>         I915_WRITE_CTL(ring,
>                         ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
>                         | RING_VALID);
>         if (wait_for(I915_READ_CTL(ring) & RING_VALID, 1000)) {
>                 DRM_ERROR("%s initialization failed ctl %08x (valid? %d)\n",
>                                 ring->name,
>                                 I915_READ_CTL(ring),
>                                 !!(I915_READ_CTL(ring) & RING_VALID));
>                 ret = -EIO;
>                 goto out;
>         }
>         I915_WRITE_HEAD(ring, 0);
>         ring->write_tail(ring, 0);
> 
>         iowrite32(MI_NOOP, ring->virtual_start + 0);
>         iowrite32(MI_NOOP, ring->virtual_start + 4);
>         ring->write_tail(ring, 8);
> [ ... ]
> 
> (i.e. it has the extra I915_WRITE_HEAD(ring, 0); ring->write_tail(ring, 0);,
> etc).
> 
> I can of course easily apply the hunk just between the 
> 
>    ring->write_tail(ring, 0);
> 
> and 
> 
>    iowrite32(MI_NOOP, ring->virtual_start + 0);
> 
> if that's what you want me to do.
> 
> Thanks.

Sorry, I threw away the preceding hack to try and keep the diff clean.
Just plonk the write to set HEAD again after setting CTRL (and the
wait_for(CTRL) if you have that).

Hmm, it appears we have drifted slightly in our assortment of patches,
let me push my current collection of hacks so we can rebase.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/46

------------------------------------------------------------------------
On 2014-04-22T12:16:21+00:00 Chris Wilson wrote:

Latest set of hacks and patches on top of drm-intel-nightly:
http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=bug76554

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/47

------------------------------------------------------------------------
On 2014-04-22T12:32:25+00:00 Jikos wrote:

Created attachment 97744
dmesg with HEAD==218bb0e7f

The problem is still there with the referenced branch (SHA1 HEAD
218bb0e7f). Dmesg attached.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/48

------------------------------------------------------------------------
On 2014-04-22T12:38:15+00:00 Chris Wilson wrote:

So it passes the immediate check that HEAD is valid after setting CTRL,
but then fails shortly afterwards. Humph. I am not sure what is going
on!

I wonder if it is as simple as the combination of reads failing?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/49

------------------------------------------------------------------------
On 2014-04-22T12:49:43+00:00 Jikos wrote:

The problematic condition causing the whole ring to be claimed dead is

     I915_READ_HEAD(ring) & HEAD_ADDR) == 8

right?

I915_READ_HEAD(ring) returns 000e200c HEAD_ADDR is 0x001FFFFC, so the
result is e200c, not the expected value of 8, causing the ring
initialization failure.

Or am I completely wrong here?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/50

------------------------------------------------------------------------
On 2014-04-22T13:02:07+00:00 Jikos wrote:

(In reply to comment #50)

I probably wasn't super clear what I was referring to by this comment:

> The problematic condition causing the whole ring to be claimed dead is
> 
>      I915_READ_HEAD(ring) & HEAD_ADDR) == 8
> 
> right?
> 
> I915_READ_HEAD(ring) returns 000e200c HEAD_ADDR is 0x001FFFFC, so the result
> is e200c, not the expected value of 8, causing the ring initialization
> failure.

I was referring to this:


(In reply to comment #49)
> So it passes the immediate check that HEAD is valid after setting CTRL, but
> then fails shortly afterwards. Humph. I am not sure what is going on!

because I don't see any check for HEAD validity after settin CTRL; I
only see

     I915_READ_START(ring) != i915_gem_obj_ggtt_offset(obj)

check, but no I915_READ_HEAD() check ... but obviously, I am absolutely unfamiliar with this code, so sorry for creating unnecessary noise likely.
> 
> I wonder if it is as simple as the combination of reads failing?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/51

------------------------------------------------------------------------
On 2014-04-22T13:28:01+00:00 Chris Wilson wrote:

No, it is just me getting confused between HEAD and START. Ok, I wonder
if this is the missing piece of magic (on top of the current bug
branch):

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b46b3e928a7f..12c59e945f8e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -530,15 +530,14 @@ static int init_ring_common(struct intel_ring_buffer *ring)
                        ((ring->size - PAGE_SIZE) & RING_NR_PAGES)
                        | RING_VALID);
 
-       if (I915_READ_START(ring) != i915_gem_obj_ggtt_offset(obj)) {
+       if (I915_READ_HEAD(ring)) {
                printk(KERN_ERR
                       "%s initialization failed"
-                      " [%08x != %08x], fudging\n",
+                      " [head now %08x], fudging\n",
                       ring->name,
-                      I915_READ_START(ring),
-                      i915_gem_obj_ggtt_offset(obj));
-               I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
-               POSTING_READ(ring);
+                      I915_READ_HEAD(ring));
+               I915_WRITE_HEAD(ring, 0);
+               (void)I915_READ_HEAD(ring);
        }
 
        iowrite32(MI_NOOP, ring->virtual_start + 0);

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/52

------------------------------------------------------------------------
On 2014-04-22T13:55:57+00:00 Jikos wrote:

Created attachment 97747
dmesg with fixed start/head

On the first resume, the issue didn't occur, but second suspend-resume
cycle revealed it again. dmesg attached.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/53

------------------------------------------------------------------------
On 2014-04-22T15:43:29+00:00 Chris Wilson wrote:

After the first resume, we applied the fixup. After the second resume,
it managed to get past the check and then failed. /o\

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/54

------------------------------------------------------------------------
On 2014-04-22T15:47:07+00:00 Chris Wilson wrote:

Created attachment 97756
Retry ring initialisation

And another hack!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/55

------------------------------------------------------------------------
On 2014-04-22T20:49:34+00:00 Jikos wrote:

Created attachment 97774
dmesg with retry-patch applied

dmesg with patch from comment#55 applied on top of the previous pile.

The only notable difference seems to be appearance of

  [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring

during resume.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/56

------------------------------------------------------------------------
On 2014-05-06T23:21:13+00:00 Jikos wrote:

Is there anything new on this front, please?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/57

------------------------------------------------------------------------
On 2014-05-08T10:22:35+00:00 Chris Wilson wrote:

I haven't had any other inspiration. Maybe,


diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 401f3e7..ccb0e5c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -513,6 +513,7 @@ reset:
         * registers with the above sequence (the readback of the HEAD registers
         * also enforces ordering), otherwise the hw might lose the new ring
         * register values. */
+       memset(ring->virtual_start, 0, ring->size);
        I915_WRITE_START(ring, i915_gem_obj_ggtt_offset(obj));
        I915_WRITE_CTL(ring,
                        ((ring->size - PAGE_SIZE) & RING_NR_PAGES)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/58

------------------------------------------------------------------------
On 2014-05-12T08:34:01+00:00 Jikos wrote:

With that patch in place (on top of all previous patches), this is still
in dmesg upon resume:

[   30.584016] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring
[   30.584021] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f401 (valid? 1) head 000e202c tail 00000008 start 000e4000 [expected 000e4000]
[   30.584024] Ring render ring after initialisation: 0001f401 000e202c 00000008 000e4000
[   30.584034] [drm:__i915_drm_thaw] *ERROR* failed to re-initialize GPU, declaring wedged!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/59

------------------------------------------------------------------------
On 2014-05-12T15:14:42+00:00 Mika-kuoppala wrote:

No good ideas here either, but would be nice to see if this makes a difference on
ring init:

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4024e16..708a1da 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -573,6 +573,15 @@ static int i915_drm_thaw_early(struct drm_device *dev)
 static int __i915_drm_thaw(struct drm_device *dev, bool restore_gtt_mappings)
 {
        struct drm_i915_private *dev_priv = dev->dev_private;
+       int ret;
+
+       mutex_lock(&dev->struct_mutex);
+       ret = intel_gpu_reset(dev);
+       mutex_unlock(&dev->struct_mutex);
+
+       if (ret)
+               DRM_ERROR("failed to reset the GPU on resume (%d), ignoring\n",
+                         ret);

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/60

------------------------------------------------------------------------
On 2014-05-13T11:56:09+00:00 Jikos wrote:

(In reply to comment #60)
> No good ideas here either, but would be nice to see if this makes a
> difference on
> ring init:

Even with the memset() patch from comment#60 applied on top of the
previous bunch, I see this on resume:

[   54.300012] [drm:stop_ring] *ERROR* render ring :timed out trying to stop ring
[   54.300018] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f401 (valid? 1) head 000e252c tail 00000008 start 000e4000 [expected 000e4000]
[   54.300021] Ring render ring after initialisation: 0001f401 000e252c 00000008 000e4000
[   54.300031] [drm:__i915_drm_thaw] *ERROR* failed to re-initialize GPU, declaring wedged!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/61

------------------------------------------------------------------------
On 2014-05-13T11:57:57+00:00 Jikos wrote:

(In reply to comment #61)
> (In reply to comment #60)
> > No good ideas here either, but would be nice to see if this makes a
> > difference on
> > ring init:
> 
> Even with the memset() 

memset() here should actually read intel_gpu_reset(), sorry for the
confusion.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/62

------------------------------------------------------------------------
On 2014-05-13T12:23:15+00:00 Antti-koskipaa wrote:

Assigning to Chris since he seems to be all over it.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/63

------------------------------------------------------------------------
On 2014-05-16T20:01:09+00:00 Chris Wilson wrote:

Created attachment 99172
Prevent updating the HWS whilst it is active

Stumbled across this. Probably irrelevant, but it is in the right area.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/64

------------------------------------------------------------------------
On 2014-05-19T11:45:19+00:00 Chris Wilson wrote:

*** Bug 77977 has been marked as a duplicate of this bug. ***

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/65

------------------------------------------------------------------------
On 2014-05-20T14:17:54+00:00 Jikos wrote:

(In reply to comment #64)
> Created attachment 99172 [details] [review]
> Prevent updating the HWS whilst it is active
> 
> Stumbled across this. Probably irrelevant, but it is in the right area.

Unfortunately this patch doesn't improve the behavior.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/66

------------------------------------------------------------------------
On 2014-06-11T11:35:13+00:00 Maf-2 wrote:

A change somewhere in between 3.14 and 3.15 makes me hit this bug
*almost* reliably. Bisecting it took me half a day and ended up pointing
at commit [78f2975eec9faff353a6194e854d3d39907bab68 drm/i915]: Move all
ring resets before setting the HWS page. As the title is the same as a
patch posted here earlier, I suppose it is the exact same patch? It
seems like what was meant to be a solution to the problem, actually
makes it much worse (and maybe helps to find the root cause of it).

If there's anything else I can do, just let me know.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/67

------------------------------------------------------------------------
On 2014-06-17T13:19:19+00:00 Info-aloisnespor wrote:

It looks like I have the same problem. After upgrading the kernel to
3.15 / 3.15.1, and after suspend appears in dmesg error:

[   31.496713] [drm:init_ring_common] *ERROR* render ring initialization
failed ctl 0001f001 head 000009c0 tail 00000000 start 000fd000

[   31.591596] PM: Device 0000:00:02.0 failed to resume async: error -5


I have G45 - X4500MHD, mesa 10.2.1, xf86-video-intel 2.99.912, libdrm 2.4.54.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/68

------------------------------------------------------------------------
On 2014-06-24T11:52:35+00:00 Chris Wilson wrote:

Fwiw, http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug76554 has
everything we have tried so far.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/69

------------------------------------------------------------------------
On 2014-06-25T06:47:04+00:00 Chris Wilson wrote:

Here's something you can try on top of that branch:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9ee4ab306134..4f3397f87152 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -539,13 +539,13 @@ reset:
                        goto reset;
 
                DRM_ERROR("%s initialization failed "
-                         "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
+                         "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx], fudging\n",
                          ring->name,
                          I915_READ_CTL(ring), I915_READ_CTL(ring) & RING_VALID,
                          I915_READ_HEAD(ring), I915_READ_TAIL(ring),
                          I915_READ_START(ring), (unsigned long)i915_gem_obj_ggtt_offset(obj));
-               ret = -EIO;
-               goto out;
+
+               ring->write_tail(ring, I915_READ_HEAD(ring) & HEAD_ADDR);
        }
 
        if (!drm_core_check_feature(ring->dev, DRIVER_MODESET))


The idea is to ignore the failure and see if we can program the GPU anyway.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/70

------------------------------------------------------------------------
On 2014-06-27T13:58:51+00:00 Jikos wrote:

(In reply to comment #70)
> Here's something you can try on top of that branch:
[ ... snip ... ]
> The idea is to ignore the failure and see if we can program the GPU anyway.

This made things much worse.

X comes back after resume (i.e. the windows get drawed the exactly same
way they were laid out during suspend), but afterwards, the system is
completely dead. Even ctrl-alt-backspace doesn't kill X session, it's
not possible to switch to text console.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/71

------------------------------------------------------------------------
On 2014-07-08T13:56:42+00:00 Simon Kalteis wrote:

Hi, I am currently on 3.16-rc4 and the GPU gets disabled right on load
of the i915 module, no need to suspend/wake :-(

Xorg seems to draw fine - unaccelerated though, xv is not working, too
(as one would expect).

The system is a Lenovo T500 with a GM45 chipset. If I can somehow help
debug this by providing logs let me know...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1339939/comments/72


** Changed in: linux
       Status: Unknown => Incomplete

** Changed in: linux
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1339939

Title:
  [Lenovo ThinkPad T400] intel graphics fail after suspend with 3.15
  kernel

Status in The Linux Kernel:
  Incomplete
Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  After resuming from suspend, I find gnome-shell has crashed. If I try
  to restart gnome-shell or gdm after this all I get is the following
  message. Everything does work fine on initial boot up until I suspend.

  intel_do_flush_locked failed: Invalid argument

  Booting on the 3.13 kernel from trusty, this problem does not occur.

  I also tried to test utopic dailly Ubuntu GNOME, this fails to even
  boot too gdm, unless I set nomodeset from the kernel line. I have
  never need to do this in the past though.

  ProblemType: Bug
  DistroRelease: Ubuntu 14.10
  Package: linux-image-3.15.0-6-generic 3.15.0-6.11
  ProcVersionSignature: Ubuntu 3.15.0-6.11-generic 3.15.0
  Uname: Linux 3.15.0-6-generic x86_64
  ApportVersion: 2.14.4-0ubuntu1
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  darkness   2770 F.... pulseaudio
  CurrentDesktop: GNOME
  Date: Thu Jul 10 10:04:38 2014
  HibernationDevice: RESUME=UUID=80ccdbf4-2eec-47cd-aac0-5d90ab898481
  MachineType: LENOVO 2764CTO
  PccardctlIdent:
   Socket 0:
     no product info available
  PccardctlStatus:
   Socket 0:
     no card
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.15.0-6-generic root=UUID=70836e80-816b-424b-8aaf-6a0d2863fa50 ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-3.15.0-6-generic N/A
   linux-backports-modules-3.15.0-6-generic  N/A
   linux-firmware                            1.132
  SourcePackage: linux
  UpgradeStatus: Upgraded to utopic on 2014-01-05 (185 days ago)
  dmi.bios.date: 02/13/2009
  dmi.bios.vendor: LENOVO
  dmi.bios.version: 7UET61WW (2.07 )
  dmi.board.name: 2764CTO
  dmi.board.vendor: LENOVO
  dmi.board.version: Not Available
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: Not Available
  dmi.modalias: dmi:bvnLENOVO:bvr7UET61WW(2.07):bd02/13/2009:svnLENOVO:pn2764CTO:pvrThinkPadT400:rvnLENOVO:rn2764CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
  dmi.product.name: 2764CTO
  dmi.product.version: ThinkPad T400
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1339939/+subscriptions


References