← Back to team overview

kernel-packages team mailing list archive

[Bug 1386695] Re: [3.16.0-23] Resume from suspend/hibernation, GPU lock - possible regression

 

Launchpad has imported 14 comments from the remote bug at
https://bugs.freedesktop.org/show_bug.cgi?id=81136.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2014-07-10T00:35:07+00:00 Agustin-6 wrote:

Created attachment 102508
Logs for git kernel

After a suspend I get messages such as these:

<3>[   39.550435] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH TLB flush idle timeout fail
<3>[   39.550435] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_STATUS  : 0x01000001 BUSY ROP
<3>[   39.550435] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS0: 0x00000000
<3>[   39.550435] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS1: 0x00000000
<3>[   39.550435] nouveau E[  PGRAPH][0000:01:00.0] PGRAPH_VSTATUS2: 0x00200000 ROP
<3>[   41.685486] nouveau E[     DRM] GPU lockup - switching to software fbcon

And if I was running X it crashes and the screen ends up looking like
this: http://imgur.com/a/D3VKw

This is always reproducible but only since Linux 3.15, so I ran a git
bisect. The first bad commit is
[ecf24de071f4f6cea79ecef5d990794df5875ee1] drm/nouveau: fix fbcon not
being accelerated after suspend. After reverting the commmit the machine
resumes properly.

The issue persists in drm-nouveau-next (last commit 0b4e8e7... from Jul
8), even if I boot with noaccel=1 nofbaccel=1.

Relevant IRC logs: http://people.freedesktop.org/~cbrill/dri-
log/index.php?channel=nouveau&highlight_names=Nitsuga&date=2014-07-09

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/0

------------------------------------------------------------------------
On 2014-07-10T00:35:42+00:00 Agustin-6 wrote:

Created attachment 102509
Logs for git kernel with noaccel=1 nofbaccel=1

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/1

------------------------------------------------------------------------
On 2014-07-10T00:36:16+00:00 Agustin-6 wrote:

Created attachment 102510
Logs for Linux 3.15.4 with commit ecf24de reverted

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/2

------------------------------------------------------------------------
On 2014-08-10T07:24:58+00:00 Dr4-bugzilla wrote:

I've experienced a similar issue when resuming from suspend-to-ram
status. The screen was blank and in dmesg, I have several kernel
messages from the nouveau module. I'm running Linux 3.16.0 (gentoo-
sources package from Gentoo) with xorg-server 1.16.0, x11-drivers/xf86
-video-nouveau-1.10.0-r1 and libdrm-2.4.54.

I will attach part of /var/log/messages with the nouveau errors.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/3

------------------------------------------------------------------------
On 2014-08-10T07:27:58+00:00 Dr4-bugzilla wrote:

Created attachment 104373
kernel messages during wake-up from resume

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/4

------------------------------------------------------------------------
On 2014-08-10T09:19:26+00:00 Dr4-bugzilla wrote:

Forgot to mention my card model:

# lspci -v|fgrep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce 8600 GT] (rev a1) (prog-if 00 [VGA controller])

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/5

------------------------------------------------------------------------
On 2014-08-11T04:15:14+00:00 Sven Joachim wrote:

Same problem here on NV86 [GeForce 8500 GT], reverting commit
ecf24de071f4f6cea79ecef5d990794df5875ee1 in 3.16.0 helps.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/6

------------------------------------------------------------------------
On 2014-08-16T05:02:00+00:00 Agustin-6 wrote:

Update: I got tired of reverting the ecf24de commit on every linux
update, so I tried booting with nouveau.nofbaccel=1 (instead of
nofbaccel=1). It works fine. The system still does not resume properly
without it on Linux v3.16.1, but that boot option is a better workaround
than reverting.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/7

------------------------------------------------------------------------
On 2014-10-25T21:30:59+00:00 Freedesktop-7 wrote:

Same issue. Dell M4800 with QHD+ display -- NVIDIA Corporation GK106GLM
[Quadro K2100M] (rev a1), 3.16.6-gentoo (I tried 3.17, that didn't even
give me a usable display).

None of the workarounds were effective for me: nouveau.nofbaccel=1
causes suspend to fail, and so did reverting
ecf24de071f4f6cea79ecef5d990794df5875ee1:

   A dependency job for suspend.target failed. See 'journalctl -xn' for details.
   ...
   Oct 25 15:21:16 hostname kernel: WARNING: CPU: 0 PID: 2852 at lib/iomap.c:43 bad_io_access+0x36/0x38()
   Oct 25 15:21:16 hostname kernel: Bad IO access at port 0x24 (outl(val,port))

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/8

------------------------------------------------------------------------
On 2014-10-27T22:41:17+00:00 Agustin-6 wrote:

Another update:

I tried running with a secondary monitor. Unfortunately under that setup
the nouveau.nofbaccel=1 workaround doesn't cut it anymore, and only one
monitor works after resume. Trying to unplug and replug or use xrandr
after this has happened doesn't make the other monitor work and once
even left me with no screen. I found some new kernel messages, in
particular:

<6>[    0.336621] nouveau  [     PFB][0000:01:00.0] RAM type: GDDR3
<6>[    0.336623] nouveau  [     PFB][0000:01:00.0] RAM size: 512 MiB
<3>[    0.336620] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000007 FAULT at 0x00e180
--snip--
<6>[    0.365519] nouveau  [     DRM] VRAM: 512 MiB
<6>[    0.365521] nouveau  [     DRM] GART: 1048576 MiB
--snip--
<3>[    0.366886] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070
<3>[    0.368257] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070

right after nouveau loads, another:

<6>[   75.935933] nouveau  [     DRM] suspending console...
<6>[   75.935944] nouveau  [     DRM] suspending display...
<6>[   75.936012] nouveau  [     DRM] evicting buffers...
<6>[   76.206568] nouveau  [     DRM] waiting for kernel channels to go idle...
<6>[   76.206573] nouveau  [     DRM] suspending client object trees...
<6>[   76.207261] nouveau  [     DRM] suspending kernel object tree...
<3>[   76.267516] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070

immediately before suspend and:

<6>[   78.110864] nouveau  [     DRM] re-enabling device...
<6>[   78.110870] nouveau  [     DRM] resuming kernel object tree...
<6>[   78.110882] nouveau  [   VBIOS][0000:01:00.0] running init tables
<3>[   78.200040] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e074
<6>[   78.274292] nouveau  [    VOLT][0000:01:00.0] GPU voltage: 1000000uv
<6>[   78.274303] nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
<6>[   78.274378] nouveau  [     CLK][0000:01:00.0] --: core 399 MHz shader 810 MHz memory 499 MHz
<3>[   78.275977] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070
<3>[   78.277301] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00e070
<6>[   78.277474] nouveau  [     DRM] resuming client object trees...
<6>[   78.277902] nouveau  [     DRM] resuming display...

on resume. Maybe this is another bug?

So now I'm using linux-lts 3.14.22. No problems there, suspend and multi
monitor setups work great.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/9

------------------------------------------------------------------------
On 2014-10-28T01:59:56+00:00 Emil-l-velikov wrote:

Ed,
Considering that the workarounds mentioned do not work in your case and that you have a different card (reporter has nv92, while yours is gk106) we can safely conclude that you're having a different issue.
Please open another bug report and let us know if it is a regression, and if so which commit broke it.


Agustín,
These two should be non-fatal and the fix for them is in 3.18. Should end up in 3.16, 3.17 as well.
> FAULT at 0x00e070
> FAULT at 0x00e074

Now this one, I have no idea. Do you get this error with 3.14 and dual monitors ?
> FAULT at 0x00e180

Linux 3.17 includes quite a few fixes in the area of s/r, can you give
it a try.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/10

------------------------------------------------------------------------
On 2014-10-28T02:41:06+00:00 Agustin-6 wrote:

On 3.14.22 I get no FAULTs and suspend works fine.

On linux 3.17.1 I get a FAULT at 0x00e070 and 0x00e074 on boot, suspend, resume, and when plugging the second monitor for the first time. But I can't reproduce a FAULT at 0x00e180 in any way. Checking the logs it looks like it's quite rare (it happens every twenty or so FAULTs) and unrelated to the second monitor.
If I use the nouveau.nofbaccel=1 (only) one of the monitors comes back after resume. If I don't I get the gabled display, 'GPU lockup' and PGRAPH errors as in the original post.

I'm downloading linux mainline now to test.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/11

------------------------------------------------------------------------
On 2014-10-28T03:07:07+00:00 Emil-l-velikov wrote:

The upstream commit addressing the e07{0,4} messages (ignore the typo in the commit message) is 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/nouveau?id=b485a7005faba38286bc02ab1d80e2cbf61c1002

^^ is just in case 3.18 causes some other unwanted behaviour.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/12

------------------------------------------------------------------------
On 2014-10-28T03:22:10+00:00 Agustin-6 wrote:

Brilliant, Linux 3.18-rc2 resumes both monitors with
nouveau.nofbaccel=1. :D

So it indeed was a different issue. The original GPU lockup bug is still
there, though.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386695/comments/13


** Changed in: linux
       Status: Unknown => Confirmed

** Changed in: linux
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1386695

Title:
  [3.16.0-23] Resume from suspend/hibernation, GPU lock - possible
  regression

Status in The Linux Kernel:
  Confirmed
Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  I'm testing the new development branch (Vivid Vervet), which currently
  has the Utopic kernel 3.16.0-23-generic installed (proposed
  repositories are enabled).

  This problem might be a regression, and it has been reported

  upstream:

  https://bugs.freedesktop.org/show_bug.cgi?id=81136

  Problem:

  When I try to resume from hibernation the screen hangs and the GPU is
  locked. It cannot load the graphics properly.

  Testing:

  Unloading or blacklisting the nouveau driver and perform the following
  test, seems to indicate that  the problem is nouveau:

  # echo platform > /sys/power/disk
  # echo devices > /sys/power/pm_test
  # echo disk > /sys/power/state 

  Mainline:

  The problem seems to be fixed in latest stable mainline kernel, for now
  3.17.1-031701-generic (Utopic).

  ProblemType: Bug
  DistroRelease: Ubuntu 15.04
  Package: linux-image-3.16.0-23-generic 3.16.0-23.31 [modified: boot/vmlinuz-3.16.0-23-generic]
  ProcVersionSignature: Ubuntu 3.16.0-23.31-generic 3.16.4
  Uname: Linux 3.16.0-23-generic x86_64
  ApportVersion: 2.14.7-0ubuntu8
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC3:  nikos      1690 F.... pulseaudio
   /dev/snd/controlC1:  nikos      1690 F.... pulseaudio
   /dev/snd/controlC2:  nikos      1690 F.... pulseaudio
   /dev/snd/controlC0:  nikos      1690 F.... pulseaudio
  CurrentDesktop: Unity
  Date: Tue Oct 28 15:24:09 2014
  HibernationDevice: RESUME=UUID=f0688f3d-9938-4cb3-b79e-7c67f7593350
  InstallationDate: Installed on 2014-10-24 (4 days ago)
  InstallationMedia: Ubuntu 14.10 "Utopic Unicorn" - Release amd64 (20141022.1)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
  MachineType: MSI MS-7623
  ProcFB: 0 nouveaufb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.16.0-23-generic root=UUID=170bb898-7aa9-4678-8e80-58a445647f93 ro resume=/dev/disk/by-uuid/f0688f3d-9938-4cb3-b79e-7c67f7593350
  RelatedPackageVersions:
   linux-restricted-modules-3.16.0-23-generic N/A
   linux-backports-modules-3.16.0-23-generic  N/A
   linux-firmware                             1.138
  RfKill:
   
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 12/06/2010
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: V17.9
  dmi.board.asset.tag: To Be Filled By O.E.M.
  dmi.board.name: 880GMA-E45 (MS-7623)
  dmi.board.vendor: MSI
  dmi.board.version: 3.0
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: MSI
  dmi.chassis.version: 3.0
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrV17.9:bd12/06/2010:svnMSI:pnMS-7623:pvr3.0:rvnMSI:rn880GMA-E45(MS-7623):rvr3.0:cvnMSI:ct3:cvr3.0:
  dmi.product.name: MS-7623
  dmi.product.version: 3.0
  dmi.sys.vendor: MSI

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1386695/+subscriptions


References