← Back to team overview

kernel-packages team mailing list archive

[Bug 1158689] Re: 10de:0422 bringing up dash causes screen corruption on nouveau

 

Launchpad has imported 89 comments from the remote bug at
https://bugs.freedesktop.org/show_bug.cgi?id=58378.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2012-12-16T23:36:06+00:00 Henrique-ribeiro-dias wrote:

Created attachment 71610
stack trace

I have a NVIDIA GeForce 8400M G graphics card. I've been using nouveau
drive for a long time without any kind of problems. After upgrade the
kernel to 3.7.0 version I have a lot of issues. After login in to the
system and after having spent some time using the system the graphics
are corrupted. The graphics show up with mixed colors.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/0

------------------------------------------------------------------------
On 2012-12-16T23:37:57+00:00 Henrique-ribeiro-dias wrote:

Created attachment 71612
Screenshot

Screenshot showing the problem.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/1

------------------------------------------------------------------------
On 2012-12-17T17:04:44+00:00 Henrique-ribeiro-dias wrote:

Today messages from dmesg:

[ 4115.879007] nouveau E[  PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 07ff00 warp 0, opcode 00000000 00000000
[ 4115.879007] nouveau  [  PGRAPH][0000:01:00.0]  TRAP
[ 4115.879007] nouveau E[  PGRAPH][0000:01:00.0] ch 5 [0x00077db000] subc 3 class 0x8297 mthd 0x1694 data 0x00010031

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/2

------------------------------------------------------------------------
On 2012-12-17T17:12:40+00:00 Henrique-ribeiro-dias wrote:

Created attachment 71674
my graphics are a mess.

my graphics are a mess.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/3

------------------------------------------------------------------------
On 2012-12-17T19:21:03+00:00 Henrique-ribeiro-dias wrote:

more dmesg messages:

[ 1123.476832] nouveau E[  PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 07ff00 warp 0, opcode ffffffff ffffffff
[ 1123.476839] nouveau  [  PGRAPH][0000:01:00.0]  TRAP
[ 1123.476844] nouveau E[  PGRAPH][0000:01:00.0] ch 6 [0x000765e000] subc 3 class 0x8297 mthd 0x1694 data 0x00010031

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/4

------------------------------------------------------------------------
On 2012-12-17T21:42:34+00:00 Henrique-ribeiro-dias wrote:

Created attachment 71698
Another screenshot

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/5

------------------------------------------------------------------------
On 2012-12-17T21:49:41+00:00 Henrique-ribeiro-dias wrote:

# lspci -nnvv

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86 [GeForce 8400M G] [10de:0428] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. Device [1462:3fe9]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	Region 5: I/O ports at cc00 [size=128]
	Expansion ROM at fe0e0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Kernel driver in use: nouveau

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/6

------------------------------------------------------------------------
On 2012-12-19T10:23:48+00:00 Henrique-ribeiro-dias wrote:

The problem persist with 3.7.1 kernel.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/7

------------------------------------------------------------------------
On 2013-02-14T01:20:55+00:00 Nemasu wrote:

I am having the same problems post kernel version 3.7.0 with a GeForce
8800 GTS. Even glxgears will lock up.

I get a ton of these messages:
[   83.399004] nouveau  [   PFIFO][0000:01:00.0] CACHE_ERROR - Ch 2/3 Mthd 0x108c Data 0x2036652f

with the occasional:
[   83.418650] nouveau E[  PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 4 MP 1: INVALID_OPCODE at 07f4d8 warp 2, opcode 0423c788 10000811
[   83.418659] nouveau  [  PGRAPH][0000:01:00.0]  TRAP
[   83.418663] nouveau E[  PGRAPH][0000:01:00.0] ch 4 [0x0027948000] subc 3 class 0x5097 mthd 0x0f04 data 0x00000000
[   83.418672] nouveau E[     PFB][0000:01:00.0] trapped read at 0x0000000000 on channel 0x00027948 PFIFO/PFIFO_READ/SEMAPHORE reason: DMAOBJ_LIMIT
[   83.431368] nouveau E[  PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 4 MP 1: INVALID_OPCODE at 07f4d8 warp 2, opcode 0423c788 10000811
[   83.431376] nouveau  [  PGRAPH][0000:01:00.0]  TRAP
[   83.431379] nouveau E[  PGRAPH][0000:01:00.0] ch 4 [0x0027948000] subc 3 class 0x5097 mthd 0x0f04 data 0x00000000

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/8

------------------------------------------------------------------------
On 2013-02-26T13:36:05+00:00 Blackberryqueen wrote:

Same here with nVidia GeForce 8400M G videocard in an Acer Aspire 7520 G
laptop running Ubuntu 12.10 64bit AMD64. My first impression was a heat
problem due to dust. So i cleaned the laptop fan and refitted the
heatsink and heatpipes with new thermal (silver) contact paste, but the
video-error reoccurs. When only two webpages are opened: no problem.
Starting a Youtube video: screen is a mass, like Henrique Dias reported.

Is there a relation to the reported failure of nVidia GeForce 8 series??
http://news.cnet.com/8301-13924_3-10037632-64.html


Carolien.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/9

------------------------------------------------------------------------
On 2013-03-26T14:48:10+00:00 Pauloedgarcastro wrote:

Hi.

I have exactly the same issue.
I seem to be able to trigger it faster by opening firefox on a page with many images.

Current Kernel: 3.8.3-103.fc17.x86_64
Other kernels affected:

kernel-3.7.9-104.fc17.x86_64
kernel-3.7.9-101.fc17.x86_64

01:00.0 VGA compatible controller: nVidia Corporation G86 [GeForce 8300 GS] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: nVidia Corporation Device 0494
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        Region 5: I/O ports at df00 [size=128]
        [virtual] Expansion ROM at fb000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM unknown, Latency L0 <512ns, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/18

------------------------------------------------------------------------
On 2013-04-24T10:01:53+00:00 Pauloedgarcastro wrote:

After further investigation, this issue only seems to happen to applications using the gtk libs.
In my case at least ...

After triggering the bug, any app which is using the GTK libs will be affected.
It does not seem to affect other app's ( not using gtk ) rendering process.

Also, the same issue doesn't happen whilst using the NVIDIA drivers,
which are just impossible to use as in my case the system is just
unusable slow.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/22

------------------------------------------------------------------------
On 2013-05-28T20:03:52+00:00 Davebjorkl wrote:

Hello!

New to ubuntu. I have an old acer 5520g with the exact same problem you
are describing in the comments above. I also tought it was a heat
problem and found alot of dust in the graphics cards fan. My computer
completely locks down and I am unable to even login or open a terminal
at the loginscreen after the first glitch.

Dave

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/29

------------------------------------------------------------------------
On 2013-11-28T11:17:49+00:00 Torsten-stocklossa-g wrote:

Hi, same here after updating to Ubuntu 12.04.3

Kernel 3.8.0-33-generic


lspci -nnvv says:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86M [GeForce 8400M G] [10de:0428] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Fujitsu Limited. Device [10cf:1422]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at dc000000 (64-bit, non-prefetchable) [size=32M]
	Region 5: I/O ports at 2000 [size=128]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: <access denied>
	Kernel driver in use: nouveau
	Kernel modules: nouveau, nvidiafb

Graphic is distorted once it happens the system is frozen ( with some
luck I may reach a terminal )

Before it happens the fontcolor in Windowframes changes to "white on white " e.g. same as the background color 
I run a E8410 Lifebook 

BTW : Using the Nvidia proprietary drivers is not an option they made
the system unusable at all and forced me to reinstall several times

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/30

------------------------------------------------------------------------
On 2013-11-29T10:48:58+00:00 Torsten-stocklossa-g wrote:

HI again, in addition some error messages

Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.215782] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.304180] nouveau E[  PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000004 warp 10, opcode ffb9c1d8 ffbac2d9
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.304188] nouveau E[  PGRAPH][0000:01:00.0]  TRAP
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.304193] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.304477] nouveau E[  PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000004 warp 10, opcode ffb9c1d8 ffbac2d9
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.304483] nouveau E[  PGRAPH][0000:01:00.0]  TRAP
Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [   66.304487] nouveau E[  PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000


and 

Nov 29 11:26:04 torsten-LIFEBOOK-E8410 kernel: [  323.106306] nouveau E[     DRM] GPU lockup - switching to software fbcon
Nov 29 11:27:07 torsten-LIFEBOOK-E8410 kernel: [  386.736037] nouveau E[    3431] failed to idle channel 0xcccc0001
Nov 29 11:27:09 torsten-LIFEBOOK-E8410 kernel: [  388.735098] nouveau E[   PFIFO][0000:01:00.0] channel 3 unload timeout
Nov 29 11:27:12 torsten-LIFEBOOK-E8410 kernel: [  391.732025] nouveau E[    3431] failed to idle channel 0xcccc0000
Nov 29 11:27:14 torsten-LIFEBOOK-E8410 kernel: [  393.731221] nouveau E[   PFIFO][0000:01:00.0] channel 2 unload timeout
Nov 29 11:28:09 torsten-LIFEBOOK-E8410 kernel: [  448.580025] nouveau E[    4056] failed to idle channel 0xcccc0001
Nov 29 11:28:11 torsten-LIFEBOOK-E8410 kernel: [  450.579162] nouveau E[   PFIFO][0000:01:00.0] channel 3 unload timeout
Nov 29 11:28:14 torsten-LIFEBOOK-E8410 kernel: [  453.576022] nouveau E[    4056] failed to idle channel 0xcccc0000
Nov 29 11:28:16 torsten-LIFEBOOK-E8410 kernel: [  455.575198] nouveau E[   PFIFO][0000:01:00.0] channel 2 unload timeout
Nov 29 11:29:17 torsten-LIFEBOOK-E8410 kernel: [  516.552036] nouveau E[    4211] failed to idle channel 0xcccc0001
Nov 29 11:29:19 torsten-LIFEBOOK-E8410 kernel: [  518.553893] nouveau E[   PFIFO][0000:01:00.0] channel 3 unload timeout
Nov 29 11:29:22 torsten-LIFEBOOK-E8410 kernel: [  521.556024] nouveau E[    4211] failed to idle channel 0xcccc0000
Nov 29 11:29:24 torsten-LIFEBOOK-E8410 kernel: [  523.555077] nouveau E[   PFIFO][0000:01:00.0] channel 2 unload timeout


For both the session is Gnome. Now when running on Gnome (no effects ) ist is slighly more stable.

As mentioned I also tried NVIDIA drivers .... with the effect that the
system was unusable at all.

Since the issue seems to be quite old . . . there should be an
appropriate solution by now !

cheers
TS

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/31

------------------------------------------------------------------------
On 2013-12-12T14:00:02+00:00 Torsten-stocklossa-g wrote:

HI,
I wonder if this is still alive ?? Any news on this

cheers
T

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/33

------------------------------------------------------------------------
On 2013-12-12T14:11:23+00:00 Ilia Mirkin wrote:

Messing with priority just annoys the developers.

In the meanwhile, try new kernels. I only see up to 3.8 tested. Do a
bisect. There was a major driver rewrite in 3.7, but it might have been
something else that causes the issue. Make sure you're running an
updated DDX.

As you might imagine, none of the devs are seeing this, so you'll have
to do the debugging if you want it fixed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/34

------------------------------------------------------------------------
On 2013-12-13T12:19:43+00:00 Awl1 wrote:

Created attachment 90715
Distorted graphics with RHEL6/OL6 showing uname -a kernel 3.12.4

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/35

------------------------------------------------------------------------
On 2013-12-13T12:20:52+00:00 Awl1 wrote:

Created attachment 90717
Distorted graphics: Icons (on kernel 3.12.4)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/36

------------------------------------------------------------------------
On 2013-12-13T12:45:32+00:00 Awl1 wrote:

Hello,

I would like to join discussions in this bug, as I have found myself
affected after the recent update from Red Hat Enterprise Linux/Oracle
Linux 6.4 (stock RHEL kernel 2.6.32-358.23.2) to RHEL/OL 6.5 (RHEL
kernel 2.6.32-431).

My graphics card is NVidia Quadro NVS 130M:
BOOT0  : 0x086a00a2
Chipset: G86 (NV86)
Family : NV50

It seems that RHEL 6.5 kernel 2.6.32-431 has updated its kernel modules
for nouveau DRM to a codebase level that matches official Linux kernels
3.7, and therefore introduced this severe graphics distortion issue into
mainline RHEL 6.

In order to verify that it indeed is the nouveau DRM kernel module
resonsible for the distortion, I have upgraded my OL6 packages to the
following versions:

* mesa-9.2.0.5 (including support for nouveau, which is commented out by default in RHEL6)
* libdrm-2.4.50
* xorg-x11-drv-nouveau-1.0.9

but this does NOT affect the issue at all.

But reverting back to RHEL stock kernel 2.6.32-358.23.2 makes the issue
vanish, also when using the above updated library versions.

I then tried Oracle's UEK kernels, and while the current UEK2 kernel
(2.6.39-400.211.2) does NOT have the issue, the current UEK3 kernel
(3.8.13-16.2.2) also shows it.

I then tried to find out about the exact "versions" (git commit levels?)
of the nouveau libdrm modules, and found out the following:

(1) Oracle UEK2 kernel 2.6.39-400.211.2 - NO ISSUE:
[drm] Initialized nouveau 0.0.16 20090420 for 0000:01:00.0 on minor 0

(2) RHEL stock kernel 2.6.32-358.23.2 - NO ISSUE:
[drm] Initialized nouveau 1.0.0 20120316 for 0000:01:00.0 on minor 0

(3) RHEL stock kernel 2.6.32-431 - DOES SHOW THE ISSUE:
[drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0

(4) more recent kernels, such as Oracle UEK3 (3.8.13-16.2.2) and the most recent Oracle "playground" kernel from public-yum.oracle.com (3.12.4-3.12.y.20131210) all DO SHOW THE ISSUE:
[drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0

So to me it now seems as if the issue has been introduced with the
massive changes to nouveau/DRM that went into 3.7:

http://www.phoronix.com/scan.php?page=news_item&px=MTE1NDg

and affects ALL subsequent versions since then... :-(

I would be very interested and willing to help in debugging/tracking
this down, but I don't have any git background, so you would have to
guide me through how to do the "bisect"...

Hope this helps & looking forward to your feedback! :-)

Best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/37

------------------------------------------------------------------------
On 2013-12-13T13:25:20+00:00 Awl1 wrote:

Had been missing my "lspci -nnvv" information:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86M [Quadro NVS 130M] [10de:042a] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Toshiba America Info Systems Device [1179:0002]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	Region 5: I/O ports at cf00 [size=128]
	[virtual] Expansion ROM at fc000000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau, nvidiafb

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/38

------------------------------------------------------------------------
On 2013-12-13T13:53:47+00:00 Ilia Mirkin wrote:

(a) Can we see a full boot log (e.g. output of dmesg) with a recent
kernel? Ideally it would include the time that the visual issues happen.

(b) This looks like it could be a fencing issue, i.e. we try to draw to
a texture, but then instead of waiting, we don't wait. There were some
fixes that went into 3.13-rc1, so perhaps trying the latest and greatest
(e.g. 3.13-rc3, or the latest Linus HEAD) would be good to test out.

(c) There are many bisection guides on the internet. You will also need
to figure out how to make the compiled kernel play nice with your
distribution. The basics are simple though:

1. git bisect start v3.7 v3.6 -- drivers/gpu/drm/nouveau
2. build/install/boot/test
3. if it's good, "git bisect good", if it's bad, "git bisect bad"
4. goto 2

At some point running the step 3 command will tell you "first bad commit
is xyz". That's when you're done. I suspect it might be the giant mega
"rewrite nouveau" commit, in which case we're screwed and this will have
been a huge time-waster (apologies in advance if it turns out this way).
But it might be one of the many other commits that went into 3.7, which
would be nice and indicate an area to focus on.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/39

------------------------------------------------------------------------
On 2013-12-13T16:57:44+00:00 Awl1 wrote:

Hello Ilia,

regarding (a) and (b): I am just waiting for a rpmbuild of an OL6
version of 3.13-rc3 to finish and will report back on my findings and
include a dmesg output from that version.

Regarding (c):

Would'nt it make more sense than starting with 3.6 release and 3.7
release tags to first rule out the "mega commit"?

Can you give me the git commands (or point me to a doc that tells me how
to produce them) for getting "ordinary kernel tarballs" out of the DRM
nouveau git just like the ones published on

https://www.kernel.org/pub/linux/kernel/v3.0/testing/

for two points in time in between 3.6 and 3.7:

(1) for the version up to the immediate commit BEFORE the "mega commit"
(2) for the version exactly matching the "mega commit"?

Using these two kernel tarballs, I could then either confirm or rule out
the "mega commit" as the root cause for the issue, and in the (unlikely)
case the mega commit can indeed be ruled out, I could then concentrate
on further narrowing down the commits

* either between 3.6 and the mega commit if build (1) is already broken
* or between the mega commit and 3.7 if build (2) still works, but 3.7 fails?

Sorry, but rather than pulling the whole git on my poor old laptop and
starting a huge number of bisection attemps "into the blue", I think
that this makes more sense and does not require me to become a git
expert in order to try and help tracking this down... ;-)

What do you think?

I will report back shortly with my 3.13-rc3 results...

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/40

------------------------------------------------------------------------
On 2013-12-13T19:03:38+00:00 Ilia Mirkin wrote:

The mega-commit is ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69. So you
could check out ebb945a94bba^ and see if it works, and then test
ebb945a94bba to see if it doesn't. In either case, you could use those
as your new "good" or "bad" starting points.

You can do a clone with like --depth 1 or something. Not sure how to do
that at a commit. Also I'd recommend against it, it'll just be more
downloading later on if things don't pan out. A full git clone of the
linux kernel is ~800MB (+ space to actually store the files, but that's
all part of the 800MB). In fact, I don't even know if that 818MB is
compressed or not -- I'd guess not, so the download is probably much
smaller.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/41

------------------------------------------------------------------------
On 2013-12-14T11:21:50+00:00 Awl1 wrote:

Created attachment 90764
dmesg output on 3.13-rc3 while the issue was seen

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/42

------------------------------------------------------------------------
On 2013-12-14T11:22:31+00:00 Awl1 wrote:

Created attachment 90765
dmesg output in debug mode (nouveau.debug=debug) on 3.13-rc3 while the issue was seen

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/43

------------------------------------------------------------------------
On 2013-12-14T11:30:05+00:00 Awl1 wrote:

Hi again,

sorry, it took longer than needed for me to find my way through
compiling recent kernels with rpmbuild and an appropriate spec file.

The result of my testing is negative: The bug is still included in the
most recent 3.13-rc3 kernel... :-(

>From the attached dmesg output (which in both cases, includes the time
when the issue was seen and my screen was completely garbled), it looks
to me that there are no signs - not even in debug mode - of anything
going wrong, so if I am right with this assumption, I think this
supports your theory that the root cause of the severe screen corruption
indeed is a "fencing" issue...

In the meantime, I have created a git repository on my machine and
produced two 3.6-based tarballs for before and after the "mega patch".

I will now move forward to adapt a 3.6 kernel rpmbuild spec file and
then build two kernels for these two snapshots.

I should be able to update you on my progress some time tomorrow...

Thanks & best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/44

------------------------------------------------------------------------
On 2013-12-14T17:36:05+00:00 Awl1 wrote:

Hmm - bad news once again:

I have now compiled and tested a 3.6.kernel to match the commit
immediately before the "mega commit", i.e. the kernel tarball has been
produced by the following command:

$ git archive --format=tar "ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69^" |
bzip2 > ~/Projekte/nouveau_drm/linux-before-mega.tar.bz2

Unfortunately, I am unable to test whether the screen distortion issue
occurs with this kernel, because I get a complete hang (system freezes,
CPU and GPU fans running full speed) somewhere between some seconds and
some minutes after starting GNOME...

Note that I have seen both: either no screen corruption at all or first
slight signs of screen corruption (white rectangles around window
frames) at the times of the hangs.

The error messages that I find in /var/log/messages probably associated
with the hangs (sorry, I can't get any messages ot of dmesg due to the
hang...) seem to be the following:

[drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown

repeating any number between 3 to 5 times directly before the hangs
(immediately followed by /var/log/messages starting over with my power-
off machine restart).

Will now move forward to test with the most recent stock kernel from the
3.6 series: 3.6.11-3.6.y.20121225.ol6 from the Oracle public yum
playground to see whether this already is affected... :-(

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/45

------------------------------------------------------------------------
On 2013-12-14T17:58:51+00:00 Awl1 wrote:

This gets really interesting now:

Oracle public yum "playground" 3.6.11-3.6.y.20121225.ol6 (should be
stock kernel 3.6.11) does NOT show any hangs, but DOES INDEED ALREADY
show the graphics corruption issue FOR ME (although it was thought by
the original posters here that it started with 3.7.0)...!?

So I will now try and move backwards in kernel versions until I might
find one that does not exhibit the corruption bug.

As Oracle's "playground" kernels are only available starting from 3.6, I
will probably move to ELRepo "ml" kernels for this job.

I'll report back once I have some idea of where exactly the issue indeed
started...

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/46

------------------------------------------------------------------------
On 2013-12-15T14:09:55+00:00 Awl1 wrote:

OK, finally I have some more encouraging news:

It now looks like the issue indeed started much earlier than initially
thought, namely already between the 3.4 and 3.5 kernel series!!!

Results from my testing with stock kernels obtained from kernel.org
(I've never ever before compiled so many kernels in such a short period
of time...):

* 3.4.5 -> NO ISSUE
* 3.4.74 -> NO ISSUE
* 3.5.1 -> ISSUE SEEN
* 3.5.5 -> ISSUE SEEN
* all later versions (3.6 onwards) -> ISSUE SEEN.

So please advise now what next steps I should undertake to track it down
more closely:

What new commits have happened between the 3.4 and 3.5 series, and did
one of them possibly affect so-called "fencing" on NV86/NV50 chips?

(And - in order to learn some more git - how can I find out the
associated commits using git command-line, such that I can produce the
respective kernel tarballs for testing out of git?)

Many thanks in advance for your feedback! :-)

Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/47

------------------------------------------------------------------------
On 2013-12-15T14:13:11+00:00 Awl1 wrote:

In addition, one more request to all the other people who raised this
issue here and/or have also seen it before myself:

Can you confirm that for you, the issue indeed also already started
after the 3.4 series like it does for me, i.e. you never tried a 3.5.x
or 3.6.x kernels?

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/48

------------------------------------------------------------------------
On 2013-12-15T14:42:27+00:00 Ilia Mirkin wrote:

You really need to figure out how to do things inside the git tree and
not do some sort of crazy export. That will speed things up by an order
of magnitude.

To get the list of nouveau changes between 3.4 and 3.5:

git log v3.4..v3.5 -- drivers/gpu/drm/nouveau

To do a bisect between 3.4 and 3.5, same instructions as before, but use
v3.5 as the bad tag and v3.4 as the good tag.

Looking through the list of changes,
c420b2dc8dc3cdd507214f4df5c5f96f08812cbe stands out as a big one, as
does 5e120f6e4b3f35b741c5445dfc755f50128c3c44 which actually introduces
the nv84+ fence mechanism.

This had actually previously occurred to me, but a quick thing to try
out is to switch to the nv17 fence and see what happens. You can do this
by editing the logic in
drivers/gpu/drm/nouveau/nouveau_drm.c:nouveau_accel_init, and just
replace nv84_fence_create with nv50_fence_create (which will make a
nv50+ appropriate nv17 fence impl).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/49

------------------------------------------------------------------------
On 2013-12-15T15:19:35+00:00 Awl1 wrote:

Many thanks for your quick reply - even on a Sunday! :-)


Regarding:

"You really need to figure out how to do things inside the git tree and not do
some sort of crazy export. That will speed things up by an order of magnitude."

the main issue is that I need to build a RHEL6/OL6 compliant kernel on
my machine, and I simply don't have a spec file which properly builds
such a kernel from git, so I need to export the git snapshot to a
tarball.

In case you have such an RHEL6/OL6 spec file (or know where to get one
from), please let me know...


I'm just in the process of trying whether moving from nv84_fence_create to nv50_fence_create will make a difference with 3.6.11 and will report back later.

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/50

------------------------------------------------------------------------
On 2013-12-15T23:09:23+00:00 Awl1 wrote:

Created attachment 90812
/var/log/messages from 3.12.4 start attempt with NV50 fence

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/51

------------------------------------------------------------------------
On 2013-12-15T23:16:08+00:00 Awl1 wrote:

Bad news once again...

I applied the following single-line patch to a stock 3.12.4 kernel in
order to switch to the NV50 fence:

diff -Nrpu linux-3.12.4.orig/drivers/gpu/drm/nouveau/nouveau_drm.c linux-3.12.4/drivers/gpu/drm/nouveau/nouveau_drm.c
--- linux-3.12.4.orig/drivers/gpu/drm/nouveau/nouveau_drm.c	2013-12-08 17:18:58.000000000 +0100
+++ linux-3.12.4/drivers/gpu/drm/nouveau/nouveau_drm.c	2013-12-15 16:37:25.000000000 +0100
@@ -180,7 +180,7 @@ nouveau_accel_init(struct nouveau_drm *d
 	else if (device->chipset   <  0x17) ret = nv10_fence_create(drm);
 	else if (device->card_type < NV_50) ret = nv17_fence_create(drm);
 	else if (device->chipset   <  0x84) ret = nv50_fence_create(drm);
-	else if (device->card_type < NV_C0) ret = nv84_fence_create(drm);
+	else if (device->card_type < NV_C0) ret = nv50_fence_create(drm);
 	else                                ret = nvc0_fence_create(drm);
 	if (ret) {
 		NV_ERROR(drm, "failed to initialise sync subsystem, %d\n", ret);

but the result is that after the GUI login screen (gdm) which works
fine, I get a complete hang when GNOME starts up using compiz (cannot
even switch to a text vt any more) and lots of the following output:

Dec 15 23:23:05 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 0 mthd 0x0018 data 0x00000002
Dec 15 23:23:05 aloew-lap kernel: nouveau E[     PFB][0000:01:00.0] trapped write at 0x0000000114 on channel 0x0000f949 [unknown] PFIFO/PFIFO_READ/SEMAPHORE reason: PT_NOT_PRESENT
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x6f000000
Dec 15 23:23:06 aloew-lap kernel: nouveau E[     PFB][0000:01:00.0] trapped write at 0x0000000114 on channel 0x0000f949 [unknown] PFIFO/PFIFO_READ/SEMAPHORE reason: PT_NOT_PRESENT
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x72000000
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x76000000
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x74000000
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x6f000000
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x60000000
Dec 15 23:23:06 aloew-lap kernel: nouveau E[   PFIFO][0000:01:00.0] CACHE_ERROR - ch 4 [compiz[4637]] subc 2 mthd 0x0860 data 0x41000000
(...)

(see the attached bz2 for the full log).

So does this mean that your proposal of switching to the nv50_fence
won't work for me?

In the meantime, I will continue and try kernel builds based on commits
"5e120f6e4b3f35b741c5445dfc755f50128c3c44^" and
"5e120f6e4b3f35b741c5445dfc755f50128c3c44" tomorrow...

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/52

------------------------------------------------------------------------
On 2013-12-15T23:44:21+00:00 Awl1 wrote:

Follow-up question note that I am digging deeper into git... ;-)

>From the commits in the v3.4..v3.5 range, only two of them:

9bd0c15fcfb42f6245447c53347d65ad9e72080b (dated Jun 26, 2012) and
e9bf5f36b09f8ec6c168ef58ee7d4890545ede1c (dated Jun 27)

when looking at the global Makefile:

git show <commit-sha1>:Makefile

have been done on 3.5.0-rc4 version of the kernel.

All other commits in this range had been done on 3.4.0 and less:

35916acedd8dadb361ef6439d05d60fbe8f53032 (dated May 31)

and all earlier commits have been done on 3.4.0 and its rc builds.

As the issue is NOT present in the 3.4.x series anyway, I assume that
only the two commits on 3.5.0-rc4 above (if any) from this interval are
relevant, and we rather need to look at the subsequent v3.5..v3.6
range!?

Am I correct (and please bear with me in case I got it wrong - I had
never used git before looking into this...)?

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/53

------------------------------------------------------------------------
On 2013-12-15T23:51:11+00:00 Ilia Mirkin wrote:

(In reply to comment #35)
> Follow-up question note that I am digging deeper into git... ;-)
> 
> From the commits in the v3.4..v3.5 range, only two of them:
> 
> 9bd0c15fcfb42f6245447c53347d65ad9e72080b (dated Jun 26, 2012) and
> e9bf5f36b09f8ec6c168ef58ee7d4890545ede1c (dated Jun 27)
> 
> when looking at the global Makefile:
> 
> git show <commit-sha1>:Makefile
> 
> have been done on 3.5.0-rc4 version of the kernel.

Most assuredly not the right way to look at it.

> 
> All other commits in this range had been done on 3.4.0 and less:
> 
> 35916acedd8dadb361ef6439d05d60fbe8f53032 (dated May 31)

And of course date of the commit has nothing to do with anything either.

Think about the branched development model. Let's say I do some work,
basing my work on, say, 2.6.0. I spend a lot of time on it. Then I send
a pull request to Linus (or whoever). He merges it. When looking at my
commits, you might think that you're looking at a 2.6.0 kernel, based on
the Makefile. And in a large sense you are. But in reality the commits
were merged into some much later release. Same with dates.

You can either read about git and fully understand it, or you can kinda
trust that the tools aren't lying to you when you ask for a bisect in a
range, or a log of commits between two revisions.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/54

------------------------------------------------------------------------
On 2013-12-16T13:02:11+00:00 Awl1 wrote:

Hello again, Ilia,

ok, I see - did some further reading and I think I now fully understand
the way it works:

This also means that you are NOT regularly pulling the updates from
Linus' central git into the nouveau git, but typically only do this ONCE
after Linus released a new version (here: 3.4.0) and then NOT for any
minor subsequent release by Linus (3.4.1, 3.4.2 and so on), but ONLY
shortly before he opens the "rc" pull window for his next release series
(here: 3.5-rc1).

So it indeed looks as if all the local commits on the nouveau git have
been made on a 3.4.0 kernel, although they ended up in the official 3.5
version released by Linus.


Besed on this, I did further testing:

Both "5e120f6e4b3f35b741c5445dfc755f50128c3c44^" and
"5e120f6e4b3f35b741c5445dfc755f50128c3c44" do still run fine, i.e. the
commit 5e120f6e4b3f35b741c5445dfc755f50128c3c44 - which actually
introduced the nv84_fence - does NOT seem to be causing the distortion
issue.

I will now move forward (slowly, as I need to do the tarball-based
rpmbuild process), and keep you updated on my findings.


Also, I repeat my question to the other folks who had reported this issue before:

Can you confirm that you also already see the issue when you use any
stock 3.5.x or 3.6.x kernels, i.e. the issue did start long before 3.7.0
and the 3.4.x is the most recent release that works fine?

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/55

------------------------------------------------------------------------
On 2013-12-16T13:19:47+00:00 Tomwij-1 wrote:

(In reply to comment #37)
> This also means that you are NOT regularly pulling the updates from Linus'
> central git into the nouveau git, but typically only do this ONCE after
> Linus released a new version (here: 3.4.0)

That usually is done as often as necessary; if one does it more often,
it could lead to situations where you have pulled a new broken commit
that could slow down Nouveau development. And thus, pulling major
releases is efficient.

> and then NOT for any minor
> subsequent release by Linus (3.4.1, 3.4.2 and so on)

Note that those releases happen by Greg KH and consist of backported
patches.

> but ONLY shortly
> before he opens the "rc" pull window for his next release series (here:
> 3.5-rc1).

That would be one possible moment where the conditions are ideal enough
to pull.

Though I am in doubt whether it matters when this was pulled from Linus.
If you don't like to bisect the Nouveau development branch, you can
bisect kernel git.

> Besed on this, I did further testing:
> 
> Both "5e120f6e4b3f35b741c5445dfc755f50128c3c44^" and
> "5e120f6e4b3f35b741c5445dfc755f50128c3c44" do still run fine, i.e. the
> commit 5e120f6e4b3f35b741c5445dfc755f50128c3c44 - which actually introduced
> the nv84_fence - does NOT seem to be causing the distortion issue.
> 
> I will now move forward (slowly, as I need to do the tarball-based rpmbuild
> process), and keep you updated on my findings.

You really want to be doing a git bisect to do the least amount of work;
I don't see what you mean by "move forward" but I really hope that you
are testing the commits in a binary tree style.

You can put the tarball-based process in a script so you only need to
run a single command after moving further in the bisection.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/56

------------------------------------------------------------------------
On 2013-12-16T14:18:21+00:00 Awl1 wrote:

> You really want to be doing a git bisect to do the least amount of work; I
> don't see what you mean by "move forward" but I really hope that you are
> testing the commits in a binary tree style.

Yup. Indeed, I am trying to use a "binary" approach to minimize work,
but am not using git bisect, but hope to augment augment cutting the
solution tree in half by reading the commit comments and letting my
intellect suggest which ones look like more or less likely candidates...

> You can put the tarball-based process in a script so you only need to run a
> single command after moving further in the bisection.

I have indeed already done so (not based on git bisect, but a commit
id).

Note that while I'm indeed a complete newbie to git, I am not at all a
newbie Linux/Unix shell scripting. In my main job, though, I am a Java
architect/developer/support engineer, so I typically am only a "dummy
user" of Linux kernels - unless it very rarely happens that something
breaks which is really important for me, so I try and see whether I can
help... ;-)

I started trying to drive this forward when I became suddenly affected
by this issue because it has indeed been introduced into RHEL6 mainline
kernels with the most recent RHEL 6.5 kernel update - so I hope that
once we're done and you have been able to successfully fix the issue,
you can take care of the fix also being ported into subsequent RHEL6
kernels (working at Red Hat, I hope that Ben Skeggs should hopefully be
interested enough in doing so...).

Will report back here as soon as I have been able to track things down
to a particular commit... :-)

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/57

------------------------------------------------------------------------
On 2013-12-17T14:51:03+00:00 Awl1 wrote:

Hello again,

I need your help on how to proceed:

Using the bisection approach, I have now largely reduced the candidate
commits that might have introduced the issue:

4c193d254ee94da02857b9670e815b1765a9579b shows the issue, while
c420b2dc8dc3cdd507214f4df5c5f96f08812cbe does not, so the issue has been introduced between May 2nd and May 4th, 2012.

I now wanted to check 78df3a1c585c8c95fd9a472125f0cd406e8617ce, but this
commit does not even compile:

The error message for the above is:

drivers/gpu/drm/nouveau/nouveau_fbcon.c: In function 'nouveau_fbcon_sync':
drivers/gpu/drm/nouveau/nouveau_fbcon.c:166: error: void value not ignored as it ought to be
make[4]: *** [drivers/gpu/drm/nouveau/nouveau_fbcon.o] Error 1
make[3]: *** [drivers/gpu/drm/nouveau] Error 2
make[2]: *** [drivers/gpu/drm] Error 2
make[1]: *** [drivers/gpu] Error 2
make: *** [drivers] Error 2

So how should I proceed? Can you tell me how to fix the above compile
error, or should I proceed to check both

b355096992e2b4d30bb77173927f45e7f2c12570
(immediately before 78df3a1c585c8c95fd9a472125f0cd406e8617ce) and

d1b167e168bdac0b6af11e7a8c601773639fc419
(immediately after 78df3a1c585c8c95fd9a472125f0cd406e8617ce)?

Please advise how I should move forward!

Thanks & best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/58

------------------------------------------------------------------------
On 2013-12-17T15:09:59+00:00 Ilia Mirkin wrote:

The fix for that compilation issue is contained in
d1b167e168bdac0b6af11e7a8c601773639fc419

Basically you need to make nouveau_channel_idle return an int, and just
stick a 'return ret' at the end. And adjust the prototype in
nouveau_drv.h.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/59

------------------------------------------------------------------------
On 2013-12-17T15:19:32+00:00 Awl1 wrote:

Thanks a million for your super-fast reply! So I'll proceed with

d1b167e168bdac0b6af11e7a8c601773639fc419

rather than 78df3a1c585c8c95fd9a472125f0cd406e8617ce, and will report
back later...

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/60

------------------------------------------------------------------------
On 2013-12-17T21:17:53+00:00 Awl1 wrote:

Hello again,

so it looks like I have now tracked the issue down.

The "offending" commit seems to be:

4c193d254ee94da02857b9670e815b1765a9579b

(the first commit which showed the issue - I tried for more than half an
hour with its direct predecessor
d1b167e168bdac0b6af11e7a8c601773639fc419, but could not reproduce the
issue.


As the change by the offending commit seems really very 'simple':

  "use crypto engine for async buffer copies"

@@ -821,6 +839,7 @@ nouveau_bo_move_init(struct nouveau_channel *chan)
        } _methods[] = {
                {  "COPY", 0xa0b5, nve0_bo_move_copy, nvc0_bo_move_init },
                {  "M2MF", 0x9039, nvc0_bo_move_m2mf, nvc0_bo_move_init },
+               { "CRYPT", 0x74c1, nv84_bo_move_exec, nv50_bo_move_init },
                {  "M2MF", 0x5039, nv50_bo_move_m2mf, nv50_bo_move_init },
                {  "M2MF", 0x0039, nv04_bo_move_m2mf, nv04_bo_move_init },

at the heart of the issue, I think we now have the question whether it
indeed is correct that there might be NV84-compatible G86 variants (such
as my 8400M-based Quadro NVS130M), for which this "nv84_bo_move_exec"
causes issues...!?


One more question regarding verification with current kernels:

In a current kernel, method nouveau_bo_move_init looks similar, but
different:

        } _methods[] = {
                {  "COPY", 4, 0xa0b5, nve0_bo_move_copy, nve0_bo_move_init },
                {  "GRCE", 0, 0xa0b5, nve0_bo_move_copy, nvc0_bo_move_init },
                { "COPY1", 5, 0x90b8, nvc0_bo_move_copy, nvc0_bo_move_init },
                { "COPY0", 4, 0x90b5, nvc0_bo_move_copy, nvc0_bo_move_init },
                {  "COPY", 0, 0x85b5, nva3_bo_move_copy, nv50_bo_move_init },
                { "CRYPT", 0, 0x74c1, nv84_bo_move_exec, nv50_bo_move_init },
                {  "M2MF", 0, 0x9039, nvc0_bo_move_m2mf, nvc0_bo_move_init },
                {  "M2MF", 0, 0x5039, nv50_bo_move_m2mf, nv50_bo_move_init },
                {  "M2MF", 0, 0x0039, nv04_bo_move_m2mf, nv04_bo_move_init },
                {},
                { "CRYPT", 0, 0x88b4, nv98_bo_move_exec, nv50_bo_move_init },
        }, *mthd = _methods;

what would be an equivalent change to a current kernel to roll back the
effects of the above forward patch?

Looking forward to your feedback...

Thanks a million & best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/61

------------------------------------------------------------------------
On 2013-12-17T21:34:34+00:00 Ilia Mirkin wrote:

Try commenting the same line out and see what happens... (i.e. the one
with 0x74c1)

FWIW I do remember seeing some PCRYPT-related (and PVP/PBSP-related)
errors on start in the form of MMIO write failures in your log and
thinking it odd:

nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at
0x10200c

Which of course is an enable of FIFO_ACCESS... probably pretty
important. (See
https://github.com/envytools/envytools/blob/master/rnndb/vdec/vp2/pcrypt2.xml)
But why do you get that error... anyone's guess. If you have the blob
installed, would be interested to know if VDPAU hw decode acceleration
works for H.264 (i.e. things are actually accelerated), because you get
similar errors for the VP/BSP engines:

nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00fd94
nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x103d94

And they're all interconnected, I think. [Sadly none of the other bug
commenters were kind enough to leave a kernel log around, so can't
easily tell if that was their issue as well.]

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/62

------------------------------------------------------------------------
On 2013-12-17T22:09:08+00:00 Awl1 wrote:

Hmm...

While my card definitely is not at all defective (it works the exact
same way it always has), I indeed also have some issues that make it
impossible for me to still use the "blob" even on an attic distro like
RHEL6:

The most recent version of the NVidia proprietary driver that worked
fine for me was their 285.09 (which can no longer be used even on RHEL6
due to the fact that it relies on an outdated X11 ABI (AFAIK).

Any more recent version of NVidia's driver (and most interestingly, on
Linux as well as on Windows 7 x64!) - even though NVidia states that
Quadro NVS 130M would still be supported with their latest drivers - has
an issue which causes sudden complete hangs every once in a while
(between a few seconds to few hours), but completely unpredictably...

Unfortunately, NVidia tech support is unable/unwilling to help with this
issue (I tried for several months without any progress...).

I just did some Google research: Do you know that an 8400M (as well as
my NVS 130M) does only support VDPAU "feature set A", but most VDPAU
software relies on feature sets C or D being implemented by the cards!?

Would it make sense to address the above questions about the feature set
implemented by these early G86 cards and how to properly activate these
features directly to NVidia (AFAIK, they recently offered some help by
answering questions from nouveau developers)?

In order to move forward, I will try to comment out the single line

 { "CRYPT", 0, 0x74c1, nv84_bo_move_exec, nv50_bo_move_init },

in a current 3.12.4 build, and report back on what happens...

Best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/63

------------------------------------------------------------------------
On 2013-12-18T00:58:35+00:00 Ilia Mirkin wrote:

Well, given that it doesn't work on the blob makes it sound like you
have some sort of funkiness in your hardware. One unsubstantiated theory
is that the vdec clock is *disabled*, and pcrypt is hooked up to that
clock. Or perhaps that clock is somehow broken. It'd be interesting to
see whether the old blob version can be made to work, but I wouldn't
spend _too_ much time on it. May be easy to do with an older livecd or
something.

So, in order to completely disable PCRYPT without patching your system
you can boot with "nouveau.config=PCRYPT=0" in your kernel cmdline
(since 3.7, I think). It will also disallow userspace from using PCRYPT,
which is probably for the best if it's really broken. (Whereas just
commenting it out there prevents a very specific use-case of it.)

With a pre-3.13 kernel, you can also try adding nouveau.perflvl_wr=7777
nouveau.perflvl=1 which will force reclocking to happen on boot (to
level '1' which in your case is comparable to what you had been booting
to anyways), and just might get PCRYPT going (if the clock theory is
right). Or hang your machine. (Or both!) With 3.13 you'll need to apply
a patch to enable the reclocking functionality. [Obviously test this
theory without the PCRYPT disable stuff.]

BTW, to people who are not Andreas: Please post a full kernel log of a
boot with nouveau in a semi-recent kernel, that should reveal whether
you guys are all having the same issue or not.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/64

------------------------------------------------------------------------
On 2013-12-18T13:20:00+00:00 Awl1 wrote:

Hi - it's me again ;-)

> Well, given that it doesn't work on the blob makes it sound like you have some
> sort of funkiness in your hardware. One unsubstantiated theory is that the 
> vdec clock is *disabled*, and pcrypt is hooked up to that clock. Or perhaps 
> that clock is somehow broken. It'd be interesting to see whether the old blob
> version can be made to work, but I wouldn't spend _too_ much time on it. May 
> be easy to do with an older livecd or something.

I could try and do an install of RHEL6.2 (before the ABI change) onto an
USB HDD. On this version, 285.09 still ran fine. What exactly would you
want me to check with the "blob"?

Is there any diagnostic tool that could check about my "vdec clock" or
pcrypt status?

In the meantime, I have verified that with the single line

 { "CRYPT", 0, 0x74c1, nv84_bo_move_exec, nv50_bo_move_init },

commented out, a 3.12.4 kernel works fine - no corruption seen.

> So, in order to completely disable PCRYPT without patching your system you
> can boot with "nouveau.config=PCRYPT=0" in your kernel cmdline (since 3.7, I > think). It will also disallow userspace from using PCRYPT, which is probably > for the best if it's really broken. (Whereas just commenting it out there 
> prevents a very specific use-case of it.)

Also, booting both a stock 3.12.4 kernel and even the current RHEL 6.5
2.32-431 kernel with kernel commend line option
"nouveau.config=PCRYPT=0" seems to work fine, which means that my basic
issue is already resolved, as with this option, I already seem to be
able to make sure that future stock RHEL kernels won't break my screen
all the time... ;-)

> With a pre-3.13 kernel, you can also try adding nouveau.perflvl_wr=7777 
> nouveau.perflvl=1 which will force reclocking to happen on boot (to level '1' 
> which in your case is comparable to what you had been booting to anyways), 
> and just might get PCRYPT going (if the clock theory is right). Or hang your 
> machine. (Or both!) With 3.13 you'll need to apply a patch to enable the 
> reclocking functionality. [Obviously test this theory without the PCRYPT 
> disable stuff.]

Can you please provide more details about this?

I tried to pass the below options to stock kernel version 3.13.0-rc4 (as
of Dec 16) but got the following message:

Command line: ro root=UUID=034d34cd-a464-4ee3-8db9-d6061a318a16 rd_NO_LUKS LANG=en_US.UTF-8  KEYBOARDTYPE=pc KEYTABLE=de-latin1-nodeadkeys rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM rd_NO_DM nouveau.perflvl_wr=7777 nouveau.perflvl=1
[...]
nouveau: unknown parameter 'perflvl_wr' ignored
nouveau: unknown parameter 'perflvl' ignored

and then of course got the distorted screen again.

So what exactly do I need to do in order to be able to pass these two
parameters and see whether reclocking my "vdec" clock helps to
successfully use the pcrypt feature?

Many thanks one more time,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/65

------------------------------------------------------------------------
On 2013-12-18T13:22:07+00:00 Awl1 wrote:

Sorry, typo:
Of course wanted to refer to the current RHEL 6.5 kernel "2.6.32-431" above...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/66

------------------------------------------------------------------------
On 2013-12-18T13:39:51+00:00 Awl1 wrote:

OK, tired with RHEL 6.5 kernel 2.6.32.431 and the two options:

Command line: ro root=UUID=034d34cd-a464-4ee3-8db9-d6061a318a16
rd_NO_LUKS LANG=en_US.UTF-8  KEYBOARDTYPE=pc KEYTABLE=de-
latin1-nodeadkeys rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM rd_NO_DM
nouveau.perflvl_wr=7777 nouveau.perflvl=1 crashkernel=auto

and saw

nouveau 0000:01:00.0: setting latency timer to 64
nouveau 0000:01:00.0: power state changed by ACPI to D0
nouveau 0000:01:00.0: power state changed by ACPI to D0
nouveau 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x086a00a2
nouveau  [  DEVICE][0000:01:00.0] Chipset: G86 (NV86)
nouveau  [  DEVICE][0000:01:00.0] Family : NV50
nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
nouveau  [   VBIOS][0000:01:00.0] BIT signature found
nouveau  [   VBIOS][0000:01:00.0] version 60.86.49.00.27
nouveau  [     PFB][0000:01:00.0] RAM type: DDR2
nouveau  [     PFB][0000:01:00.0] RAM size: 256 MiB
nouveau  [     PFB][0000:01:00.0]    ZCOMP: 646 tags
nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
nouveau  [  PTHERM][0000:01:00.0] fan management: disabled
nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
nouveau  [  PTHERM][0000:01:00.0] Programmed thresholds [ 90(3), 95(3), 125(5), 125(5) ]
[TTM] Zone  kernel: Available graphics memory: 2963482 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau  [     DRM] VRAM: 256 MiB
nouveau  [     DRM] GART: 512 MiB
nouveau  [     DRM] TMDS table version 2.0
nouveau  [     DRM] DCB version 4.0
nouveau  [     DRM] DCB outp 00: 010003f3 00010035
nouveau  [     DRM] DCB outp 01: 02811300 00000028
nouveau  [     DRM] DCB outp 02: 02822312 00000030
nouveau  [     DRM] DCB outp 03: 01833320 00000028
nouveau  [     DRM] DCB conn 00: 0040
nouveau  [     DRM] DCB conn 01: 0100
nouveau  [     DRM] DCB conn 02: 1255
nouveau  [     DRM] DCB conn 03: 0351
nouveau  [     DRM] BIOS FP mode: 1680x1050 (119880kHz pixel clock)
nouveau E[  PTHERM][0000:01:00.0] unhandled intr 0x000001e1
Slow work thread pool: Starting up
Slow work thread pool: Ready
nouveau W[     DRM] unknown connector type 55
nouveau W[     DRM] unknown connector type 51
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] No driver support for vblank timestamp query.
nouveau  [     DRM] ACPI backlight interface available, not registering our own
nouveau  [     DRM] 3 available performance level(s)
nouveau  [     DRM] 0: core 169MHz shader 338MHz memory 100MHz voltage 1150mV fanspeed 100%
nouveau  [     DRM] 1: core 275MHz shader 550MHz memory 200MHz voltage 1150mV fanspeed 100%
nouveau  [     DRM] 2: core 400MHz shader 800MHz memory 400MHz voltage 1200mV fanspeed 100%
nouveau  [     DRM] c: core 275MHz shader 550MHz memory 99MHz voltage 1200mV
nouveau  [     DRM] setting performance level: 1
nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x88888888 FAULT at 0x100844
nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x11111111 FAULT at 0x100764
nouveau  [     DRM] > reclocking took 8299680ns

nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at 0x10200c
nouveau  [     DRM] MM: using CRYPT for buffer copies
nouveau  [     DRM] allocated 1680x1050 fb: 0x60000, bo ffff8801b3e6d400
fbcon: nouveaufb (fb0) is primary device

So it seems we had some "reclocking" taking place, but also we still
have the errors about "MMIO write" errors, and also the screen is
distorted in the exact same way like before even after just two
minutes...

Any comments?

Thanks,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/67

------------------------------------------------------------------------
On 2013-12-18T15:48:57+00:00 AnAkIn wrote:

For the reclocking to work on 3.13 you need to apply this patch:

http://cgit.freedesktop.org/~darktama/nouveau/commit/?h=devel-
pm&id=74556533b2cc3dd787ba9fc8a346177116d1a68e

And you can change the performance level with /sys/class/drm/card0/device/pstate
(I think the command line options don't do anything anymore)

This new 3.13 reclocking might just hang your system though (it does for
mine on the two computers I tested on).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/68

------------------------------------------------------------------------
On 2013-12-18T16:28:14+00:00 Ilia Mirkin wrote:

So actually I'm told that pcrypt is on the main clock, so that theory is
out. Can you grab envytools (https://github.com/envytools/envytools) and
run

nvapeek 10200c
nvapoke 10200c 10
nvapeek 10200c

and see what's in dmesg?Do you see additional MMIO read/write failures,
or is it all good?  What does the peek return? (I'm wondering if it's an
initialization order issue or something.)

What issues are you seeing with the blob driver? I'd also still be
interested in knowing whether a previously-known-good version of the
blob still works.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/69

------------------------------------------------------------------------
On 2013-12-18T18:57:06+00:00 Awl1 wrote:

Hello again, Ilia,

> Can you grab envytools (https://github.com/envytools/envytools) and
run

bad news (or maybe expected from what we have been seeing earlier):

[aloew@aloew-lap envytools-master]$ ./nva/nvapeek 10200c
WARN: Can't probe 0000:01:00.0
PCI init failure!

[aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
WARN: Can't probe 0000:01:00.0
PCI init failure!

> and see what's in dmesg?

No additional output in dmesg - probably because of the "PCI init
failure"...

> Do you see additional MMIO read/write failures, or is it 
> all good?  What does the peek return? (I'm wondering if it's an initialization 
> order issue or something.)

As above - and additionally, during the boot process, I also see the
following messages in dmesg:

nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00fd94
nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x103d94
(...)
nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at 0x10200c

> What issues are you seeing with the blob driver?

As stated earlier: Every more recent version of NVidia's driver after
their 295.09 causes unpredictable complete hangs at some point in time -
sooner or later, but consistently (especially on GUI actions that
initiate screen changes like closing windows or using the scrollbar).
Fan runs at 100% and the only thing I can still do is a hard power-
off...

> I'd also still be interested in knowing whether a previously-known-good 
> version of the blob still works.

I am 99.9% certain it does, as my Windows install with NVidia 285.09
driver also still runs fine, while any more recent Windows driver from
NVidia hangs with the same symptoms as their Linux "blob" - I had just
checked this last week with their latest Windows version 331.82, once
again without any luck.

Will try to do a new install of old RHEL 6.1 or 6.2 onto a USB HDD
either later today or tomorrow night and report back about this.

Is there anything else that we can try to find out why the above memory
addresses seemingly cannot be accessed on my card?

Could this be a motherboard layout issue by Toshiba or some defective
chips that NVidia has sold anyway to OEM manufacturers?

Maybe indeed you could ask your new friends/contacts at NVidia about
this?

And please let me know if I shall check some other commands using the
"envytools" (nice name!)...

Many thanks one more time & best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/70

------------------------------------------------------------------------
On 2013-12-18T18:59:21+00:00 Awl1 wrote:

Oops - typo: was referring to NVidia version 285.09 above (not 295.09).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/71

------------------------------------------------------------------------
On 2013-12-18T19:01:01+00:00 Ilia Mirkin wrote:

(In reply to comment #52)
> Hello again, Ilia,
> 
> > Can you grab envytools (https://github.com/envytools/envytools) and run
> 
> bad news (or maybe expected from what we have been seeing earlier): 
> 
> [aloew@aloew-lap envytools-master]$ ./nva/nvapeek 10200c
> WARN: Can't probe 0000:01:00.0
> PCI init failure!
> 
> [aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
> WARN: Can't probe 0000:01:00.0
> PCI init failure!

You need to run these as root.

> > I'd also still be interested in knowing whether a previously-known-good 
> > version of the blob still works.
> 
> I am 99.9% certain it does, as my Windows install with NVidia 285.09 driver
> also still runs fine, while any more recent Windows driver from NVidia hangs
> with the same symptoms as their Linux "blob" - I had just checked this last
> week with their latest Windows version 331.82, once again without any luck.

Ah OK, that's probably good enough of a test.

> Maybe indeed you could ask your new friends/contacts at NVidia about
this?

I just bugged them about video decoding stuff a few weeks ago, don't
want to use up all of my brownie points :)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/72

------------------------------------------------------------------------
On 2013-12-18T19:16:45+00:00 Awl1 wrote:

> > [aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
> > WARN: Can't probe 0000:01:00.0
> > PCI init failure!

> You need to run these as root.

Ouch - sorry - could have indeed had this idea myself... :-(

Here are the results as root:

[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapoke 10200c 10
0010200c: ERR S
[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
[aloew@aloew-lap envytools-master]$ 

And no new messages in "dmesg" output at all. Still not enlightening...
:-(

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/73

------------------------------------------------------------------------
On 2013-12-18T19:27:16+00:00 Awl1 wrote:

> > I am 99.9% certain it does, as my Windows install with NVidia 285.09 driver
> > also still runs fine, while any more recent Windows driver from NVidia hangs
> > with the same symptoms as their Linux "blob" - I had just checked this last
> > week with their latest Windows version 331.82, once again without any luck.

> Ah OK, that's probably good enough of a test.

So I don't need to do this any more? That would be great, because I am
pretty certain that it won't give any new results other than the Linux
285.09 driver still works fine.

My card definitely has no new hardware defect. In case it might indeed
be defective in some sense, then it has been from the very beginning...

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/74

------------------------------------------------------------------------
On 2013-12-18T19:27:47+00:00 Ilia Mirkin wrote:

(In reply to comment #55)
> > > [aloew@aloew-lap envytools-master]$ ./nva/nvapoke 10200c 10
> > > WARN: Can't probe 0000:01:00.0
> > > PCI init failure!
> 
> > You need to run these as root.
> 
> Ouch - sorry - could have indeed had this idea myself... :-(
> 
> Here are the results as root:
> 
> [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
> 0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS

nvapeek 10200c without the 10. (Not sure what that does.... maybe reads
out 0x10 regs)

> [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapoke 10200c 10
> 0010200c: ERR S

Oh well. Some sort of error.

> [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
> 0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
> [aloew@aloew-lap envytools-master]$ 
> 
> And no new messages in "dmesg" output at all. Still not enlightening... :-(

Well, no one's heard of a "missing" PCRYPT before, but it's certainly
conceivable that certain blocks were omitted. I'd feel better with that
diagnosis if more people chimed in saying that they had the same issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/75

------------------------------------------------------------------------
On 2013-12-18T19:36:57+00:00 Awl1 wrote:

> > [aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c 10
> > 0010200c: SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS SS
> 
> nvapeek 10200c without the 10. (Not sure what that does.... maybe reads out
> 0x10 regs)

yes - seems to read 10 registers:

[aloew@aloew-lap envytools-master]$ sudo ./nva/nvapeek 10200c   
0010200c: SS

> Well, no one's heard of a "missing" PCRYPT before, but it's certainly
> conceivable that certain blocks were omitted. I'd feel better with that
> diagnosis if more people chimed in saying that they had the same issue.

>From the screenshots of the corrupted graphics, I definitely think that
this is the exact same issue.

But I fully agree that it is a pity that nobody of the folks who had
raised this previously and/or commented, do react now that it has
probably been tracked down to its root cause.

And something else seems interesting:

All other people who saw the corruption issue (except nemasu with
his/her 8800 GTS, who might have seen a different issue indeed from the
dmesg output) also were using early G86 chips, particularly 8400M-based,
and mostly the "mobile" variants...

Maybe NVidia omitted part of the 8400 functionality in the mobile
variants? This would again make up a nice (and easy) question to
them...!? ;-)

Thanks again & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/76

------------------------------------------------------------------------
On 2013-12-19T15:04:56+00:00 Ilia Mirkin wrote:

There was a bug in nvapeek/poke (it was using the wrong address space by
default), can you update your pull and try again? [That explains why you
saw 'S' in the output.]

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/77

------------------------------------------------------------------------
On 2013-12-19T15:20:02+00:00 Awl1 wrote:

> There was a bug in nvapeek/poke (it was using the wrong address space by
> default), can you update your pull and try again? [That explains why you saw
> 'S' in the output.]

Of course - here you are:

The version I used is from the "Download ZIP" button in GitHub:
https://github.com/envytools/envytools/archive/master.zip

[aloew@aloew-lap nva]$ sudo ./nvapeek 10200c
...
[aloew@aloew-lap nva]$ sudo ./nvapoke 10200c 10
[aloew@aloew-lap nva]$ sudo ./nvapeek 10200c
...

Maybe now another bug, as we don't seem to get any hex address and/or
value output?

Please advise if we need to pass any additional parameters to get hex
ouput...

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/78

------------------------------------------------------------------------
On 2013-12-19T15:26:34+00:00 Awl1 wrote:

Hmm... Looking at the code for nvapeek, I fear that nva_rd(...) still
did not return any meaningful data, as it look like we get s == 0...!?


                int s = 0;
                for (i = j = 0; i < 16 && i < b; i+=rs.regsz, j++) {
                        e[j] = nva_rd(&rs, a+i, &z[j]);
                        if (e[j] || z[j])
                                s = 1;
                }
                if (s) {
                        ls = 1;
                        printf ("%08x:", a);
                        for (i = j = 0; i < 16 && i < b; i+=rs.regsz, j++) {
                                nva_rsprint(&rs, e[j], z[j]);
                        }
                        printf ("\n");
                } else {
                        if (ls) printf ("...\n"), ls = 0;
                }

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/79

------------------------------------------------------------------------
On 2013-12-19T15:32:45+00:00 Awl1 wrote:

But generally, nvapeek seems to work fine now:

[aloew@aloew-lap nva]$ sudo ./nvapeek 0
00000000: 086a00a2

Looking forward to your comments...

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/80

------------------------------------------------------------------------
On 2013-12-19T15:40:11+00:00 Ilia Mirkin wrote:

Yeah, it prints "..." instead of 0. This makes a lot of sense when
you're peeking a large range full of 0's. Anyways, were there any
additional messages in dmesg, e.g. MMIO read/write failures as a result?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/81

------------------------------------------------------------------------
On 2013-12-19T15:49:42+00:00 Awl1 wrote:

Yes, we indeed see the same well-known:

nouveau E[    PBUS][0000:01:00.0] MMIO read of 0x00010000 FAULT at
0x10200c

at any nvpeek read attempt, and

nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at
0x10200c

at any nvpoke attempt... :-(

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/82

------------------------------------------------------------------------
On 2013-12-19T15:55:36+00:00 Awl1 wrote:

Oops - I just remembered that I am booting my kernel with
"nouveau.config=PCRYPT=0" in the meantime...

Does this make any difference, i.e. do I need to retry the
nvapeek/nvapoke sequence without this kernel option?

Sorry & thanks,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/83

------------------------------------------------------------------------
On 2014-01-08T11:43:13+00:00 Awl1 wrote:

A Happy New Year to everybody! :-)

Just wondering whether you intend to simply close this issue down with
the workaround "solution" for me to set kernel option

nouveau.config=PCRYPT=0

or whether you are still interested in finding out *why* my Quadro NVS
130M and other 8400M-based cards do not seem support this functionality
(or what might need to be done differently in the driver to ensure they
do).


Additional interesting information:

I have been informed that folks at NVIDIA have recently succeeded to
track down a Solaris hang issue in their proprietary Unix drivers
("blob") that affected exactly Quadro NVS 130M cards (AFAIK, NVIDIA IR #
1172500).

I can indeed reproduce these hangs on Solaris 11.1, so this issue
probably matches the unpredictable hangs that I have been also seeing
with the Linux blob versions > 285.05.09 that made their drivers
unusable for me.

AFAIK, their fix is scheduled to be fixed in the third update to their
R331 series in February.


So how would you like to proceed regarding this issue?

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/84

------------------------------------------------------------------------
On 2014-01-08T20:51:40+00:00 Ilia Mirkin wrote:

Could you provide the output of

nvapeek 154c
nvapeek 1540

Those registers specify which engines are there. I think we're ignoring
them in nouveau...

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/85

------------------------------------------------------------------------
On 2014-01-08T22:44:18+00:00 Ilia Mirkin wrote:

Created attachment 91714
patch to honor disabled engines

Give this a shot (without forcing PCRYPT=0). You should hopefully see a
message saying that it and a few other engines are disabled. This needs
some more testing on a wider variety of cards before I'll send it
upstream, but it may be what you need.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/86

------------------------------------------------------------------------
On 2014-01-09T11:23:09+00:00 Awl1 wrote:

> Could you provide the output of
> 
> nvapeek 154c
> nvapeek 1540
> 
> Those registers specify which engines are there. I think we're ignoring them
> in nouveau...

OK - that might indeed explain the issues seen...
Here you are:

[aloew@aloew-lap nva]$ sudo ./nvapeek 154c
0000154c: 0000009c

[aloew@aloew-lap nva]$ sudo ./nvapeek 1540
00001540: b1010001

Looking at the patch you provided, if the rusty binary arithmetics chip
in my brain is still valid, this means for my case:

vdec = nv_rd32(device, 0x1540) & 0x40000000;

0xb(...) = 1011(...)
0x4(...) = 0100(...)

=> for me, vdec indeed is 0x00000000, i.e. false

and as my chipset is 0x86, furthermore:

MPEG -> disabled
VP -> disabled

and for the dynamic features,

0x9c = 10011100 binary

0x20 = 00100000 binary
0x40 = 01000000 binary

as 0x9c & 0x20 == 0x00, BSP -> disabled
as 0x9c & 0x40 == 0x00, PCRYPT -> disabled

which would probably confirm your that for me your patch is correct.

That said, I will apply this patch to my current stock RHEL6 kernel and
report back later today on whether this works fine for me (which it
indeed should, based on the above considerations!).

Thanks a million - great work! :-)

Best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/87

------------------------------------------------------------------------
On 2014-01-09T16:15:35+00:00 Ilia Mirkin wrote:

Created attachment 91765
patch to honor hw disables after vbios

Unfortunately the first patch runs before VBIOS, so if the manufacturer
explicitly disables an engine for some reason (by writing a 0 to those
bits) we should probably honor that. This patch does that (actually 2
patches munged into 1). I've tested it on my NV98 and it correctly
doesn't disable anything, but would be nice to test it on a card that
_does_ disable stuff.

[note, this patch replaces the first patch, not in addition to it]

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/88

------------------------------------------------------------------------
On 2014-01-09T16:35:26+00:00 Awl1 wrote:

Hello Ilia,

hmm - you just caught me with the update five minutes after I had
started the rpmbuild with the previous version... ;-)

Unfortunately, while I could make the first patch apply to a current
RHEL kernel source with only one change (core/engine/device.c ->
core/subdev/device.c), the new patch will need much more rework to make
it compile against a RHEL kernel.

I am therefore looking into getting a 3.12 kernel from the Oracle Linux
"playground":

http://public-
yum.oracle.com/repo/OracleLinux/OL6/playground/latest/x86_64/

Would 3.12.6 be an appropriate version to apply your updated patch to
successfully?

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/89

------------------------------------------------------------------------
On 2014-01-09T16:43:33+00:00 Ilia Mirkin wrote:

(In reply to comment #71)
> Would 3.12.6 be an appropriate version to apply your updated patch to
> successfully?

I'm working against, effectively, 3.13-rc8. I'd think it would apply to
3.12, and just about any other semi-recent kernel, but I guess RHEL does
something special? Not sure. That subdev -> engine move happened in
dded35dee3 which went into 3.10, so I guess you're using something old.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/90

------------------------------------------------------------------------
On 2014-01-09T16:50:01+00:00 Awl1 wrote:

Yes, definitely, a RHEL6 stock kernel is *very* old (2.6.32.*) - but due
to a kernel drm/nouveau module update from 3.x source that they recently
did for RHEL 6.5, it also suddenly became new enough to make me see this
issue... ;-)

Have just successfully applied the updated patch to 3.12.6, so my
rpmbuild is running! :-)

You can expect my results in about two hours or so (will have dinner
inbetween).

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/91

------------------------------------------------------------------------
On 2014-01-09T17:07:40+00:00 Awl1 wrote:

Just received

drivers/gpu/drm/nouveau/core/subdev/devinit/nv50.c:164: error:
'NVDEV_ENGINE_VIC' undeclared (first use in this function)

but "fixed" it for me by commenting out the lines for a 0xaf card (I
have a 0x86 type anyway, so this code does not apply to me):

+	case 0xaf:
+		/* if (!(r154c & 0x40)) */
+		/*	device->disable_mask |= 1ULL << NVDEV_ENGINE_VIC; */
+		/* fallthrough */

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/92

------------------------------------------------------------------------
On 2014-01-09T21:56:49+00:00 Awl1 wrote:

Created attachment 91786
Complete dmesg output booting 3.12.6 with "hwunits.patch" applied (nouveau.debug=debug)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/93

------------------------------------------------------------------------
On 2014-01-09T21:57:29+00:00 Awl1 wrote:

Created attachment 91787
nouveau-related dmesg output booting 3.12.6 with "hwunits.patch" applied (nouveau.debug=debug)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/94

------------------------------------------------------------------------
On 2014-01-09T22:03:57+00:00 Awl1 wrote:

Sorry that it took me longer to get back here - I needed an additional
rpmbuild run due to running out of disk space for my first attempt...

But I can give an all clear signal - at least for my machine, AFAIK,
everything seems to be fine:

Kernel command line: ro root=UUID=034d34cd-a464-4ee3-8db9-d6061a318a16
rd_NO_LUKS LANG=en_US.UTF-8  KEYBOARDTYPE=pc KEYTABLE=de-
latin1-nodeadkeys rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM rd_NO_DM
nouveau.debug=debug rhgb quiet

nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x086a00a2
nouveau  [  DEVICE][0000:01:00.0] Chipset: G86 (NV86)
nouveau  [  DEVICE][0000:01:00.0] Family : NV50
nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
nouveau  [   VBIOS][0000:01:00.0] BIT signature found
nouveau  [   VBIOS][0000:01:00.0] version 60.86.49.00.27
(...)
nouveau  [   PMPEG][0000:01:00.0] hardware is marked as disabled
nouveau  [     PVP][0000:01:00.0] hardware is marked as disabled
nouveau  [  PCRYPT][0000:01:00.0] hardware is marked as disabled
nouveau  [    PBSP][0000:01:00.0] hardware is marked as disabled

and also, everything is fine afterwards (as PCRYPT seems to indeed have
been properly disabled). :-)

What do you say? Do you agree that everything turned out as expected
from my nvpeek results?

Thanks & BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/95

------------------------------------------------------------------------
On 2014-01-09T22:42:16+00:00 Ilia Mirkin wrote:

Great news! I'll update the bug when this makes it upstream (or if we
have further questions about your hardware). FWIW I've been going around
asking people to report registers 1540/154c to me, and so far everyone
except you and one other person having trouble with nouveau has had them
listed as everything enabled.

Thanks for tracking down the commit that caused the issue, that was
instrumental!

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/96

------------------------------------------------------------------------
On 2014-01-09T23:06:05+00:00 Awl1 wrote:

You're welcome! :-)

I did do this in my very own interest, because the OL6/RHEL6 install on
my main work laptop all of a sudden had this distortion issue when RHEL
updated the drm/nouveau module to an affected codebase in RHEL 6.5, so I
definitely needed a solution for this (other than get a new laptop)...

One final request from my side, as I don't have commercial RHEL6 support
(I am using the free OL6 clone):

Hoping that you have pretty good contact/access to Ben Skeggs (who I
think officially owns the nouveau modules at Red Hat), can you please
approach him and ask him to please take care of the fact that Red Hat
also applies a (backported) version of this patch to their mainline
stock RHEL 6.5 kernels?

That would be great, as this is definitely needed to ensure that all
those people with the affected older/low-end NVIDIA notebook chips -
such as myself (and all the other now unfortunately silent people who
initially created this issue) - will no longer be affected by this issue
in the current RHEL 6 kernels (or don't need the explicit workaround
using the kernel parameter PCRYPT=0)?

Thanks a million for your kind help & best regards from Germany,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/97

------------------------------------------------------------------------
On 2014-01-15T23:21:26+00:00 Tomshere wrote:

Hello,

I have the same NVIDIA GeForce NVS 130M with the disabled functions.
I checked with nvapeek:
0000154c: 0000009c
00001540: b1010001

uname -a delivers
Linux mobuntu 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I do not have any issues with distorted graphics during normal usage but
my problem is that resume from suspend mode makes X hang.

I also have these errors in dmesg

[   18.985158] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x00fd94
[   18.986213] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000 FAULT at 0x103d94
[   19.026027] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010 FAULT at 0x10200c

but also

[   18.984164] nouveau E[  PTHERM][0000:01:00.0] unhandled intr
0x00000161

When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate the errors and X still hangs when resuming.
I was not sure if I have to set the parameter in quotes.
As you can see I'm not a linux specialist ;)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/98

------------------------------------------------------------------------
On 2014-01-15T23:31:10+00:00 Ilia Mirkin wrote:

(In reply to comment #80)
> Hello,
> 
> I have the same NVIDIA GeForce NVS 130M with the disabled functions.
> I checked with nvapeek:
> 0000154c: 0000009c
> 00001540: b1010001
> 
> uname -a delivers
> Linux mobuntu 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013
> x86_64 x86_64 x86_64 GNU/Linux
> 
> I do not have any issues with distorted graphics during normal usage but my
> problem is that resume from suspend mode makes X hang.
> 
> I also have these errors in dmesg 
> 
> [   18.985158] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x00fd94
> [   18.986213] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x103d94
> [   19.026027] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010
> FAULT at 0x10200c

These errors should go away with the patch.

> 
> but also
> 
> [   18.984164] nouveau E[  PTHERM][0000:01:00.0] unhandled intr 0x00000161

I believe this is unrelated.

> 
> When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate
> the errors and X still hangs when resuming.

It should eliminate the 10200c error. The others are from PVP and PBSP,
you could do like nouveau.config=PCRYPT=0,PVP=0,PBSP=0,PMPEG=0 -- that
should have the same effect as my patch for your hardware. (I think.)

> I was not sure if I have to set the parameter in quotes.

Not necessary, but I *think* it'll work with quotes as well. Not sure.

> As you can see I'm not a linux specialist ;)

OK, then you have some different issue. I would recommend filing a fresh
issue with all the relevant info.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/99

------------------------------------------------------------------------
On 2014-01-15T23:57:27+00:00 Awl1 wrote:

Hi Thomas,

> I have the same NVIDIA GeForce NVS 130M with the disabled functions.
> I checked with nvapeek:
> 0000154c: 0000009c
> 00001540: b1010001

great - finally somebody who confirms this issue.

> uname -a delivers
> Linux mobuntu 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013
> x86_64 x86_64 x86_64 GNU/Linux

> I do not have any issues with distorted graphics during normal usage but my
> problem is that resume from suspend mode makes X hang.

> I also have these errors in dmesg 
> 
> [   18.985158] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x00fd94
> [   18.986213] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> FAULT at 0x103d94
> [   19.026027] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010
> FAULT at 0x10200c

Hmm - your kernel and your nvapeek results clearly suggest you should be
affected...

Have you enabled compiz (i.e. OpenGL-based 3D acceleration features)? I
assume that so far, you haven't (it does not seem to be active in Ubuntu
by default), which most likely is the only reason why you are not seeing
the distortion issue (so far).

See e.g.

http://www.howtoforge.com/install-compiz-on-the-unity-desktop-on-
ubuntu-12.04-precise-pangolin

(depending on your particular Ubuntu version) on how to enable compiz. I
am almost certain that once you have done so, you will also run see the
distorted graphics, but you now already know the fix... ;-)

> [   18.984164] nouveau E[  PTHERM][0000:01:00.0] unhandled intr
0x00000161

This last "PTHERM" error seems to be a different, unrelated issue.

> When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate
> the errors and X still hangs when resuming.

Hmm - interesting, as I clearly don't have any issues with
suspend/resume. Which laptop do you have? Did you already update your
BIOS to the latest available version?

> I was not sure if I have to set the parameter in quotes.

No, you don't (and AFAIK, you even must not). Ilia has already proposed
the correct workaround for the distortion issue (until your distro of
choice has integrated the new fix) - add this:

nouveau.config=PCRYPT=0,PVP=0,PBSP=0,PMPEG=0

to your grub kernel parameters. Having done so, all "MMIO write" errors
in dmesg must be gone (they are for me!), otherwise something else is
still wrong for you in addition.

Hope this helps & best regards,
Andreas


BTW @ Ilia:
Did you already have a chance to contact Ben Skeggs about applying the fix to mainline RHEL 6.5 (and above) kernels?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/100

------------------------------------------------------------------------
On 2014-01-16T00:01:59+00:00 Ilia Mirkin wrote:

(In reply to comment #82)
> BTW @ Ilia:
> Did you already have a chance to contact Ben Skeggs about applying the fix
> to mainline RHEL 6.5 (and above) kernels?

That seems a little premature given that it's not even in the mainline
kernel. However I would recommend that once it is, you file a redhat
issue to make sure it gets backported to the whatever. I have no
knowledge of, and do not care about RHEL or any non-mainline kernel. If
you do, work with whatever processes they have. I bug Ben about enough
stuff already :)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/101

------------------------------------------------------------------------
On 2014-01-16T00:18:18+00:00 Awl1 wrote:

Hi Ilia,

> That seems a little premature given that it's not even in the mainline
> kernel. However I would recommend that once it is, you file a redhat issue
> to make sure it gets backported to the whatever. I have no knowledge of, and
> do not care about RHEL or any non-mainline kernel. If you do, work with
> whatever processes they have. I bug Ben about enough stuff already :)

ouch - that's a pity... :-(

As stated earlier, as I am using the free (only as in beer...) Oracle
Linux version rather than a commercial pais RHEL license, I cannot file
any issues with them, so I was hoping about you being able to raise this
with him within the nouveau team. It clearly deserves a fix, but I won't
be able to drive anything myself here due to the lack of a paid
license... :-(

Oh, and one more thing by the way:

Interestingly, I can also confirm that for me, the proprietary NVidia
"blob" Unix driver version 331.38 (which has just been released this
week):

https://devtalk.nvidia.com/default/topic/672875

indeed has also fixed the long-standing hang issue with their drivers
for my Quadro NVS 130M on both Linux and Solaris. Even better news is
that the fix will also be integrated into the next R331 release for
Windows (it is not yet in Windows versions 331.93 or 332.21)!

But while it took NVidia a little less than two years between
introducing their regression bug (all releases since 285.x are
affected), you/the nouveau team have tracked down and fixed this issue
in just a couple of days... :-)

So thanks again for your great work on nouveau! :-)

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/102

------------------------------------------------------------------------
On 2014-01-16T18:34:01+00:00 Tomshere wrote:

(In reply to comment #82)
> > I also have these errors in dmesg 
> > 
> > [   18.985158] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> > FAULT at 0x00fd94
> > [   18.986213] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000000
> > FAULT at 0x103d94
> > [   19.026027] nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x00000010
> > FAULT at 0x10200c
> 
> Hmm - your kernel and your nvapeek results clearly suggest you should be
> affected...
> 
> Have you enabled compiz(...)

No I don't think so.

> > [   18.984164] nouveau E[  PTHERM][0000:01:00.0] unhandled intr 0x00000161
> 
> This last "PTHERM" error seems to be a different, unrelated issue.
> 
> > When I use the kernel option nouveau.config=PCRYPT=0 it doesn't eliminate
> > the errors and X still hangs when resuming.
> 
> Hmm - interesting, as I clearly don't have any issues with suspend/resume.
> Which laptop do you have? Did you already update your BIOS to the latest
> available version?
> 
> > I was not sure if I have to set the parameter in quotes.
> 
> No, you don't (and AFAIK, you even must not). Ilia has already proposed the
> correct workaround for the distortion issue (until your distro of choice has
> integrated the new fix) - add this:
> 
> nouveau.config=PCRYPT=0,PVP=0,PBSP=0,PMPEG=0
> 
> to your grub kernel parameters. Having done so, all "MMIO write" errors in
> dmesg must be gone (they are for me!), otherwise something else is still
> wrong for you in addition.

Ok, now after adding the whole bunch to the kernel opts the three PBUS
errors are gone. For the resume failure I open a new issue.

How can I push the integration of such a fix into another distro?

Many thanks!
-Thomas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/103

------------------------------------------------------------------------
On 2014-02-09T16:06:33+00:00 Awl1 wrote:

Hello Ilia,

has there been any progress so far in getting this into the mainstream
Linux kernel (or mainstream git) for the next official kernel release?

I'd like to make an attempt to get this patch (or rather, a backport of
it) into official RHEL 6.x kernels, but I'd like to point to an official
kernel patch in order to do so.

Many thanks & best regards,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/104

------------------------------------------------------------------------
On 2014-02-09T21:03:38+00:00 Ilia Mirkin wrote:

(In reply to comment #86)
> Hello Ilia,
> 
> has there been any progress so far in getting this into the mainstream Linux
> kernel (or mainstream git) for the next official kernel release?

This should be upstream as of commit
4019aaa2b314a5be9886ae1db64ff8c6d3c060ed, available in 3.14-rc1.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/105

------------------------------------------------------------------------
On 2014-02-10T18:39:55+00:00 Awl1 wrote:

(In reply to comment #87)

> > has there been any progress so far in getting this into the mainstream Linux
> > kernel (or mainstream git) for the next official kernel release?

> This should be upstream as of commit
> 4019aaa2b314a5be9886ae1db64ff8c6d3c060ed, available in 3.14-rc1.

Many thanks, Ilia! :-)

BR,
Andreas

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1158689/comments/106


** Changed in: linux
       Status: Unknown => Fix Released

** Changed in: linux
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1158689

Title:
  10de:0422 bringing up dash causes screen corruption on nouveau

Status in The Linux Kernel:
  Fix Released
Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  Hitting the "Windows" key to bring up the dash corrupts the screen
  from where everything goes downhill. See attached screenshot for
  example after a few window launches, dash enable/disable, etc.

  WORKAROUND: Logging in with gnome-shell doesn't show these artifacts.

  ProblemType: Bug
  DistroRelease: Ubuntu 13.04
  Package: linux-image-3.8.0-13-generic 3.8.0-13.23
  ProcVersionSignature: Ubuntu 3.8.0-13.23-generic 3.8.3
  Uname: Linux 3.8.0-13-generic x86_64
  ApportVersion: 2.9.2-0ubuntu2
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  amit       2192 F.... pulseaudio
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
  CurrentDmesg:
   [   28.341465] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
   [   32.135937] init: plymouth-stop pre-start process (1666) terminated with status 1
  Date: Fri Mar 22 15:06:56 2013
  HibernationDevice: RESUME=UUID=fb5ae586-aed6-4e8b-8884-21fef7bf242d
  InstallationDate: Installed on 2013-02-01 (48 days ago)
  InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Alpha amd64+mac (20130130)
  IwConfig:
   eth0      no wireless extensions.

   lo        no wireless extensions.
  MachineType: Intel To be filled by O.E.M.
  MarkForUpload: True
  ProcFB: 0 nouveaufb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-13-generic root=UUID=0084a9a5-74cd-47ba-afe6-a47d02d0b262 ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-3.8.0-13-generic N/A
   linux-backports-modules-3.8.0-13-generic  N/A
   linux-firmware                            1.104
  RfKill:

  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 10/29/2009
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 4.6.3
  dmi.board.asset.tag: To be filled by O.E.M.
  dmi.board.name: To be filled by O.E.M.
  dmi.board.vendor: Intel
  dmi.board.version: To be filled by O.E.M.
  dmi.chassis.asset.tag: To Be Filled By O.E.M.
  dmi.chassis.type: 3
  dmi.chassis.vendor: To Be Filled By O.E.M.
  dmi.chassis.version: To Be Filled By O.E.M.
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4.6.3:bd10/29/2009:svnIntel:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnIntel:rnTobefilledbyO.E.M.:rvrTobefilledbyO.E.M.:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
  dmi.product.name: To be filled by O.E.M.
  dmi.product.version: To be filled by O.E.M.
  dmi.sys.vendor: Intel

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1158689/+subscriptions