kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #69826
[Bug 1009312] Re: 10de:0426 GPU loads unreliably, possible kernel timeout
It's been a while, but I've found the time to dig much deeper into this
and familiarize myself with the kernel code some. Actually, I feel
comfortable with the idea of directly contacting the appropriate mailing
list now so this is more to keep the record up-to-date than a request
for more triage.
Anyways, after just walking through the kernel code, I first realized
that the first sign of the bug (the 30ms gap) was occurring somewhere
within the function pci_scan_child_bus (in drivers/pci/probe.c), between
when it invokes the function pci_scan_slot (also in drivers/pci/probe.c)
and the function pcibios_fixup_bus (in my case, under
arch/x86/pci/common.c)
>From there, I began adding dev_info statements around function calls
that would be executed in between, then looked between whichever 2
messages the gap occurred between to further narrow down the problem.
After a few rounds of this, I found the delay consistently appearing
within the function pcie_aspm_configure_common_clock (in
drivers/pci/pcie/aspm.c) After a little research about what the PCIe
common clock is about, it actually explains several aspects of this bug.
Booting the computer from battery power would influence the power state
of the device, which is what ASPM is all about. And it turns out the
discrepancy of 24ms between a good boot and a bad boot is precisely the
length of time the PCIe standard defines as a timeout for link training.
Unfortunately, I don't know how, or even if, the two commits I found
earlier directly tie into this. It seems there's a really weird race
condition or resource fight going on. I'm not exactly sure how to fix
the problem clearly either because just adding the overhead of dev_info
statements to the function makes the bug go away (so I can technically
"fix" the bug, but that's just a total hack). The one other little cliue
I found was that the delay went away completely when I put dev_info
statements in every possible branch of the function's logic. When I only
added dev_info to the ifs corresponding to a problem though, a slight
delay appeared (bumping the total time in the function to around 10ms),
but still not enough for link training to timeout (so my GPU always
loaded).
I plan on mailing the list for the PCI subsystem of the kernel soon, but
I'm stumped about how exactly to proceed so if you have any debugging
suggestions, I'd be happy to hear them. Thanks again.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1009312
Title:
10de:0426 GPU loads unreliably, possible kernel timeout
Status in “linux” package in Ubuntu:
Triaged
Bug description:
Reverse upstream kernel commit bisecting revealed a fix via commit
d34883d4e35c0a994e91dd847a82b4c9e0c31d83 by Xiao Guangrong.
WORKAROUND: If I boot my computer from battery power alone without AC,
my GPU & the Ubuntu splash screen load on startup.
I've been running Ubuntu 12.04 for a few weeks now, I really like it,
but from the beginning, I had the issue where the proprietary nvidia
driver installs but fails to load (confirmed from the commandline,
jockey, and the nvidia-dashboard). Over time, I've noticed that
sometimes when I power on, the driver does load and I can enter a full
unity session without problems, but other times, I fall back onto the
VESA driver and a unity 2d session. On a whim, I finally copied logs
from both successful and unsuccessful boots, cut out the times, ran a
diff on them, and noticed a pattern in the kernel messages.
I'm filing this bug after a successful boot so I've also attached
copies of dmesg, Xorg, & jockey logs from an unsuccessful boot. The
first thing I saw in the logs was a timing discrepancy between the two
boots, most of which is due to GPE storms. I've checked other logs and
there's not a clear relation, I've had successful boots with them and
unsuccessful ones without them. I do still wonder if they may be
involved because it seems I'm a little luckier if I turn off and
unplug any peripherals before booting.
But around line 325 in my dmesg logs, at the last step that mentions
my GPU (pci device 0000:01:00.0), there is consistently at most a 6 ms
delay for successful boots, but a 30 ms one for unsuccessful ones.
Also, on all dmesg logs from successful boots, around line 610, the
message "Boot video device" is recorded for the PCI number of my GPU,
but for every fallback, the message never appears. That's why I'm
thinking it's a kernel issue because the earliest mention of a
specific driver module doesn't occur until later in the log.
I'm currently using fully updated versions of nvidia driver 295.49.
ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-24-generic-pae 3.2.0-24.39
ProcVersionSignature: Ubuntu 3.2.0-24.39-generic-pae 3.2.16
Uname: Linux 3.2.0-24-generic-pae i686
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu8
Architecture: i386
ArecordDevices:
**** List of CAPTURE Hardware Devices ****
card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: kyle 1790 F.... pulseaudio
Card0.Amixer.info:
Card hw:0 'Intel'/'HDA Intel at 0xfc400000 irq 48'
Mixer name : 'SigmaTel STAC9872AK'
Components : 'HDA:83847662,104d1c00,00100201 HDA:14f12c06,104d1700,00100000'
Controls : 18
Simple ctrls : 9
Date: Tue Jun 5 22:44:22 2012
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=1b676222-44c7-453c-a522-06b6fd5d66f4
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release i386 (20120423)
MachineType: Sony Corporation VGN-FZ260E
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcEnviron:
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-24-generic-pae root=UUID=e330e46a-b426-439f-8037-c1069cc693ce ro quiet splash vt.handoff=7
RelatedPackageVersions:
linux-restricted-modules-3.2.0-24-generic-pae N/A
linux-backports-modules-3.2.0-24-generic-pae N/A
linux-firmware 1.79
RfKill:
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/04/2007
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: R1120J7
dmi.board.asset.tag: N/A
dmi.board.name: VAIO
dmi.board.vendor: Sony Corporation
dmi.board.version: N/A
dmi.chassis.asset.tag: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Sony Corporation
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrR1120J7:bd07/04/2007:svnSonyCorporation:pnVGN-FZ260E:pvrFC000001:rvnSonyCorporation:rnVAIO:rvrN/A:cvnSonyCorporation:ct10:cvrN/A:
dmi.product.name: VGN-FZ260E
dmi.product.version: FC000001
dmi.sys.vendor: Sony Corporation
---
AcpiTables: Error: command ['pkexec', '/usr/share/apport/dump_acpi_tables.py'] failed with exit code 127: Error executing /usr/share/apport/dump_acpi_tables.py: Permission denied
ApportVersion: 2.5.1-0ubuntu4
Architecture: i386
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: ubuntu 3344 F.... pulseaudio
CasperVersion: 1.321
DistroRelease: Ubuntu 12.10
LiveMediaBuild: Ubuntu 12.10 "Quantal Quetzal" - Alpha i386 (20120831)
MachineType: Sony Corporation VGN-FZ260E
Package: linux (not installed)
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: noprompt cdrom-detect/try-usb=true file=/cdrom/preseed/username.seed boot=casper initrd=/casper/initrd.lz quiet splash -- maybe-ubiquity
ProcVersionSignature: Ubuntu 3.5.0-13.14-generic 3.5.3
RelatedPackageVersions:
linux-restricted-modules-3.5.0-13-generic N/A
linux-backports-modules-3.5.0-13-generic N/A
linux-firmware 1.91
RfKill:
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: yes
Tags: quantal running-unity
Uname: Linux 3.5.0-13-generic i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
dmi.bios.date: 07/04/2007
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: R1120J7
dmi.board.asset.tag: N/A
dmi.board.name: VAIO
dmi.board.vendor: Sony Corporation
dmi.board.version: N/A
dmi.chassis.asset.tag: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Sony Corporation
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrR1120J7:bd07/04/2007:svnSonyCorporation:pnVGN-FZ260E:pvrFC000001:rvnSonyCorporation:rnVAIO:rvrN/A:cvnSonyCorporation:ct10:cvrN/A:
dmi.product.name: VGN-FZ260E
dmi.product.version: FC000001
dmi.sys.vendor: Sony Corporation
---
ApportVersion: 2.10.2-0ubuntu1
Architecture: i386
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: ubuntu 4176 F.... pulseaudio
ubuntu 6045 F.... pulseaudio
CasperVersion: 1.333
DistroRelease: Ubuntu 13.10
LiveMediaBuild: Ubuntu 13.10 "Saucy Salamander" - Alpha i386 (20130529)
MachineType: Sony Corporation VGN-FZ260E
MarkForUpload: True
Package: linux (not installed)
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcEnviron:
LANGUAGE=en_US
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: noprompt cdrom-detect/try-usb=true persistent file=/cdrom/preseed/hostname.seed boot=casper initrd=/casper/initrd.lz quiet splash -- maybe-ubiquity
ProcVersionSignature: Ubuntu 3.9.0-3.8-generic 3.9.4
PulseList:
Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
linux-restricted-modules-3.9.0-3-generic N/A
linux-backports-modules-3.9.0-3-generic N/A
linux-firmware 1.109
RfKill:
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
Tags: saucy
Uname: Linux 3.9.0-3-generic i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
dmi.bios.date: 07/04/2007
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: R1120J7
dmi.board.asset.tag: N/A
dmi.board.name: VAIO
dmi.board.vendor: Sony Corporation
dmi.board.version: N/A
dmi.chassis.asset.tag: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Sony Corporation
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrR1120J7:bd07/04/2007:svnSonyCorporation:pnVGN-FZ260E:pvrFC000001:rvnSonyCorporation:rnVAIO:rvrN/A:cvnSonyCorporation:ct10:cvrN/A:
dmi.product.name: VGN-FZ260E
dmi.product.version: FC000001
dmi.sys.vendor: Sony Corporation
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1009312/+subscriptions