← Back to team overview

kernel-packages team mailing list archive

[Bug 1483189] [NEW] Machine crashes when we unload the Nvidia dirver module with Ubuntu 15.10

 

You have been subscribed to a public bug:

Problem Description
==============================
Machine crashes when we unload the Nvidia dirver module
 
---Additional Hardware Info---
root@fr111p1:~# lspci | grep -i NVIDIA
0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0000:04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0002:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
0002:04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
  
Machine Type = P8 
 
Steps to Reproduce
===============================
Install a P8 Power NV 8335-GTA Hardware with Ubuntu 15.10 Netboot images.
Then followed the below steps to install the latest kernel.

1) cat <<EOF >> /etc/apt/sources.list
deb http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu wily main
deb-src http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu wily main
EOF 

2) apt-get update
3) apt-cache search linux-image-4 
linux-image-4.0.0-3-generic - Linux kernel image for version 4.0.0 on PowerPC 64el SMP
linux-image-4.0.0-4-generic - Linux kernel image for version 4.0.0 on PowerPC 64el SMP

4) Choose the latest kernel version for installation
apt-get install linux-image-4.0.0-4-generic

Then rebooted the machine to the latest kernel and installed the CUDA
packages.

root@fr111p1:~# dpkg -i cuda-repo-ubuntu1410_7.0-28_ppc64el.deb
root@fr111p1:~# apt-get update
root@fr111p1:~# apt-get install cuda

Then tried to unload the kernel module manually.

root@fr111p1:~# lsmod | grep nvidia
nvidia_uvm             88636  0
nvidia              11342553  1 nvidia_uvm
drm                   431025  5 ast,ttm,drm_kms_helper,nvidia
root@fr111p1:~# rmmod nvidia_uvm
root@fr111p1:~# lsmod | grep nvidia
nvidia              11342553  0
drm                   431025  5 ast,ttm,drm_kms_helper,nvidia
root@fr111p1:~# rmmod nvidia
root@fr111p1:~# nvidia-smi
 
---uname output---
Linux fr111p1 4.0.0-4-generic #6-Ubuntu SMP Tue Jun 30 20:50:37 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

Stack trace output:
 [14134.139423] Call Trace:
[14134.139448] [c000001e3e1c7890] [c000000000287ec0] kmem_cache_alloc_trace+0x300/0x330 (unreliable)
[14134.139526] [c000001e3e1c7900] [c00000000011206c] down+0x7c/0xa0
[14134.139687] [c000001e3e1c7940] [d000000012a94970] nvidia_open+0x370/0x730 [nvidia]
[14134.139837] [c000001e3e1c79e0] [d000000012aa33ac] nvidia_frontend_open+0x8c/0x100 [nvidia]
[14134.139901] [c000001e3e1c7a70] [c0000000002bc3f4] chrdev_open+0x114/0x260
[14134.139954] [c000001e3e1c7ad0] [c0000000002b18b0] do_dentry_open+0x2d0/0x480
[14134.140007] [c000001e3e1c7b30] [c0000000002c76a0] do_last+0x190/0x1010
[14134.140073] [c000001e3e1c7c00] [c0000000002caffc] path_openat+0xdc/0x810
[14134.140126] [c000001e3e1c7cd0] [c0000000002ccb98] do_filp_open+0x58/0xf0
[14134.140183] [c000001e3e1c7db0] [c0000000002b3698] do_sys_open+0x1c8/0x390
[14134.140271] [c000001e3e1c7e30] [c000000000009258] system_call+0x38/0xd0

 
Oops output:
 [14134.137140] Unable to handle kernel paging request for data at address 0x00000000
[14134.137162] NVRM: loading NVIDIA UNIX ppc64le Kernel Module  346.46  Tue Feb 17 17:18:33 PST 2015
[14134.137361] Faulting instruction address: 0xc000000000a42154
[14134.137411] Oops: Kernel access of bad area, sig: 11 [#1]
[14134.137640] SMP NR_CPUS=2048 NUMA PowerNV
[14134.137684] Modules linked in: nvidia(POE) dm_round_robin dm_multipath scsi_dh cxgb3 cxgb4 ib_ipoib ib_ucm ib_uverbs ib_cm ib_umad mlx4_ib ib_sa ib_mad ib_core ib_addr joydev mac_hid hid_generic ipmi_powernv ipmi_msghandler ast powernv_rng ttm at24 drm_kms_helper usbhid uio_pdrv_genirq syscopyarea uio sysfillrect hid sysimgblt i2c_algo_bit drm autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel uas usb_storage bnx2x ahci mlx4_core libahci mdio libcrc32c [last unloaded: nvidia]
[14134.138214] CPU: 62 PID: 54865 Comm: nvidia-persiste Tainted: P           OE   4.0.0-4-generic #6-Ubuntu
[14134.138279] task: c000001e3e122200 ti: c000001e3e1c4000 task.ti: c000001e3e1c4000
[14134.138335] NIP: c000000000a42154 LR: c00000000011206c CTR: c000000000111ff0
[14134.138389] REGS: c000001e3e1c7610 TRAP: 0300   Tainted: P           OE    (4.0.0-4-generic)
[14134.138453] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002482  XER: 20000000
[14134.138594] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0
               GPR00: c00000000011206c c000001e3e1c7890 c000000001489300 d000000012d55ac8
               GPR04: 0000000000000001 d000000012d55b10 c000001e0c442a70 00000000000000ff
               GPR08: d000000012d55ad0 0000000000000000 c000001e3e1c78b0 d000000012aa6c48
               GPR12: 0000000000002200 c000000007b82e00 0000000000000000 0000000000000000
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
               GPR20: 0000000000000000 c000001e3e1c7dd0 0000000000000026 c000001e3d11a000
               GPR24: c000001e0c44c600 c000001ffb040000 c0000000002bc2e0 c000001e0c44c600
               GPR28: d000000012d55658 c000001e3e122200 d000000012d55ac8 d000000012d55ac8
[14134.139349] NIP [c000000000a42154] __down+0x54/0x120
[14134.139388] LR [c00000000011206c] down+0x7c/0xa0
[14134.139423] Call Trace:
[14134.139448] [c000001e3e1c7890] [c000000000287ec0] kmem_cache_alloc_trace+0x300/0x330 (unreliable)
[14134.139526] [c000001e3e1c7900] [c00000000011206c] down+0x7c/0xa0
[14134.139687] [c000001e3e1c7940] [d000000012a94970] nvidia_open+0x370/0x730 [nvidia]
[14134.139837] [c000001e3e1c79e0] [d000000012aa33ac] nvidia_frontend_open+0x8c/0x100 [nvidia]
[14134.139901] [c000001e3e1c7a70] [c0000000002bc3f4] chrdev_open+0x114/0x260
[14134.139954] [c000001e3e1c7ad0] [c0000000002b18b0] do_dentry_open+0x2d0/0x480
[14134.140007] [c000001e3e1c7b30] [c0000000002c76a0] do_last+0x190/0x1010
[14134.140073] [c000001e3e1c7c00] [c0000000002caffc] path_openat+0xdc/0x810
[14134.140126] [c000001e3e1c7cd0] [c0000000002ccb98] do_filp_open+0x58/0xf0
[14134.140183] [c000001e3e1c7db0] [c0000000002b3698] do_sys_open+0x1c8/0x390
[14134.140271] [c000001e3e1c7e30] [c000000000009258] system_call+0x38/0xd0
[14134.140387] Instruction dump:
[14134.140430] f8010010 f821ff91 7c7e1b78 60000000 60000000 ebad0290 e93e0010 39410020
[14134.140582] 391e0008 f95e0010 f9010020 f9210028 <f9490000> 39200000 fba10030 99210038
[14134.171233] ---[ end trace fd85f29c7dc22bee ]---
[14134.171293]
[14134.171331] Sending IPI to other CPUs
[14134.172464] IPI complete


== Comment: #5 - Vaishnavi Bhat <vaish123@xxxxxxxxxx> - 2015-07-07 04:18:03 ==
# dpkg -l|grep nvidia
ii  nvidia-346                              346.46-0ubuntu1                            ppc64el      NVIDIA binary driver - version 346.46
ii  nvidia-346-dev                          346.46-0ubuntu1                            ppc64el      NVIDIA binary Xorg driver development files
ii  nvidia-346-uvm                          346.46-0ubuntu1                            ppc64el      NVIDIA Unified Memory kernel module

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-127272 severity-critical targetmilestone-inin1510 wily
-- 
Machine crashes when we unload the Nvidia dirver module with Ubuntu 15.10
https://bugs.launchpad.net/bugs/1483189
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.