← Back to team overview

kernel-packages team mailing list archive

[Bug 1483189] Re: Machine crashes when we unload the Nvidia dirver module with Ubuntu 15.10

 

------- Comment From apopple@xxxxxxxxxxx 2016-01-26 18:30 EDT-------
This looks related to https://bugzilla.linux.ibm.com/show_bug.cgi?id=135507

** Bug watch added: bugzilla.linux.ibm.com/ #135507
   https://bugzilla.linux.ibm.com/show_bug.cgi?id=135507

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1483189

Title:
  Machine crashes when we unload the Nvidia dirver module with Ubuntu
  15.10

Status in linux package in Ubuntu:
  New

Bug description:
  Problem Description
  ==============================
  Machine crashes when we unload the Nvidia dirver module
   
  ---Additional Hardware Info---
  root@fr111p1:~# lspci | grep -i NVIDIA
  0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
  0000:04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
  0002:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
  0002:04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
    
  Machine Type = P8 
   
  Steps to Reproduce
  ===============================
  Install a P8 Power NV 8335-GTA Hardware with Ubuntu 15.10 Netboot images.
  Then followed the below steps to install the latest kernel.

  1) cat <<EOF >> /etc/apt/sources.list
  deb http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu wily main
  deb-src http://ppa.launchpad.net/canonical-kernel-team/ppa/ubuntu wily main
  EOF 

  2) apt-get update
  3) apt-cache search linux-image-4 
  linux-image-4.0.0-3-generic - Linux kernel image for version 4.0.0 on PowerPC 64el SMP
  linux-image-4.0.0-4-generic - Linux kernel image for version 4.0.0 on PowerPC 64el SMP

  4) Choose the latest kernel version for installation
  apt-get install linux-image-4.0.0-4-generic

  Then rebooted the machine to the latest kernel and installed the CUDA
  packages.

  root@fr111p1:~# dpkg -i cuda-repo-ubuntu1410_7.0-28_ppc64el.deb
  root@fr111p1:~# apt-get update
  root@fr111p1:~# apt-get install cuda

  Then tried to unload the kernel module manually.

  root@fr111p1:~# lsmod | grep nvidia
  nvidia_uvm             88636  0
  nvidia              11342553  1 nvidia_uvm
  drm                   431025  5 ast,ttm,drm_kms_helper,nvidia
  root@fr111p1:~# rmmod nvidia_uvm
  root@fr111p1:~# lsmod | grep nvidia
  nvidia              11342553  0
  drm                   431025  5 ast,ttm,drm_kms_helper,nvidia
  root@fr111p1:~# rmmod nvidia
  root@fr111p1:~# nvidia-smi
   
  ---uname output---
  Linux fr111p1 4.0.0-4-generic #6-Ubuntu SMP Tue Jun 30 20:50:37 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

  Stack trace output:
   [14134.139423] Call Trace:
  [14134.139448] [c000001e3e1c7890] [c000000000287ec0] kmem_cache_alloc_trace+0x300/0x330 (unreliable)
  [14134.139526] [c000001e3e1c7900] [c00000000011206c] down+0x7c/0xa0
  [14134.139687] [c000001e3e1c7940] [d000000012a94970] nvidia_open+0x370/0x730 [nvidia]
  [14134.139837] [c000001e3e1c79e0] [d000000012aa33ac] nvidia_frontend_open+0x8c/0x100 [nvidia]
  [14134.139901] [c000001e3e1c7a70] [c0000000002bc3f4] chrdev_open+0x114/0x260
  [14134.139954] [c000001e3e1c7ad0] [c0000000002b18b0] do_dentry_open+0x2d0/0x480
  [14134.140007] [c000001e3e1c7b30] [c0000000002c76a0] do_last+0x190/0x1010
  [14134.140073] [c000001e3e1c7c00] [c0000000002caffc] path_openat+0xdc/0x810
  [14134.140126] [c000001e3e1c7cd0] [c0000000002ccb98] do_filp_open+0x58/0xf0
  [14134.140183] [c000001e3e1c7db0] [c0000000002b3698] do_sys_open+0x1c8/0x390
  [14134.140271] [c000001e3e1c7e30] [c000000000009258] system_call+0x38/0xd0

   
  Oops output:
   [14134.137140] Unable to handle kernel paging request for data at address 0x00000000
  [14134.137162] NVRM: loading NVIDIA UNIX ppc64le Kernel Module  346.46  Tue Feb 17 17:18:33 PST 2015
  [14134.137361] Faulting instruction address: 0xc000000000a42154
  [14134.137411] Oops: Kernel access of bad area, sig: 11 [#1]
  [14134.137640] SMP NR_CPUS=2048 NUMA PowerNV
  [14134.137684] Modules linked in: nvidia(POE) dm_round_robin dm_multipath scsi_dh cxgb3 cxgb4 ib_ipoib ib_ucm ib_uverbs ib_cm ib_umad mlx4_ib ib_sa ib_mad ib_core ib_addr joydev mac_hid hid_generic ipmi_powernv ipmi_msghandler ast powernv_rng ttm at24 drm_kms_helper usbhid uio_pdrv_genirq syscopyarea uio sysfillrect hid sysimgblt i2c_algo_bit drm autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel uas usb_storage bnx2x ahci mlx4_core libahci mdio libcrc32c [last unloaded: nvidia]
  [14134.138214] CPU: 62 PID: 54865 Comm: nvidia-persiste Tainted: P           OE   4.0.0-4-generic #6-Ubuntu
  [14134.138279] task: c000001e3e122200 ti: c000001e3e1c4000 task.ti: c000001e3e1c4000
  [14134.138335] NIP: c000000000a42154 LR: c00000000011206c CTR: c000000000111ff0
  [14134.138389] REGS: c000001e3e1c7610 TRAP: 0300   Tainted: P           OE    (4.0.0-4-generic)
  [14134.138453] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002482  XER: 20000000
  [14134.138594] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0
                 GPR00: c00000000011206c c000001e3e1c7890 c000000001489300 d000000012d55ac8
                 GPR04: 0000000000000001 d000000012d55b10 c000001e0c442a70 00000000000000ff
                 GPR08: d000000012d55ad0 0000000000000000 c000001e3e1c78b0 d000000012aa6c48
                 GPR12: 0000000000002200 c000000007b82e00 0000000000000000 0000000000000000
                 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
                 GPR20: 0000000000000000 c000001e3e1c7dd0 0000000000000026 c000001e3d11a000
                 GPR24: c000001e0c44c600 c000001ffb040000 c0000000002bc2e0 c000001e0c44c600
                 GPR28: d000000012d55658 c000001e3e122200 d000000012d55ac8 d000000012d55ac8
  [14134.139349] NIP [c000000000a42154] __down+0x54/0x120
  [14134.139388] LR [c00000000011206c] down+0x7c/0xa0
  [14134.139423] Call Trace:
  [14134.139448] [c000001e3e1c7890] [c000000000287ec0] kmem_cache_alloc_trace+0x300/0x330 (unreliable)
  [14134.139526] [c000001e3e1c7900] [c00000000011206c] down+0x7c/0xa0
  [14134.139687] [c000001e3e1c7940] [d000000012a94970] nvidia_open+0x370/0x730 [nvidia]
  [14134.139837] [c000001e3e1c79e0] [d000000012aa33ac] nvidia_frontend_open+0x8c/0x100 [nvidia]
  [14134.139901] [c000001e3e1c7a70] [c0000000002bc3f4] chrdev_open+0x114/0x260
  [14134.139954] [c000001e3e1c7ad0] [c0000000002b18b0] do_dentry_open+0x2d0/0x480
  [14134.140007] [c000001e3e1c7b30] [c0000000002c76a0] do_last+0x190/0x1010
  [14134.140073] [c000001e3e1c7c00] [c0000000002caffc] path_openat+0xdc/0x810
  [14134.140126] [c000001e3e1c7cd0] [c0000000002ccb98] do_filp_open+0x58/0xf0
  [14134.140183] [c000001e3e1c7db0] [c0000000002b3698] do_sys_open+0x1c8/0x390
  [14134.140271] [c000001e3e1c7e30] [c000000000009258] system_call+0x38/0xd0
  [14134.140387] Instruction dump:
  [14134.140430] f8010010 f821ff91 7c7e1b78 60000000 60000000 ebad0290 e93e0010 39410020
  [14134.140582] 391e0008 f95e0010 f9010020 f9210028 <f9490000> 39200000 fba10030 99210038
  [14134.171233] ---[ end trace fd85f29c7dc22bee ]---
  [14134.171293]
  [14134.171331] Sending IPI to other CPUs
  [14134.172464] IPI complete

  
  == Comment: #5 - Vaishnavi Bhat <vaish123@xxxxxxxxxx> - 2015-07-07 04:18:03 ==
  # dpkg -l|grep nvidia
  ii  nvidia-346                              346.46-0ubuntu1                            ppc64el      NVIDIA binary driver - version 346.46
  ii  nvidia-346-dev                          346.46-0ubuntu1                            ppc64el      NVIDIA binary Xorg driver development files
  ii  nvidia-346-uvm                          346.46-0ubuntu1                            ppc64el      NVIDIA Unified Memory kernel module

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483189/+subscriptions