← Back to team overview

kernel-packages team mailing list archive

[Bug 1483170] Comment bridged from LTC Bugzilla

 

------- Comment From cdeadmin@xxxxxxxxxx 2016-03-07 12:15 EDT-------
==== State: Verify by: panico on 07 March 2016 11:10:48 ====

Tested on the original gp6 system (PowerNV Ubuntu and nVidia K80) and
verified.  I ran a hardbootme on the system, shutting down and booting
the system every four hours over a two day period.

The system versions:
root@gp6p01:~# uname -a
Linux gp6p01 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 10:11:54 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
root@gp6p01:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.3 LTS
Release:	14.04
Codename:	trusty

The system has this versions of firmware:
Current Side Driver:.....fips811/b1105a_1540.811

Here is the hardbootme.log file from the script running on the lcb:

Bootme started at :
Fri Mar  4 16:38:39 CST 2016
==========================================
System gp6 powered off at :
Fri Mar  4 17:12:41 CST 2016
System gp6 powering on at :
Fri Mar  4 17:22:41 CST 2016
==========================================
==========================================
System gp6 powered off at :
Fri Mar  4 20:01:48 CST 2016
System gp6 powering on at :
Fri Mar  4 20:11:48 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sat Mar  5 00:00:59 CST 2016
System gp6 powering on at :
Sat Mar  5 00:10:59 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sat Mar  5 04:01:09 CST 2016
System gp6 powering on at :
Sat Mar  5 04:11:09 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sat Mar  5 08:01:20 CST 2016
System gp6 powering on at :
Sat Mar  5 08:11:20 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sat Mar  5 12:01:31 CST 2016
System gp6 powering on at :
Sat Mar  5 12:11:31 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sat Mar  5 16:01:41 CST 2016
System gp6 powering on at :
Sat Mar  5 16:11:41 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sat Mar  5 20:00:52 CST 2016
System gp6 powering on at :
Sat Mar  5 20:10:52 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sun Mar  6 00:01:03 CST 2016
System gp6 powering on at :
Sun Mar  6 00:11:03 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sun Mar  6 04:01:13 CST 2016
System gp6 powering on at :
Sun Mar  6 04:11:13 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sun Mar  6 08:01:24 CST 2016
System gp6 powering on at :
Sun Mar  6 08:11:24 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sun Mar  6 12:01:35 CST 2016
System gp6 powering on at :
Sun Mar  6 12:11:35 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sun Mar  6 16:00:46 CST 2016
System gp6 powering on at :
Sun Mar  6 16:10:46 CST 2016
==========================================
==========================================
System gp6 powered off at :
Sun Mar  6 20:00:57 CST 2016
System gp6 powering on at :
Sun Mar  6 20:10:57 CST 2016
==========================================
==========================================
System gp6 powered off at :
Mon Mar  7 00:01:08 CST 2016
System gp6 powering on at :
Mon Mar  7 00:11:08 CST 2016
==========================================
==========================================
System gp6 powered off at :
Mon Mar  7 04:01:19 CST 2016
System gp6 powering on at :
Mon Mar  7 04:11:19 CST 2016
==========================================
==========================================
System gp6 powered off at :
Mon Mar  7 08:01:30 CST 2016
System gp6 powering on at :
Mon Mar  7 08:11:30 CST 2016
==========================================

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1483170

Title:
  NVidia: Ubuntu: OS crashed into xmon Prompt;  scsi_report_bus_reset

Status in linux package in Ubuntu:
  New

Bug description:
  Problem Description:
  ====================
  This system is running non-virtualized ubuntu with one nvidia k80 GPU. During a hardbootme run the OS crashed.  Here are the details from xmon:

  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c000003ffff8f3b0]
      pc: c00000000069ba80: scsi_report_bus_reset+0x60/0xb0
      lr: d00000001cae524c: ipr_erp_start+0x3bc/0x644 [ipr]
      sp: c000003ffff8f630
     msr: 9000000000009033
     dar: 100178
   dsisr: 40000000
    current = 0xc000000001359b10
    paca    = 0xc00000000fb80000	 softe: 0	 irq_happened: 0x01
      pid   = 0, comm = swapper/0
  0:mon> r
  R00 = d00000001cae524c   R16 = 0000000000200000
  R01 = c000003ffff8f630   R17 = 0000000000000000
  R02 = c0000000013d8028   R18 = 00000000fffefa58
  R03 = c000000fdcb00000   R19 = c000000000e4a000
  R04 = 0000000000000000   R20 = c000000001412180
  R05 = 0000000000000002   R21 = 0000000000000001
  R06 = 0000000000000067   R22 = 0000000000000002
  R07 = 0000000006290000   R23 = 00000000000001f0
  R08 = 0000000000000001   R24 = c00000001010ea00
  R09 = 00000000001000f0   R25 = c000000fdcb00730
  R10 = 00000000000000ff   R26 = 0000000000000001
  R11 = d00000001cae6518   R27 = 0000000006290000
  R12 = c00000000069ba20   R28 = c000000fdce40cf0
  R13 = c00000000fb80000   R29 = c000000fa4c50300
  R14 = c00000000135a120   R30 = 0000000000000000
  R15 = 0000000000000000   R31 = c000000fdcb00000
  pc  = c00000000069ba80 scsi_report_bus_reset+0x60/0xb0
  cfar= c000000000009368 slb_miss_realmode+0x50/0x78
  lr  = d00000001cae524c ipr_erp_start+0x3bc/0x644 [ipr]
  msr = 9000000000009033   cr  = 28044444
  ctr = c00000000069ba20   xer = 0000000000000000   trap =  300
  dar = 0000000000100178   dsisr = 40000000
  0:mon> t
  [c000003ffff8f660] d00000001cae524c ipr_erp_start+0x3bc/0x644 [ipr]
  [c000003ffff8f6c0] d00000001caddb20 ipr_scsi_done+0x100/0x120 [ipr]
  [c000003ffff8f700] d00000001cadc5bc ipr_isr_mhrrq+0x10c/0x250 [ipr]
  [c000003ffff8f760] c00000000012ff90 handle_irq_event_percpu+0x90/0x2b0
  [c000003ffff8f820] c000000000130218 handle_irq_event+0x68/0xd0
  [c000003ffff8f850] c000000000135380 handle_fasteoi_irq+0xe0/0x250
  [c000003ffff8f880] c00000000012f188 generic_handle_irq+0x58/0x90
  [c000003ffff8f8b0] c0000000000119d0 __do_irq+0x80/0x190
  [c000003ffff8f8e0] c000000000011bec do_IRQ+0x10c/0x120
  [c000003ffff8f940] c000000000002794 hardware_interrupt_common+0x114/0x180
  --- Exception: 501 (Hardware Interrupt) at c0000000006a45b4 scsi_io_completion+0x1e4/0x800
  [c000003ffff8fd00] c00000000069662c scsi_finish_command+0x15c/0x1b0
  [c000003ffff8fd80] c0000000006a41d8 scsi_softirq_done+0x198/0x200
  [c000003ffff8fe00] c0000000004cbbd4 blk_done_softirq+0xb4/0xe0
  [c000003ffff8fe40] c0000000000b5244 __do_softirq+0x174/0x3e0
  [c000003ffff8ff30] c0000000000b5888 irq_exit+0xf8/0x140
  [c000003ffff8ff60] c0000000000119dc __do_irq+0x8c/0x190
  [c000003ffff8ff90] c000000000025320 call_do_irq+0x14/0x24
  [c0000000013d7840] c000000000011b80 do_IRQ+0xa0/0x120
  [c0000000013d78a0] c000000000002794 hardware_interrupt_common+0x114/0x180
  --- Exception: 501 (Hardware Interrupt) at c0000000000110d4 arch_local_irq_restore+0x74/0x90
  [c0000000013d7b90] c0000000000162f8 __switch_to+0x208/0x350 (unreliable)
  [c0000000013d7bb0] c0000000000ef70c finish_task_switch+0x7c/0x1e0
  [c0000000013d7bf0] c0000000009d6c40 __schedule+0x370/0x910
  [c0000000013d7e10] c0000000009d7880 schedule_preempt_disabled+0x20/0x30
  [c0000000013d7e30] c0000000001121e4 cpu_startup_entry+0x1c4/0x500
  [c0000000013d7ee0] c00000000000ccd4 rest_init+0xa4/0xc0
  [c0000000013d7f00] c000000000d53e4c start_kernel+0x520/0x53c
  [c0000000013d7f90] c000000000009b6c start_here_common+0x20/0xa8
  0:mon>

  == Comment: #1 - Brian J. King <bjking1@xxxxxxxxxx> - 2015-05-28 17:08:13 ==
  Make sure we have the host lock held when calling scsi_report_bus_reset. Fixes a crash seen as the __devices list in the scsi host was changing as we were iterating through it.

  == Comment: #8 - Wen Xiong <wenxiong@xxxxxxxxxx> - 2015-08-06 11:09:25 ==
  Release of bug changed to Ubuntu14.04.

  He has tested the patch and " yes the patch worked". We have upstream
  the patch last month. Here is the commit link:

  https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/drivers/scsi/ipr.c?h=misc&id=36b8e180e1e929e00b351c3b72aab3147fc14116

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483170/+subscriptions