← Back to team overview

kernel-packages team mailing list archive

[Bug 1597974] Re: ISST-LTE:pVM:monklp5:Ubuntu16.04.1:system crashed at lpfc_sli4_scmd_to_wqidx_distr

 

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
       Status: New => Triaged

** Changed in: linux (Ubuntu)
     Assignee: Taco Screen team (taco-screen-team) => Canonical Kernel Team (canonical-kernel-team)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1597974

Title:
  ISST-LTE:pVM:monklp5:Ubuntu16.04.1:system crashed at
  lpfc_sli4_scmd_to_wqidx_distr

Status in linux package in Ubuntu:
  Triaged

Bug description:
  ---Problem Description---
  We have Ubuntu16.04.1 installed on our system and run DLPAR test for ZR1 adapter  after some time it crashes at lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100
    
  Machine Type = 9119-MME*1085AE7 
   
  ---Debugger Data---
  e:mon> e
  cpu 0xe: Vector: 300 (Data Access) at [c0000003d45335a0]
      pc: d000000003a374e0: lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]
      lr: d0000000039d749c: lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
      sp: c0000003d4533820
     msr: 8000000100009033
     dar: 0
   dsisr: 40000000
    current = 0xc0000003e06c2a20
    paca    = 0xc000000007af8500   softe: 0        irq_happened: 0x01
      pid   = 17983, comm = scsi_eh_23
  e:mon> r
  R00 = d0000000039d749c   R16 = 0000000000000000
  R01 = c0000003d4533820   R17 = c0000003d4533cd0
  R02 = d000000003a84160   R18 = c0000003d4533cb8
  R03 = c0000003ee76a000   R19 = c0000003d87e5088
  R04 = c0000003dad6a800   R20 = c0000003d4533cb0
  R05 = c0000003dad6a870   R21 = 000000000000001e
  R06 = 0000000000000001   R22 = c0000000018aab78
  R07 = d000000003a84160   R23 = c0000003dad6a870
  R08 = d000000003a2f830   R24 = c0000003dad6a800
  R09 = 0000000000000004   R25 = c0000003d4533978
  R10 = 0000000000000000   R26 = 0000000000000001
  R11 = d000000003a59a50   R27 = 0000000000000000
  R12 = 0000000028533824   R28 = c0000003e841e000
  R13 = c000000007af8500   R29 = c0000003ee76a000
  R14 = c0000003d87e5000   R30 = c0000003dad6a800
  R15 = c0000003d4533cb8   R31 = c0000003ee76a000
  pc  = d000000003a374e0 lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]
  cfar= c000000000008468 slb_miss_realmode+0x50/0x78
  lr  = d0000000039d749c lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
  msr = 8000000100009033   cr  = 28538828
  ctr = c000000000ae3cf0   xer = 0000000020000010   trap =  300
  dar = 0000000000000000   dsisr = 40000000
  e:mon> 
   
  Stack trace output:
   e:mon> t
  [c0000003d4533850] d0000000039d749c lpfc_sli_calc_ring.part.20+0xdc/0x100 [lpfc]
  [c0000003d4533890] d0000000039df680 lpfc_sli_issue_iocb+0xf0/0x330 [lpfc]
  [c0000003d45338f0] d0000000039e3824 lpfc_sli_issue_iocb_wait+0x264/0x680 [lpfc]
  [c0000003d45339d0] d000000003a32944 lpfc_send_taskmgmt+0x2d4/0x7d0 [lpfc]
  [c0000003d4533aa0] d000000003a33564 lpfc_device_reset_handler+0x114/0x210 [lpfc]
  [c0000003d4533b60] c00000000075843c scsi_eh_ready_devs+0x68c/0xee0
  [c0000003d4533c50] c00000000075a91c scsi_error_handler+0x6bc/0x9e0
  [c0000003d4533d80] c0000000000e61e0 kthread+0x110/0x130
  [c0000003d4533e30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
  --- Exception: 0  at 0000000000000000
  e:mon>
   
  e:mon> dl
  [10194.079284] sd 13:0:3:0: [sdaf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.079293] sd 13:0:3:0: [sdaf] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.079297] blk_update_request: I/O error, dev sdaf, sector 41942912
  [10194.079313] device-mapper: multipath: Failing path 65:240.
  [10194.079351] sd 13:0:2:0: [sdab] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.079360] sd 13:0:2:0: [sdab] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.079364] blk_update_request: I/O error, dev sdab, sector 41942912
  [10194.079375] device-mapper: multipath: Failing path 65:176.
  [10194.102832] scsi 13:0:1:0: alua: Detached
  [10194.110320] sd 13:0:1:1: [sdh] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.110324] sd 13:0:1:1: [sdh] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.110326] blk_update_request: I/O error, dev sdh, sector 41942912
  [10194.110334] device-mapper: multipath: Failing path 8:112.
  [10194.110394] sd 13:0:2:1: [sdac] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.110398] sd 13:0:2:1: [sdac] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.110401] blk_update_request: I/O error, dev sdac, sector 41942912
  [10194.110407] device-mapper: multipath: Failing path 65:192.
  [10194.110439] sd 13:0:3:1: [sdag] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.110448] sd 13:0:3:1: [sdag] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.110452] blk_update_request: I/O error, dev sdag, sector 41942912
  [10194.110464] device-mapper: multipath: Failing path 66:0.
  [10194.118851] scsi 13:0:0:1: alua: Detached
  [10194.122868] sd 13:0:3:0: [sdaf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.122879] sd 13:0:3:0: [sdaf] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.122887] blk_update_request: I/O error, dev sdaf, sector 41942912
  [10194.122911] device-mapper: multipath: Failing path 65:240.
  [10194.138865] scsi 13:0:2:0: alua: Detached
  [10194.158852] scsi 13:0:3:0: alua: Detached
  [10194.162199] sd 13:0:3:1: [sdag] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [10194.162204] sd 13:0:3:1: [sdag] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [10194.162207] blk_update_request: I/O error, dev sdag, sector 41942912
  [10194.162216] device-mapper: multipath: Failing path 66:0.
  [10194.162241] device-mapper: multipath: Failing path 65:192.
  [10194.194835] scsi 13:0:1:1: alua: Detached
  [10194.202301] device-mapper: multipath: Failing path 65:208.
  [10194.202323] device-mapper: multipath: Failing path 8:128.
  [10194.202359] device-mapper: multipath: Failing path 66:16.
  [10194.218852] scsi 13:0:0:2: alua: Detached
  [10194.222391] device-mapper: multipath: Failing path 66:0.
  [10194.250830] scsi 13:0:2:1: alua: Detached
  [10194.274829] scsi 13:0:3:1: alua: Detached
  [10194.278436] device-mapper: multipath: Failing path 66:16.
  [10194.278467] device-mapper: multipath: Failing path 65:208.
  [10194.298817] scsi 13:0:1:2: alua: Detached
  [10194.306356] device-mapper: multipath: Failing path 65:224.
  [10194.306383] device-mapper: multipath: Failing path 65:160.
  [10194.306424] device-mapper: multipath: Failing path 66:32.
  [10194.334838] scsi 13:0:0:3: alua: Detached
  [10194.338579] device-mapper: multipath: Failing path 66:16.
  [10194.354934] scsi 13:0:2:2: alua: Detached
  [10194.378850] scsi 13:0:3:2: alua: Detached
  [10194.382605] device-mapper: multipath: Failing path 66:32.
  [10194.382643] device-mapper: multipath: Failing path 65:224.
  [10194.406826] scsi 13:0:1:3: alua: Detached
  [10194.410973] device-mapper: multipath: Failing path 66:32.
  [10194.434908] scsi 13:0:2:3: alua: Detached
  [10194.462920] scsi 13:0:3:3: alua: Detached
  [10194.587776] iommu: Removing device 0007:01:00.0 from group 0
  [10204.593263] pci_bus 0007:01: busn_res: [bus 01-ff] is released
  [10204.593333] rpadlpar_io: slot PHB 21 removed
  [10849.383986] PCI host bridge /pci@800000020000015  ranges:
  [10849.383991]  MEM 0x00003fc600000000..0x00003fc67effffff -> 0x0000000080000000
  [10849.383993]  MEM 0x000030c000000000..0x000030cfffffffff -> 0x0003d0c000000000
  [10849.389303] PCI: I/O resource not set for host bridge /pci@800000020000015 (domain 8)
  [10849.389372] PCI host bridge to bus 0008:01
  [10849.389378] pci_bus 0008:01: root bus resource [mem 0x3fc600000000-0x3fc67effffff] (bus address [0x80000000-0xfeffffff])
  [10849.389384] pci_bus 0008:01: root bus resource [bus 01-ff]
  [10849.394162] pci 0008:01:00.1: reg 0x160: [mem 0x00000000-0x0000ffff 64bit pref]
  [10849.394165] pci 0008:01:00.1: VF(n) BAR0 space: [mem 0x00000000-0x0013ffff 64bit pref] (contains BAR0 for 20 VFs)
  [10849.405662] pci 0008:01:00.0: reg 0x160: [mem 0x00000000-0x0000ffff 64bit pref]
  [10849.405664] pci 0008:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x0013ffff 64bit pref] (contains BAR0 for 20 VFs)
  [10849.491175] iommu: Adding device 0008:01:00.1 to group 0
  [10849.491704] iommu: Adding device 0008:01:00.0 to group 0
  [10849.492196] PIAR: overlapping address range
  [10849.492198] PIAR: overlapping address range
  [10849.492199] PIAR: overlapping address range
  [10849.492199] PIAR: overlapping address range
  [10849.492200] PIAR: overlapping address range
  [10849.492441] lpfc 0008:01:00.1: enabling device (0140 -> 0142)
  [10849.495406] lpfc 0008:01:00.1: ibm,query-pe-dma-windows(53) 10000 8000000 20000015 returned 0
  [10849.542283] lpfc 0008:01:00.1: Using 64-bit direct DMA at offset 800000000000000
  [10849.675139] scsi host14: Emulex LPe16000 16Gb PCIe Fibre Channel Adapter on PCI bus 01 device 01 irq 505
  [10850.235317] lpfc 0008:01:00.0: enabling device (0140 -> 0142)
  [10850.239152] lpfc 0008:01:00.0: Using 64-bit direct DMA at offset 800000000000000
  [10850.399263] scsi host15: Emulex LPe16000 16Gb PCIe Fibre Channel Adapter on PCI bus 01 device 00 irq 504
  [10850.959301] rpaphp: Slot [U78CA.001.CSS003P-P1-C6-C1] registered
  [10850.959309] rpadlpar_io: slot PHB 21 added
  [10851.847229] lpfc 0008:01:00.0: 1:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
  [10852.026827] scsi 15:0:0:0: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.027527] sd 15:0:0:0: alua: supports implicit TPGS
  [10852.027843] sd 15:0:0:0: alua: port group 00 rel port 230
  [10852.027890] sd 15:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.028276] sd 15:0:0:0: alua: port group 00 state A preferred supports tolusnA
  [10852.028425] sd 15:0:0:0: [sda] Write Protect is off
  [10852.028431] sd 15:0:0:0: [sda] Mode Sense: f5 00 00 08
  [10852.028455] sd 15:0:0:0: Attached scsi generic sg0 type 0
  [10852.028711] sd 15:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.029804] scsi 15:0:0:1: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.030789] sd 15:0:0:1: alua: supports implicit TPGS
  [10852.031484] sd 15:0:0:1: alua: port group 00 rel port 230
  [10852.031522] sd 15:0:0:1: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.031994] sd 15:0:0:1: alua: port group 00 state A preferred supports tolusnA
  [10852.032153] sd 15:0:0:1: Attached scsi generic sg1 type 0
  [10852.032239] sd 15:0:0:1: [sdb] Write Protect is off
  [10852.032246] sd 15:0:0:1: [sdb] Mode Sense: ed 00 00 08
  [10852.032596] sd 15:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.033460] scsi 15:0:0:2: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.034530] sd 15:0:0:2: alua: supports implicit TPGS
  [10852.034917] sd 15:0:0:2: alua: port group 00 rel port 230
  [10852.035000] sd 15:0:0:2: [sdd] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.035294] sd 15:0:0:2: alua: port group 00 state A preferred supports tolusnA
  [10852.035568] sd 15:0:0:2: Attached scsi generic sg2 type 0
  [10852.036739] scsi 15:0:0:3: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.037441] sd 15:0:0:3: alua: supports implicit TPGS
  [10852.037740] sd 15:0:0:3: alua: port group 00 rel port 230
  [10852.037798] sd 15:0:0:3: [sde] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.038070] sd 15:0:0:3: alua: port group 00 state A preferred supports tolusnA
  [10852.038234] sd 15:0:0:3: Attached scsi generic sg3 type 0
  [10852.038349] sd 15:0:0:3: [sde] Write Protect is off
  [10852.038355] sd 15:0:0:3: [sde] Mode Sense: ed 00 00 08
  [10852.038683] sd 15:0:0:3: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.039314] scsi 15:0:1:0: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.039748]  sdb:
  [10852.040238] sd 15:0:1:0: alua: supports implicit TPGS
  [10852.040632] sd 15:0:1:0: alua: port group 00 rel port 30
  [10852.040708] sd 15:0:1:0: [sdg] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.041053] sd 15:0:1:0: alua: port group 00 state A preferred supports tolusnA
  [10852.041481] sd 15:0:1:0: [sdg] Write Protect is off
  [10852.041496] sd 15:0:1:0: [sdg] Mode Sense: f5 00 00 08
  [10852.041550] sd 15:0:0:1: [sdb] Attached SCSI disk
  [10852.041786] sd 15:0:1:0: [sdg] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.042390] sd 15:0:1:0: Attached scsi generic sg4 type 0
  [10852.044049] scsi 15:0:1:1: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.044795] sd 15:0:1:1: alua: supports implicit TPGS
  [10852.045180] sd 15:0:1:1: alua: port group 00 rel port 30
  [10852.045226] sd 15:0:0:0: [sda] Attached SCSI disk
  [10852.045313] sd 15:0:1:1: [sdh] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.045631] sd 15:0:1:1: alua: port group 00 state A preferred supports tolusnA
  [10852.045730] sd 15:0:1:1: Attached scsi generic sg5 type 0
  [10852.045942] sd 15:0:1:1: [sdh] Write Protect is off
  [10852.045949] sd 15:0:1:1: [sdh] Mode Sense: ed 00 00 08
  [10852.046318] sd 15:0:1:1: [sdh] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.046813] scsi 15:0:1:2: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.047808] sd 15:0:1:2: alua: supports implicit TPGS
  [10852.048133] sd 15:0:1:2: alua: port group 00 rel port 30
  [10852.048358] sd 15:0:1:2: [sdi] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.048520] sd 15:0:1:2: alua: port group 00 state A preferred supports tolusnA
  [10852.048643] sd 15:0:1:2: Attached scsi generic sg6 type 0
  [10852.049296]  sdh:
  [10852.049299] sd 15:0:1:2: [sdi] Write Protect is off
  [10852.049308] sd 15:0:1:2: [sdi] Mode Sense: ed 00 00 08
  [10852.049634] sd 15:0:1:2: [sdi] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.049667] scsi 15:0:1:3: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.050354] sd 15:0:1:3: alua: supports implicit TPGS
  [10852.050853] sd 15:0:1:3: alua: port group 00 rel port 30
  [10852.050943] sd 15:0:1:3: [sdaa] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.051214] sd 15:0:1:1: [sdh] Attached SCSI disk
  [10852.051287] sd 15:0:1:3: alua: port group 00 state A preferred supports tolusnA
  [10852.051426] sd 15:0:1:3: Attached scsi generic sg7 type 0
  [10852.051646] sd 15:0:1:3: [sdaa] Write Protect is off
  [10852.051656] sd 15:0:1:3: [sdaa] Mode Sense: ed 00 00 08
  [10852.051967] sd 15:0:1:3: [sdaa] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.052323] scsi 15:0:2:0: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.052973] sd 15:0:2:0: alua: supports implicit TPGS
  [10852.053314] sd 15:0:2:0: alua: port group 00 rel port 100
  [10852.053406] sd 15:0:2:0: [sdab] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.053686] sd 15:0:2:0: alua: port group 00 state A preferred supports tolusnA
  [10852.053892] sd 15:0:2:0: Attached scsi generic sg8 type 0
  [10852.054069] sd 15:0:2:0: [sdab] Write Protect is off
  [10852.054078] sd 15:0:2:0: [sdab] Mode Sense: f5 00 00 08
  [10852.054391] sd 15:0:2:0: [sdab] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.055081] scsi 15:0:2:1: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.056159] sd 15:0:2:1: alua: supports implicit TPGS
  [10852.056493] sd 15:0:2:1: alua: port group 00 rel port 100
  [10852.056574] sd 15:0:2:1: [sdac] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.056997] sd 15:0:2:1: alua: port group 00 state A preferred supports tolusnA
  [10852.057274] sd 15:0:2:1: [sdac] Write Protect is off
  [10852.057280] sd 15:0:2:1: Attached scsi generic sg29 type 0
  [10852.057290] sd 15:0:2:1: [sdac] Mode Sense: ed 00 00 08
  [10852.057578] sd 15:0:2:1: [sdac] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.058491] scsi 15:0:2:2: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.059132]  sde:
  [10852.059173]  sdaa:
  [10852.060148] sd 15:0:2:2: alua: supports implicit TPGS
  [10852.060723] sd 15:0:2:2: alua: port group 00 rel port 100
  [10852.060814] sd 15:0:0:3: [sde] Attached SCSI disk
  [10852.060858] sd 15:0:1:3: [sdaa] Attached SCSI disk
  [10852.060942] sd 15:0:2:2: alua: rtpg failed with 8000002
  [10852.061167]  sdac:
  [10852.061313] sd 15:0:2:2: [sdad] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.061363] sd 15:0:2:2: alua: port group 00 state A preferred supports tolusnA
  [10852.061615] sd 15:0:2:2: Attached scsi generic sg30 type 0
  [10852.062278] sd 15:0:2:2: [sdad] Write Protect is off
  [10852.062291] sd 15:0:2:2: [sdad] Mode Sense: ed 00 00 08
  [10852.062841] scsi 15:0:2:3: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.062902] sd 15:0:2:2: [sdad] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.063740] sd 15:0:2:1: [sdac] Attached SCSI disk
  [10852.063965] sd 15:0:2:3: alua: supports implicit TPGS
  [10852.064330] sd 15:0:2:0: [sdab] Attached SCSI disk
  [10852.064437] sd 15:0:2:3: alua: port group 00 rel port 100
  [10852.064507] sd 15:0:2:3: [sdae] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.064927] sd 15:0:2:3: alua: port group 00 state A preferred supports tolusnA
  [10852.065231] sd 15:0:2:3: Attached scsi generic sg31 type 0
  [10852.065348] sd 15:0:2:3: [sdae] Write Protect is off
  [10852.065358] sd 15:0:2:3: [sdae] Mode Sense: ed 00 00 08
  [10852.065859] sd 15:0:2:3: [sdae] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.065872]  sdad:
  [10852.065959]  sdi:
  [10852.066310] scsi 15:0:3:0: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.067721] sd 15:0:2:2: [sdad] Attached SCSI disk
  [10852.067796] sd 15:0:1:2: [sdi] Attached SCSI disk
  [10852.067876] sd 15:0:3:0: alua: supports implicit TPGS
  [10852.068387] sd 15:0:3:0: [sdaf] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.068426] sd 15:0:3:0: alua: port group 00 rel port 300
  [10852.068904] sd 15:0:3:0: alua: port group 00 state A preferred supports tolusnA
  [10852.069151]  sdae:
  [10852.069204] sd 15:0:3:0: Attached scsi generic sg32 type 0
  [10852.069657] sd 15:0:3:0: [sdaf] Write Protect is off
  [10852.069664] sd 15:0:3:0: [sdaf] Mode Sense: f5 00 00 08
  [10852.070344] sd 15:0:3:0: [sdaf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.070954] sd 15:0:2:3: [sdae] Attached SCSI disk
  [10852.074942] sd 15:0:3:0: [sdaf] Attached SCSI disk
  [10852.074954] scsi 15:0:3:1: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.076026] sd 15:0:3:1: alua: supports implicit TPGS
  [10852.076714] sd 15:0:3:1: alua: port group 00 rel port 300
  [10852.076745] sd 15:0:3:1: [sdag] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.077546] sd 15:0:3:1: alua: port group 00 state A preferred supports tolusnA
  [10852.077885] sd 15:0:3:1: Attached scsi generic sg33 type 0
  [10852.078100] sd 15:0:3:1: [sdag] Write Protect is off
  [10852.078109] sd 15:0:3:1: [sdag] Mode Sense: ed 00 00 08
  [10852.078403] sd 15:0:3:1: [sdag] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.080875]  sdag:
  [10852.083603] sd 15:0:3:1: [sdag] Attached SCSI disk
  [10852.086470] scsi 15:0:3:2: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.087290] sd 15:0:3:2: alua: supports implicit TPGS
  [10852.087630] sd 15:0:3:2: alua: port group 00 rel port 300
  [10852.087790] sd 15:0:3:2: [sdah] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.087984] sd 15:0:3:2: alua: port group 00 state A preferred supports tolusnA
  [10852.088145] sd 15:0:3:2: Attached scsi generic sg34 type 0
  [10852.088392] sd 15:0:3:2: [sdah] Write Protect is off
  [10852.088411] sd 15:0:3:2: [sdah] Mode Sense: ed 00 00 08
  [10852.088687] sd 15:0:3:2: [sdah] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [10852.089078] scsi 15:0:3:3: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [10852.089911] sd 15:0:3:3: alua: supports implicit TPGS
  [10852.090323] sd 15:0:3:3: alua: port group 00 rel port 300
  [10852.090360] sd 15:0:3:3: [sdai] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [10852.090737] sd 15:0:3:3: alua: port group 00 state A preferred supports tolusnA
  [10852.091050] sd 15:0:3:3: Attached scsi gen0
  [12126.399029] scsi host17: Emulex LPe16000 16Gb PCIe Fibre Channel Adapter on PCI bus 01 device 00 irq 504
  [12126.959169] rpaphp: Slot [U78CA.001.CSS003P-P1-C6-C1] registered
  [12126.959176] rpadlpar_io: slot PHB 21 added
  [12127.844158] lpfc 0009:01:00.0: 1:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
  [12128.043085] scsi 17:0:0:0: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [12128.043688] sd 17:0:0:0: alua: supports implicit TPGS
  [12128.043969] sd 17:0:0:0: alua: port group 00 rel port 300
  [12128.044190] sd 17:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [12128.044311] sd 17:0:0:0: alua: port group 00 state A preferred supports tolusnA
  [12128.044430] sd 17:0:0:0: Attached scsi generic sg0 type 0
  [12128.044896] sd 17:0:0:0: [sda] Write Protect is off
  [12128.044903] sd 17:0:0:0: [sda] Mode Sense: f5 00 00 08
  [12128.045179] sd 17:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or [sdaf] Mode Sense: f5 00 00 08
  [12128.088966] sd 17:0:3:0: Attached scsi generic sg32 type 0
  [12128.089258] sd 17:0:3:0: [sdaf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [12128.089877] sd 17:0:2:3: [sdae] Attached SCSI disk
  [12128.090349] scsi 17:0:3:1: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [12128.092029] sd 17:0:3:1: alua: supports implicit TPGS
  [12128.092495] sd 17:0:3:1: alua: port group 00 rel port 30
  [12128.092548] sd 17:0:3:1: [sdag] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [12128.092885] sd 17:0:3:1: alua: port group 00 state A preferred supports tolusnA
  [12128.093103] sd 17:0:3:1: Attached scsi generic sg33 type 0
  [12128.093172] sd 17:0:3:1: [sdag] Write Protect is off
  [12128.093183] sd 17:0:3:1: [sdag] Mode Sense: ed 00 00 08
  [12128.093503] sd 17:0:3:1: [sdag] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [12128.094183] scsi 17:0:3:2: Direct-Access     IBM    th 65:224.
  [12749.073922] device-mapper: multipath: Failing path 66:32.
  [12749.073953] device-mapper: multipath: Failing path 65:160.
  [12749.090309] scsi 17:0:0:3: alua: Detached
  [12749.094400] device-mapper: multipath: Failing path 66:16.
  [12749.118398] scsi 17:0:2:2: alua: Detached
  [12749.138317] scsi 17:0:3:2: alua: Detached
  [12749.141437] device-mapper: multipath: Failing path 65:224.
  [12749.141467] device-mapper: multipath: Failing path 66:32.
  [12749.162316] scsi 17:0:1:3: alua: Detached
  [12749.165448] device-mapper: multipath: Failing path 66:32.
  [12749.202489] scsi 17:0:2:3: alua: Detached
  [12749.238436] scsi 17:0:3:3: alua: Detached
  [12749.371720] iommu: Removing device 0009:01:00.0 from group 0
  [12759.378488] pci_bus 0009:01: busn_res: [bus 01-ff] is released
  [12759.378559] rpadlpar_io: slot PHB 21 removed
  [13405.725246] PCI host bridge /pci@800000020000015  ranges:
  [13405.725253]  MEM 0x00003fc600000000..0x00003fc67effffff -> 0x0000000080000000 l port 100
  [13408.460796] sd 19:0:2:1: [sdac] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [13408.461979] sd 19:0:2:1: [sdac] Write Protect is off
  [13408.461987] sd 19:0:2:1: [sdac] Mode Sense: ed 00 00 08
  [13408.462292] sd 19:0:2:1: [sdac] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [13408.463890] sd 19:0:2:0: [sdab] Attached SCSI disk
  [13408.465140]  sdac:
  [13408.467041] sd 19:0:2:1: [sdac] Attached SCSI disk
  [13408.569203]  sdd:
  [13408.569287]  sdi:
  [13408.570556] sd 19:0:0:2: [sdd] Attached SCSI disk
  [13408.570631] sd 19:0:1:2: [sdi] Attached SCSI disk
  [13438.697588] sd 19:0:0:1: [sdb] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [13439.326030]  rport-19:0-9: blocked FC remote port time out: removing rport
  [13453.426174] sd 19:0:0:1: [sdb] Write Protect is off
  [13453.426184] sd 19:0:0:1: [sdb] Mode Sense: ed 00 00 08
  [13453.426711] sd 19:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [14683.037860] scsi 21:0:0:2: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [14683.039069] sd 21:0:0:2: alua: supports implicit TPGS
  [14683.039472] sd 21:0:0:2: alua: port group 00 rel port 300
  [14683.039557] sd 21:0:0:2: [sdd] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [14683.039882] sd 21:0:0:2: alua: port group 00 state A preferred supports tolusnA
  [14683.040143] sd 21:0:0:2: Attached scsi generic sg2 type 0
  [14683.040221] sd 21:0:0:2: [sdd] Write Protect is off
  [14683.040231] sd 21:0:0:2: [sdd] Mode Sense: ed 00 00 08
  [14683.040517] sd 21:0:0:2: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [14683.041295] scsi 21:0:0:3: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [14683.042497] sd 21:0:0:3: alua: supports implicit TPGS
  [14683.042870] sd 21:0:0:3: [sde] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [1468er: multipath: Failing path 8:16.
  [15299.302590] iommu: Removing device 000b:01:00.1 from group 0
  [15299.317740] scsi 21:0:0:0: alua: Detached
  [15299.325260] sd 21:0:2:0: [sdab] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [15299.325265] sd 21:0:2:0: [sdab] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [15299.325267] blk_update_request: I/O error, dev sdab, sector 41942912
  [15299.325275] device-mapper: multipath: Failing path 65:176.
  [15299.325296] sd 21:0:3:0: [sdae] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [15299.325298] sd 21:0:3:0: [sdae] tag#0 CDB: Read(10) 28 00 02 7f ff 80 00 00 80 00
  [15299.325301] blk_update_request: I/O error, dev sdae, sector 41942912
  [15299.325309] device-mapper: multipath: Failing path 65:224.
  [15299.353733] scsi 21:0:1:0: alua: Detached
  [15299.361265] sd 21:0:3:1: [sdaf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
  [15299.361269] sd 21:0:3:1: [sdaf] tag#0 CDB958.559922] scsi 23:0:1:1: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [15958.560721] sd 23:0:1:1: alua: supports implicit TPGS
  [15958.561066] sd 23:0:1:1: alua: port group 00 rel port 100
  [15958.561092] sd 23:0:1:1: [sdh] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
  [15958.561697] sd 23:0:1:1: alua: port group 00 state A preferred supports tolusnA
  [15958.561880] sd 23:0:1:1: Attached scsi generic sg5 type 0
  [15958.561951] sd 23:0:1:1: [sdh] Write Protect is off
  [15958.561958] sd 23:0:1:1: [sdh] Mode Sense: ed 00 00 08
  [15958.562239] sd 23:0:1:1: [sdh] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
  [15958.562942] scsi 23:0:1:2: Direct-Access     IBM      2107900          .149 PQ: 0 ANSI: 5
  [15958.563577] sd 23:0:1:0: [sdg] Attached SCSI disk
  [15958.563921] sd 23:0:1:2: alua: supports implicit TPGS
  [15958.564270] sd 23:0:1:2: alua: port group 00 rel port 100
  [15958.564443] sd 23:0:1:2: [sdi] 41943040 512-byte
  d000000003a37530  7c6307b4      extsw   r3,r3
  d000000003a37534  38210030      addi    r1,r1,48
  d000000003a37538  e8010010      ld      r0,16(r1)
  d000000003a3753c  ebc1fff0      ld      r30,-16(r1)
  d000000003a37540  ebe1fff8      ld      r31,-8(r1)
  d000000003a37544  7c0803a6      mtlr    r0
  d000000003a37548  4e800020      blr
  d000000003a3754c  60420000      ori     r2,r2,0
  d000000003a37550  813f0b14      lwz     r9,2836(r31)
  d000000003a37554  2b890001      cmplwi  cr7,r9,1
  d000000003a37558  409dffa8      ble     cr7,d000000003a37500    # lpfc_sli4_scmd_to_wqidx_distr+0x50/0x100 [lpfc]
  d000000003a3755c  a14d0008      lhz     r10,8(r13)
  d000000003a37560  a13f0572      lhz     r9,1394(r31)
  d000000003a37564  7f895000      cmpw    cr7,r9,r10
  d000000003a37568  409dff98      ble     cr7,d000000003a37500    # lpfc_sli4_scmd_to_wqidx_distr+0x50/0x100 [lpfc]
  d000000003a3756c  e93f0568      ld      r9,1384(r31)
  e:mon>

  ---Steps to Reproduce---
   cd /kte/tools
  ./setup dlar
  cd /kte/tools/dlpar 
  ./start.dlpar -d 0

  Doing some analysis without a crashdump, as it's taking too long.

  This looks like a NULL pointer dereference, 
  if my assembly reading/matching to C is correct.

  Would need to understand why/how this
  '(struct scsi_cmnd *cmnd)->device' field is NULL.

  Analysis
  --------

  From xmon:
  	pc: <...>: lpfc_sli4_scmd_to_wqidx_distr+0x30/0x100 [lpfc]

          R10 = 0000000000000000   R26 = 0000000000000001

  From 'objdump -d /usr/lib/debug/<...>/lpfc.ko',

  	lpfc_sli4_scmd_to_wqidx_distr      = 674b0
  	lpfc_sli4_scmd_to_wqidx_distr+0x30 = 674e0 (crash)

          and

  	00000000000674b0 <lpfc_sli4_scmd_to_wqidx_distr>:
  	<...>
  	   674cc:       78 23 9e 7c     mr      r30,r4
  	   674d0:       78 1b 7f 7c     mr      r31,r3
  	<...>
  	   674dc:       10 00 5e e9     ld      r10,16(r30)
  	   674e0:       00 00 2a e9     ld      r9,0(r10)  <<-- (crash)
  	   674e4:       00 00 29 e9     ld      r9,0(r9)
  	<...>

  This is the relevant snippet of code:

  From Ubuntu 16.04 kernel 4.4.0-22.40 [1]

  	int lpfc_sli4_scmd_to_wqidx_distr(struct lpfc_hba *phba,
  					  struct lpfc_scsi_buf *lpfc_cmd)
  	{
  		struct scsi_cmnd *cmnd = lpfc_cmd->pCmd;
  	<...>
  		if (shost_use_blk_mq(cmnd->device->host)) {
  	<...>

  
  So, back to the assembly, this seems the 2 function parameters
  passed by register (r4, r3), loaded into other registers (r30, r31).

  	   674cc:       78 23 9e 7c     mr      r30,r4
  	   674d0:       78 1b 7f 7c     mr      r31,r3

  Per the load below, r10 is *cmnd, and r30 is *lpfc_cmd;  it loads
  lpfc_cmd->pCmd, which has offset 16 bytes into struct lpfc_cmd [2]
  (after 2 pointers * 8-bytes each, from list_head list [3])

             674dc:       10 00 5e e9     ld      r10,16(r30)

  	struct lpfc_scsi_buf {
  		struct list_head list;
  		struct scsi_cmnd *pCmd;
  	<...>

  	struct list_head {
  		struct list_head *next, *prev;
  	};

  And the load below hits the crash, because it dereferences r10 (*cmnd)
  which is zero:

             674e0:       00 00 2a e9     ld      r9,0(r10)  <<--
  (crash)

          From xmon:

          R10 = 0000000000000000   R26 = 0000000000000001

  That deference was for cmnd->device; you can see the load instruction immediately
  afterward would further dereference the cmnd->device pointer, for the device->host field,
  which has offset 0 into struct scsi_device [4]:
   --- this is a confirmantion that the assembly/C matching looks correct.

                  if (shost_use_blk_mq(cmnd->device->host)) {

  	struct scsi_device {
  		struct Scsi_Host *host;
  	<...>

  
  [1] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/drivers/scsi/lpfc/lpfc_scsi.c?h=Ubuntu-4.4.0-22.40
  [2] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/drivers/scsi/lpfc/lpfc_scsi.h?h=Ubuntu-4.4.0-22.40#n130
  [3] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/include/linux/types.h?h=Ubuntu-4.4.0-22.40#n185
  [4] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/tree/include/scsi/scsi_device.h?h=Ubuntu-4.4.0-22.40#n77

  One detail missing..

  (In reply to comment #16)
  > And the load below hits the crash, because it dereferences r10 (*cmnd) which
  > is zero:
  > 
  > 	   674e0:       00 00 2a e9     ld      r9,0(r10)  <<-- (crash)
  > 
  > 	From xmon:
  > 
  > 	R10 = 0000000000000000   R26 = 0000000000000001
  > 
  > That deference was for cmnd->device;  [snip]

  which has offset zero into struct scsi_cmnd [5]

          if (shost_use_blk_mq(cmnd->device->host)) {

          struct scsi_cmnd {
  	        struct scsi_device *device;
          <...>

  [5] http://kernel.ubuntu.com/git/ubuntu/ubuntu-
  xenial.git/tree/include/scsi/scsi_cmnd.h?h=Ubuntu-4.4.0-22.40#n59

  > Would need to understand why/how this
  > '(struct scsi_cmnd *cmnd)->device' field is NULL.

  Checking this again today, it occurred to me that the problem is
  actually *cmnd == NULL, and dereferencing *cmnd (for cmnd->device)
  hits the crash.

  Fix submitted upstream:
      http://marc.info/?l=linux-scsi&m=146534119707379&w=2

  I didn't provide a test kernel because the system was running
  regression tests over weekend, and it takes long to reproduce the
  problem w/ DLPAR operations -- but the same problem could be
  reproduced w/ simpler test-cases (see commit), so I worked it in the
  background.

  If you keep hitting this, let me know and I'll provide a test kernel.

  Hi Mauricio,
  Installed the fix and verified. Now  am not finding the issue.

  Hi Canonical,

  Can you consider picking up this fix that has not yet made the
  upstream kernel?

  It's fairly obvious and trivial, very documented in the commit message
  (w/ test-cases), and has been tested successfully here in IBM (also
  see commit msg).

  http://marc.info/?l=linux-scsi&m=146534119707379&w=2

  The adapter vendor's team has not yet reviewed it on the mailing list
  (and no other patches for lpfc), so I guess it'll take some time until
  this makes in.

  Is that possible?

  Thanks

  Mauricio

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1597974/+subscriptions


References