← Back to team overview

kernel-packages team mailing list archive

[Bug 1473883] Missing required logs.

 

This bug is missing log files that will aid in diagnosing the problem.
>From a terminal window please run:

apport-collect 1473883

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1473883

Title:
  Kernel panics on mlx4_core (Mellanox Core driver) with SR-IOV mode

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  While load/unload mlx4_core twice  with SR-IOV mode enabled in host
  with multiple Mellanox devices (some of them support SR-IOV and other
  don't) this will lead to kernel panic.

  The following two upstream commits fix this issue:

  commit 32b4ca5af1cf1c558dfca0e3417e9b35402401a6
  Author: Carol L Soto <clsoto@xxxxxxxxxxxxxxxxxx>
  Date:   Tue Jun 2 16:07:23 2015 -0500

      net/mlx4_core: double free of dev_vfs
      
      If user loads mlx4_core with num_vfs greater than
      supported then variable dev->dev_vfs is freed 2 times after unloading the
      driver.
      
      Acked-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>
      Signed-off-by: Carol L Soto <clsoto@xxxxxxxxxxxxxxxxxx>
      Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

  
  commit 7095b39f3189d2107045d769fdc32dfc0b704028
  Author: Carol Soto <clsoto@xxxxxxxxxxxxxxxxxx>
  Date:   Tue Jun 2 16:07:24 2015 -0500

      net/mlx4_core: need to call close fw if alloc icm is called twice
      
      If mlx4_enable_sriov is called by adapter without this
      feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this path the function alloc
      icm is called twice without freeing the structures from the first time.
      
      Acked-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>
      Signed-off-by: Carol L Soto <clsoto@xxxxxxxxxxxxxxxxxx>
      Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>


  Steps to reproduce:
  1- add the "options mlx4_core num_vfs=60 port_type_array=2,2" to /etc/modprobe.d/mlx4_core.conf file.
  2- unload mlx4_* kernel modules: modprobe -rv mlx4_en; modprobe -rv mlx4_ib; modprobe -rv mlx4_core;
  3- load mlx4_en kernel module:  modprobe -v mlx4_en
  4- edit /etc/modprobe.d/mlx4_core.conf file and put "options mlx4_core num_vfs=60 port_type_array=2,2" in comment.
  5 -repeat 2 and 3
  6- will get the following call trace.

  
  Call Trace:
   1175.699487] mlx4_core 0000:24:00.0: Received reset from slave:7 
  [ 1175.767388] mlx4_core 0000:24:00.0: Received reset from slave:6 
  [ 1175.830898] mlx4_core 0000:24:00.0: Received reset from slave:5 
  [ 1175.898229] mlx4_core 0000:24:00.0: Received reset from slave:4 
  [ 1175.963514] mlx4_core 0000:24:00.0: Received reset from slave:3 
  [ 1176.035312] mlx4_core 0000:24:00.0: Received reset from slave:2 
  [ 1176.105085] mlx4_core 0000:24:00.0: Received reset from slave:1 
  [ 1177.253200] mlx4_core 0000:24:00.0: Disabling SR-IOV            
  [ 1179.724864] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
  [ 1179.724885] mlx4_core: Initializing 0000:21:00.0                       
  [ 1185.760555] mlx4_core 0000:21:00.0: Enabling SR-IOV with 60 VFs        
  [ 1185.760575] mlx4_core 0000:21:00.0: Failed to enable SR-IOV, continuing without SR-IOV (err = -22)
  [ 1185.770550] mlx4_core 0000:21:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s                                                                                                           
  [ 1185.770552] mlx4_core 0000:21:00.0: PCIe link width is x8, device supports x8                                                                                                                     
  [ 1185.771870] ------------[ cut here ]------------                                                                                                                                                  
  [ 1185.771878] WARNING: CPU: 6 PID: 5947 at /build/buildd/linux-3.19.0/fs/sysfs/dir.c:31 sysfs_warn_dup+0x68/0x80()                                                                                  
  [ 1185.771880] sysfs: cannot create duplicate filename '/devices/pci0000:20/0000:20:03.0/0000:21:00.0/msi_irqs/57'                                                                                   
  [ 1185.771881] Modules linked in: mlx4_core(+) vxlan ip6_udp_tunnel udp_tunnel mst_pciconf(OE) mst_pci(OE) nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul dm_multipath glue_helper scsi_dh ablk_helper cryptd joydev lpc_ich serio_raw ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter ioatdma dca hpilo mac_hid wmi sb_edac edac_core shpchp nfsd auth_rpcgss                                                                                                                                                                         
  [ 1185.771920]  nfs_acl lockd grace sunrpc autofs4 hid_generic usbhid tg3 pata_acpi ptp hid psmouse hpsa pps_core [last unloaded: ib_addr]                                                           
  [ 1185.771931] CPU: 6 PID: 5947 Comm: modprobe Tainted: G           OE  3.19.0-16-generic #16-Ubuntu                                                                                                 
  [ 1185.771932] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 03/01/2013                                                                                                                           
  [ 1185.771934]  ffffffff81abb6d8 ffff88086cdb37c8 ffffffff817c2235 0000000000000007                                                                                                                  
  [ 1185.771936]  ffff88086cdb3818 ffff88086cdb3808 ffffffff8107595a 0000000000000292                                                                                                                  
  [ 1185.771938]  ffff88084d1ea000 ffff88086d1c1648 ffff8807b3df62d0 ffff880867ab85a0                                                                                                                  
  [ 1185.771941] Call Trace:                                                                                                                                                                           
  [ 1185.771949]  [<ffffffff817c2235>] dump_stack+0x45/0x57                                                                                                                                            
  [ 1185.771953]  [<ffffffff8107595a>] warn_slowpath_common+0x8a/0xc0                                                                                                                                  
  [ 1185.771955]  [<ffffffff810759d6>] warn_slowpath_fmt+0x46/0x50                                                                                                                                     
  [ 1185.771958]  [<ffffffff8126ab58>] ? kernfs_path+0x48/0x60                                                                                                                                         
  [ 1185.771961]  [<ffffffff8126e508>] sysfs_warn_dup+0x68/0x80                                                                                                                                        
  [ 1185.771963]  [<ffffffff8126e1ff>] sysfs_add_file_mode_ns+0x14f/0x1c0                                                                                                                              
  [ 1185.771966]  [<ffffffff8126c050>] ? kernfs_create_dir_ns+0x50/0x80                                                                                                                                
  [ 1185.771969]  [<ffffffff8126edf9>] internal_create_group+0xd9/0x280                                                                                                                                
  [ 1185.771971]  [<ffffffff8126f0d9>] sysfs_create_groups+0x49/0xa0                                                                                                                                   
  [ 1185.771976]  [<ffffffff8141bfad>] populate_msi_sysfs+0x1bd/0x200                                                                                                                                  
  [ 1185.771978]  [<ffffffff8141c4c8>] pci_enable_msix+0x158/0x3c0                                                                                                                                     
  [ 1185.771980]  [<ffffffff8141c75d>] pci_enable_msix_range+0x2d/0x70                                                                                                                                 
  [ 1185.771991]  [<ffffffffc0900245>] mlx4_load_one+0xea5/0x1410 [mlx4_core]                                                                                                                          
  [ 1185.771999]  [<ffffffffc0900c9b>] mlx4_init_one+0x4eb/0x600 [mlx4_core]                                                                                                                           
  [ 1185.772003]  [<ffffffff81401155>] local_pci_probe+0x45/0xa0                                                                                                                                       
  [ 1185.772005]  [<ffffffff81402345>] ? pci_match_device+0xe5/0x110                                                                                                                                   
  [ 1185.772007]  [<ffffffff81402489>] pci_device_probe+0xd9/0x130                                                                                                                                     
  [ 1185.772012]  [<ffffffff81506523>] driver_probe_device+0xa3/0x410                                                                                                                                  
  [ 1185.772014]  [<ffffffff8150696b>] __driver_attach+0x9b/0xa0                                                                                                                                       
  [ 1185.772016]  [<ffffffff815068d0>] ? __device_attach+0x40/0x40                                                                                                                                     
  [ 1185.772020]  [<ffffffff815042eb>] bus_for_each_dev+0x6b/0xb0                                                                                                                                      
  [ 1185.772022]  [<ffffffff81505f8e>] driver_attach+0x1e/0x20                                                                                                                                         
  [ 1185.772024]  [<ffffffff81505b60>] bus_add_driver+0x180/0x250                                                                                                                                      
  [ 1185.772027]  [<ffffffffc0344000>] ? 0xffffffffc0344000                                                                                                                                            
  [ 1185.772030]  [<ffffffff81507164>] driver_register+0x64/0xf0                                                                                                                                       
  [ 1185.772034]  [<ffffffff8140098c>] __pci_register_driver+0x4c/0x50                                                                                                                                 
  [ 1185.772042]  [<ffffffffc0344126>] mlx4_init+0x126/0x1000 [mlx4_core]                                                                                                                              
  [ 1185.772047]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210                                                                                                                                      
  [ 1185.772053]  [<ffffffff811d5b49>] ? kmem_cache_alloc_trace+0x189/0x200                                                                                                                            
  [ 1185.772058]  [<ffffffff810f99c4>] ? load_module+0x15a4/0x1ce0                                                                                                                                     
  [ 1185.772061]  [<ffffffff810f99fe>] load_module+0x15de/0x1ce0                                                                                                                                       
  [ 1185.772063]  [<ffffffff810f51d0>] ? store_uevent+0x40/0x40                                                                                                                                        
  [ 1185.772067]  [<ffffffff810fa276>] SyS_finit_module+0x86/0xb0                                                                                                                                      
  [ 1185.772072]  [<ffffffff817c934d>] system_call_fastpath+0x16/0x1b                                                                                                                                  
  [ 1185.772074] ---[ end trace 9d9c0896e72e5312 ]---                                                                                                                                                  
  [ 1185.873139] mlx4_core 0000:21:00.0: command 0x31 timed out (go bit not cleared)                                                                                                                   
  [ 1185.873147] mlx4_core 0000:21:00.0: device is going to be reset                                                                                                                                   
  [ 1186.881239] mlx4_core 0000:21:00.0: device was reset successfully                                                                                                                                 
  [ 1186.888006] mlx4_core 0000:21:00.0: NOP command failed to generate interrupt (IRQ 53), aborting                                                                                                   
  [ 1186.897831] mlx4_core 0000:21:00.0: BIOS or ACPI interrupt routing problem?                                                                                                                       
  [ 1186.907762] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c                                                                                                             
  [ 1186.916462] IP: [<ffffffff81181185>] __free_pages+0x5/0x30                                                                                                                                        
  [ 1186.922560] PGD 0                                                                                                                                                                                 
  [ 1186.924814] Oops: 0002 [#1] SMP                                                                                                                                                                   
  [ 1186.928423] Modules linked in: mlx4_core(+) vxlan ip6_udp_tunnel udp_tunnel mst_pciconf(OE) mst_pci(OE) nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul dm_multipath glue_helper scsi_dh ablk_helper cryptd joydev lpc_ich serio_raw ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter ioatdma dca hpilo mac_hid wmi sb_edac edac_core shpchp nfsd auth_rpcgss                                                                                                                                                                         
  [ 1187.008078]  nfs_acl lockd grace sunrpc autofs4 hid_generic usbhid tg3 pata_acpi ptp hid psmouse hpsa pps_core [last unloaded: ib_addr]                                                           
  [ 1187.020643] CPU: 8 PID: 5947 Comm: modprobe Tainted: G        W  OE  3.19.0-16-generic #16-Ubuntu                                                                                                 
  [ 1187.030455] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 03/01/2013                                                                                                                           
  [ 1187.037778] task: ffff88079d6cb110 ti: ffff88086cdb0000 task.ti: ffff88086cdb0000                                                                                                                 
  [ 1187.046064] RIP: 0010:[<ffffffff81181185>]  [<ffffffff81181185>] __free_pages+0x5/0x30                                                                                                            
  [ 1187.054859] RSP: 0018:ffff88086cdb39a0  EFLAGS: 00010206                                                                                                                                          
  [ 1187.060730] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000                                                                                                                     
  [ 1187.068610] RDX: 00000000000ffff8 RSI: 0000000000000014 RDI: 0000000000000000                                                                                                                     
  [ 1187.076492] RBP: ffff88086cdb39e8 R08: 0000000000000040 R09: 0000000000000000                                                                                                                     
  [ 1187.084374] R10: 0000000000000040 R11: ffff88079bbf6000 R12: ffff8807b3e20000                                                                                                                     
  [ 1187.092253] R13: ffff88086921a420 R14: ffff88086921a400 R15: 0000000000000001                                                                                                                     
  [ 1187.100139] FS:  00007fadaa1b9700(0000) GS:ffff88087f840000(0000) knlGS:0000000000000000                                                                                                          
  [ 1187.109092] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                     
  [ 1187.115445] CR2: 000000000000001c CR3: 0000000823f6f000 CR4: 00000000000407e0                                                                                                                     
  [ 1187.123336] Stack:                                                                                                                                                                                
  [ 1187.125570]  ffffffffc08f9d9f 0000000000000099 ffff88086921a3e0 ffff88086cdb39e8
  [ 1187.133802]  0000000000000099 ffff8807b3e20000 ffff8807b3e23268 0000000000000099
  [ 1187.142030]  ffff8807b3e20000 ffff88086cdb3a18 ffffffffc08fab7c ffff8807b3e20000
  [ 1187.150264] Call Trace:
  [ 1187.153003]  [<ffffffffc08f9d9f>] ? mlx4_free_icm+0x17f/0x1d0 [mlx4_core]
  [ 1187.160526]  [<ffffffffc08fab7c>] mlx4_cleanup_icm_table+0x5c/0x80 [mlx4_core]
  [ 1187.168537]  [<ffffffffc08fb5bd>] mlx4_free_icms+0x1d/0x100 [mlx4_core]
  [ 1187.175849]  [<ffffffffc08fba8b>] mlx4_close_hca+0x4b/0x70 [mlx4_core]
  [ 1187.183072]  [<ffffffffc08ff943>] mlx4_load_one+0x5a3/0x1410 [mlx4_core]
  [ 1187.190480]  [<ffffffffc0900c9b>] mlx4_init_one+0x4eb/0x600 [mlx4_core]
  [ 1187.197786]  [<ffffffff81401155>] local_pci_probe+0x45/0xa0
  [ 1187.203944]  [<ffffffff81402345>] ? pci_match_device+0xe5/0x110
  [ 1187.210485]  [<ffffffff81402489>] pci_device_probe+0xd9/0x130
  [ 1187.216842]  [<ffffffff81506523>] driver_probe_device+0xa3/0x410
  [ 1187.223478]  [<ffffffff8150696b>] __driver_attach+0x9b/0xa0
  [ 1187.229643]  [<ffffffff815068d0>] ? __device_attach+0x40/0x40
  [ 1187.236002]  [<ffffffff815042eb>] bus_for_each_dev+0x6b/0xb0
  [ 1187.242256]  [<ffffffff81505f8e>] driver_attach+0x1e/0x20
  [ 1187.248222]  [<ffffffff81505b60>] bus_add_driver+0x180/0x250
  [ 1187.254479]  [<ffffffffc0344000>] ? 0xffffffffc0344000
  [ 1187.260158]  [<ffffffff81507164>] driver_register+0x64/0xf0
  [ 1187.266334]  [<ffffffff8140098c>] __pci_register_driver+0x4c/0x50
  [ 1187.273077]  [<ffffffffc0344126>] mlx4_init+0x126/0x1000 [mlx4_core]
  [ 1187.280112]  [<ffffffff81002148>] do_one_initcall+0xd8/0x210
  [ 1187.286383]  [<ffffffff811d5b49>] ? kmem_cache_alloc_trace+0x189/0x200
  [ 1187.293753]  [<ffffffff810f99c4>] ? load_module+0x15a4/0x1ce0
  [ 1187.300109]  [<ffffffff810f99fe>] load_module+0x15de/0x1ce0
  [ 1187.306271]  [<ffffffff810f51d0>] ? store_uevent+0x40/0x40
  [ 1187.312333]  [<ffffffff810fa276>] SyS_finit_module+0x86/0xb0
  [ 1187.318595]  [<ffffffff817c934d>] system_call_fastpath+0x16/0x1b
  [ 1187.325233] Code: 74 1c 48 8b 03 90 48 8b 7b 08 48 83 c3 10 44 89 ea 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 e8 eb a6 66 0f 1f 44 00 00 66 66 66 66 90 <f0> ff 4f 1c 74 05 c3 0f 1f 40 00 55 85 f6 48 89 e5 74 08 e8 d3
  [ 1187.346856] RIP  [<ffffffff81181185>] __free_pages+0x5/0x30
  [ 1187.353034]  RSP <ffff88086cdb39a0>
  [ 1187.356900] CR2: 000000000000001c
  [ 1187.361080] ---[ end trace 9d9c0896e72e5313 ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1473883/+subscriptions


References