← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1672550] Re: i40e Intel X710 error during device probe prevents link set up and ip association

 

** Changed in: linux (Ubuntu)
       Status: Triaged => Fix Released

** Changed in: linux (Ubuntu Xenial)
       Status: Triaged => In Progress

** Changed in: linux (Ubuntu Xenial)
     Assignee: Canonical Kernel Team (canonical-kernel-team) => Seth Forshee (sforshee)

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1672550

Title:
  i40e Intel X710 error during device probe prevents link set up and ip
  association

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress

Bug description:
  == Comment: #0 - Mauro Sergio Martins Rodrigues - 2017-02-22 06:48:42 ==
  While investigating bug #145959 I got blocked in the reproduction process due to the follow issue during interface link bring up:

  [    1.590591] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC failed = 14
  [    1.590661] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
  [    1.590669] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL

  which prevented me to bring the interface up and associate an ip to
  it.

  == Comment: #2 - Mauro Sergio Martins Rodrigues - 2017-02-22 07:26:36 ==
  some missing Information kernel is Ubuntu's 4.4.0-62-generic.

  When testing with 4.8.0-36-generic (from xenial's proposed) device
  probe works fine, no similar message is seen.

  To obtain some more data on this I added some statements to see which
  TC MAP was applied in a healthy probe (note that the other functions,
  like function 1 works fine but those functions have no cable on them).

  root@yangtze-lp1:~/_maurosr/linux-4.4.0/drivers/net/ethernet/intel/i40e# dmesg 
  [52448.914605] i40e 0045:01:00.3: i40e_ptp_stop: removed PHC on enP69p1s0f3
  [52448.981801] i40e 0045:01:00.2: i40e_ptp_stop: removed PHC on enP69p1s0f2
  [52449.069793] i40e 0045:01:00.1: i40e_ptp_stop: removed PHC on enP69p1s0f1
  [52449.173834] i40e 0045:01:00.0: i40e_ptp_stop: removed PHC on enP69p1s0f0
  [52449.264462] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.4.25-k
  [52449.264468] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
  [52449.264625] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
  [52449.286138] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
  [52449.505657] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
  [52449.508977] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
  [52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
  [52449.531210] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC failed = 14
  [52449.531213] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
  [52449.531217] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
  [52449.544642] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
  [52449.697424] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
  [52449.727043] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 0 RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
  [52449.727098] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
  [52449.748667] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
  [52449.976665] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
  [52449.980685] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
  [52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1
  [52450.015610] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
  [52450.074479] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
  [52450.080516] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA

  Comparing function 0:
  [52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
  and function 1:
  [52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1

  
  Then looking at 4.8:
  [  123.425399] i40e: loading out-of-tree module taints kernel.
  [  123.428958] i40e: module verification failed: signature and/or required key missing - tainting kernel
  [  123.430690] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.6.11-k
  [  123.430691] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
  [  123.430918] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
  [  123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
  [  123.664088] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
  [  123.667878] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
  [  123.681915] Non-contiguous TC - Disabling DCB
  [  123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1
  [  123.713262] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
  [  123.864601] i40e 0045:01:00.0: Added LAN device PF0 bus=0x00 func=0x00
  [  123.864611] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
  [  123.893254] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 RSS FD_ATR DCB VxLAN Geneve PTP VEPA
  [  123.893321] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
  [  123.914829] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
  [  124.152980] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
  [  124.156999] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
  [  124.171266] i40e 0045:01:00.1: DEBUG DATA vsi > 398, enabled_tc 1
  [  124.196080] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
  [  124.253353] i40e 0045:01:00.1: Added LAN device PF1 bus=0x00 func=0x01
  [  124.253387] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
  [  124.263908] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 RSS FD_ATR DCB VxLAN Geneve PTP VEPA

  
  These 2 lines are important here:
  [  123.681915] Non-contiguous TC - Disabling DCB
  [  123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1

  First it decided to disable DCB feature due to lack of contiguous
  traffic classes, and then it used TC MAP (enabled_tc in device driver
  code as 1, same we already knew works). With that information in hand
  I forced enabled_tc (TC MAP) to 1 in 4.4's code and it worked, so I'm
  suspecting of a bad TC mask due to DCB being enabled.

  == Comment: #3 - Mauro Sergio Martins Rodrigues - 2017-02-23 11:24:41 ==
  I tried the 4.4's version of the i40e but with dcbx disabled in switch's port, Traffic class setup and function bring up worked fine! It user TC MAP (or traffic class mask) as 1. I do understand that this is just a workaround though, the device driver should deal with the case where the switch has such feature enabled instead of leaving the device 'broken':

  [  199.762738] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
  [  199.786589] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
  [  200.045270] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
  [  200.048955] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
  [  200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
  [  200.069232] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 1
  [  200.088056] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
  [  200.240641] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
  [  200.270717] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA

  The line
  [  200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
  corresponds to the piece of code where the traffic class is defined (see: http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40e/i40e_main.c?v=4.4#L4563)

  Another interesting discovery is that the device behaves well when we
  turn dcbx on in the switch after it's already probed:

  [  609.566786] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
  [  609.566794] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
  [  611.574987] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
  [  611.574990] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
  [  611.574994] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 31

  and such transition set traffic class mask as 31 instead of 255. and
  if we unload/load the module it goes to the original bad state we
  experienced in this bug again:

  [  746.151068] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
  [  746.174695] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
  [  746.433649] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
  [  746.437552] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
  [  746.457815] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
  [  746.457819] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
  [  746.459537] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC failed = 14
  [  746.459541] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
  [  746.459550] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL

  == Comment: #4 - Mauro Sergio Martins Rodrigues - 2017-02-23 14:25:30 ==
  Things are going smoothly in kernel 4.8 even if dcbx is enabled in the port due to this commit https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c which disabledcbx when TC are not contiguous (it's not supported by the device) 

  We should ask for a backport into 4.4.0 but I'm still investigating to
  see if something else should be included since in comment #3 we can
  see it transitioning into a valid state when dcbx is enabled in the
  switch.

  == Comment: #5 - Mauro Sergio Martins Rodrigues - 2017-03-13 13:41:19 ==
  Even though it was already clear that was related to kernel code, since it works on 4.8 and doesn't in 4.4 I decided to perform a nvm update and it didn't change the scenario. 

  comment #2 show nvm version as:
  > [  123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0

  Current version is:
  firmware-version: 5.05 0x8000289d 1.1568.0

  and the issue continues reproducible .

  As stated in comment #4, now I can confirm we need to backport
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c
  to 4.4 to avoid getting into the broken state when probing Intel x710
  (driver i40e).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672550/+subscriptions