group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #11708
[Bug 1672550] Re: i40e Intel X710 error during device probe prevents link set up and ip association
** Changed in: linux (Ubuntu)
Importance: Undecided => High
** Changed in: linux (Ubuntu)
Status: New => Triaged
** Changed in: linux (Ubuntu)
Assignee: Taco Screen team (taco-screen-team) => Canonical Kernel Team (canonical-kernel-team)
** Also affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Xenial)
Importance: Undecided => High
** Changed in: linux (Ubuntu Xenial)
Status: New => Triaged
** Changed in: linux (Ubuntu Xenial)
Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1672550
Title:
i40e Intel X710 error during device probe prevents link set up and ip
association
Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged
Bug description:
== Comment: #0 - Mauro Sergio Martins Rodrigues - 2017-02-22 06:48:42 ==
While investigating bug #145959 I got blocked in the reproduction process due to the follow issue during interface link bring up:
[ 1.590591] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC failed = 14
[ 1.590661] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
[ 1.590669] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
which prevented me to bring the interface up and associate an ip to
it.
== Comment: #2 - Mauro Sergio Martins Rodrigues - 2017-02-22 07:26:36 ==
some missing Information kernel is Ubuntu's 4.4.0-62-generic.
When testing with 4.8.0-36-generic (from xenial's proposed) device
probe works fine, no similar message is seen.
To obtain some more data on this I added some statements to see which
TC MAP was applied in a healthy probe (note that the other functions,
like function 1 works fine but those functions have no cable on them).
root@yangtze-lp1:~/_maurosr/linux-4.4.0/drivers/net/ethernet/intel/i40e# dmesg
[52448.914605] i40e 0045:01:00.3: i40e_ptp_stop: removed PHC on enP69p1s0f3
[52448.981801] i40e 0045:01:00.2: i40e_ptp_stop: removed PHC on enP69p1s0f2
[52449.069793] i40e 0045:01:00.1: i40e_ptp_stop: removed PHC on enP69p1s0f1
[52449.173834] i40e 0045:01:00.0: i40e_ptp_stop: removed PHC on enP69p1s0f0
[52449.264462] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.4.25-k
[52449.264468] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[52449.264625] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[52449.286138] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[52449.505657] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[52449.508977] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
[52449.531210] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC failed = 14
[52449.531213] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
[52449.531217] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
[52449.544642] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
[52449.697424] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
[52449.727043] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 0 RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
[52449.727098] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
[52449.748667] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[52449.976665] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
[52449.980685] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
[52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1
[52450.015610] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
[52450.074479] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
[52450.080516] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
Comparing function 0:
[52449.529200] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
and function 1:
[52449.994982] i40e 0045:01:00.1: DEBUG DATA vsi > 398;enabled_tc > 1
Then looking at 4.8:
[ 123.425399] i40e: loading out-of-tree module taints kernel.
[ 123.428958] i40e: module verification failed: signature and/or required key missing - tainting kernel
[ 123.430690] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.6.11-k
[ 123.430691] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
[ 123.430918] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[ 123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[ 123.664088] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[ 123.667878] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[ 123.681915] Non-contiguous TC - Disabling DCB
[ 123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1
[ 123.713262] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
[ 123.864601] i40e 0045:01:00.0: Added LAN device PF0 bus=0x00 func=0x00
[ 123.864611] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
[ 123.893254] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 RSS FD_ATR DCB VxLAN Geneve PTP VEPA
[ 123.893321] i40e 0045:01:00.1: Using 64-bit DMA iommu bypass
[ 123.914829] i40e 0045:01:00.1: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[ 124.152980] i40e 0045:01:00.1: MAC address: 68:05:ca:2d:e9:09
[ 124.156999] i40e 0045:01:00.1: SAN MAC: 68:05:ca:2d:e9:0d
[ 124.171266] i40e 0045:01:00.1: DEBUG DATA vsi > 398, enabled_tc 1
[ 124.196080] i40e 0045:01:00.1 enP69p1s0f1: renamed from eth0
[ 124.253353] i40e 0045:01:00.1: Added LAN device PF1 bus=0x00 func=0x01
[ 124.253387] i40e 0045:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
[ 124.263908] i40e 0045:01:00.1: Features: PF-id[1] VFs: 32 VSIs: 34 QP: 128 RSS FD_ATR DCB VxLAN Geneve PTP VEPA
These 2 lines are important here:
[ 123.681915] Non-contiguous TC - Disabling DCB
[ 123.690177] i40e 0045:01:00.0: DEBUG DATA vsi > 399, enabled_tc 1
First it decided to disable DCB feature due to lack of contiguous
traffic classes, and then it used TC MAP (enabled_tc in device driver
code as 1, same we already knew works). With that information in hand
I forced enabled_tc (TC MAP) to 1 in 4.4's code and it worked, so I'm
suspecting of a bad TC mask due to DCB being enabled.
== Comment: #3 - Mauro Sergio Martins Rodrigues - 2017-02-23 11:24:41 ==
I tried the 4.4's version of the i40e but with dcbx disabled in switch's port, Traffic class setup and function bring up worked fine! It user TC MAP (or traffic class mask) as 1. I do understand that this is just a workaround though, the device driver should deal with the case where the switch has such feature enabled instead of leaving the device 'broken':
[ 199.762738] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[ 199.786589] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[ 200.045270] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[ 200.048955] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[ 200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
[ 200.069232] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 1
[ 200.088056] i40e 0045:01:00.0 enP69p1s0f0: renamed from eth0
[ 200.240641] i40e 0045:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
[ 200.270717] i40e 0045:01:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 RX: 1BUF RSS FD_ATR DCB VxLAN Geneve PTP VEPA
The line
[ 200.069228] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
corresponds to the piece of code where the traffic class is defined (see: http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/i40e/i40e_main.c?v=4.4#L4563)
Another interesting discovery is that the device behaves well when we
turn dcbx on in the switch after it's already probed:
[ 609.566786] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
[ 609.566794] i40e 0045:01:00.0: DEBUG DATA >> dcb not enabled - first if
[ 611.574987] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
[ 611.574990] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
[ 611.574994] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 31
and such transition set traffic class mask as 31 instead of 255. and
if we unload/load the module it goes to the original bad state we
experienced in this bug again:
[ 746.151068] i40e 0045:01:00.0: Using 64-bit DMA iommu bypass
[ 746.174695] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
[ 746.433649] i40e 0045:01:00.0: MAC address: 68:05:ca:2d:e9:08
[ 746.437552] i40e 0045:01:00.0: SAN MAC: 68:05:ca:2d:e9:0c
[ 746.457815] i40e 0045:01:00.0: DEBUG DATA >> SFP - second if
[ 746.457819] i40e 0045:01:00.0: DEBUG DATA vsi > 399;enabled_tc > 255
[ 746.459537] i40e 0045:01:00.0: AQ command Config VSI BW allocation per TC failed = 14
[ 746.459541] i40e 0045:01:00.0: Failed configuring TC map 255 for VSI 399
[ 746.459550] i40e 0045:01:00.0: failed to configure TCs for main VSI tc_map 0x000000ff, err I40E_ERR_INVALID_QP_ID aq_err I40E_AQ_RC_EINVAL
== Comment: #4 - Mauro Sergio Martins Rodrigues - 2017-02-23 14:25:30 ==
Things are going smoothly in kernel 4.8 even if dcbx is enabled in the port due to this commit https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c which disabledcbx when TC are not contiguous (it's not supported by the device)
We should ask for a backport into 4.4.0 but I'm still investigating to
see if something else should be included since in comment #3 we can
see it transitioning into a valid state when dcbx is enabled in the
switch.
== Comment: #5 - Mauro Sergio Martins Rodrigues - 2017-03-13 13:41:19 ==
Even though it was already clear that was related to kernel code, since it works on 4.8 and doesn't in 4.4 I decided to perform a nvm update and it didn't change the scenario.
comment #2 show nvm version as:
> [ 123.450445] i40e 0045:01:00.0: fw 5.0.40043 api 1.5 nvm 5.02 0x80002284 0.0.0
Current version is:
firmware-version: 5.05 0x8000289d 1.1568.0
and the issue continues reproducible .
As stated in comment #4, now I can confirm we need to backport
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fbfe12c
to 4.4 to avoid getting into the broken state when probing Intel x710
(driver i40e).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672550/+subscriptions