kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #187040
[Bug 1597867] [NEW] thunderx nics fail to establish link
Public bug reported:
[Impact]
When connected to certain switches, Cavium ThunderX nodes will occasionally fail to establish a link. On one setup (using a Cisco 10G switch), we're seeing a failure rate of about 20%.
Manually reloading the driver modules - or rebooting - is required to
recover.
A fix is now available in linux-next that greatly reduces (though it
doesn't 100% eliminate) the frequency of occurrences (2% vs. 20%, in my
case). Investigation continues to identify a resolution for the
remaining cases.
[Test Case]
Connect Cavium ThunderX systems to a known bad switch, and put them in a reboot loop. In my test, I use the maas cli to release/acquire/deploy systems, and wait for a node to enter the deployment failure state.
[Regression Risk]
The fix is upstream, and internal to a specific driver only used on Cavium ThunderX systems.
** Affects: linux (Ubuntu)
Importance: High
Assignee: dann frazier (dannf)
Status: In Progress
** Affects: linux (Ubuntu Xenial)
Importance: High
Assignee: dann frazier (dannf)
Status: In Progress
** Also affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Xenial)
Status: New => In Progress
** Changed in: linux (Ubuntu Xenial)
Assignee: (unassigned) => dann frazier (dannf)
** Changed in: linux (Ubuntu Xenial)
Importance: Undecided => High
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1597867
Title:
thunderx nics fail to establish link
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Xenial:
In Progress
Bug description:
[Impact]
When connected to certain switches, Cavium ThunderX nodes will occasionally fail to establish a link. On one setup (using a Cisco 10G switch), we're seeing a failure rate of about 20%.
Manually reloading the driver modules - or rebooting - is required to
recover.
A fix is now available in linux-next that greatly reduces (though it
doesn't 100% eliminate) the frequency of occurrences (2% vs. 20%, in
my case). Investigation continues to identify a resolution for the
remaining cases.
[Test Case]
Connect Cavium ThunderX systems to a known bad switch, and put them in a reboot loop. In my test, I use the maas cli to release/acquire/deploy systems, and wait for a node to enter the deployment failure state.
[Regression Risk]
The fix is upstream, and internal to a specific driver only used on Cavium ThunderX systems.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1597867/+subscriptions