← Back to team overview

kernel-packages team mailing list archive

[Bug 1597867] [NEW] thunderx nics fail to establish link

 

Public bug reported:

[Impact]
When connected to certain switches, Cavium ThunderX nodes will occasionally fail to establish a link. On one setup (using a Cisco 10G switch), we're seeing a failure rate of about 20%.

Manually reloading the driver modules - or rebooting - is required to
recover.

A fix is now available in linux-next that greatly reduces (though it
doesn't 100% eliminate) the frequency of occurrences (2% vs. 20%, in my
case). Investigation continues to identify a resolution for the
remaining cases.

[Test Case]
Connect Cavium ThunderX systems to a known bad switch, and put them in a reboot loop. In my test, I use the maas cli to release/acquire/deploy systems, and wait for a node to enter the deployment failure state.

[Regression Risk]
The fix is upstream, and internal to a specific driver only used on Cavium ThunderX systems.

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: dann frazier (dannf)
         Status: In Progress

** Affects: linux (Ubuntu Xenial)
     Importance: High
     Assignee: dann frazier (dannf)
         Status: In Progress

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Xenial)
       Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
     Assignee: (unassigned) => dann frazier (dannf)

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1597867

Title:
  thunderx nics fail to establish link

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  [Impact]
  When connected to certain switches, Cavium ThunderX nodes will occasionally fail to establish a link. On one setup (using a Cisco 10G switch), we're seeing a failure rate of about 20%.

  Manually reloading the driver modules - or rebooting - is required to
  recover.

  A fix is now available in linux-next that greatly reduces (though it
  doesn't 100% eliminate) the frequency of occurrences (2% vs. 20%, in
  my case). Investigation continues to identify a resolution for the
  remaining cases.

  [Test Case]
  Connect Cavium ThunderX systems to a known bad switch, and put them in a reboot loop. In my test, I use the maas cli to release/acquire/deploy systems, and wait for a node to enter the deployment failure state.

  [Regression Risk]
  The fix is upstream, and internal to a specific driver only used on Cavium ThunderX systems.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1597867/+subscriptions