← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1832082] Re: bnx2x driver causes 100% CPU load

 

** Changed in: linux (Ubuntu Cosmic)
       Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being skipped on timestamp routine due to a packet getting stuck in the timestamping "pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s <iface> msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions