← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1673303] [NEW] [Xenial] net: better skb->sender_cpu and skb->napi_id cohabitation

 

Public bug reported:

== Xenial SRU ==
We've twice now tried to roll out new firewalls and twice had to
revert back when the new firewalls almost immediately hung after
cutover.

At first we thought it was hardware issues, but after we reproduced it
on 4 different firewalls, we realised it was more likely to be a
problem with the Xenial kernel.

We think we're running into something similar to:

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1579943

And Joel thinks the following patch might fix it:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb

Unfortunately, even when we mimic live production traffic on the new
firewalls with port mirroring, we only have a ~20% success rate at
reproducing the kernel hang and I'm keen not to have any more failed
migration attempts (and the corresponding downtime for many many
services).

== Fix ==
See http://kernel.ubuntu.com/~ogasawara/lp1579943/

== Testing ==
We've just successfully migrated four firewalls
that are running with the patched kernel. Previously two of them would
have survived for less than 2 minutes, both have now been running in
production for over an hour.

I'll provide another update tomorrow, however at this stage I'd suggest
that it makes sense to get this into an SRU.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Affects: linux (Ubuntu Xenial)
     Importance: Medium
     Assignee: Leann Ogasawara (leannogasawara)
         Status: In Progress

** Affects: linux (Ubuntu Yakkety)
     Importance: Undecided
         Status: Fix Released

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Yakkety)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1673303

Title:
  [Xenial] net: better skb->sender_cpu and skb->napi_id cohabitation

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Xenial SRU ==
  We've twice now tried to roll out new firewalls and twice had to
  revert back when the new firewalls almost immediately hung after
  cutover.

  At first we thought it was hardware issues, but after we reproduced it
  on 4 different firewalls, we realised it was more likely to be a
  problem with the Xenial kernel.

  We think we're running into something similar to:

    https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1579943

  And Joel thinks the following patch might fix it:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb

  Unfortunately, even when we mimic live production traffic on the new
  firewalls with port mirroring, we only have a ~20% success rate at
  reproducing the kernel hang and I'm keen not to have any more failed
  migration attempts (and the corresponding downtime for many many
  services).

  == Fix ==
  See http://kernel.ubuntu.com/~ogasawara/lp1579943/

  == Testing ==
  We've just successfully migrated four firewalls
  that are running with the patched kernel. Previously two of them would
  have survived for less than 2 minutes, both have now been running in
  production for over an hour.

  I'll provide another update tomorrow, however at this stage I'd suggest
  that it makes sense to get this into an SRU.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1673303/+subscriptions


Follow ups