← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1877858] Re: Improve TSC refinement (and calibration) reliability

 

** Changed in: linux (Ubuntu)
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1877858

Title:
  Improve TSC refinement (and calibration) reliability

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]
  * We received a report recently of a missing TSC refinement across multiple reboots of a server, in an Intel Skylake-based processor. This was only reproducible in Bionic pre-5.0.

  * After checking kernel commits, we came up with 2 commits that
  largely improve the situation: a786ef152cdc ("x86/tsc: Make
  calibration refinement more robust")
  [git.kernel.org/linus/a786ef152cdc] and 604dc9170f24 ("x86/tsc: Use
  CPUID.0x16 to calculate missing crystal frequency")
  [git.kernel.org/linus/604dc9170f24]. We hereby request SRU for both of
  them.

  * The first commit contains improvement in comments and in an offset to match more recent (fast) machines, but the important part is a retry mechanism in the TSC refinement (in case it fails due to some disturbance on TSC read, like NMIs/SMIs).
   
  * The second commit is an improvement in TSC calibration for Skylake (and some other models), by checking a register instead of relying on table-based hardcoded values.

  * A note for Xenial (kernel 4.4): the second patch would require the
  inclusion of more commits, so given the "maturity" of this release
  (and the fact kernel 4.15 is an HWE for Xenial), I've kept it out of
  Xenial, backporting only the first and more important patch for 4.4 .

  [Test case]
  * Unfortunately there's not an easy way to test the effectiveness of the commits, specially the refinement improvement.

  * The user that reported us the missing refinements was able to test
  300 reboots with a regular Bionic kernel (and it reproduced the issue
  at least once), whereas when they tested with Bionic kernel + both
  hereby proposed commits, the problem didn't happen.

  * Regarding the calibration commit, it was well-tested by community
  using multiple machines and checking the TSC calibration read vs.
  tables present in instlatx64.atw.hu .

  [Regression potential]
  * We consider the regression potential low, specially due to the nature of the patches: the first is basically a retry mechanism (and some improvement in an offset to reflect more recent machines), and the 2nd is an improvement for TSC calibration on some platforms (that are currently hardcoded in a table-based way in kernel). Also, the patches are present upstream for a while and I couldn't find any fixes for them.

  * An hypothetical regression from the 2nd patch could be in TSC
  precision calculation, which refinement itself might as well
  circumvent. From the first patch, a bug in code is the one
  hypothetical regression I could think.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1877858/+subscriptions