← Back to team overview

kernel-packages team mailing list archive

[Bug 1564949] Re: Severe latency/skew on AMD Opetron processor


OK, so the local APIC timer is not working as I expected. Perhaps it's
not wired up to produce the timer interrupt as we expect.

I've disabled the Local APIC timer and lo and behold it works fine:

Testing for clock jitter on 8 cpus
PASSED: largest jitter seen was 0.001365

Testing clock direction for 5 minutes...
PASSED: Iteration 0 delta: 0.000447
PASSED: Iteration 1 delta: 0.001461
PASSED: Iteration 2 delta: 0.001509
PASSED: Iteration 3 delta: 0.001465
PASSED: Iteration 4 delta: 0.001416
clock direction test: sleeptime 60 sec per iteration, failed iterations: 0

So much hunch is that there is something wrong in the firmware not
setting up the APIC IRQ routing and hence we're not getting wakeups from
the APICS causing the poor scheduling wakeup responses when the CPUs go
into an idle state.  As it stands, I believe there is something not
correct with the way the timer interrupts being configured, which is
certainly BIOS related.   I think we need an AMD APIC expert and/or
somebody with the IRQ firmware routing know-how to verify this

BTW,  the kernel is reporting that 4 of the ACPI _PRS objects are not
correctly configured:

<4>[    3.114961] ACPI: Invalid _PRS IRQ 0
<6>[    3.115192] ACPI: PCI Interrupt Link [U1PI] (IRQs) *0
<4>[    3.115613] ACPI: Invalid _PRS IRQ 0
<6>[    3.115846] ACPI: PCI Interrupt Link [U2PI] (IRQs) *0
<4>[    3.116244] ACPI: Invalid _PRS IRQ 0
<6>[    3.116484] ACPI: PCI Interrupt Link [U3PI] (IRQs) *0
<4>[    3.116891] ACPI: Invalid _PRS IRQ 0
<6>[    3.117147] ACPI: PCI Interrupt Link [U4PI] (IRQs) *0
<6>[    3.117439] ACPI: PCI Interrupt Link [SATA] (IRQs *16)
<4>[    3.117745] ACPI: Invalid _PRS IRQ 0

so this looks iffy too.

You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

  Severe latency/skew on AMD Opetron processor

Status in linux package in Ubuntu:
  In Progress

Bug description:
  Discovered this while doing pre-release certification testing for
  16.04 on an HP ProLiant DL385p Gen8 with an AMD Opteron 6320 8-core

  I have some code that essentially does this:  And note, I am NOT a C
  programmer, I know enough C to read it and do some minor things, and I
  once knew C++ fairly well, about 10 years ago...

  gettimeofday(&tval_start, NULL);
  gettimeofday(&tval_stop, NULL);

  where tval_start and tval_stop are timeval structs and sleeptime is

  Once it gets the start and stop it finds the delta minus the sleep

  In a perfect world, for example, start time would be 123456.123 and
  end time would be 123516.123 and the delta between them minus the 60
  seconds of sleep would be 0.

  Of course, that isn't how it works in reality so the delta may be a
  few microseconds here and there depending on what else the kernel is
  doing at any given moment.  The following, however, are on essentially
  idle Xenial systems (only processes running are whatever Ubuntu Server
  runs by default, nothing really taxing going on).

  On my Skylake i7 with Xenial, the time differences are never more than
  a few 10,000ths of a second: (kernel 4.4.0-15.31)
  Testing clock direction for 5 minutes...
  PASSED: Iteration 0 delta: 0.000109
  PASSED: Iteration 1 delta: 0.000068
  PASSED: Iteration 2 delta: 0.000107
  PASSED: Iteration 3 delta: 0.000216
  PASSED: Iteration 4 delta: 0.000089

  On a zVM instance (kernel 4.4.0-16.32) it's even better:
  PASSED: Iteration 0 delta: 0.000058
  PASSED: Iteration 1 delta: 0.000058
  PASSED: Iteration 2 delta: 0.000074
  PASSED: Iteration 3 delta: 0.000052
  PASSED: Iteration 4 delta: 0.000062

  But on an AMD cpu with Xenial (the only AMD CPU I have access to), the
  difference is always in the 10ths of a second, sometimes even several
  seconds... in other words, I've seen up to a 7.9 second delta with
  this code.  Here's one run that shows 3 seconds in one iteration:
  (kernel 4.4.0-15.31)
  FAILED: Iteration 0 delta: 3.057980
  FAILED: Iteration 1 delta: 0.225712
  FAILED: Iteration 2 delta: 0.241468
  FAILED: Iteration 3 delta: 0.229084
  FAILED: Iteration 4 delta: 0.223933

  I ran a second run on the AMD cpu and the latency was all over the place:
  FAILED: Iteration 0 delta: 9.302149
  FAILED: Iteration 1 delta: 0.624466
  FAILED: Iteration 2 delta: 1.644834
  FAILED: Iteration 3 delta: 1.011474
  FAILED: Iteration 4 delta: 0.923033

  After a discussion with cking and apw, deviations of as seen on the
  Intel and s390 CPUs are about what we should expect to see depending
  on what the system is doing at the moment gettimeofday() is executed.
  However, on the AMD CPU, differences of up to 9 seconds or more are
  NOT expected and highly irregular.

  Colin said he tested this on an AMD C60 CPU and got numbers inline
  with the Skylake and s390 chips and could not reproduce the times I am
  seeing on the Opteron.

  $ cat /proc/version_signature 
  Ubuntu 4.4.0-15.31-generic 4.4.6
   total 0
   crw-rw---- 1 root audio 116,  1 Mar 31 17:21 seq
   crw-rw---- 1 root audio 116, 33 Mar 31 17:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: [Errno 2] No such file or directory
  DistroRelease: Ubuntu 16.04
  HibernationDevice: RESUME=UUID=b6a44b05-ebe0-4d1c-a525-69d4748960f8
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: HP ProLiant DL385p Gen8
  Package: linux (not installed)
   PATH=(custom, no user)
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-15-generic root=UUID=70808172-98ba-4129-a185-20a112bdc4fe ro rootdelay=60
  ProcVersionSignature: Ubuntu 4.4.0-15.31-generic 4.4.6
   linux-restricted-modules-4.4.0-15-generic N/A
   linux-backports-modules-4.4.0-15-generic  N/A
   linux-firmware                            1.157
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial uec-images
  Uname: Linux 4.4.0-15-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 02/06/2014
  dmi.bios.vendor: HP
  dmi.bios.version: A28
  dmi.chassis.type: 23
  dmi.chassis.vendor: HP
  dmi.modalias: dmi:bvnHP:bvrA28:bd02/06/2014:svnHP:pnProLiantDL385pGen8:pvr:cvnHP:ct23:cvr:
  dmi.product.name: ProLiant DL385p Gen8
  dmi.sys.vendor: HP

To manage notifications about this bug go to: