← Back to team overview

kernel-packages team mailing list archive

[Bug 1470404] Re: Some workloads experience more measurement variation with scaling_governor=performance than ondemand

 

This bug was fixed in the package linux - 3.19.0-23.24

---------------
linux (3.19.0-23.24) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1472346

  [ Chris J Arges ]

  * SAUCE: Don't use atomic read in evlist.c
    - LP: #1410673

linux (3.19.0-23.23) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1472048

  [ Chris J Arges ]

  * [Config] Add CRYPTO_DEV_NX_*, 842_* as modules
    - LP: #1454687

  [ Lu, Han ]

  * SAUCE: i915_bpo: drm/i915/audio: add codec wakeup override
    enabled/disable callback
    - LP: #1460674

  [ Timo Aaltonen ]

  * SAUCE: Backport I915_OVERLAY_DISABLE_DEST_COLORKEY
    - LP: #1460674
  * SAUCE: i915_bpo: Rebase to drm-intel-next-fixes-2015-05-29
    - LP: #1460674
  * SAUCE: i915_bpo: Revert "drm/i915: Implement the intel_dp_autotest_edid
    function for DP EDID complaince tests"
    - LP: #1460674
  * SAUCE: i915_bpo: Revert "drm/i915: Add debugfs test control files for
    Displayport compliance testing"
    - LP: #1460674
  * SAUCE: Load i915_bpo from the hda driver on SKL/CHV
    - LP: #1460674
  * SAUCE: i915_bpo: Don't try to support BXT
    - LP: #1460674
  * SAUCE: i915_bpo: drm/i915/skl: Fix DMC API version.

  [ Upstream Kernel Changes ]

  * Revert "usb: dwc2: add bus suspend/resume for dwc2"
    - LP: #1471252
  * Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
    - LP: #1471252
  * Revert "KVM: x86: drop fpu_activate hook"
    - LP: #1471252
  * Revert "libceph: clear r_req_lru_item in __unregister_linger_request()"
    - LP: #1471252
  * drm/i915: add component support
    - LP: #1460661
  * ALSA: hda: export struct hda_intel
    - LP: #1460661
  * ALSA: hda: pass intel_hda to all i915 interface functions
    - LP: #1460661
  * ALSA: hda: add component support
    - LP: #1460661
  * drm/atomic-helpers: Fix documentation typos and wrong copy&paste
    - LP: #1460674
  * drm/atomic: Rename drm_atomic_helper_commit_pre_planes() state argument
    - LP: #1460674
  * drm/atomic-helper: Rename commmit_post/pre_planes
    - LP: #1460674
  * drm/atomic-helpers: make mode_set hooks optional
    - LP: #1460674
  * drm/atomic-helper: Fix kerneldoc for prepare_planes
    - LP: #1460674
  * drm: Complete moving rotation property to core
    - LP: #1460674
  * drm: Share plane pixel format check code between legacy and atomic
    - LP: #1460674
  * drm/atomic: Constify a bunch of functions pointer structs
    - LP: #1460674
  * drm: Fix some typo mistake of the annotations
    - LP: #1460674
  * drm: change connector to tmp_connector
    - LP: #1460674
  * drm: atomic: Expose CRTC active property
    - LP: #1460674
  * drm: atomic: Allow setting CRTC active property
    - LP: #1460674
  * drm/atomic-helpers: Properly avoid full modeset dance
    - LP: #1460674
  * drm/atomic: Add helpers for state-subclassing drivers
    - LP: #1460674
  * drm: Fix some typos
    - LP: #1460674
  * drm/atomic: Add for_each_{connector,crtc,plane}_in_state helper macros
    - LP: #1460674
  * drm/atomic-helper: Don't call atomic_update_plane when it stays off
    - LP: #1460674
  * drm/atomic-helper: Really recover pre-atomic plane/cursor behavior
    - LP: #1460674
  * drm/atomic: Make mode_fixup() optional for check_modeset()
    - LP: #1460674
  * drm/atomic-helpers: Update vblank timestamping constants
    - LP: #1460674
  * drm/atomic-helpers: Export
    drm_atomic_helper_update_legacy_modeset_state
    - LP: #1460674
  * drm/atomic: add drm_atomic_get_existing_*_state helpers
    - LP: #1460674
  * drm/atomic: remove duplicated assignment of old_plane_state
    - LP: #1460674
  * drm/atomic: Allow drivers to subclass drm_atomic_state, v3
    - LP: #1460674
  * drm/dp: indentation and ordering cleanups
    - LP: #1460674
  * drm/dp: add DPCD definitions from eDP 1.2
    - LP: #1460674
  * drm/dp: add DPCD definitions from DP 1.1 and 1.2a
    - LP: #1460674
  * drm/dp: add DPCD definitions from eDP 1.4
    - LP: #1460674
  * drm: Adding drm helper function drm_plane_from_index().
    - LP: #1460674
  * ALSA: hda - reset display codec when power on
    - LP: #1460674
  * drm/i915/audio: add codec wakeup override enabled/disable callback
    - LP: #1460674
  * scsi: storvsc: Increase the ring buffer size
    - LP: #1445195
  * scsi: storvsc: Size the queue depth based on the ringbuffer size
    - LP: #1445195
  * scsi: storvsc: Always send on the selected outgoing channel
    - LP: #1445195
  * scsi: storvsc: Retrieve information about the capability of the target
    - LP: #1445195
  * scsi: storvsc: Don't assume that the scatterlist is not chained
    - LP: #1445195
  * scsi: storvsc: Set the tablesize based on the information given by the
    host
    - LP: #1445195
  * Drivers: hv: vmbus: Add support for VMBus panic notifier handler
    - LP: #1463584
  * Drivers: hv: vmbus: Correcting truncation error for constant
    HV_CRASH_CTL_CRASH_NOTIFY
    - LP: #1463584
  * net: eth: xgene: change APM X-Gene SoC platform ethernet to support
    ACPI
    - LP: #1458042
  * net: eth: xgene: devm_ioremap() returns NULL on error
    - LP: #1458042
  * drivers: net: xgene: Make xgene_enet_of_match depend on CONFIG_OF
    - LP: #1458042
  * Documentation: dts: Update compatible field description for APM X-Gene
    - LP: #1458042
  * dtb: change binding name to match with newer firmware DT
    - LP: #1458042
  * drivers: net: xgene: fix new firmware backward compatibility with older
    driver
    - LP: #1458042
  * net: eth: xgene: fix booting with devicetree
    - LP: #1458042
  * drivers: net: xgene: constify of_device_id array
    - LP: #1458042
  * Documentation: dtb: Add port-id field for APM X-Gene ethernet
    - LP: #1458042
  * dtb: xgene: Add second SGMII based 1G interface node
    - LP: #1458042
  * drivers: net: xgene: Add second SGMII based 1G interface
    - LP: #1458042
  * x86/fpu: Disable XSAVES* support for now
    - LP: #1468797
  * powerpc: export of_get_ibm_chip_id function
    - LP: #1454687
  * powerpc: Add ICSWX instruction
    - LP: #1454687
  * lib: add software 842 compression/decompression
    - LP: #1454687
  * crypto: 842 - change 842 alg to use software
    - LP: #1454687
  * crypto: nx - rename nx-842.c to nx-842-pseries.c
    - LP: #1454687
  * crypto: nx - add NX-842 platform frontend driver
    - LP: #1454687
  * crypto: nx - add nx842 constraints
    - LP: #1454687
  * crypto: nx - add PowerNV platform NX-842 driver
    - LP: #1454687
  * crypto: nx - simplify pSeries nx842 driver
    - LP: #1454687
  * crypto: nx - add hardware 842 crypto comp alg
    - LP: #1454687
  * lib: make lib/842 decompress functions static
    - LP: #1454687
  * lib: correct 842 decompress for 32 bit
    - LP: #1454687
  * crypto: nx - remove 842-nx null checks
    - LP: #1454687
  * crypto: nx - prevent nx 842 load if no hw driver
    - LP: #1454687
  * crypto: nx - fix nx-842 pSeries driver minimum buffer size
    - LP: #1454687
  * crypto: nx - move include/linux/nx842.h into drivers/crypto/nx/nx-842.h
    - LP: #1454687
  * crypto: nx - replace NX842_MEM_COMPRESS with function
    - LP: #1454687
  * crypto: nx - add LE support to pSeries platform driver
    - LP: #1454687
  * net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
    - LP: #1432848
  * mm/slab_common: support the slub_debug boot option on specific object
    size
    - LP: #1456952
  * perf trace: Fix race condition at the end of started workloads
    - LP: #1410673
  * kvm: x86: fix kvm_apic_has_events to check for NULL pointer
  * cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
    - LP: #1470404
  * IB/ipoib: change init sequence ordering
    - LP: #1467912
  * IB/ipoib: factor out ah flushing
    - LP: #1467912
  * HID: add ALWAYS_POLL quirk for a Logitech 0xc007
    - LP: #1471252
  * HID: add HP OEM mouse to quirk ALWAYS_POLL
    - LP: #1471252
  * HID: add quirk for PIXART OEM mouse used by HP
    - LP: #1471252
  * usb: dwc2: hcd: use new USB_RESUME_TIMEOUT
    - LP: #1471252
  * usb: isp1760: hcd: use new USB_RESUME_TIMEOUT
    - LP: #1471252
  * nfsd: fix nsfd startup race triggering BUG_ON
    - LP: #1471252
  * jhash: Update jhash_[321]words functions to use correct initval
    - LP: #1471252
  * firmware/ihex2fw.c: restore missing default in switch statement
    - LP: #1471252
  * bridge/mdb: remove wrong use of NLM_F_MULTI
    - LP: #1471252
  * iio/axp288_adc: add missing channel info mask
    - LP: #1471252
  * iio: light: hid-sensor-prox: Fix modifier
    - LP: #1471252
  * iio: pressure: hid-sensor-press: Fix modifier
    - LP: #1471252
  * iio: adc: xilinx: Fix register addresses
    - LP: #1471252
  * iio: adc: xilinx: Fix "vccaux" channel .address
    - LP: #1471252
  * iio: adc: xilinx: Fix VREFP scale
    - LP: #1471252
  * iio: adc: xilinx: Fix VREFN sign
    - LP: #1471252
  * libata: Add helper to determine when PHY events should be ignored
    - LP: #1471252
  * libata: Ignore spurious PHY event on LPM policy change
    - LP: #1471252
  * iio:st_sensors: Fix oops when probing SPI devices
    - LP: #1471252
  * usb: gadget: configfs: Fix interfaces array NULL-termination
    - LP: #1471252
  * rtlwifi: rtl8192cu: Fix kernel deadlock
    - LP: #1471252
  * USB: cp210x: add ID for KCF Technologies PRN device
    - LP: #1471252
  * USB: pl2303: Remove support for Samsung I330
    - LP: #1471252
  * USB: visor: Match I330 phone more precisely
    - LP: #1471252
  * net: can: xilinx_can: fix extended frame handling
    - LP: #1471252
  * nfsd: fix the check for confirmed openowner in
    nfs4_preprocess_stateid_op
    - LP: #1471252
  * svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures
    - LP: #1471252
  * ACPI / init: Fix the ordering of acpi_reserve_resources()
    - LP: #1471252
  * md/raid5: don't record new size if resize_stripes fails.
    - LP: #1471252
  * sched: Handle priority boosted tasks proper in setscheduler()
    - LP: #1471252
  * staging: vt6656: use ieee80211_tx_info to select packet type.
    - LP: #1471252
  * staging: vt6655: device_free_tx_buf use only
    ieee80211_tx_status_irqsafe
    - LP: #1471252
  * staging: vt6655: Fix 80211 control and management status reporting.
    - LP: #1471252
  * staging: vt6655: lock MACvWriteBSSIDAddress.
    - LP: #1471252
  * arm64: bpf: fix signedness bug in loading 64-bit immediate
    - LP: #1471252
  * xhci: fix isoc endpoint dequeue from advancing too far on transaction
    error
    - LP: #1471252
  * xhci: Solve full event ring by increasing TRBS_PER_SEGMENT to 256
    - LP: #1471252
  * xhci: gracefully handle xhci_irq dead device
    - LP: #1471252
  * ARC: unbork !LLSC build
    - LP: #1471252
  * staging: gdm724x: Correction of variable usage after applying ALIGN()
    - LP: #1471252
  * usb-storage: Add NO_WP_DETECT quirk for Lacie 059f:0651 devices
    - LP: #1471252
  * tty/n_gsm.c: fix a memory leak when gsmtty is removed
    - LP: #1471252
  * pty: Fix input race when closing
    - LP: #1429756, #1471252
  * ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction.
    - LP: #1471252
  * x86/vdso: Fix the x86 vdso2c tool includes
    - LP: #1471252
  * x86/vdso: Fix 'make bzImage' on older distros
    - LP: #1471252
  * perf/x86/rapl: Enable Broadwell-U RAPL support
    - LP: #1471252
  * net: qca_spi: Fix possible race during probe
    - LP: #1471252
  * drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling
    - LP: #1471252
  * RDMA/core: Fix for parsing netlink string attribute
    - LP: #1471252
  * drm/radeon: add new bonaire pci id
    - LP: #1471252
  * parisc,metag: Fix crashes due to stack randomization on
    stack-grows-upwards architectures
    - LP: #1471252
  * net: phy: micrel: Fix regression in kszphy_probe
    - LP: #1471252
  * firmware: dmi_scan: Fix ordering of product_uuid
    - LP: #1471252
  * ext4: fix NULL pointer dereference when journal restart fails
    - LP: #1471252
  * ext4: check for zero length extent explicitly
    - LP: #1471252
  * jbd2: fix r_count overflows leading to buffer overflow in journal
    recovery
    - LP: #1471252
  * tools/vm: fix page-flags build
    - LP: #1471252
  * mm, numa: really disable NUMA balancing by default on single node
    machines
    - LP: #1471252
  * igb: Fix oops on changing number of rings
    - LP: #1471252
  * power/reset: at91: fix return value check in
    at91_reset_platform_probe()
    - LP: #1471252
  * spi: bitbang: Make setup_transfer() callback optional
    - LP: #1471252
  * iwlwifi: pcie: prevent using unmapped memory in fw monitor
    - LP: #1471252
  * x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions
    - LP: #1471252
  * igb: Fix NULL assignment to incorrect variable in igb_reset_q_vector
    - LP: #1471252
  * thermal: rockchip: fix an error code
    - LP: #1471252
  * ARM: net: delegate filter to kernel interpreter when imm_offset()
    return value can't fit into 12bits.
    - LP: #1471252
  * ALSA: hda - Add headphone quirk for Lifebook E752
    - LP: #1471252
  * ASoC: mc13783: Fix wrong mask value used in mc13xxx_reg_rmw() calls
    - LP: #1471252
  * ASoC: uda1380: Avoid accessing i2c bus when codec is disabled
    - LP: #1471252
  * clk: exynos5420: Restore GATE_BUS_TOP on suspend
    - LP: #1471252
  * thermal: armada: Update Armada 380 thermal sensor coefficients
    - LP: #1471252
  * ALSA: hda/realtek - Support Dell headset mode for ALC256
    - LP: #1471252
  * ALSA: hda - fix headset mic detection problem for one more machine
    - LP: #1447909, #1471252
  * ALSA: hda - Add headset mic quirk for Dell Inspiron 5548
    - LP: #1452175, #1471252
  * mac80211: move WEP tailroom size check
    - LP: #1471252
  * KVM: MMU: fix smap permission check
    - LP: #1471252
  * KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages
    - LP: #1471252
  * KVM: MMU: fix SMAP virtualization
    - LP: #1471252
  * powerpc/mce: fix off by one errors in mce event handling
    - LP: #1471252
  * ASoC: dapm: Modify widget stream name according to prefix
    - LP: #1471252
  * ASoC: wm8960: fix "RINPUT3" audio route error
    - LP: #1471252
  * ASoC: wm8994: correct BCLK DIV 348 to 384
    - LP: #1471252
  * ktime: Optimize ktime_divns for constant divisors
    - LP: #1471252
  * ktime: Fix ktime_divns to do signed division
    - LP: #1471252
  * Input: elantech - fix semi-mt protocol for v3 HW
    - LP: #1471252
  * powerpc: Align TOC to 256 bytes
    - LP: #1471252
  * ALSA: hda - Add Conexant codecs CX20721, CX20722, CX20723 and CX20724
    - LP: #1454656, #1471252
  * ALSA: hda/realtek - ALC292 dock fix for Thinkpad L450
    - LP: #1471252
  * mmc: atmel-mci: fix bad variable type for clkdiv
    - LP: #1471252
  * sd: Disable support for 256 byte/sector disks
    - LP: #1471252
  * s390/mm: correct return value of pmd_pfn
    - LP: #1471252
  * xen/events: don't bind non-percpu VIRQs with percpu chip
    - LP: #1471252
  * kvm: fix crash in kvm_vcpu_reload_apic_access_page
    - LP: #1471252
  * kvm/fpu: Enable eager restore kvm FPU for MPX
    - LP: #1471252
  * libceph: request a new osdmap if lingering request maps to no osd
    - LP: #1471252
  * drm/radeon: retry dcpd fetch
    - LP: #1471252
  * crypto: s390/ghash - Fix incorrect ghash icv buffer handling.
    - LP: #1471252
  * ipvs: fix memory leak in ip_vs_ctl.c
    - LP: #1471252
  * rtnl/bond: don't send rtnl msg for unregistered iface
    - LP: #1471252
  * net: sched: fix call_rcu() race on classifier module unloads
    - LP: #1471252
  * conntrack: RFC5961 challenge ACK confuse conntrack LAST-ACK transition
    - LP: #1471252
  * net: phy: Allow EEE for all RGMII variants
    - LP: #1471252
  * bridge: fix parsing of MLDv2 reports
    - LP: #1471252
  * cdc_ncm: Fix tx_bytes statistics
    - LP: #1471252
  * ipv4: Avoid crashing in ip_error
    - LP: #1471252
  * ipv6: do not delete previously existing ECMP routes if add fails
    - LP: #1471252
  * net/ipv6/udp: Fix ipv6 multicast socket filter regression
    - LP: #1471252
  * ipv6: fix ECMP route replacement
    - LP: #1471252
  * tcp/ipv6: fix flow label setting in TIME_WAIT state
    - LP: #1471252
  * staging: vt6655: move setting of PSTxDesc->buff_addr to vnt_tx_packet
    - LP: #1471252
  * staging: vt6655: Fix TD_FLAGS_NETIF_SKB only on TYPE_AC0DMA
    - LP: #1471252
  * staging: vt6655: vnt_tx_packet fix dma_idx selection.
    - LP: #1471252
  * staging: vt6655: vnt_tx_packet Correct TX order of OWNED_BY_NIC
    - LP: #1471252
  * staging: vt6655: [BUG] Protect MACvSelectPage1 with lock.
    - LP: #1471252
  * net: core: Correct an over-stringent device loop detection.
    - LP: #1471252
  * x86: bpf_jit: fix compilation of large bpf programs
    - LP: #1471252
  * net: dp83640: fix broken calibration routine.
    - LP: #1471252
  * net: dp83640: reinforce locking rules.
    - LP: #1471252
  * net: dp83640: fix improper double spin locking.
    - LP: #1471252
  * unix/caif: sk_socket can disappear when state is unlocked
    - LP: #1471252
  * xen/netback: Properly initialize credit_bytes
    - LP: #1471252
  * net_sched: invoke ->attach() after setting dev->qdisc
    - LP: #1471252
  * sctp: Fix mangled IPv4 addresses on a IPv6 listening socket
    - LP: #1471252
  * bridge: fix br_multicast_query_expired() bug
    - LP: #1471252
  * udp: fix behavior of wrong checksums
    - LP: #1471252
  * xen: netback: read hotplug script once at start of day.
    - LP: #1471252
  * ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()
    - LP: #1471252
  * bridge: disable softirqs around br_fdb_update to avoid lockup
    - LP: #1471252
  * tcp: fix child sockets to use system default congestion control if not
    set
    - LP: #1471252
  * be2net: Replace dma/pci_alloc_coherent() calls with
    dma_zalloc_coherent()
    - LP: #1471252
  * drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR
    handling"
    - LP: #1471252
  * Linux 3.19.8-ckt2
    - LP: #1471252

 -- Luis Henriques <luis.henriques@xxxxxxxxxxxxx>  Tue, 07 Jul 2015
18:15:10 +0100

** Changed in: linux (Ubuntu Vivid)
       Status: Fix Committed => Fix Released

** Changed in: linux (Ubuntu Utopic)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1470404

Title:
  Some workloads experience more measurement variation with
  scaling_governor=performance than ondemand

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Utopic:
  Fix Released
Status in linux source package in Vivid:
  Fix Released

Bug description:
  SRU Justification:
  [Impact]
  Certain workloads can exhibit a large variance in behavior due to how how cpus are idled on power8 systems.

  [Fix]

  For 3.16:
  74aa51b5ccd3975392e30d11820dc073c5f2cd32
  92c83ff5b42b109c94fdeee53cb31f674f776d75
  70734a786acfd1998e47d40df19cba5c29469bdf

  For 3.16, 3.19:
  78eaa10f027cf69f9bd409e64eaff902172b2327

  $ git describe 78eaa10f027cf69f9bd409e64eaff902172b2327
  v4.1-rc2-9-g78eaa10
  Once we rebase to something v4.1+ we'll have this fixed in Wily.

  [Test Case]
  Set the system with the SMT8 mode and scaling_governor=performance or ondemand.
  Run the workload 100 times.

  --

  == Comment: #0 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-15 21:30:31 ==
  ---Problem Description---
  Many workloads experience wide measurement variation, more with scaling_governor=performance than ondemand.

  Contact Information = wpeter@xxxxxxxxxx, farid@xxxxxxxxxx

  ---uname output---
  Linux c656f7n04 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = 20-core and 24-core Tuleta systems

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
  Set the system with the SMT8 mode and scaling_governor=performance or ondemand.
  Run the workload 100 times.
  Get 100 data points and sort them.
  Compare the spread of results with two governor modes.
  The source and scripts to run a simple test case will be provided.

  Stack trace output:
   no

  Oops output:
   no

  Userspace tool common name: not sure what it is.

  Userspace rpm: ??

  The userspace tool has the following bit modes: These are 64-bit
  programs.

  System Dump Info:
    The system is not configured to capture a system dump.

  Userspace tool obtained from project website:  na

  *Additional Instructions for wpeter@xxxxxxxxxx, farid@xxxxxxxxxx:
  -Attach sysctl -a output output to the bug.
  -Attach ltrace and strace of userspace application.

  == Comment: #2 - Paul A. Clarke <pacman@xxxxxxxxxx> - 2015-04-16 08:47:41 ==
  This problem has a number of variables we were trying to reduce:
  - endianness
  - operating system
  - kernel level
  - compiler

  Bob Walkup says he's seen the variability in a bunch of CPU-intensive
  test cases, in various languages, using various compilers, which would
  seem to eliminate the "compiler" variable.

  We had not looked at the performance governor setting to this point.
  Interesting results, and yet another variable to add to the above mix.
  Perhaps two more runs?  (LE-ondemand, LE-performance, BE-ondemand, BE-
  performance)

  == Comment: #3 - Paul A. Clarke <pacman@xxxxxxxxxx> - 2015-04-16 08:50:09 ==
  Also, Bob says he can reproduce this with and without vectorization (the stalls move from the VSU to the FPU), and with and without floating point (the stalls move from the FPU to the FXU).  Very odd.

  == Comment: #4 - Andrea M. Davis <amdavis@xxxxxxxxxx> - 2015-04-16 10:10:01 ==
  Peter, what version of Ubuntu are you running?

  == Comment: #5 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-16 10:32:58 ==
  Andrea,

  Ubuntu 14.04.2 LTS.

  #uname -a
  Linux c656f7n04 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

  #lsb_release -a
  No LSB modules are available.
  Distributor ID:	Ubuntu
  Description:	Ubuntu 14.04.2 LTS
  Release:	14.04
  Codename:	trusty

  == Comment: #6 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-16 10:50:11 ==
  There are a few more things we have tried.

  (1) For STREAM, it was originally compiled with gfotran and its
  corresponding OpenMP. I compiled it with xlf and its corresponding
  OpenMP. There is no difference in performance.

  (2) There was a concern about NUMA, meaning is it possible the CPU
  binding by OpenMP is incorrect so that there are remote memory
  accesses behind the scene? By disabling one DCM and using 10 or 12
  cores only in the other DCM, we can still see occasional drops in
  performance, although not often. We can conclude it is not due to
  NUMA.

  (3) Farid and I also tried out different scheduler parameters
  (sched_min_granularity_ns, sched_wakeup_granularity_ns,
  sched_latency_ns, and others) and matched the correponding the other
  distro's values, but did not see performance changes.

  (4) For the workload AMG2006, the use of scaling_performance=ondemand
  also reduces the spread of data significantly.

  (5) For all the above investigations, I used a 20-core Tuleta and a
  24-core Tuleta, although they are configured identically with Ubuntu
  14.04.2. I mean two systems paint a consistent picture.

  So far, we looked at compiler, NUMA, scheduler, memory test, CPU test,
  ST vs SMT, etc. There is a significant difference in variation between
  scaling_governor=performance and scaling_govenor=ondemand with the
  same application and system configurations.

  Hopefully, the data point us to the right direction, i.e., there could
  be some unexpected behaviour with the implementation of
  scaling_governor=performance.

  == Comment: #7 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-16 14:30:21 ==
  Note that Bob Walkup does not see the improvement using scaling_governor=ondemand on a borrowed POK lab system. However, he still suggested me to open a bug based on my findings. I guess he is not totally sure about the system he got.

  It would be good to have data independently collected by others to
  verify my observations.

  Bob's serial_loop.c program can be compiled and run very easily. The
  examination of data is straightforward too.

  == Comment: #10 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-04-17 16:33:38 ==
  I was able to reproduce the problem with the serial_loop test described in comment 1 (my system is Ubunu 15.04), however disabling the nap cpuidle state seemed to resolve the variance:

  cpupower idle-set -d 0

  Can others reproduce?   I am not sure why nap behavior would be any
  different w/ the performance governor though..   Note, to re-enable:
  cpupower idle-set -E

  == Comment: #11 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-04-20 13:09:34 ==
  (In reply to comment #10)
  > disabling the nap cpuidle
  > state seemed to resolve the variance:
  >
  > cpupower idle-set -d 0

  just want to clarify state0 is actually snooze, not nap:
  # cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
  snooze
  # cat /sys/devices/system/cpu/cpu0/cpuidle/state1/name
  Nap

  == Comment: #12 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-20 16:26:32 ==
  Jenifer, thanks for the suggestion.

  "cpupower idle-set -d 0" works for Bob's serial_loop.c program.

  There are 24 identical processes running serial_loop in parallel, each
  bound to one core. With 100 iterations, there are 2400 elapsed times
  collected for each run. Each elapsed time over 5 seconds is counted as
  an outlier.

  The following data were collected on a 24-core Tuleta system.

  Scaling_govenor = P(erformance) or O(ndemand)
  snooze (state0) = default (enabled) and disabled

  P and default                = 34-35 outliers
  P and snooze disabled = 0 outliers

  O and default               = 2-4 outliers
  O and snooze disabled = 0 outliers

  As you asked, why do we need to disable snooze in order to reduce
  measurement variation when scaling_governor=performance?

  == Comment: #13 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-04-20 16:40:46 ==
  Vaidy,  could your team comment on this?  In SMT8 mode, more measurement variation is seen using the performance governor compared to the ondemand governor when snooze is enabled, but disabling snooze seems to resolve the problem. Does it make sense that snooze impacts would be higher in performance mode?

  Stewart mentioned some latency improvements in the new 830 OPAL
  firmware, is that related to this type of sleep state wakeup?

  == Comment: #14 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-21 12:23:01 ==
  "cpupower idle-set -d 0" also fixes the measurement variation of STREAM on a 24-core Tuleta system.

  scaling_governor=performance and default snooze = 65 outliers out of
  400 runs.

  scaling_governor=performance and snooze disabled = 0 outlier out of
  400 runs.

  == Comment: #15 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-21 23:21:22 ==
  "cpupower idle-set -d 0" also fixes the measurement variation of AMG2006 on a 24-core Tuleta system.

  It means when scaling_governor=performance, disabling snooze (state0,
  shallow sleep) while still enabling Nap (state1, deep sleep) can
  stabilize measurements.

  Vaidy,  please help understand this behaviour.

  == Comment: #17 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> - 2015-04-22 14:22:11 ==
  Hi Team,

  Interesting observation.  Let me give possible contributing factors:

  (a) When running on ondemand, cpu frequency changed from min to max including turbo frequencies.
  (b) When running performance governor, frequency is set to constantly run turbo.

  Based on temperature, CPU may not be able to sustain turbo since we
  are constantly running at the frequency and burning more power.  The
  variation could actually come from the fact that we the platform (OCC)
  could drop the frequency periodically due to over temperature.

  While running ondemand, turning down the power could help sustain the turbo frequency longer.
  Disabling snooze will further increase the power consumption and push for more variation at turbo frequency.

  Our systems are designed to run consistently at nominal frequency and
  hence I would suggest that you run your experiment by setting nominal
  frequency to all cores using performance governor+max limit or
  userspace governor.

  You could use "Throughput-performance profile" using tuned-adm for
  this purpose.

  If running in "Nominal" Frequency gives you consistent performance,
  then the above theory of turbo mode variation holds good.  We can
  confirm them with additional traces in cpufreq back-end driver code.
  We are currently improving our instrumentation to detect frequency
  variation and throttling.  This is a good scenario to validate our
  trace design as well.

  Let me know what you find.

  --Vaidy

  == Comment: #18 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-04-22 14:28:15 ==
  (In reply to comment #17)

  > Disabling snooze will further increase the power consumption and push for
  > more variation at turbo frequency.

  We actually see the opposite effect, disabling snooze makes the
  variability at turbo freq go away :)

  == Comment: #19 - Basu Vaidyanathan <basu@xxxxxxxxxx> - 2015-04-22 14:44:43 ==
  Additionally, this is not a problem when running BE kernel, on the same P8 configuration box. I suspect
  it is more to do with configuration settings on LE before we start pointing finger at the FW codepath
  when using Ubuntu LE.

  == Comment: #20 - Paul A. Clarke <pacman@xxxxxxxxxx> - 2015-04-22 15:23:43 ==
  Bob is finding another distro LE does _not_ exhibit variation.

  This would seem to eliminate LE as the culprit.

  Looking at the settings of
  /sys/devices/system/cpu/cpu*/cpuidle/state0/disable, they all report
  "0", which I believe is the same as having "snooze" enabled, correct?
  That would seem to eliminate "snooze" in and of itself as a culprit,
  *at least with this kernel level (3.10.0-210.ael7a)*.

  I'm starting to suspect it's an issue with the kernel in Ubuntu
  (3.16...)

  == Comment: #21 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> -
  2015-04-22 15:31:41 ==

  Running at constant nominal frequency will help you eliminate turbo
  mode variation and focus on the Linux issues and root-cause faster.

  The behavior I described above is not a bug or problem in firmware.
  It is the expected and correct behavior where throttling can happen.
  I am only trying to help you to reduce the number of variables that is
  affecting this experiment.

  --Vaidy

  == Comment: #22 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> - 2015-04-22 15:35:45 ==
  (In reply to comment #20)

  This is good input.  The other distro does not have fast-sleep
  support. We will have only snooze and nap.  On the Ubuntu system do
  you see /sys/devices/system/cpu/cpu*/cpuidle/state2/name ?

  Disabling fast-sleep state if present in your Ubuntu setup could help
  us to the next step.

  --Vaidy

  == Comment: #23 - Robert E. Walkup <walkup@xxxxxxxxxx> - 2015-04-22 16:30:28 ==
  On the different distro LE system provided by Paul Clarke, the observed behavior is different than what I have seen on Ubuntu LE systems, but one of the tests ... the MPI-enabled simple loop ... shows huge timing variations core-to-core for nearly every job.  That system has 24 cores in smt8 mode

  ppc64_cpu --frequency
  Power Savings Mode: Dynamic, Favor Performance
  min:    3.961 GHz (cpu 175)
  max:    3.963 GHz (cpu 1)
  avg:    3.962 GHz

  and nearly every job provides output that looks like this :
  out.10:tmin = 3.757, tmax = 6.519 on rank 17, tavg = 5.126

  meaning that it takes anywhere from 3.757 to 6.519 seconds to get
  through the timed loop :

     MPI_Barrier(MPI_COMM_WORLD);
     t1 = MPI_Wtime();
     sum = 0.0;
     for (i=0; i<2000000000; i++) sum += ((double) (i%10));
     t2 = MPI_Wtime();
     elapsed = t2 - t1;

  There are no loads or stores in that loop ... there is a separate
  process bound to each core, and they work independently.  Additional
  instrumentation shows that the slow processes are in the run queue the
  whole time.

  So far, the other work loads that I have tried on the different distro
  LE system showed significantly lower timing variations than what I had
  recorder on Ubuntu LE ... but not this one.

  == Comment: #24 - Robert E. Walkup <walkup@xxxxxxxxxx> - 2015-04-22 16:54:07 ==
  Just adding that on the same different distro LE system, after turning off SMT via the command : ppc64_cpu --smt=1, all instances of the simple loop test have outputs like this :

  tmin = 3.756, tmax = 3.757 on rank 5, tavg = 3.757

  in other words it takes the same time to complete the work in the loop
  on every core ... every time,  within the limits of what I have had
  the patience to check.

  == Comment: #25 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-22 17:03:16 ==
  Bob, the use of ST mode reduces variation on Ubuntu 14.04.2 as well.

  With SMT8 on another distro LE, I wonder whether "cpupower idle-set -d
  0" helps reduce variation for the MPI-enabled simple loop?

  Is it correct to say that both Ubuntu LE 14.04.2 (kernel 3.16.0) and
  another distro LE (kernel) exhibit variation?

  Vaidy, Ubuntu 14.4.2 does not have cpuidle/state2 (fastsleep state).

  == Comment: #26 - Robert E. Walkup <walkup@xxxxxxxxxx> - 2015-04-22 17:11:42 ==
  I ran the command :

  [root@tuleta ~]# cpupower idle-set -d 0
  Idlestate 0 disabled on CPU 0
  Idlestate 0 disabled on CPU 1
  ...

  on the different distro LE system after setting the state back to
  smt8, and the timing variability is still there :

  out.2:tmin = 3.757, tmax = 9.010 on rank 4, tavg = 4.619
  out.3:tmin = 3.757, tmax = 11.518 on rank 2, tavg = 4.684
  out.4:tmin = 3.757, tmax = 9.398 on rank 3, tavg = 4.773

  Essentially every job is showing truly huge timing variations.

  == Comment: #27 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-22 17:24:46 ==
  Does it make any difference with "cpupower idle-set -d 1"? to disable Nap too?

  I think we only have snooze and Nap on LE.

  == Comment: #28 - Basu Vaidyanathan <basu@xxxxxxxxxx> - 2015-04-22 17:46:14 ==
  (In reply to comment #27)

  I have a p8 box running ubuntu 14.10 and I do see
  cat /sys/devices/system/cpu/cpu0/cpuidle/state2/name
  FastSleep

  == Comment: #29 - Preeti U. Murthy <preeti.murthy@xxxxxxxxxx> - 2015-04-23 06:01:57 ==
  I see that there are hotplug operations being carried out simultaneously with running the benchmark. If so, the performance degradation could be due to the tasks being not allowed to run on the freshly onlined cpus.

  I would suggest boot a system with all hardware threads and not do
  hotplug operations in order to keep the above issue away while
  verifying the performance of the benchmarks, if the intention is to
  profile the cpufreq governors.

  Regards
  Preeti U Murthy

  == Comment: #31 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-28 00:27:52 ==
  On Ubuntu 14.04.2, there are two states in cpuidle: snooze and Nap.

  Are the enabling and disabling of these two states independent?

  == Comment: #32 - Robert E. Walkup <walkup@xxxxxxxxxx> - 2015-04-28 16:16:23 ==
  Adding an observation on ubuntu le systems, using the simple-loop example above and the userspace governor (chosen so that one can set the frequency to a desired value).  When  using one thread per core with the system in SMT8 state, the time for the loop varies from ~3.7 sec to over 8 sec.  However, if a lot of iterations (10-20) of the same loop are done before starting the timed section of the code (adding a warmup loop), the variations in the timed section are dramatically reduced.  There are still some outliers, but a much smaller number of them; and the timing spread is a fraction of one second, instead of several seconds.  So there is a clear dependence on history, with the largest timing variations occurring immediately after job startup.  I should mention that this remains a problem for many performance benchmarks in the HPC area, which often run in a total time of less than one minute.  I would hope that with the userspace governor, or the performance governor, the power and frequency settings would remain constant.  Can someone confirm that?

  == Comment: #33 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-04-29 17:16:58 ==
  Vaidy, would you help answer my question on Comment 31?

  == Comment: #34 - George A. Chochia <chochia@xxxxxxxxxx> - 2015-05-13 11:52:53 ==
  Vaidy, I am currently seeing a 2.5x performance degradation in the Message Rate benchmark on p8, Ubuntu 14.04.02 LE.

  Performance was normal back in February, when we had 14.04.01 and
  older FW.

  The degradation goes away once snooze state is disabled. There have
  been two FW updates: 1/13 and 2/17.

  == Comment: #35 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> - 2015-05-13 14:35:37 ==
  (In reply to comment #31)
  > On Ubuntu 14.04.2, there are two states in cpuidle: snooze and Nap.
  >
  > Are the enabling and disabling of these two states independent?

  Hi Peter,

  Yes the enable/disable for idle states are independent.  Atleast 1
  idle state is expected to be enabled, if not the CPU may busy loop at
  idle and not reduce the thread priority like snooze.

  You can disable snooze and have nap enabled or the other way, but
  having both disabled will lead to idle threads burning more cycles.

  --Vaidy

  == Comment: #36 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> - 2015-05-13 14:58:07 ==
  (In reply to comment #34)

  Hi George,

  The idle state management code is same for both the kernels.  You have
  only snooze and nap as idle states right?

  As I explained over email, when snooze and nap are enabled, the
  cpuidle logic should choose nap for idle sibling threads after a short
  period in snooze.

  Can you guys analyse and confirm that following points:

  * Workloads is run on primary thread on each core always
  * Remaining 7 sibling threads should be in nap (state1)
  * Time spend in 'nap' state for each of the sibling threads can be obtained from sysfs
  /sys/devices/system/cpu/cpuN/cpuidle/state1/time (unit is micro secs)
  * Workload variation is related to nap residency of sibling threads on that core

  If the nap residency (time spent in nap) is not uniform then workload
  performance would be proportionally non uniform.

  The above statement (if proven) is one possible root-cause, that can
  help us move forward and design a fix.

  --Vaidy

  == Comment: #37 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-05-13 17:45:33 ==
  Hi Vaidy,

  Let's use Bob's serial_loop.c as an example. There are 24 copies of
  his program running on 24 cores in parallel. Only the primary threads
  of the cores are used.

  Did Shilpa use Bob's program to re-create the problem and find out
  that some unused sibling threads do not sleep fast enough and take
  away cycles from the primary thread to cause variability?

  It is great to know that we can study the sleep time by examining the
  /sys/devices/system/cpu/cpuN/cpuidle/state1/time. Did Shilpa use this
  method to come up with the above understanding?

  Based on George's finding, do you know whether there are thermal code
  changes in the old firmware that affects the thermal behavior in the
  current version?

  Thanks,
  Peter

  == Comment: #38 - Preeti U. Murthy <preeti.murthy@xxxxxxxxxx> - 2015-05-13 23:24:18 ==
  Is this really related to snooze ? Jennifer mentioned in Comment 10 that disabling nap and not snooze also reduced the variance ? Can you please confirm if this is the case ? This will help us narrow down on the issue.

  Regards
  Preeti U Murthy

  == Comment: #39 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-05-14 10:19:09 ==
  (In reply to comment #38)
  Hi Preeti, sorry I corrected myself in comment 11, I was disabling state0 which is snooze, not nap:
  # cpupower idle-set -d 0
  # cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
  snooze

  Still might be interesting to try some tests w/ nap disabled.

  == Comment: #40 - Shilpasri G. Bhat <shigbhat@xxxxxxxxxx> - 2015-05-14 11:15:45 ==
  (In reply to comment #37)
  Yes . I also used perf-trace events to get the same info.

  Regards,
  Shilpa

  == Comment: #42 - Anton Blanchard <antonb@xxxxxxxxxxx> - 2015-05-19 19:40:45 ==
  If I am reading that trace right, we spent over 200ms in snooze on a secondary thread of a badly performing core. That is an enormous amount of time to be chewing up the core.

  == Comment: #43 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-05-19 21:45:20 ==
  Vaidy,

  Could you provide more information on your proposed solution which is
  in the kernel, not in OPAL?

  Does it mean that you need to upstream different patches to set of
  kernels for Ubuntu and other distro?

  Peter

  == Comment: #44 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> - 2015-05-20 10:56:48 ==
  (In reply to comment #42)
  Hi Anton,

  That is right, exit from snooze state is the problem.  In the proposed
  fix Shilpa has added a forced exit from snooze loop after the target
  residency so that the cpuidle governor can select nap.

  We have to rewrite the snooze loop and exit after the first interrupt
  or timer or after after target residency (100us) so that the idle
  state promotion can happen.

  --Vaidy

  == Comment: #45 - Shilpasri G. Bhat <shigbhat@xxxxxxxxxx> - 2015-05-20 11:02:06 ==
   Hi,

  I am sharing the link for ubuntu kernel packages with the fix:

  1) http://kernel.stglabs.ibm.com/~shilpa/ubuntu-14-04.tar
      This file contains the following packages:
      a)linux-headers-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
      b)linux-image-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
      c)linux-image-extra-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
      d)linux-tools-3.16.0-38-generic_3.16.0-38.52~14.04.1_ppc64el.deb
      The fix is based on top of ubuntu-14.-04.02 3.16.0-38-generic + upstream commit (92c83ff5b42b  cpuidle: powernv: Read target_residency value of idle states from DT if available)

  2) http://kernel.stglabs.ibm.com/~shilpa/ubuntu-15.04.tar
      This file contains the following packages:
      linux-headers-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
      linux-image-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
      linux-image-extra-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
      linux-tools-3.19.0-17-generic_3.19.0-17.17+snooze_ppc64el.deb
      The fix is based on top of ubuntu-15.04 3.19.0-17-generic

  == Comment: #46 - VAIDYANATHAN SRINIVASAN <svaidyan@xxxxxxxxxx> - 2015-05-20 11:21:07 ==
  (In reply to comment #43)

  Hi Peter,

  Sure.  As per our discussion yesterday, we agreed on the following:

  * The issue is not machine specific, the problem was recreated by
  Jenifer on S822L also even though other teams believe the issue is
  S824L specific.

  * The key issue observed is the sibling thread's snooze time variation
  which chews cycles from primary thread.

  * The fix is to force exit snooze loop after target residency (100us)
  and allow the cpuidle governor to enter nap.

  * This fix is completely in Linux kernel cpuidle driver code and does
  not require change in OPAL.

  Yes, once we verify the solution, we will design the correct idle
  state auto-promotion logic in cpuidle driver and get it upstream and
  then push to the other distro and ubuntu distros that run bare-metal.

  --Vaidy

  == Comment: #47 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-05-20 12:44:17 ==
  I tested Shilpa's kernel packages w/ the fix and can confirm I no longer see the variation issue w/ the serial loop program running on primary threads in SMT8 mode when the performance governor is set.   I will get with Peter to test with another benchmark that previously hit the variation issue.

  ----

  System:
  8247-42L
  20 cores, SMT8
  FW830_041
  Ubuntu 15.04

  Run script:
  #!/bin/bash

  for iter in `seq 1 100`
  do
    for cpu in 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 136 144 152
    do
    taskset -c ${cpu} ./serial_loop > out.${cpu}.${iter} &
    done
    echo $iter
    wait
  done

  Results:

  -- 3.19.0-17 fix --

  Performance
  -----------
  Loop elapsed:		User time:
  Min	Max		Min	Max
  3.885	3.92		3.877	3.914
  3.885	3.892		3.877	3.886
  3.885	3.908		3.877	3.901

  Ondemand
  --------
  Loop elapsed:		User time:
  Min	Max		Min	Max
  3.933	3.949		3.901	3.912

  -- orig 3.19.0-16 kernel --

  Performance
  -----------
  Loop elapsed:		User time:
  Min	Max		Min	Max
  3.886	4.507		3.88	4.498
  3.884	10.404		3.877	10.39

  Ondemand
  --------
  Loop elapsed:		User time:
  Min	Max		Min	Max
  3.932	3.994		3.901	3.959

  == Comment: #49 - JENIFER HOPPER <jhopper@xxxxxxxxxx> - 2015-05-21 18:59:33 ==
  The fix from comment #45 also resolves large variance issues w/ STREAM and DGEMM workloads. Results listed below.

  =========================================
  STREAM:

  MB/sec
  SMT8, 1 thread per core, 100 loop

  -------- orig 3.19.0-16 kernel --------

  Performance:
  ____________
   Min		Max		%diff
  run1:	304384.6341	308199.3341	1.25%
  run2: 	150096.0562	308516.5557	69.09%

  Performance
  + disable snooze:
  _________________
   Min		Max		%diff
  run1:	305700.3257	308403.9185	0.88%
  run2: 	305547.2215	308771.2772	1.05%

  Ondemand:
  _________
   Min		Max		%diff
  run1:	298386.1295	302209.7456	1.27%

  ----------- 3.19.0-17 fix -----------

  Performance:
  ____________
   Min		Max		%diff
  run1:	303486.8368	308433.0545	1.62%
  run2: 	304768.6159	308410.2177	1.19%
  run3:	304723.2556	308847.065	1.34%

  Ondemand:
  _________
   Min		Max		%diff
  run1:	297364.385	302473.0888	1.70%

  =========================================

  =========================================
  DGEMM:

  GFlops
  SMT8, 1 thread per core, 20 loop

  -------- orig 3.19.0-16 kernel --------

  Performance:
  ____________
   Min		Max		%diff
  run1:	479.53		520.2		8.14%

  Performance
  + disable snooze:
  _________________
   Min		Max		%diff
  run1:	511.18		520.49		1.80%

  Ondemand:
  _________
   Min		Max		%diff
  run1:	505.64		509.88		0.84%

  ----------- 3.19.0-17 fix -----------

  Performance:
  ____________
   Min		Max		%diff
  run1:	512.77		520.84		1.56%
  run2: 	517.19		520.34		0.61%
  run3:	517.93		520.35		0.47%

  Ondemand:
  _________
   Min		Max		%diff
  run1:	505.72		508.53		0.55%

  == Comment: #51 - Peter W. Wong <wpeter@xxxxxxxxxx> - 2015-06-14 22:53:05 ==
  Vaidy, is this fix being reviewed by the Linux kernel community? Can you give some estimates as to when this kernel fix will get into mainline and also when it will get into Ubuntu distro?

  == Comment: #52 - Shilpasri G. Bhat <shigbhat@xxxxxxxxxx> - 2015-06-24 07:18:28 ==
  The patch can be found in the upstream kernel 4.2
  78eaa10f027c cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1470404/+subscriptions