← Back to team overview

kernel-packages team mailing list archive

[Bug 1413540] Re: Trusty soft lockup issues with nested KVM

 

This bug was fixed in the package linux - 3.13.0-51.84

---------------
linux (3.13.0-51.84) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1444141
  * Merged back Ubuntu-3.13.0-49.83 security release

linux (3.13.0-50.82) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1442285

  [ Andy Whitcroft ]

  * [Config] CONFIG_DEFAULT_MMAP_MIN_ADDR needs to match on armhf and arm64
    - LP: #1418140

  [ Chris J Arges ]

  * [Config] CONFIG_PCIEASPM_DEBUG=y
    - LP: #1398544

  [ Upstream Kernel Changes ]

  * KEYS: request_key() should reget expired keys rather than give
    EKEYEXPIRED
    - LP: #1124250
  * audit: correctly record file names with different path name types
    - LP: #1439441
  * KVM: x86: Check for nested events if there is an injectable interrupt
    - LP: #1413540
  * be2iscsi: fix memory leak in error path
    - LP: #1440156
  * block: remove old blk_iopoll_enabled variable
    - LP: #1440156
  * be2iscsi: Fix handling timed out MBX completion from FW
    - LP: #1440156
  * be2iscsi: Fix doorbell format for EQ/CQ/RQ s per SLI spec.
    - LP: #1440156
  * be2iscsi: Fix the session cleanup when reboot/shutdown happens
    - LP: #1440156
  * be2iscsi: Fix scsi_cmnd leakage in driver.
    - LP: #1440156
  * be2iscsi : Fix DMA Out of SW-IOMMU space error
    - LP: #1440156
  * be2iscsi: Fix retrieving MCCQ_WRB in non-embedded Mbox path
    - LP: #1440156
  * be2iscsi: Fix exposing Host in sysfs after adapter initialization is
    complete
    - LP: #1440156
  * be2iscsi: Fix interrupt Coalescing mechanism.
    - LP: #1440156
  * be2iscsi: Fix TCP parameters while connection offloading.
    - LP: #1440156
  * be2iscsi: Fix memory corruption in MBX path
    - LP: #1440156
  * be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
    - LP: #1440156
  * be2iscsi: add an missing goto in error path
    - LP: #1440156
  * be2iscsi: remove potential junk pointer free
    - LP: #1440156
  * be2iscsi: Fix memory leak in mgmt_set_ip()
    - LP: #1440156
  * be2iscsi: Fix the sparse warning introduced in previous submission
    - LP: #1440156
  * be2iscsi: Fix updating the boot enteries in sysfs
    - LP: #1440156
  * be2iscsi: Fix processing CQE before connection resources are freed
    - LP: #1440156
  * be2iscsi : Fix kernel panic during reboot/shutdown
    - LP: #1440156
  * fixed invalid assignment of 64bit mask to host dma_boundary for scatter
    gather segment boundary limit.
    - LP: #1440156
  * quota: Store maximum space limit in bytes
    - LP: #1441284
  * ip: zero sockaddr returned on error queue
    - LP: #1441284
  * net: rps: fix cpu unplug
    - LP: #1441284
  * ipv6: stop sending PTB packets for MTU < 1280
    - LP: #1441284
  * netxen: fix netxen_nic_poll() logic
    - LP: #1441284
  * udp_diag: Fix socket skipping within chain
    - LP: #1441284
  * ping: Fix race in free in receive path
    - LP: #1441284
  * bnx2x: fix napi poll return value for repoll
    - LP: #1441284
  * net: don't OOPS on socket aio
    - LP: #1441284
  * bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify
    - LP: #1441284
  * ipv4: tcp: get rid of ugly unicast_sock
    - LP: #1441284
  * ppp: deflate: never return len larger than output buffer
    - LP: #1441284
  * net: sctp: fix passing wrong parameter header to param_type2af in
    sctp_process_param
    - LP: #1441284
  * ARM: pxa: add regulator_has_full_constraints to corgi board file
    - LP: #1441284
  * ARM: pxa: add regulator_has_full_constraints to poodle board file
    - LP: #1441284
  * ARM: pxa: add regulator_has_full_constraints to spitz board file
    - LP: #1441284
  * hx4700: regulator: declare full constraints
    - LP: #1441284
  * HID: input: fix confusion on conflicting mappings
    - LP: #1441284
  * HID: fixup the conflicting keyboard mappings quirk
    - LP: #1441284
  * megaraid_sas: disable interrupt_mask before enabling hardware
    interrupts
    - LP: #1441284
  * PCI: Generate uppercase hex for modalias var in uevent
    - LP: #1441284
  * usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN
    - LP: #1441284
  * tty/serial: at91: enable peripheral clock before accessing I/O
    registers
    - LP: #1441284
  * tty/serial: at91: fix error handling in atmel_serial_probe()
    - LP: #1441284
  * axonram: Fix bug in direct_access
    - LP: #1441284
  * ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
    - LP: #1441284
  * TPM: Add new TPMs to the tail of the list to prevent inadvertent change
    of dev
    - LP: #1441284
  * char: tpm: Add missing error check for devm_kzalloc
    - LP: #1441284
  * tpm_tis: verify interrupt during init
    - LP: #1441284
  * tpm: Fix NULL return in tpm_ibmvtpm_get_desired_dma
    - LP: #1441284
  * tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send
    - LP: #1441284
  * tpm/tpm_i2c_stm_st33: Add status check when reading data on the FIFO
    - LP: #1441284
  * mmc: sdhci-pxav3: fix unbalanced clock issues during probe
    - LP: #1441284
  * iwlwifi: mvm: validate tid and sta_id in ba_notif
    - LP: #1441284
  * power: bq24190: Fix ignored supplicants
    - LP: #1441284
  * ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL enabled on UART3
    - LP: #1441284
  * Bluetooth: ath3k: Add support of AR3012 bluetooth 13d3:3423 device
    - LP: #1411193, #1441284
  * cfq-iosched: fix incorrect filing of rt async cfqq
    - LP: #1441284
  * smack: fix possible use after frees in task_security() callers
    - LP: #1441284
  * xfs: ensure buffer types are set correctly
    - LP: #1441284
  * xfs: inode unlink does not set AGI buffer type
    - LP: #1441284
  * xfs: set buf types when converting extent formats
    - LP: #1441284
  * xfs: set superblock buffer type correctly
    - LP: #1441284
  * btrfs: set proper message level for skinny metadata
    - LP: #1441284
  * KVM: s390: base hrtimer on a monotonic clock
    - LP: #1441284
  * PCI: Fix infinite loop with ROM image of size 0
    - LP: #1441284
  * USB: cp210x: add ID for RUGGEDCOM USB Serial Console
    - LP: #1441284
  * clk: zynq: Force CPU_2X clock to be ungated
    - LP: #1441284
  * mmc: sdhci-pxav3: Remove checks for mandatory host clock
    - LP: #1441284
  * mmc: sdhci-pxav3: fix race between runtime pm and irq
    - LP: #1441284
  * power_supply: 88pm860x: Fix leaked power supply on probe fail
    - LP: #1441284
  * staging: comedi: comedi_compat32.c: fix COMEDI_CMD copy back
    - LP: #1441284
  * mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
    - LP: #1441284
  * ARM: 8284/1: sa1100: clear RCSR_SMR on resume
    - LP: #1441284
  * usb: musb: omap2plus bus glue needs USB host support
    - LP: #1441284
  * USB: add flag for HCDs that can't receive wakeup requests (isp1760-hcd)
    - LP: #1441284
  * USB: fix use-after-free bug in usb_hcd_unlink_urb()
    - LP: #1441284
  * iwlwifi: mvm: always use mac color zero
    - LP: #1441284
  * iwlwifi: pcie: disable the SCD_BASE_ADDR when we resume from WoWLAN
    - LP: #1441284
  * vt: provide notifications on selection changes
    - LP: #1441284
  * tty: Prevent untrappable signals from malicious program
    - LP: #1441284
  * cpufreq: Set cpufreq_cpu_data to NULL before putting kobject
    - LP: #1441284
  * lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in
    interrupt urb
    - LP: #1441284
  * mei: mask interrupt set bit on clean reset bit
    - LP: #1441284
  * mei: me: release hw from reset only during the reset flow
    - LP: #1441284
  * MIPS: KVM: Deliver guest interrupts after local_irq_disable()
    - LP: #1441284
  * KVM: MIPS: Don't leak FPU/DSP to guest
    - LP: #1441284
  * ALSA: hda - Add the pin fixup for HP Envy TS bass speaker
    - LP: #1441284
  * ALSA: hda - Set up GPIO for Toshiba Satellite S50D
    - LP: #1441284
  * xen/manage: Fix USB interaction issues when resuming
    - LP: #1441284
  * drm/i915: Correct the IOSF Dev_FN field for IOSF transfers
    - LP: #1441284
  * cfq-iosched: handle failure of cfq group allocation
    - LP: #1441284
  * tracing: Fix unmapping loop in tracing_mark_write
    - LP: #1441284
  * fsnotify: fix handling of renames in audit
    - LP: #1441284
  * drm/radeon: workaround for CP HW bug on CIK
    - LP: #1441284
  * drm/radeon: only enable kv/kb dpm interrupts once v3
    - LP: #1441284
  * NFSv4.1: Fix a kfree() of uninitialised pointers in
    decode_cb_sequence_args
    - LP: #1441284
  * cpufreq: speedstep-smi: enable interrupts when waiting
    - LP: #1441284
  * mm/hugetlb: pmd_huge() returns true for non-present hugepage
    - LP: #1441284
  * mm: cleanup follow_page_mask()
    - LP: #1441284
  * mm/hugetlb: take page table lock in follow_huge_pmd()
    - LP: #1441284
  * mm/hugetlb: fix getting refcount 0 page in hugetlb_fault()
    - LP: #1441284
  * mm/hugetlb: add migration/hwpoisoned entry check in
    hugetlb_change_protection
    - LP: #1441284
  * mm/hugetlb: add migration entry check in __unmap_hugepage_range
    - LP: #1441284
  * mm: softdirty: unmapped addresses between VMAs are clean
    - LP: #1441284
  * proc/pagemap: walk page tables under pte lock
    - LP: #1441284
  * mm: when stealing freepages, also take pages created by splitting buddy
    page
    - LP: #1441284
  * mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    - LP: #1441284
  * mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    - LP: #1441284
  * iscsi-target: Drop problematic active_ts_list usage
    - LP: #1441284
  * target: Fix PR_APTPL_BUF_LEN buffer size limitation
    - LP: #1441284
  * mm/compaction: fix wrong order check in compact_finished()
    - LP: #1441284
  * mm/memory.c: actually remap enough memory
    - LP: #1441284
  * mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()
    - LP: #1441284
  * ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE
    - LP: #1441284
  * drm/radeon/dp: Set EDP_CONFIGURATION_SET for bridge chips if necessary
    - LP: #1441284
  * drm/radeon: fix voltage setup on hawaii
    - LP: #1441284
  * ALSA: hdspm - Constrain periods to 2 on older cards
    - LP: #1441284
  * jffs2: fix handling of corrupted summary length
    - LP: #1441284
  * dm mirror: do not degrade the mirror on discard error
    - LP: #1441284
  * dm io: reject unsupported DISCARD requests with EOPNOTSUPP
    - LP: #1441284
  * target: Add missing WRITE_SAME end-of-device sanity check
    - LP: #1441284
  * target: Check for LBA + sectors wrap-around in sbc_parse_cdb
    - LP: #1441284
  * Btrfs: fix fsync data loss after adding hard link to inode
    - LP: #1441284
  * Added Little Endian support to vtpm module
    - LP: #1441284
  * sg: fix read() error reporting
    - LP: #1441284
  * IB/qib: Do not write EEPROM
    - LP: #1441284
  * md/raid5: Fix livelock when array is both resyncing and degraded.
    - LP: #1441284
  * dm: fix a race condition in dm_get_md
    - LP: #1441284
  * dm snapshot: fix a possible invalid memory access on unload
    - LP: #1441284
  * cpufreq: s3c: remove incorrect __init annotations
    - LP: #1441284
  * libceph: assert both regular and lingering lists in __remove_osd()
    - LP: #1441284
  * libceph: change from BUG to WARN for __remove_osd() asserts
    - LP: #1441284
  * libceph: fix double __remove_osd() problem
    - LP: #1441284
  * MIPS: Export FP functions used by lose_fpu(1) for KVM
    - LP: #1441284
  * kdb: fix incorrect counts in KDB summary command output
    - LP: #1441284
  * blk-throttle: check stats_cpu before reading it from sysfs
    - LP: #1441284
  * procfs: fix race between symlink removals and traversals
    - LP: #1441284
  * autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for
    allocation
    - LP: #1441284
  * pktgen: fix UDP checksum computation
    - LP: #1441284
  * ipv6: fix ipv6_cow_metrics for non DST_HOST case
    - LP: #1441284
  * clk-gate: fix bit # check in clk_register_gate()
    - LP: #1441284
  * ALSA: off by one bug in snd_riptide_joystick_probe()
    - LP: #1441284
  * ath5k: fix spontaneus AR5312 freezes
    - LP: #1441284
  * pinctrl: pinctrl-imx: don't use invalid value of conf_reg
    - LP: #1441284
  * ALSA: hda - Add one more node in the EAPD supporting candidate list
    - LP: #1436745, #1441284
  * ALSA: hda - Add pin configs for ASUS mobo with IDT 92HD73XX codec
    - LP: #1441284
  * drm/i915/bdw: PCI IDs ending in 0xb are ULT.
    - LP: #1441284
  * xfs: Fix quota type in quota structures when reusing quota file
    - LP: #1441284
  * gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one
    chip per node
    - LP: #1441284
  * gpio: tps65912: fix wrong container_of arguments
    - LP: #1441284
  * ALSA: pcm: Don't leave PREPARED state after draining
    - LP: #1441284
  * metag: Fix KSTK_EIP() and KSTK_ESP() macros
    - LP: #1441284
  * md/raid1: fix read balance when a drive is write-mostly.
    - LP: #1441284
  * drm/radeon: use drm_mode_vrefresh() rather than mode->vrefresh
    - LP: #1441284
  * drm/radeon: fix 1 RB harvest config setup for TN/RL
    - LP: #1441284
  * arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big
    endian
    - LP: #1441284
  * nilfs2: fix potential memory overrun on inode
    - LP: #1441284
  * HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events
    - LP: #1441284
  * Linux 3.13.11-ckt18
    - LP: #1441284
  * ipv6: Don't reduce hop limit for an interface
    - LP: #1441103
    - CVE-2015-2922
  * x86/microcode/intel: Guard against stack overflow in the loader
    - LP: #1438504
    - CVE-2015-2666
 -- Luis Henriques <luis.henriques@xxxxxxxxxxxxx>   Tue, 14 Apr 2015 21:38:57 +0100

** Changed in: linux (Ubuntu Trusty)
       Status: Fix Committed => Fix Released

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2666

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2922

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  Trusty soft lockup issues with nested KVM

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Released

Bug description:
  [Impact]
  Upstream discussion: https://lkml.org/lkml/2015/2/11/247

  Certain workloads that need to execute functions on a non-local CPU
  using smp_call_function_* can result in soft lockups with the
  following backtrace:

  PID: 22262  TASK: ffff8804274bb000  CPU: 1   COMMAND: "qemu-system-x86"
   #0 [ffff88043fd03d18] machine_kexec at ffffffff8104ac02
   #1 [ffff88043fd03d68] crash_kexec at ffffffff810e7203
   #2 [ffff88043fd03e30] panic at ffffffff81719ff4
   #3 [ffff88043fd03ea8] watchdog_timer_fn at ffffffff8110d7c5
   #4 [ffff88043fd03ed8] __run_hrtimer at ffffffff8108e787
   #5 [ffff88043fd03f18] hrtimer_interrupt at ffffffff8108ef4f
   #6 [ffff88043fd03f80] local_apic_timer_interrupt at ffffffff81043537
   #7 [ffff88043fd03f98] smp_apic_timer_interrupt at ffffffff81733d4f
   #8 [ffff88043fd03fb0] apic_timer_interrupt at ffffffff817326dd
  --- <IRQ stack> ---
   #9 [ffff880426f0d958] apic_timer_interrupt at ffffffff817326dd
      [exception RIP: generic_exec_single+130]
      RIP: ffffffff810dbe62  RSP: ffff880426f0da00  RFLAGS: 00000202
      RAX: 0000000000000002  RBX: ffff880426f0d9d0  RCX: 0000000000000001
      RDX: ffffffff8180ad60  RSI: 0000000000000000  RDI: 0000000000000286
      RBP: ffff880426f0da30   R8: ffffffff8180ad48   R9: ffff88042713bc68
      R10: 00007fe7d1f2dbd0  R11: 0000000000000206  R12: ffff8804274bb000
      R13: 0000000000000000  R14: ffff880407670280  R15: 0000000000000000
      ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
  #10 [ffff880426f0da38] smp_call_function_single at ffffffff810dbf75
  #11 [ffff880426f0dab0] smp_call_function_many at ffffffff810dc3a6
  #12 [ffff880426f0db10] native_flush_tlb_others at ffffffff8105c8f7
  #13 [ffff880426f0db38] flush_tlb_mm_range at ffffffff8105c9cb
  #14 [ffff880426f0db68] pmdp_splitting_flush at ffffffff8105b80d
  #15 [ffff880426f0db88] __split_huge_page at ffffffff811ac90b
  #16 [ffff880426f0dc20] split_huge_page_to_list at ffffffff811acfb8
  #17 [ffff880426f0dc48] __split_huge_page_pmd at ffffffff811ad956
  #18 [ffff880426f0dcc8] unmap_page_range at ffffffff8117728d
  #19 [ffff880426f0dda0] unmap_single_vma at ffffffff81177341
  #20 [ffff880426f0ddd8] zap_page_range at ffffffff811784cd
  #21 [ffff880426f0de90] sys_madvise at ffffffff81174fbf
  #22 [ffff880426f0df80] system_call_fastpath at ffffffff8173196d
      RIP: 00007fe7ca2cc647  RSP: 00007fe7be9febf0  RFLAGS: 00000293
      RAX: 000000000000001c  RBX: ffffffff8173196d  RCX: ffffffffffffffff
      RDX: 0000000000000004  RSI: 00000000007fb000  RDI: 00007fe7be1ff000
      RBP: 0000000000000000   R8: 0000000000000000   R9: 00007fe7d1cd2738
      R10: 00007fe7d1f2dbd0  R11: 0000000000000206  R12: 00007fe7be9ff700
      R13: 00007fe7be9ff9c0  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: 000000000000001c  CS: 0033  SS: 002b

  [Fix]

  commit 9242b5b60df8b13b469bc6b7be08ff6ebb551ad3,
  Mitigates this issue if b6b8a1451fc40412c57d1 is applied (as in the case of the affected 3.13 distro kernel. However the issue can still occur in some cases.

  
  [Workaround]

  In order to avoid this issue, the workload needs to be pinned to CPUs
  such that the function always executes locally. For the nested VM
  case, this means the the L1 VM needs to have all vCPUs pinned to a
  unique CPU. This can be accomplished with the following (for 2 vCPUs):

  virsh vcpupin <domain> 0 0
  virsh vcpupin <domain> 1 1

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  (Although this may not necessarily be related to nested KVM)
  Potentially related: https://lkml.org/lkml/2014/11/14/656

  Another test case is to do the following (on affected hardware):

  1) Create an L1 KVM VM with 2 vCPUs (single vCPU case doesn't reproduce)
  2) Create an L2 KVM VM inside the L1 VM with 1 vCPU
  3) Run something like 'stress -c 1 -m 1 -d 1 -t 1200' inside the L2 VM

  Sometimes this is sufficient to reproduce the issue, I've observed that running
  KSM in the L1 VM can agitate this issue (it calls native_flush_tlb_others).
  If this doesn't reproduce then you can do the following:
  4) Migrate the L2 vCPU randomly (via virsh vcpupin --live  OR tasksel) between
  L1 vCPUs until the hang occurs.

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:    Ubuntu 14.04.1 LTS
  Release:        14.04
  Codename:       trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions