← Back to team overview

kernel-packages team mailing list archive

[Bug 1450584] Re: mono occassionally crashes since kernel 3.13.0-48 on multi-cpu vm

 

I had a couple of users test 3.13.0-54 and 3.16.0-39 proposed.
After more than 24h they haven't experienced a crash whereas before it would likely crash between minutes to an hour.

Please note we also now confirmed earlier suspicions that the issue
didn't only happen on VMs but also physical systems.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1450584

Title:
  mono occassionally crashes since kernel 3.13.0-48 on multi-cpu vm

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Utopic:
  Fix Committed
Status in linux source package in Vivid:
  Fix Committed

Bug description:
  [Impact]
  The addition of the commit:
  http://kernel.ubuntu.com/git/ubuntu/ubuntu-trusty.git/commit/?id=11f4e0339c8dc8d760483258efd9f15b4c6dcda2

  Causes SIGSEGVs when running certain workloads on multi-cpu VMs.

  [Test Case]

  Mono test case here that causes the SIGSEGV
  https://bugzilla.xamarin.com/show_bug.cgi?id=29212

  [Fix]

  These two commits are required for fixing this issue:
  https://github.com/torvalds/linux/commit/80f7fdb1c7f0f9266421f823964fd1962681f6ce
  https://github.com/torvalds/linux/commit/0a4e6be9ca17c54817cf814b4b5aa60478c6df27

  --

  Gradually since late March more and more users started to complain
  about frequent SIGSEGV crashes in our .net/mono application. Early
  April I started to investigate it actively.

  After eliminating possible native libraries, and testing various mono
  versions I discovered the crashes would occur more frequently on a
  vbox vm with multiple cpus configured. And discovered that the mono
  bug-18026.cs testcase would fairly consistently crash. At that point
  it was reported to the mono bug tracker.

  I finally got a break when we found a correlation with the kernel version. 3.13.0-46 didn't crash while 3.13.0-48,49 did.
  More and more users upgrade to these newer kernel versions and start running into issues, which explains the gradual increase in reports.

  Early this week I performed a full git bisect on the kernel between 3.13.0-46 and -48 and isolated the commit that seems to trigger the crashes.
  Namely http://kernel.ubuntu.com/git/ubuntu/ubuntu-trusty.git/commit/?id=11f4e0339c8dc8d760483258efd9f15b4c6dcda2

  At this point I don't know if the commit messed up something, or that mono simply handles it incorrectly. However, a few commits for linux 4.x seem to fix it:
  https://github.com/torvalds/linux/commit/80f7fdb1c7f0f9266421f823964fd1962681f6ce
  https://github.com/torvalds/linux/commit/0a4e6be9ca17c54817cf814b4b5aa60478c6df27
  I applied these commits myself on top of commit 11f4e033, compiled and ran the testcase... didn't crash in the 200x test runs I did.
  Although I don't know if those two patches have unknown side-effects.
  I'm not an expert on the kernel, not even remotely. But I thought it would be nice to be able to point at a possible solution.

  My current test vm is a virtualbox vm 64bit installed using the 14.04.2 server iso running on an older i7 quad core Windows 7 64bit host.
  In the vm I've tested numerous mono and kernel combinations. Last test was with kernel 3.16.0-36 and 3.13.0-51 and mono 4.0.1, in which the problem still occurs.

  By now I've debugged the app using gdb several dozen times on various
  user setups, compiled mono half a dozen times, and then the 8x3h
  compile kernel bisect :) Speaking of down the rabbit-hole...

  So I'm pretty desperate for some expert to help me out here. :D

  Reference to mono bug report:
  https://bugzilla.xamarin.com/show_bug.cgi?id=29212

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-51-generic 3.13.0-51.84
  ProcVersionSignature: Ubuntu 3.13.0-51.84-generic 3.13.11-ckt18
  Uname: Linux 3.13.0-51-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Apr 30 18:53 seq
   crw-rw---- 1 root audio 116, 33 Apr 30 18:53 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3.10
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  CurrentDmesg: [    9.379188] init: plymouth-upstart-bridge main process ended, respawning
  Date: Thu Apr 30 19:45:43 2015
  HibernationDevice: RESUME=UUID=b35ef328-166d-4476-a418-e7e80d22cb30
  InstallationDate: Installed on 2015-04-22 (7 days ago)
  InstallationMedia: Ubuntu-Server 14.04.2 LTS "Trusty Tahr" - Release amd64 (20150218.1)
  IwConfig:
   eth0      no wireless extensions.

   lo        no wireless extensions.
  Lsusb:
   Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
   Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: innotek GmbH VirtualBox
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-51-generic root=UUID=68da7e09-1a91-4107-859d-bf452f9ed992 ro
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-51-generic N/A
   linux-backports-modules-3.13.0-51-generic  N/A
   linux-firmware                             1.127.11
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 12/01/2006
  dmi.bios.vendor: innotek GmbH
  dmi.bios.version: VirtualBox
  dmi.board.name: VirtualBox
  dmi.board.vendor: Oracle Corporation
  dmi.board.version: 1.2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Oracle Corporation
  dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:
  dmi.product.name: VirtualBox
  dmi.product.version: 1.2
  dmi.sys.vendor: innotek GmbH

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1450584/+subscriptions


References