← Back to team overview

kernel-packages team mailing list archive

[Bug 1450584] [NEW] mono occassionally crashes since kernel 3.13.0-46 on multi-cpu vm

 

Public bug reported:

Gradually since late March more and more users started to complain about
frequent SIGSEGV crashes in our .net/mono application. Early April I
started to investigate it actively.

After eliminating possible native libraries, and testing various mono
versions I discovered the crashes would occur more frequently on a vbox
vm with multiple cpus configured. And discovered that the mono
bug-18026.cs testcase would fairly consistently crash. At that point it
was reported to the mono bug tracker.

I finally got a break when we found a correlation with the kernel version. 3.13.0-46 didn't crash while 3.13.0-48,49 did.
More and more users upgrade to these newer kernel versions and start running into issues, which explains the gradual increase in reports.

Early this week I performed a full git bisect on the kernel between 3.13.0-46 and -48 and isolated the commit that seems to trigger the crashes.
Namely http://kernel.ubuntu.com/git/ubuntu/ubuntu-trusty.git/commit/?id=11f4e0339c8dc8d760483258efd9f15b4c6dcda2

At this point I don't know if the commit messed up something, or that mono simply handles it incorrectly. However, a few commits for linux 4.x seem to fix it:
https://github.com/torvalds/linux/commit/80f7fdb1c7f0f9266421f823964fd1962681f6ce
https://github.com/torvalds/linux/commit/0a4e6be9ca17c54817cf814b4b5aa60478c6df27
I applied these commits myself on top of commit 11f4e033, compiled and ran the testcase... didn't crash in the 200x test runs I did.
Although I don't know if those two patches have unknown side-effects.
I'm not an expert on the kernel, not even remotely. But I thought it would be nice to be able to point at a possible solution.

My current test vm is a virtualbox vm 64bit installed using the 14.04.2 server iso running on an older i7 quad core Windows 7 64bit host.
In the vm I've tested numerous mono and kernel combinations. Last test was with kernel 3.16.0-36 and 3.13.0-51 and mono 4.0.1, in which the problem still occurs.

By now I've debugged the app using gdb several dozen times on various
user setups, compiled mono half a dozen times, and then the 8x3h compile
kernel bisect :) Speaking of down the rabbit-hole...

So I'm pretty desperate for some expert to help me out here. :D


Reference to mono bug report: https://bugzilla.xamarin.com/show_bug.cgi?id=29212

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-51-generic 3.13.0-51.84
ProcVersionSignature: Ubuntu 3.13.0-51.84-generic 3.13.11-ckt18
Uname: Linux 3.13.0-51-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 Apr 30 18:53 seq
 crw-rw---- 1 root audio 116, 33 Apr 30 18:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.10
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
CurrentDmesg: [    9.379188] init: plymouth-upstart-bridge main process ended, respawning
Date: Thu Apr 30 19:45:43 2015
HibernationDevice: RESUME=UUID=b35ef328-166d-4476-a418-e7e80d22cb30
InstallationDate: Installed on 2015-04-22 (7 days ago)
InstallationMedia: Ubuntu-Server 14.04.2 LTS "Trusty Tahr" - Release amd64 (20150218.1)
IwConfig:
 eth0      no wireless extensions.
 
 lo        no wireless extensions.
Lsusb:
 Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: innotek GmbH VirtualBox
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-51-generic root=UUID=68da7e09-1a91-4107-859d-bf452f9ed992 ro
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-51-generic N/A
 linux-backports-modules-3.13.0-51-generic  N/A
 linux-firmware                             1.127.11
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.board.name: VirtualBox
dmi.board.vendor: Oracle Corporation
dmi.board.version: 1.2
dmi.chassis.type: 1
dmi.chassis.vendor: Oracle Corporation
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1450584

Title:
  mono occassionally crashes since kernel 3.13.0-46 on multi-cpu vm

Status in linux package in Ubuntu:
  New

Bug description:
  Gradually since late March more and more users started to complain
  about frequent SIGSEGV crashes in our .net/mono application. Early
  April I started to investigate it actively.

  After eliminating possible native libraries, and testing various mono
  versions I discovered the crashes would occur more frequently on a
  vbox vm with multiple cpus configured. And discovered that the mono
  bug-18026.cs testcase would fairly consistently crash. At that point
  it was reported to the mono bug tracker.

  I finally got a break when we found a correlation with the kernel version. 3.13.0-46 didn't crash while 3.13.0-48,49 did.
  More and more users upgrade to these newer kernel versions and start running into issues, which explains the gradual increase in reports.

  Early this week I performed a full git bisect on the kernel between 3.13.0-46 and -48 and isolated the commit that seems to trigger the crashes.
  Namely http://kernel.ubuntu.com/git/ubuntu/ubuntu-trusty.git/commit/?id=11f4e0339c8dc8d760483258efd9f15b4c6dcda2

  At this point I don't know if the commit messed up something, or that mono simply handles it incorrectly. However, a few commits for linux 4.x seem to fix it:
  https://github.com/torvalds/linux/commit/80f7fdb1c7f0f9266421f823964fd1962681f6ce
  https://github.com/torvalds/linux/commit/0a4e6be9ca17c54817cf814b4b5aa60478c6df27
  I applied these commits myself on top of commit 11f4e033, compiled and ran the testcase... didn't crash in the 200x test runs I did.
  Although I don't know if those two patches have unknown side-effects.
  I'm not an expert on the kernel, not even remotely. But I thought it would be nice to be able to point at a possible solution.

  My current test vm is a virtualbox vm 64bit installed using the 14.04.2 server iso running on an older i7 quad core Windows 7 64bit host.
  In the vm I've tested numerous mono and kernel combinations. Last test was with kernel 3.16.0-36 and 3.13.0-51 and mono 4.0.1, in which the problem still occurs.

  By now I've debugged the app using gdb several dozen times on various
  user setups, compiled mono half a dozen times, and then the 8x3h
  compile kernel bisect :) Speaking of down the rabbit-hole...

  So I'm pretty desperate for some expert to help me out here. :D

  
  Reference to mono bug report: https://bugzilla.xamarin.com/show_bug.cgi?id=29212

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-51-generic 3.13.0-51.84
  ProcVersionSignature: Ubuntu 3.13.0-51.84-generic 3.13.11-ckt18
  Uname: Linux 3.13.0-51-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Apr 30 18:53 seq
   crw-rw---- 1 root audio 116, 33 Apr 30 18:53 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3.10
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  CurrentDmesg: [    9.379188] init: plymouth-upstart-bridge main process ended, respawning
  Date: Thu Apr 30 19:45:43 2015
  HibernationDevice: RESUME=UUID=b35ef328-166d-4476-a418-e7e80d22cb30
  InstallationDate: Installed on 2015-04-22 (7 days ago)
  InstallationMedia: Ubuntu-Server 14.04.2 LTS "Trusty Tahr" - Release amd64 (20150218.1)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
  Lsusb:
   Bus 001 Device 002: ID 80ee:0021 VirtualBox USB Tablet
   Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
  MachineType: innotek GmbH VirtualBox
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-51-generic root=UUID=68da7e09-1a91-4107-859d-bf452f9ed992 ro
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-51-generic N/A
   linux-backports-modules-3.13.0-51-generic  N/A
   linux-firmware                             1.127.11
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 12/01/2006
  dmi.bios.vendor: innotek GmbH
  dmi.bios.version: VirtualBox
  dmi.board.name: VirtualBox
  dmi.board.vendor: Oracle Corporation
  dmi.board.version: 1.2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Oracle Corporation
  dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:rvnOracleCorporation:rnVirtualBox:rvr1.2:cvnOracleCorporation:ct1:cvr:
  dmi.product.name: VirtualBox
  dmi.product.version: 1.2
  dmi.sys.vendor: innotek GmbH

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1450584/+subscriptions


Follow ups

References