← Back to team overview

kernel-packages team mailing list archive

[Bug 1508767] Re: IBM POWER8 unhandled signal 11 / SEGV

 

1) Can you test an lts-wily kernel which is in our CKT PPA:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+packages

2) How many times a day does this actually occur? Does this only occur
on nova nodes? Are they fairly loaded in terms of memory when this
occurs?

3) Another potential test would be to disable KSM to see if that's the culprit. As root:
	echo 0 > /sys/kernel/mm/ksm/run

4) Can you get the machine to generate userspace core dumps when programs segv?
	ulimit -c unlimited

I can also generate a kernel which BUG()s on _exception with code 30001,
which may give us more insight. But the above information might help.

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Chris J Arges (arges)

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508767

Title:
  IBM POWER8 unhandled signal 11 / SEGV

Status in Ubuntu Cloud Archive:
  New
Status in apparmor package in Ubuntu:
  Invalid
Status in linux package in Ubuntu:
  Confirmed
Status in linux-meta-lts-vivid package in Ubuntu:
  New

Bug description:
  Hi,

  We have a few IBM POWER8 servers which we're currently using as
  OpenStack nova compute nodes. It seems we're regularly running into
  issues where processes are segfaulting:

  | hloeung@gligar:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/syslog.5.gz
  | Oct 16 23:31:38 gligar kernel: [88351.465559] neutron-openvsw[29733]: unhandled signal 11 at 88f9010000000000 nip 00000000100ba0d8 lr 00000000101ad860 code 30001
  | Oct 16 23:31:38 gligar kernel: [88351.566909] init: neutron-plugin-openvswitch-agent main process (29733) killed by SEGV signal
  | Oct 16 23:31:38 gligar kernel: [88351.746611] apport[29500]: unhandled signal 11 at 8850e467250040a8 nip 0000000010201f80 lr 0000000010202984 code 30001
  | Oct 16 23:31:39 gligar kernel: [88352.245829] neutron-rootwra[29749]: unhandled signal 11 at 0809c4b610000000 nip 000000001014ae4c lr 000000001014b544 code 30001
  | Oct 16 23:31:50 gligar kernel: [88364.040340] neutron-rootwra[30060]: unhandled signal 11 at 08a305c12b000000 nip 00000000100b74d0 lr 00000000100b73e4 code 30001
  | Oct 16 23:31:51 gligar kernel: [88364.174218] neutron-rootwra[30065]: unhandled signal 11 at 088eb28e2f004078 nip 00000000100b5974 lr 00000000100aa794 code 30001
  | Oct 16 23:31:52 gligar kernel: [88365.195380] neutron-rootwra[30098]: unhandled signal 11 at 88c939e322000008 nip 00000000100c8b28 lr 0000000010060384 code 30001
  | Oct 16 23:31:52 gligar kernel: [88365.362374] neutron-rootwra[30106]: unhandled signal 11 at 882c58ad2800f04f nip 00003fffaef81220 lr 00003fffaef811a0 code 30001
  | Oct 16 23:32:27 gligar kernel: [88400.966976] neutron-rootwra[30341]: unhandled signal 11 at 88d1fbe922001008 nip 00000000100c8b28 lr 0000000010060384 code 30001
  | Oct 16 23:32:47 gligar kernel: [88420.953053] neutron-rootwra[30412]: unhandled signal 11 at 11b6629054008000 nip 00003fff9a864ac4 lr 00003fff9a84c42c code 30001
  | Oct 16 23:34:49 gligar kernel: [88542.778503] neutron-rootwra[30977]: unhandled signal 11 at 88540f00000010a8 nip 00000000100aa768 lr 00000000100b74e8 code 30001
  | Oct 16 23:35:23 gligar kernel: [88576.700721] neutron-openvsw[29739]: unhandled signal 11 at 08bfcbf7210000a8 nip 00000000100ab390 lr 00000000100b7c38 code 30001
  | Oct 16 23:35:23 gligar kernel: [88576.804961] init: neutron-plugin-openvswitch-agent main process (29739) killed by SEGV signal
  | Oct 16 23:36:01 gligar kernel: [88614.995497] nova-compute[31662]: unhandled signal 11 at 8846c1c81f004008 nip 000000001014c2f0 lr 0000000010151080 code 30001
  | Oct 16 23:36:02 gligar kernel: [88615.110735] nova-compute[4331]: unhandled signal 11 at 88befae9220010a8 nip 00000000100b5c8c lr 000000001014c734 code 30001
  | Oct 16 23:36:02 gligar kernel: [88615.219436] init: nova-compute main process (4331) killed by SEGV signal
  | Oct 17 03:59:56 gligar kernel: [104449.890256] landscape-packa[63283]: unhandled signal 11 at 02f0000000000008 nip 00000000101abeac lr 00000000100a8738 code 30001
  | Oct 17 04:05:00 gligar kernel: [104753.718195] sudo[63915]: unhandled signal 11 at 08e06105d1dcfff8 nip 00003fffb15cf7e4 lr 00003fffb15cfa00 code 30001

  
  | hloeung@floette:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/syslog.7.gz
  | Oct 14 16:55:30 floette kernel: [149326.697938] rsync[9915]: unhandled signal 11 at 00003ffff7cb0000 nip 00003fffa242d054 lr 00003fffa2426560 code 30001
  | Oct 14 21:05:57 floette kernel: [164353.333697] apparmor_parser[102284]: unhandled signal 11 at 08680f0000000000 nip 000000001004bbf8 lr 0000000010028de4 code 30001
  | Oct 14 22:21:24 floette kernel: [168880.481778] neutron-rootwra[153488]: unhandled signal 11 at 8860fbe21f0000a8 nip 00000000100aa768 lr 00000000100b74e8 code 30001
  | Oct 14 22:21:26 floette kernel: [168882.078608] neutron-openvsw[4546]: unhandled signal 11 at 8822cbf03d000008 nip 00000000100aa764 lr 00000000100e6900 code 30001
  | Oct 14 22:21:37 floette kernel: [168893.597834] init: neutron-plugin-openvswitch-agent main process (4546) killed by SEGV signal
  | Oct 14 22:21:39 floette kernel: [168894.949777] nova-rootwrap[153708]: unhandled signal 11 at 88d495c93c0000a8 nip 00000000100a57d4 lr 00000000100ab42c code 30001
  | Oct 14 22:21:43 floette kernel: [168898.973700] neutron-rootwra[153847]: unhandled signal 11 at 08c90df318000020 nip 00000000101ac260 lr 00000000101ad92c code 30001
  | Oct 14 22:21:44 floette kernel: [168900.785421] neutron-rootwra[153850]: unhandled signal 11 at 88d87b783f0000a8 nip 00000000101abf40 lr 00000000100d9cac code 30001
  | Oct 14 22:21:46 floette kernel: [168902.724121] neutron-openvsw[153852]: unhandled signal 11 at 882b78783f0000a8 nip 00000000100b5c8c lr 000000001014c734 code 30001

  
  | hloeung@patrat:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/syslog.7.gz
  | Oct 15 00:48:13 patrat kernel: [553143.677075] rsync[89656]: unhandled signal 11 at 00003fffe6a50000 nip 00003fff77e0d054 lr 00003fff77e06560 code 30001

  
  | Oct 16 02:42:03 wailmer kernel: [862104.157449] nova-compute[11431]: unhandled signal 11 at 081169bc370000a8 nip 00000000100ac164 lr 00000000100b7d6c code 30001
  | Oct 16 02:42:03 wailmer kernel: [862104.264242] init: nova-compute main process (11431) killed by SEGV signal
  | Oct 16 06:38:22 wailmer kernel: [876282.603855] qemu-img[78662]: unhandled signal 11 at 11b625104e000000 nip 00003fffb6224bb4 lr 00003fffb620c42c code 30001
  | Oct 16 06:38:23 wailmer kernel: [876283.336045] qemu-system-ppc[78609]: unhandled signal 11 at ffffffc10000009a nip 00003fffae1a7124 lr 0000000010314874 code 30001
  | Oct 16 06:39:40 wailmer kernel: [876360.399550] neutron-rootwra[79380]: unhandled signal 11 at 0800c20428000000 nip 00000000100a6c14 lr 00000000100a6d4c code 30001
  | Oct 16 06:39:47 wailmer kernel: [876367.577184] neutron-rootwra[79676]: unhandled signal 11 at 0878a100000040a8 nip 00000000100aa768 lr 000000001004ed6c code 30001
  | Oct 16 06:39:49 wailmer kernel: [876369.478066] neutron-openvsw[12655]: unhandled signal 11 at 088e47f11f000008 nip 00000000100db46c lr 00000000100db424 code 30001
  | Oct 16 06:39:58 wailmer kernel: [876378.286827] init: neutron-plugin-openvswitch-agent main process (12655) killed by SEGV signal
  | Oct 16 06:39:59 wailmer kernel: [876379.211801] sudo[79703]: unhandled signal 11 at 886baddd38005000 nip 886baddd38005000 lr 00003fff7da870a8 code 30001
  | Oct 16 06:40:00 wailmer kernel: [876380.344562] libvirtd[109725]: unhandled signal 11 at 88806be02f000000 nip 00003fff78a70684 lr 00003fff78ab7a5c code 30001
  | Oct 16 06:40:06 wailmer kernel: [876386.781123] init: libvirt-bin main process (109725) killed by SEGV signal
  | Oct 16 06:40:06 wailmer kernel: [876386.818672] sudo[79919]: unhandled signal 11 at 11bda1eb70000000 nip 00003fff82094ac4 lr 00003fff8207c42c code 30001
  | Oct 16 06:40:06 wailmer kernel: [876386.921414] neutron-openvsw[79689]: unhandled signal 11 at 88f8010000005000 nip 00000000100ba0d8 lr 00000000100c97c8 code 30001
  | Oct 16 06:40:06 wailmer kernel: [876387.024431] init: neutron-plugin-openvswitch-agent main process (79689) killed by SEGV signal

  
  These servers are all running Trusty with hwe-v kernel (3.19.0-31-generic #36~14.04.1-Ubuntu).

  ProblemType: Crash
  DistroRelease: Ubuntu 14.04
  Package: nova-compute 1:2015.1.1-0ubuntu1~cloud2 [origin: Canonical]
  ProcVersionSignature: Ubuntu 3.19.0-30.34~14.04.1-generic 3.19.8-ckt6
  Uname: Linux 3.19.0-30-generic ppc64le
  ApportVersion: 2.14.1-0ubuntu3.16
  Architecture: ppc64el
  CrashDB:
   {
                  "impl": "launchpad",
                  "project": "cloud-archive",
                  "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml";,
               }
  Date: Fri Oct 16 23:30:00 2015
  ExecutablePath: /usr/bin/nova-compute
  InterpreterPath: /usr/bin/python2.7
  PackageArchitecture: all
  ProcCmdline: /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
  ProcLoadAvg: 1.98 1.32 1.28 3/1516 7754
  ProcSwaps:
   Filename				Type		Size	Used	Priority
   /swap.img                               file		8388544	0	-1
  ProcVersion: Linux version 3.19.0-30-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #34~14.04.1-Ubuntu SMP Fri Oct 2 22:21:52 UTC 2015
  Signal: 6
  SourcePackage: nova
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: libvirtd
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_smt: SMT is off
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Oct 22 03:34 seq
   crw-rw---- 1 root audio 116, 33 Oct 22 03:34 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3.18
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 14.04
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  Package: linux-meta-lts-vivid
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_GB
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=UUID=fcd256a9-8aa6-4805-95ae-f8c635967753 ro console=ttyS1
  ProcLoadAvg: 3.77 2.83 2.55 3/1574 89091
  ProcSwaps:
   Filename				Type		Size	Used	Priority
   /swap.img                               file		8388544	0	-1
  ProcVersion: Linux version 3.19.0-31-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #36~14.04.1-Ubuntu SMP Thu Oct 8 10:25:49 UTC 2015
  ProcVersionSignature: Ubuntu 3.19.0-31.36~14.04.1-generic 3.19.8-ckt7
  RelatedPackageVersions:
   linux-restricted-modules-3.19.0-31-generic N/A
   linux-backports-modules-3.19.0-31-generic  N/A
   linux-firmware                             1.127.16
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty uec-images
  Uname: Linux 3.19.0-31-generic ppc64le
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm
  _MarkForUpload: True
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_dscr: DSCR is 0
  cpu_freq:
   min:	2.016 GHz (cpu 80)
   max:	3.691 GHz (cpu 32)
   avg:	3.527 GHz
  cpu_runmode:
   Could not retrieve current diagnostics mode,
   No firmware implementation of function
  cpu_smt: SMT is off

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1508767/+subscriptions