← Back to team overview

kernel-packages team mailing list archive

[Bug 1508767] Re: IBM POWER8 unhandled signal 11 / SEGV

 

Disabling KSM doesn't seem to have helped. Ryan's
(http://launchpad.net/~fo0bar) been working on getting hwe-w installed
on these compute nodes to see if a more recent kernel will help.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1508767

Title:
  IBM POWER8 unhandled signal 11 / SEGV

Status in Ubuntu Cloud Archive:
  New
Status in apparmor package in Ubuntu:
  Invalid
Status in linux package in Ubuntu:
  Confirmed
Status in linux-meta-lts-vivid package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  We have a few IBM POWER8 servers which we're currently using as
  OpenStack nova compute nodes. It seems we're regularly running into
  issues where processes are segfaulting:

  | hloeung@gligar:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/syslog.5.gz
  | Oct 16 23:31:38 gligar kernel: [88351.465559] neutron-openvsw[29733]: unhandled signal 11 at 88f9010000000000 nip 00000000100ba0d8 lr 00000000101ad860 code 30001
  | Oct 16 23:31:38 gligar kernel: [88351.566909] init: neutron-plugin-openvswitch-agent main process (29733) killed by SEGV signal
  | Oct 16 23:31:38 gligar kernel: [88351.746611] apport[29500]: unhandled signal 11 at 8850e467250040a8 nip 0000000010201f80 lr 0000000010202984 code 30001
  | Oct 16 23:31:39 gligar kernel: [88352.245829] neutron-rootwra[29749]: unhandled signal 11 at 0809c4b610000000 nip 000000001014ae4c lr 000000001014b544 code 30001
  | Oct 16 23:31:50 gligar kernel: [88364.040340] neutron-rootwra[30060]: unhandled signal 11 at 08a305c12b000000 nip 00000000100b74d0 lr 00000000100b73e4 code 30001
  | Oct 16 23:31:51 gligar kernel: [88364.174218] neutron-rootwra[30065]: unhandled signal 11 at 088eb28e2f004078 nip 00000000100b5974 lr 00000000100aa794 code 30001
  | Oct 16 23:31:52 gligar kernel: [88365.195380] neutron-rootwra[30098]: unhandled signal 11 at 88c939e322000008 nip 00000000100c8b28 lr 0000000010060384 code 30001
  | Oct 16 23:31:52 gligar kernel: [88365.362374] neutron-rootwra[30106]: unhandled signal 11 at 882c58ad2800f04f nip 00003fffaef81220 lr 00003fffaef811a0 code 30001
  | Oct 16 23:32:27 gligar kernel: [88400.966976] neutron-rootwra[30341]: unhandled signal 11 at 88d1fbe922001008 nip 00000000100c8b28 lr 0000000010060384 code 30001
  | Oct 16 23:32:47 gligar kernel: [88420.953053] neutron-rootwra[30412]: unhandled signal 11 at 11b6629054008000 nip 00003fff9a864ac4 lr 00003fff9a84c42c code 30001
  | Oct 16 23:34:49 gligar kernel: [88542.778503] neutron-rootwra[30977]: unhandled signal 11 at 88540f00000010a8 nip 00000000100aa768 lr 00000000100b74e8 code 30001
  | Oct 16 23:35:23 gligar kernel: [88576.700721] neutron-openvsw[29739]: unhandled signal 11 at 08bfcbf7210000a8 nip 00000000100ab390 lr 00000000100b7c38 code 30001
  | Oct 16 23:35:23 gligar kernel: [88576.804961] init: neutron-plugin-openvswitch-agent main process (29739) killed by SEGV signal
  | Oct 16 23:36:01 gligar kernel: [88614.995497] nova-compute[31662]: unhandled signal 11 at 8846c1c81f004008 nip 000000001014c2f0 lr 0000000010151080 code 30001
  | Oct 16 23:36:02 gligar kernel: [88615.110735] nova-compute[4331]: unhandled signal 11 at 88befae9220010a8 nip 00000000100b5c8c lr 000000001014c734 code 30001
  | Oct 16 23:36:02 gligar kernel: [88615.219436] init: nova-compute main process (4331) killed by SEGV signal
  | Oct 17 03:59:56 gligar kernel: [104449.890256] landscape-packa[63283]: unhandled signal 11 at 02f0000000000008 nip 00000000101abeac lr 00000000100a8738 code 30001
  | Oct 17 04:05:00 gligar kernel: [104753.718195] sudo[63915]: unhandled signal 11 at 08e06105d1dcfff8 nip 00003fffb15cf7e4 lr 00003fffb15cfa00 code 30001

  
  | hloeung@floette:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/syslog.7.gz
  | Oct 14 16:55:30 floette kernel: [149326.697938] rsync[9915]: unhandled signal 11 at 00003ffff7cb0000 nip 00003fffa242d054 lr 00003fffa2426560 code 30001
  | Oct 14 21:05:57 floette kernel: [164353.333697] apparmor_parser[102284]: unhandled signal 11 at 08680f0000000000 nip 000000001004bbf8 lr 0000000010028de4 code 30001
  | Oct 14 22:21:24 floette kernel: [168880.481778] neutron-rootwra[153488]: unhandled signal 11 at 8860fbe21f0000a8 nip 00000000100aa768 lr 00000000100b74e8 code 30001
  | Oct 14 22:21:26 floette kernel: [168882.078608] neutron-openvsw[4546]: unhandled signal 11 at 8822cbf03d000008 nip 00000000100aa764 lr 00000000100e6900 code 30001
  | Oct 14 22:21:37 floette kernel: [168893.597834] init: neutron-plugin-openvswitch-agent main process (4546) killed by SEGV signal
  | Oct 14 22:21:39 floette kernel: [168894.949777] nova-rootwrap[153708]: unhandled signal 11 at 88d495c93c0000a8 nip 00000000100a57d4 lr 00000000100ab42c code 30001
  | Oct 14 22:21:43 floette kernel: [168898.973700] neutron-rootwra[153847]: unhandled signal 11 at 08c90df318000020 nip 00000000101ac260 lr 00000000101ad92c code 30001
  | Oct 14 22:21:44 floette kernel: [168900.785421] neutron-rootwra[153850]: unhandled signal 11 at 88d87b783f0000a8 nip 00000000101abf40 lr 00000000100d9cac code 30001
  | Oct 14 22:21:46 floette kernel: [168902.724121] neutron-openvsw[153852]: unhandled signal 11 at 882b78783f0000a8 nip 00000000100b5c8c lr 000000001014c734 code 30001

  
  | hloeung@patrat:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/syslog.7.gz
  | Oct 15 00:48:13 patrat kernel: [553143.677075] rsync[89656]: unhandled signal 11 at 00003fffe6a50000 nip 00003fff77e0d054 lr 00003fff77e06560 code 30001

  
  | Oct 16 02:42:03 wailmer kernel: [862104.157449] nova-compute[11431]: unhandled signal 11 at 081169bc370000a8 nip 00000000100ac164 lr 00000000100b7d6c code 30001
  | Oct 16 02:42:03 wailmer kernel: [862104.264242] init: nova-compute main process (11431) killed by SEGV signal
  | Oct 16 06:38:22 wailmer kernel: [876282.603855] qemu-img[78662]: unhandled signal 11 at 11b625104e000000 nip 00003fffb6224bb4 lr 00003fffb620c42c code 30001
  | Oct 16 06:38:23 wailmer kernel: [876283.336045] qemu-system-ppc[78609]: unhandled signal 11 at ffffffc10000009a nip 00003fffae1a7124 lr 0000000010314874 code 30001
  | Oct 16 06:39:40 wailmer kernel: [876360.399550] neutron-rootwra[79380]: unhandled signal 11 at 0800c20428000000 nip 00000000100a6c14 lr 00000000100a6d4c code 30001
  | Oct 16 06:39:47 wailmer kernel: [876367.577184] neutron-rootwra[79676]: unhandled signal 11 at 0878a100000040a8 nip 00000000100aa768 lr 000000001004ed6c code 30001
  | Oct 16 06:39:49 wailmer kernel: [876369.478066] neutron-openvsw[12655]: unhandled signal 11 at 088e47f11f000008 nip 00000000100db46c lr 00000000100db424 code 30001
  | Oct 16 06:39:58 wailmer kernel: [876378.286827] init: neutron-plugin-openvswitch-agent main process (12655) killed by SEGV signal
  | Oct 16 06:39:59 wailmer kernel: [876379.211801] sudo[79703]: unhandled signal 11 at 886baddd38005000 nip 886baddd38005000 lr 00003fff7da870a8 code 30001
  | Oct 16 06:40:00 wailmer kernel: [876380.344562] libvirtd[109725]: unhandled signal 11 at 88806be02f000000 nip 00003fff78a70684 lr 00003fff78ab7a5c code 30001
  | Oct 16 06:40:06 wailmer kernel: [876386.781123] init: libvirt-bin main process (109725) killed by SEGV signal
  | Oct 16 06:40:06 wailmer kernel: [876386.818672] sudo[79919]: unhandled signal 11 at 11bda1eb70000000 nip 00003fff82094ac4 lr 00003fff8207c42c code 30001
  | Oct 16 06:40:06 wailmer kernel: [876386.921414] neutron-openvsw[79689]: unhandled signal 11 at 88f8010000005000 nip 00000000100ba0d8 lr 00000000100c97c8 code 30001
  | Oct 16 06:40:06 wailmer kernel: [876387.024431] init: neutron-plugin-openvswitch-agent main process (79689) killed by SEGV signal

  
  These servers are all running Trusty with hwe-v kernel (3.19.0-31-generic #36~14.04.1-Ubuntu).

  ProblemType: Crash
  DistroRelease: Ubuntu 14.04
  Package: nova-compute 1:2015.1.1-0ubuntu1~cloud2 [origin: Canonical]
  ProcVersionSignature: Ubuntu 3.19.0-30.34~14.04.1-generic 3.19.8-ckt6
  Uname: Linux 3.19.0-30-generic ppc64le
  ApportVersion: 2.14.1-0ubuntu3.16
  Architecture: ppc64el
  CrashDB:
   {
                  "impl": "launchpad",
                  "project": "cloud-archive",
                  "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml";,
               }
  Date: Fri Oct 16 23:30:00 2015
  ExecutablePath: /usr/bin/nova-compute
  InterpreterPath: /usr/bin/python2.7
  PackageArchitecture: all
  ProcCmdline: /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova.conf --config-file=/etc/nova/nova-compute.conf
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
  ProcLoadAvg: 1.98 1.32 1.28 3/1516 7754
  ProcSwaps:
   Filename				Type		Size	Used	Priority
   /swap.img                               file		8388544	0	-1
  ProcVersion: Linux version 3.19.0-30-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #34~14.04.1-Ubuntu SMP Fri Oct 2 22:21:52 UTC 2015
  Signal: 6
  SourcePackage: nova
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: libvirtd
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_smt: SMT is off
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Oct 22 03:34 seq
   crw-rw---- 1 root audio 116, 33 Oct 22 03:34 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3.18
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 14.04
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  Package: linux-meta-lts-vivid
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_GB
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=UUID=fcd256a9-8aa6-4805-95ae-f8c635967753 ro console=ttyS1
  ProcLoadAvg: 3.77 2.83 2.55 3/1574 89091
  ProcSwaps:
   Filename				Type		Size	Used	Priority
   /swap.img                               file		8388544	0	-1
  ProcVersion: Linux version 3.19.0-31-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #36~14.04.1-Ubuntu SMP Thu Oct 8 10:25:49 UTC 2015
  ProcVersionSignature: Ubuntu 3.19.0-31.36~14.04.1-generic 3.19.8-ckt7
  RelatedPackageVersions:
   linux-restricted-modules-3.19.0-31-generic N/A
   linux-backports-modules-3.19.0-31-generic  N/A
   linux-firmware                             1.127.16
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty uec-images
  Uname: Linux 3.19.0-31-generic ppc64le
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm
  _MarkForUpload: True
  cpu_cores: Number of cores present = 20
  cpu_coreson: Number of cores online = 20
  cpu_dscr: DSCR is 0
  cpu_freq:
   min:	2.016 GHz (cpu 80)
   max:	3.691 GHz (cpu 32)
   avg:	3.527 GHz
  cpu_runmode:
   Could not retrieve current diagnostics mode,
   No firmware implementation of function
  cpu_smt: SMT is off

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1508767/+subscriptions