← Back to team overview

kernel-packages team mailing list archive

[Bug 1301496] Re: kernel crash: Unable to handle kernel paging request for data

 

fwiw, I've investigated the dpkg segfaults, and seen the following:

$ gdb dpkg
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
[...]
Reading symbols from dpkg...Reading symbols from /usr/lib/debug//usr/bin/dpkg...done.
done.
(gdb) run -l
Starting program: /usr/bin/dpkg -l

Program received signal SIGSEGV, Segmentation fault.
filesdbinit () at ../../src/filesdb.c:571
571 ../../src/filesdb.c: No such file or directory.
(gdb) print bins
$1 = {0x0 <repeats 9441 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 8191 times>, 0x10000,
  0x0 <repeats 8191 times>, 0x10000, 0x0 <repeats 6942 times>}
(gdb)

On a healthy system, this looks like:

(gdb) break filesdbinit
Breakpoint 2 at 0x10003338: file ../../src/filesdb.c, line 565.
(gdb) print bins
$12 = {0x0 <repeats 131072 times>}
(gdb)

Note that bins is an array of pointers.

(gdb) print sizeof(bins[0])
$6 = 8
(gdb)

So once every 8192 elements, there's a wrong bit in the array; 8192*8 is
64k of memory.

This could be a bug in any of the kernel, qemu, or the underlying host.
Note that after a reboot of wolfe, the VMs are reported to be stable
again for the past 72 hours (!).  So it's possible this points to a bug
with the host OS/kernel.

There is a second P7 system, postal, which has been exhibiting the same
kinds of problems as wolfe.  Adam can speak to this in more detail, and
facilitate any necessary diagnostics on postal.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1301496

Title:
  kernel crash: Unable to handle kernel paging request for data

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  We've seen this happen twice now on ppc64el guests that are probably
  under load.  I don't have a lot of the details on what was going on
  when they failed, but I have the stack traces.

  [101168.836780] Unable to handle kernel paging request for data at address 0x00010001
  [101168.836886] Faulting instruction address: 0xc000000000954b60
  [101168.836934] Oops: Kernel access of bad area, sig: 11 [#1]
  [101168.836971] SMP NR_CPUS=2048 NUMA pSeries
  [101168.837020] Modules linked in: veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables dm_crypt
  [101168.837234] CPU: 1 PID: 19760 Comm: kworker/u4:0 Not tainted 3.13.0-8-generic #28-Ubuntu
  [101168.837294] Workqueue: netns .cleanup_net
  [101168.837332] task: c0000003f99d43e0 ti: c0000001cce44000 task.ti: c0000001cce44000
  [101168.837386] NIP: c000000000954b60 LR: c000000000954b68 CTR: c000000000954b00
  [101168.837439] REGS: c0000001cce47760 TRAP: 0300   Not tainted  (3.13.0-8-generic)
  [101168.837493] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002024  XER: 00000000
  [101168.837620] CFAR: 000000001063ea4c DAR: 0000000000010001 DSISR: 40000000 SOFTE: 1
  GPR00: c000000000954b68 c0000001cce479e0 c0000000010b0dd0 0000000000010001
  GPR04: f0000000099918f0 c0000002be072380 c000000000954b68 c0000003fe023508
  GPR08: 0000000000010000 c000000209fc0000 000000000000000e 0000000000000001
  GPR12: 0000000044002028 c00000000fe80300 c0000000000c3f00 c0000002be1e8bc0
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000001 c000000000f630fc
  GPR24: 0000000000000001 fffffffffffffef7 0000000000000000 c000000000f58638
  GPR28: 0000000000000001 c0000003fbdc0000 0000000000002000 0000000000000000
  [101168.838355] NIP [c000000000954b60] .tcp_net_metrics_exit+0x60/0x110
  [101168.838402] LR [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110
  [101168.838448] Call Trace:
  [101168.838469] [c0000001cce479e0] [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110 (unreliable)
  [101168.838542] [c0000001cce47a70] [c0000000008cc49c] .ops_exit_list.isra.2+0x6c/0xd0
  [101168.838605] [c0000001cce47b00] [c0000000008ccef0] .cleanup_net+0x150/0x250
  [101168.838662] [c0000001cce47bc0] [c0000000000b9e28] .process_one_work+0x1a8/0x4d0
  [101168.838726] [c0000001cce47c60] [c0000000000baaf0] .worker_thread+0x180/0x4a0
  [101168.838783] [c0000001cce47d30] [c0000000000c4010] .kthread+0x110/0x130
  [101168.838841] [c0000001cce47e30] [c00000000000a160] .ret_from_kernel_thread+0x5c/0x7c
  [101168.838903] Instruction dump:
  [101168.838940] 7d295030 2f890000 e93d0288 419e0058 3bc00000 3b800001 60000000 60420000
  [101168.839031] 7bc81f24 7c69402a 2fa30000 419e0024 <ebe30000> 4b8b809d 60000000 2fbf0000
  [101168.839127] ---[ end trace fb028b2b5c006a6a ]---
  --- 
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14-0ubuntu1
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory
  CRDA: Error: [Errno 2] No such file or directory
  DistroRelease: Ubuntu 14.04
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-3.13.0-19-generic root=UUID=19eaa2f9-0f24-49b9-ba48-24879242481c ro console=hvc0 earlyprintk
  ProcVersionSignature: User Name 3.13.0-19.40-generic 3.13.6
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-19-generic N/A
   linux-backports-modules-3.13.0-19-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty uec-images
  Uname: Linux 3.13.0-19-generic ppc64le
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
  WifiSyslog:
   
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1301496/+subscriptions


References