← Back to team overview

kernel-packages team mailing list archive

[Bug 1301496] Re: kernel crash: Unable to handle kernel paging request for data

 

Got hold of one of these machines in this "everything is exploding"
state.  Used the below test program to dump out the static variables and
obtain the alignment of the corruption.  (This program does not
manipulate this data which eliminates a bug in dpkg as cause.)  Note
that the corruption is at the start of the page (and although most
elided here repeats on each page thereafter):

===
#include <stdio.h>

static char b[65536 * 16];

main(int argc, char *argv[])
{
        int p;

        printf("%08lx\n", (long)b);
        for (p = 0; p < sizeof(b); p++) {
                if (b[p]) {
                        printf("%d != 0 @ %d [%08lx]\n", b[p], p, (long)&b[p]);
                }
        }
}
===
10011068
68 != 0 @ 61336 [10020000]
20 != 0 @ 61340 [10020004]
2 != 0 @ 61342 [10020006]
1 != 0 @ 61344 [10020008]
75 != 0 @ 61348 [1002000c]
3 != 0 @ 61349 [1002000d]
2 != 0 @ 61352 [10020010]
8 != 0 @ 61353 [10020011]
[...]
68 != 0 @ 126872 [10030000]
20 != 0 @ 126876 [10030004]
2 != 0 @ 126878 [10030006]
1 != 0 @ 126880 [10030008]
75 != 0 @ 126884 [1003000c]
3 != 0 @ 126885 [1003000d]
2 != 0 @ 126888 [10030010]
8 != 0 @ 126889 [10030011]
[...]
===

I also dumped the corruption in full in a more readable form, I would
note that this seems to contain 'lo' and 'eth0' as if it were networking
related:

===
000000    0044    0000    0014    0002    0001    0000    034b    0000
         D  \0  \0  \0 024  \0 002  \0 001  \0  \0  \0   K 003  \0  \0
000010    0802    fe80    0001    0000    0008    0001    007f    0100
       002  \b 200 376 001  \0  \0  \0  \b  \0 001  \0 177  \0  \0 001
000020    0008    0002    007f    0100    0007    0003    6f6c    0000
        \b  \0 002  \0 177  \0  \0 001  \a  \0 003  \0   l   o  \0  \0
000030    0014    0006    ffff    ffff    ffff    ffff    bb72    0054
       024  \0 006  \0 377 377 377 377 377 377 377 377   r 273   T  \0
000040    bb72    0054    0050    0000    0014    0002    0001    0000
         r 273   T  \0   P  \0  \0  \0 024  \0 002  \0 001  \0  \0  \0
000050    034b    0000    1802    0080    000c    0000    0008    0001
         K 003  \0  \0 002 030 200  \0  \f  \0  \0  \0  \b  \0 001  \0
000060    000a    8203    0008    0002    000a    8203    0008    0004
        \n  \0 003 202  \b  \0 002  \0  \n  \0 003 202  \b  \0 004  \0
000070    000a    ff03    0009    0003    7465    3068    0000    0000
        \n  \0 003 377  \t  \0 003  \0   e   t   h   0  \0  \0  \0  \0
000080    0014    0006    ffff    ffff    ffff    ffff    bcc8    0054
       024  \0 006  \0 377 377 377 377 377 377 377 377 310 274   T  \0
000090    bcc8    0054    0000    0000    0000    0000    0000    0000
       310 274   T  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000a0    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
010000    0044    0000    0014    0002    0001    0000    034b    0000
===

I should note at this point that this differs from the corruption as
seen by @vorlon which showed a single bit change in each page.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1301496

Title:
  kernel crash: Unable to handle kernel paging request for data

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  We've seen this happen twice now on ppc64el guests that are probably
  under load.  I don't have a lot of the details on what was going on
  when they failed, but I have the stack traces.

  [101168.836780] Unable to handle kernel paging request for data at address 0x00010001
  [101168.836886] Faulting instruction address: 0xc000000000954b60
  [101168.836934] Oops: Kernel access of bad area, sig: 11 [#1]
  [101168.836971] SMP NR_CPUS=2048 NUMA pSeries
  [101168.837020] Modules linked in: veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables dm_crypt
  [101168.837234] CPU: 1 PID: 19760 Comm: kworker/u4:0 Not tainted 3.13.0-8-generic #28-Ubuntu
  [101168.837294] Workqueue: netns .cleanup_net
  [101168.837332] task: c0000003f99d43e0 ti: c0000001cce44000 task.ti: c0000001cce44000
  [101168.837386] NIP: c000000000954b60 LR: c000000000954b68 CTR: c000000000954b00
  [101168.837439] REGS: c0000001cce47760 TRAP: 0300   Not tainted  (3.13.0-8-generic)
  [101168.837493] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002024  XER: 00000000
  [101168.837620] CFAR: 000000001063ea4c DAR: 0000000000010001 DSISR: 40000000 SOFTE: 1
  GPR00: c000000000954b68 c0000001cce479e0 c0000000010b0dd0 0000000000010001
  GPR04: f0000000099918f0 c0000002be072380 c000000000954b68 c0000003fe023508
  GPR08: 0000000000010000 c000000209fc0000 000000000000000e 0000000000000001
  GPR12: 0000000044002028 c00000000fe80300 c0000000000c3f00 c0000002be1e8bc0
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000001 c000000000f630fc
  GPR24: 0000000000000001 fffffffffffffef7 0000000000000000 c000000000f58638
  GPR28: 0000000000000001 c0000003fbdc0000 0000000000002000 0000000000000000
  [101168.838355] NIP [c000000000954b60] .tcp_net_metrics_exit+0x60/0x110
  [101168.838402] LR [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110
  [101168.838448] Call Trace:
  [101168.838469] [c0000001cce479e0] [c000000000954b68] .tcp_net_metrics_exit+0x68/0x110 (unreliable)
  [101168.838542] [c0000001cce47a70] [c0000000008cc49c] .ops_exit_list.isra.2+0x6c/0xd0
  [101168.838605] [c0000001cce47b00] [c0000000008ccef0] .cleanup_net+0x150/0x250
  [101168.838662] [c0000001cce47bc0] [c0000000000b9e28] .process_one_work+0x1a8/0x4d0
  [101168.838726] [c0000001cce47c60] [c0000000000baaf0] .worker_thread+0x180/0x4a0
  [101168.838783] [c0000001cce47d30] [c0000000000c4010] .kthread+0x110/0x130
  [101168.838841] [c0000001cce47e30] [c00000000000a160] .ret_from_kernel_thread+0x5c/0x7c
  [101168.838903] Instruction dump:
  [101168.838940] 7d295030 2f890000 e93d0288 419e0058 3bc00000 3b800001 60000000 60420000
  [101168.839031] 7bc81f24 7c69402a 2fa30000 419e0024 <ebe30000> 4b8b809d 60000000 2fbf0000
  [101168.839127] ---[ end trace fb028b2b5c006a6a ]---
  --- 
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14-0ubuntu1
  Architecture: ppc64el
  ArecordDevices: Error: [Errno 2] No such file or directory
  CRDA: Error: [Errno 2] No such file or directory
  DistroRelease: Ubuntu 14.04
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinux-3.13.0-19-generic root=UUID=19eaa2f9-0f24-49b9-ba48-24879242481c ro console=hvc0 earlyprintk
  ProcVersionSignature: User Name 3.13.0-19.40-generic 3.13.6
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-19-generic N/A
   linux-backports-modules-3.13.0-19-generic  N/A
   linux-firmware                             N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty uec-images
  Uname: Linux 3.13.0-19-generic ppc64le
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
  WifiSyslog:
   
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1301496/+subscriptions


References