← Back to team overview

kernel-packages team mailing list archive

[Bug 1315736] Re: [Dell PowerEdge R720] Machine Check Exception

 

We are also seeing this bug. The machine becomes non-responsive, unable
to ssh, high load average, trying to access the running java process
does not work. I will file a bug as described in Comment #30

Our hardware is HP Proliant DL380p and we see the following in the
syslog

May 26 06:19:38 server06 kernel: [75831.929529] ------------[ cut here ]------------
May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: 0000 [#1] SMP
May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core
May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu
May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
May 26 06:19:38 server06 kernel: [75831.943583] task: ffff881fe8372fe0 ti: ffff881fe632a000 task.ti: ffff881fe632a000
May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[<ffffffff81179051>]  [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
May 26 06:19:38 server06 kernel: [75831.946137] RSP: 0000:ffff881fe632bd98  EFLAGS: 00010246
May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0000000000000100 RBX: 00007fc37320a370 RCX: ffff881fe632bb18
May 26 06:19:38 server06 kernel: [75831.947902] RDX: ffff881fe8372fe0 RSI: 0000000000000000 RDI: 8000000100c009e6
May 26 06:19:38 server06 kernel: [75831.948932] RBP: ffff881fe632be20 R08: 0000000000000000 R09: 00000000000000a9
May 26 06:19:38 server06 kernel: [75831.949952] R10: 0000000000000001 R11: 0000000000000000 R12: ffff881fd83a7cc8
May 26 06:19:38 server06 kernel: [75831.950961] R13: ffff880fe6787d40 R14: ffff880fe5d95780 R15: 0000000000000080
May 26 06:19:38 server06 kernel: [75831.951985] FS:  00007fc938145700(0000) GS:ffff880fffa80000(0000) knlGS:0000000000000000
May 26 06:19:38 server06 kernel: [75831.976736] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 26 06:19:38 server06 kernel: [75832.005183] CR2: 00007fc373620930 CR3: 0000000fe63fe000 CR4: 00000000000407e0
May 26 06:19:38 server06 kernel: [75832.033473] Stack:
May 26 06:19:38 server06 kernel: [75832.060551]  0000000000000001 ffff881fe632bdb0 ffffffff8109a780 ffff881fe632bdd0
May 26 06:19:38 server06 kernel: [75832.117385]  ffffffff810d7ad6 0000000000000001 ffffffff81f1ea20 ffff881fe632be78
May 26 06:19:38 server06 kernel: [75832.173599]  ffffffff810d983d ffff881fe632be48 ffff8800000000a9 00000001ffffffff
May 26 06:19:38 server06 kernel: [75832.231813] Call Trace:
May 26 06:19:38 server06 kernel: [75832.258781]  [<ffffffff8109a780>] ? wake_up_state+0x10/0x20
May 26 06:19:38 server06 kernel: [75832.286702]  [<ffffffff810d7ad6>] ? wake_futex+0x66/0x90
May 26 06:19:38 server06 kernel: [75832.311849]  [<ffffffff810d983d>] ? futex_wake_op+0x4ed/0x620
May 26 06:19:38 server06 kernel: [75832.337329]  [<ffffffff81721a24>] __do_page_fault+0x184/0x560
May 26 06:19:38 server06 kernel: [75832.363061]  [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
May 26 06:19:38 server06 kernel: [75832.387739]  [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
May 26 06:19:38 server06 kernel: [75832.411608]  [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
May 26 06:19:38 server06 kernel: [75832.436126]  [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
May 26 06:19:38 server06 kernel: [75832.458239]  [<ffffffff8171e288>] page_fault+0x28/0x30
May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7
May 26 06:19:38 server06 kernel: [75832.551672] RIP  [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
May 26 06:19:38 server06 kernel: [75832.574254]  RSP <ffff881fe632bd98>
May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1315736

Title:
  [Dell PowerEdge R720] Machine Check Exception

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  Dell PowerEdge 720 on ubuntu 14.04 shows MCE errors on dmesg. Dell
  support instructed to run DSET and BIOS hardware diagnostics. Neither
  of the tools showed any errors. Dell support said that if there was a
  hardware error it would have been shown on Dell logs and the probable
  reason for the dmesg log is a bug in ubuntu kernel MCE reporting.

  So, is it that following dmesg is because of a kernel bug in ubuntu
  14.04 server?

  [11562.171040] Please check user daemon is running.
  [94953.306404] sbridge: HANDLING MCE MEMORY ERROR
  [94953.306415] CPU 1: Machine Check Exception: 0 Bank 9: 8c00004b000800c0
  [94953.306416] TSC 0 ADDR 2dfa0e1000 MISC 90000800080168c PROCESSOR 0:306e4 TIME 1399142359 SOCKET 1 APIC 20
  [94953.306422] sbridge: HANDLING MCE MEMORY ERROR
  [94953.306423] CPU 1: Machine Check Exception: 0 Bank 10: 8c000050000800c1
  [94953.306424] TSC 0 ADDR 2dfa0e1000 MISC 90000000000208c PROCESSOR 0:306e4 TIME 1399142359 SOCKET 1 APIC 20
  [94953.532217] EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2dfa0e1 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c0 socket:1 channel_mask:3 rank:0)
  [94953.532226] EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Channel#1_DIMM#0 (channel:1 slot:0 page:0x2dfa0e1 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c1 socket:1 channel_mask:3 rank:0)

  ---
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 touko  2 19:15 seq
   crw-rw---- 1 root audio 116, 33 touko  2 19:15 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  CurrentDmesg:
   Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
   dmesg: write failed: Broken pipe
  DistroRelease: Ubuntu 14.04
  InstallationDate: Installed on 2014-02-26 (66 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140219)
  MachineType: Dell Inc. PowerEdge R720
  Package: linux (not installed)
  PciMultimedia:

  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=c03eb237-955a-4dee-bba1-deded53df372 ro
  ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:

  WifiSyslog:

  _MarkForUpload: True
  dmi.bios.date: 01/16/2014
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.2.2
  dmi.board.name: 0DCWD1
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A01
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: dmi:bvnDellInc.:bvr2.2.2:bd01/16/2014:svnDellInc.:pnPowerEdgeR720:pvr:rvnDellInc.:rn0DCWD1:rvrA01:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R720
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315736/+subscriptions


References