← Back to team overview

kernel-packages team mailing list archive

[Bug 1315736] Re: [Dell PowerEdge R720] Machine Check Exception

 

Sami,

Good observation: I do not have a machine check exception. The
similarities are: a reported bug on the same line; similar behaviour;
and java involved. For reference I copy my kernel bug below (I get
several instances of this, only that the next ones are tainted). As soon
as I have a problem with the new upstream kernel I will report it back

May  9 09:55:29 wintermute kernel: [604868.582044] ------------[ cut here ]------------
May  9 09:55:29 wintermute kernel: [604868.582059] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
May  9 09:55:29 wintermute kernel: [604868.582064] invalid opcode: 0000 [#1] SMP 
May  9 09:55:29 wintermute kernel: [604868.582069] Modules linked in: veth xt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bridge stp llc bnep rfcomm bluetooth aufs binfmt_misc kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd parport_pc ppdev psmouse amd64_edac_mod sp5100_tco serio_raw edac_core lp fam15h_power k10temp i2c_piix4 edac_mce_amd mac_hid parport hid_generic usbhid hid usb_storage ixgbe igb mdio i2c_algo_bit dca ahci ptp libahci pps_core
May  9 09:55:29 wintermute kernel: [604868.582148] CPU: 21 PID: 25260 Comm: java Not tainted 3.13.0-24-generic #46-Ubuntu
May  9 09:55:29 wintermute kernel: [604868.582152] Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.5        12/16/2013
May  9 09:55:29 wintermute kernel: [604868.582156] task: ffff8876d3985fc0 ti: ffff8871f58c8000 task.ti: ffff8871f58c8000
May  9 09:55:29 wintermute kernel: [604868.582159] RIP: 0010:[<ffffffff81179051>]  [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
May  9 09:55:29 wintermute kernel: [604868.582171] RSP: 0000:ffff8871f58c9d98  EFLAGS: 00010246
May  9 09:55:29 wintermute kernel: [604868.582174] RAX: 0000000000000100 RBX: 00007fa583801ea0 RCX: ffff8871f58c9b18
May  9 09:55:29 wintermute kernel: [604868.582177] RDX: ffff8876d3985fc0 RSI: 0000000000000000 RDI: 80000020286009e6
May  9 09:55:29 wintermute kernel: [604868.582180] RBP: ffff8871f58c9e20 R08: 0000000000000000 R09: 00000000000000a9
May  9 09:55:29 wintermute kernel: [604868.582182] R10: 0000000000000001 R11: 0000000000000000 R12: ffff883fb68b30e0
May  9 09:55:29 wintermute kernel: [604868.582185] R13: ffff882e351b2600 R14: ffff88702aceec80 R15: 0000000000000080
May  9 09:55:29 wintermute kernel: [604868.582188] FS:  00007fa5603f2700(0000) GS:ffff882fe7d40000(0000) knlGS:0000000000000000
May  9 09:55:29 wintermute kernel: [604868.582192] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May  9 09:55:29 wintermute kernel: [604868.582194] CR2: 00007fa583a05620 CR3: 0000007861d59000 CR4: 00000000000407e0
May  9 09:55:29 wintermute kernel: [604868.582198] Stack:
May  9 09:55:29 wintermute kernel: [604868.582200]  ffff8871f58c9e20 ffff88702aceec80 00007fad7d38fd70 00007fa583804020
May  9 09:55:29 wintermute kernel: [604868.582241]  0000000000002190 00007fad7401bb68 0000000000000000 0000000000000002
May  9 09:55:29 wintermute kernel: [604868.582266]  ffff887101ef5e20 00007fad781a900f ffff8800000000a9 ffffffffffffff03
May  9 09:55:29 wintermute kernel: [604868.582283] Call Trace:
May  9 09:55:29 wintermute kernel: [604868.582297]  [<ffffffff817219a4>] __do_page_fault+0x184/0x560
May  9 09:55:29 wintermute kernel: [604868.582311]  [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
May  9 09:55:29 wintermute kernel: [604868.582321]  [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
May  9 09:55:29 wintermute kernel: [604868.582329]  [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
May  9 09:55:29 wintermute kernel: [604868.582338]  [<ffffffff81721d9a>] do_page_fault+0x1a/0x70
May  9 09:55:29 wintermute kernel: [604868.582349]  [<ffffffff8171e208>] page_fault+0x28/0x30
May  9 09:55:29 wintermute kernel: [604868.582353] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 
May  9 09:55:29 wintermute kernel: [604868.582415] RIP  [<ffffffff81179051>] handle_mm_fault+0xe61/0xf10
May  9 09:55:29 wintermute kernel: [604868.582421]  RSP <ffff8871f58c9d98>
May  9 09:55:29 wintermute kernel: [604868.582426] ---[ end trace 77f5d1b963750a41 ]---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1315736

Title:
  [Dell PowerEdge R720] Machine Check Exception

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  Dell PowerEdge 720 on ubuntu 14.04 shows MCE errors on dmesg. Dell
  support instructed to run DSET and BIOS hardware diagnostics. Neither
  of the tools showed any errors. Dell support said that if there was a
  hardware error it would have been shown on Dell logs and the probable
  reason for the dmesg log is a bug in ubuntu kernel MCE reporting.

  So, is it that following dmesg is because of a kernel bug in ubuntu
  14.04 server?

  [11562.171040] Please check user daemon is running.
  [94953.306404] sbridge: HANDLING MCE MEMORY ERROR
  [94953.306415] CPU 1: Machine Check Exception: 0 Bank 9: 8c00004b000800c0
  [94953.306416] TSC 0 ADDR 2dfa0e1000 MISC 90000800080168c PROCESSOR 0:306e4 TIME 1399142359 SOCKET 1 APIC 20
  [94953.306422] sbridge: HANDLING MCE MEMORY ERROR
  [94953.306423] CPU 1: Machine Check Exception: 0 Bank 10: 8c000050000800c1
  [94953.306424] TSC 0 ADDR 2dfa0e1000 MISC 90000000000208c PROCESSOR 0:306e4 TIME 1399142359 SOCKET 1 APIC 20
  [94953.532217] EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2dfa0e1 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c0 socket:1 channel_mask:3 rank:0)
  [94953.532226] EDAC MC1: 1 CE memory scrubbing error on CPU_SrcID#1_Channel#1_DIMM#0 (channel:1 slot:0 page:0x2dfa0e1 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0008:00c1 socket:1 channel_mask:3 rank:0)

  ---
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 touko  2 19:15 seq
   crw-rw---- 1 root audio 116, 33 touko  2 19:15 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  CurrentDmesg:
   Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
   dmesg: write failed: Broken pipe
  DistroRelease: Ubuntu 14.04
  InstallationDate: Installed on 2014-02-26 (66 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140219)
  MachineType: Dell Inc. PowerEdge R720
  Package: linux (not installed)
  PciMultimedia:

  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=c03eb237-955a-4dee-bba1-deded53df372 ro
  ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:

  WifiSyslog:

  _MarkForUpload: True
  dmi.bios.date: 01/16/2014
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.2.2
  dmi.board.name: 0DCWD1
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A01
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: dmi:bvnDellInc.:bvr2.2.2:bd01/16/2014:svnDellInc.:pnPowerEdgeR720:pvr:rvnDellInc.:rn0DCWD1:rvrA01:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R720
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1315736/+subscriptions


References