← Back to team overview

kernel-packages team mailing list archive

[Bug 1423483] Re: Kdump over network(nfs) does not work

 

Hello,

First of all, the problem at hand is not that the mechanism doesn't
work, it is the fact that NFS file transfer takes too long.  From what I
see, the NFS mechanism has worked at least partly.

The NFS was correctly mounted and the coredump transfer was initiated.
For some reason, the NFS service started to timeout, but kdump-tools
doesn't have much to do with it.

One thing did get my attention.  The mount command that you issued
returns the following (edited for clarity ):

# mount
/dev/sda2 on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,nodev,noexec,nosuid)
...
9.3.189.84:/nfsshare on /nfsmount type nfs (rw,vers=4,addr=9.3.189.84,clientaddr=9.114.13.128)

The NFS mount on /var/crash is not appearing which is definitively a
problem as this is done at a very early stage of the process.  And it
was mounted at the beginning since there is a vmcore-incomplete file on
the remote NFS server.

I don't have any context on the size of the file to be transfered and
maybe it did bring the kexec booted kernel to memory exhaustion but
there is no sign of OOM which is to be expected in these situations.

Right now, with the data at hand, I cannot put forward anything else
than an lack of availability of the NFS server that caused the failure.

** Tags added: cts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1423483

Title:
  Kdump over network(nfs) does not work

Status in makedumpfile package in Ubuntu:
  Triaged

Bug description:
  Problem Description
  ==========================
  Kdump over network(nfs) does not work
   
  ---uname output---
  3.18.0-13-generic
   
  Machine Type = POWER8 
   
  System Hang
  =====================
   The dump process seems to take a lot of time and it takes forever to save the dump. I waited for almost 3 hours, but the dump did not complete.
   
  Steps to Reproduce
  ===========================
  1) Configure kdump over nfs
      Add the following line to /etc/default/kdump-tools

      NFS="9.3.189.84:/nfsshare"

  2) Load  kdump

  root@lop824:~# kdump-config load
  Modified cmdline:BOOT_IMAGE=/boot/vmlinux-3.18.0-13-generic root=UUID=234c5426-796e-4f54-bd77-7b0fe10e0407 ro splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service elfcorehdr=155072K 
  segment[0].mem:0x8000000 memsz:24510464
  segment[1].mem:0x9760000 memsz:65536
  segment[2].mem:0x9770000 memsz:65536
  segment[3].mem:0x9780000 memsz:65536
  segment[4].mem:0x9790000 memsz:21954560
  segment[5].mem:0xec70000 memsz:196608
   * loaded kdump kernel

  3) Trigger a dump. Kdump boot and starts copying the dump but hangs
  midway.

  root@lop824:~# ls -lh /nfsmount/9.114.13.128-201502170326/
  total 1.3M
  -rw------- 1 nobody nogroup 27M Feb 17 03:27 dump-incomplete
  root@lop824:~# 


  root@lop824:~# kdump-config show
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr: 
  NFS:              9.3.189.84:/nfsshare
  HOSTTAG:          ip
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --args-linux --command-line="BOOT_IMAGE=/boot/vmlinux-3.18.0-13-generic root=UUID=234c5426-796e-4f54-bd77-7b0fe10e0407 ro splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/boot/initrd.img-3.18.0-13-generic /boot/vmlinux-3.18.0-13-generic
  root@lop824:~#

  == Comment: #3 - SACHIN P. SANT <ssant@xxxxxxxxxx> - 2015-02-17 07:17:14 ==
  Following messages are seen while saving a dump

  [   31.059522] NFS: Registering the id_resolver key type
  [   31.059542] Key type id_resolver registered
  [   31.059544] Key type id_legacy registered
  [   36.021996] nfs: server 9.3.189.84 not responding, timed out
  [   36.022026] nfs: server 9.3.189.84 not responding, timed out
  [   36.022049] nfs: server 9.3.189.84 not responding, timed out
  [   40.530000] nfs: server 9.3.189.84 not responding, timed out
  [   40.530033] nfs: server 9.3.189.84 not responding, timed out
  [   45.037994] nfs: server 9.3.189.84 not responding, timed out
  [   45.038020] nfs: server 9.3.189.84 not responding, timed out
  [   48.550133] nfs: server 9.3.189.84 not responding, timed out
  [   48.550161] nfs: server 9.3.189.84 not responding, timed out
  [   51.557995] nfs: server 9.3.189.84 not responding, timed out
  [   51.558021] nfs: server 9.3.189.84 not responding, timed out
  [   55.617018] nfs: server 9.3.189.84 not responding, timed out
  [   55.617050] nfs: server 9.3.189.84 not responding, timed out
  [   58.621419] nfs: server 9.3.189.84 not responding, timed out
  [   58.621447] nfs: server 9.3.189.84 not responding, timed out
  [   58.621470] nfs: server 9.3.189.84 not responding, timed out
  [   61.413753] BUG: arch topology borken
  [   61.413757]      the DIE domain not a subset of the NUMA domain
  [   61.413760] BUG: arch topology borken
  [   61.413762]      the DIE domain not a subset of the NUMA domain
  [   61.413765] BUG: arch topology borken
  [   61.413766]      the DIE domain not a subset of the NUMA domain
  [   61.413769] BUG: arch topology borken
  [   61.413770]      the DIE domain not a subset of the NUMA domain
  [   61.413773] BUG: arch topology borken
  [   61.413774]      the DIE domain not a subset of the NUMA domain
  [   61.413777] BUG: arch topology borken
  [   61.413778]      the DIE domain not a subset of the NUMA domain
  [   61.413781] BUG: arch topology borken
  [   61.413782]      the DIE domain not a subset of the NUMA domain
  [   61.413785] BUG: arch topology borken
  [   61.413786]      the DIE domain not a subset of the NUMA domain
  [   61.625436] nfs: server 9.3.189.84 not responding, timed out
  [   66.133424] nfs: server 9.3.189.84 not responding, timed out
  [   66.133453] nfs: server 9.3.189.84 not responding, timed out
  [   70.641436] nfs: server 9.3.189.84 not responding, timed out
  [   70.641465] nfs: server 9.3.189.84 not responding, timed out
  [   74.149421] nfs: server 9.3.189.84 not responding, timed out
  [   74.149452] nfs: server 9.3.189.84 not responding, timed out
  [   78.209471] nfs: server 9.3.189.84 not responding, timed out
  [   78.209498] nfs: server 9.3.189.84 not responding, timed out
  [   81.629433] nfs: server 9.3.189.84 not responding, timed out
  [   81.629442] nfs: server 9.3.189.84 not responding, timed out
  [   84.633433] nfs: server 9.3.189.84 not responding, timed out
  [   87.637419] nfs: server 9.3.189.84 not responding, timed out
  [   90.649450] nfs: server 9.3.189.84 not responding, timed out
  [   93.653426] nfs: server 9.3.189.84 not responding, timed out
  [   95.005433] nfs: server 9.3.189.84 not responding, timed out
  [   96.653426] nfs: server 9.3.189.84 not responding, timed out
  [   98.009437] nfs: server 9.3.189.84 not responding, timed out

  I can manually mount the nfs share manually (while the dump is in
  progress)

  root@lop824:~# mount -t nfs 9.3.189.84:/nfsshare /nfsmount/
  root@lop824:~# mount
  /dev/sda2 on / type ext4 (rw,errors=remount-ro)
  proc on /proc type proc (rw,nodev,noexec,nosuid)
  sysfs on /sys type sysfs (rw,nodev,noexec,nosuid)
  none on /sys/fs/cgroup type tmpfs (rw,uid=0,gid=0,mode=0755,size=1024)
  none on /sys/fs/fuse/connections type fusectl (rw)
  none on /sys/kernel/debug type debugfs (rw)
  none on /sys/kernel/security type securityfs (rw)
  udev on /dev type devtmpfs (rw,mode=0755)
  devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
  tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
  none on /run/lock type tmpfs (rw,nodev,noexec,nosuid,size=5242880)
  none on /run/shm type tmpfs (rw,nosuid,nodev)
  none on /run/user type tmpfs (rw,nodev,noexec,nosuid,size=104857600,mode=0755)
  none on /sys/fs/pstore type pstore (rw)
  cgmfs on /run/cgmanager/fs type tmpfs (rw,relatime,size=128k,mode=755)
  rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
  9.3.189.84:/nfsshare on /nfsmount type nfs (rw,vers=4,addr=9.3.189.84,clientaddr=9.114.13.128)
  root@lop824:~# ls
  root@lop824:~# ls /nfsmount/
  9.114.13.128-201502170326  test
  root@lop824:~# ls /nfsmount/9.114.13.128-201502170326/
  dump-incomplete
  root@lop824:~#

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1423483/+subscriptions