group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 1641235] Re: Ubuntu 16.10: kdump over nfs did not generate complete vmcore

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Louis Bouchard <louis.bouchard@xxxxxxxxxxxxx>
Date: Wed, 11 Jan 2017 10:26:56 -0000
Reply-to: Bug 1641235 <1641235@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Also affects: makedumpfile (Ubuntu Yakkety)
   Importance: Undecided
       Status: New

** Also affects: makedumpfile (Ubuntu Trusty)
   Importance: Undecided
       Status: New

** Also affects: makedumpfile (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: makedumpfile (Ubuntu Trusty)
       Status: New => Confirmed

** Changed in: makedumpfile (Ubuntu Xenial)
       Status: New => Confirmed

** Changed in: makedumpfile (Ubuntu Yakkety)
       Status: New => Confirmed

** Changed in: makedumpfile (Ubuntu Trusty)
     Assignee: (unassigned) => Louis Bouchard (louis-bouchard)

** Changed in: makedumpfile (Ubuntu Xenial)
     Assignee: (unassigned) => Louis Bouchard (louis-bouchard)

** Changed in: makedumpfile (Ubuntu Yakkety)
     Assignee: (unassigned) => Louis Bouchard (louis-bouchard)

** Changed in: makedumpfile (Ubuntu Trusty)
   Importance: Undecided => Medium

** Changed in: makedumpfile (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: makedumpfile (Ubuntu Yakkety)
   Importance: Undecided => Medium

** Changed in: makedumpfile (Ubuntu)
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1641235

Title:
  Ubuntu 16.10: kdump over nfs did not generate complete vmcore

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Trusty:
  Confirmed
Status in makedumpfile source package in Xenial:
  Confirmed
Status in makedumpfile source package in Yakkety:
  Confirmed

Bug description:
  == Comment: #0 - HARSHA THYAGARAJA - 2016-11-03 08:05:59 ==
  ---Problem Description---
  kdump over nfs did not generate complete vmcore
   
  ---uname output---
  Linux ltciofvtr-firestone1 4.8.0-26-generic #28-Ubuntu SMP Tue Oct 18 14:41:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = PowerNV (Baremetal) - Firestone 
   
  ---Steps to Reproduce---
   1. Setup NFS
  2. Trigger crash: echo c > /proc/sysrq-trigger

  
  == Comment: #6 - Kevin W. Rudd - 2016-11-04 16:30:49 ==

  Hi Harsha.

  It looks like the base kdump NFS functionality works just fine.  The
  known issue with makedumpfile is causing it to drop back to using "cp"
  to transfer the entire, non-compressed /proc/vmcore image.  That's a
  rather large amount of data to send over to the remote server, and it
  appears to be sending back an I/O error after the first 122G.

  Further debug would need to be done to determine if this is a client-
  side or server-side issue.  I recommend first bringing your remote NFS
  server up to the current release as it is currently a bit down-rev.

  == Comment: #8 - HARSHA THYAGARAJA  - 2016-11-10 02:02:31 ==

  Hi Kevin,
  I updated my peer to Ubuntu 16.10 and still saw the same observation. 
  A snippet of the problem at hand is pasted below. 

  [   20.610748] kdump-tools[4559]: Starting kdump-tools:  * Mounting NFS mountpoint 150.1.1.20:/home/tools ...
  [   53.400516] kdump-tools[4559]:  * Dumping to NFS mountpoint 150.1.1.20:/home/tools/201611100158
  [   53.409242] kdump-tools[4559]:  * running makedumpfile -c -d 31 /proc/vmcore /mnt/var/crash/9.47.84.18-201611100158/dump-incomplete
  [   53.526593] kdump-tools[4559]: get_mem_map: Can't distinguish the memory type.
  [   53.527154] kdump-tools[4559]: The kernel version is not supported.
  [   53.527488] kdump-tools[4559]: The makedumpfile operation may be incomplete.
  [   53.527813] kdump-tools[4559]: makedumpfile Failed.
  [   53.528117] kdump-tools[4559]:  * kdump-tools: makedumpfile failed, falling back to 'cp'
  [   90.754092] kdump-tools[4559]: cp: error writing '/mnt/var/crash/9.47.84.18-201611100158/vmcore-incomplete': Input/output error
  [   90.754857] kdump-tools[4559]:  * kdump-tools: failed to save vmcore in /mnt/var/crash/9.47.84.18-201611100158
  [   90.756155] kdump-tools[4559]:  * running makedumpfile --dump-dmesg /proc/vmcore /mnt/var/crash/9.47.84.18-201611100158/dmesg.201611100158
  [   90.758731] kdump-tools[4559]: get_mem_map: Can't distinguish the memory type.
  [   90.759089] kdump-tools[4559]: The kernel version is not supported.
  [   90.759436] kdump-tools[4559]: The makedumpfile operation may be incomplete.
  [   90.759780] kdump-tools[4559]: makedumpfile Failed.
  [   90.760094] kdump-tools[4559]:  * kdump-tools: makedumpfile --dump-dmesg failed. dmesg content will be unavailable
  [   90.760668] kdump-tools[4559]:  * kdump-tools: failed to save dmesg content in /mnt/var/crash/9.47.84.18-201611100158
  [   90.846117] kdump-tools[4559]: Thu, 10 Nov 2016 01:59:56 -0500
  [   90.886629] kdump-tools[4559]: Failed to read reboot parameter file: No such file or directory
  [   90.887070] kdump-tools[4559]: Rebooting.

  == Comment: #13 - Kevin W. Rudd  - 2016-11-11 17:12:33 ==

  I was able to replicate this with debugging at both the kdump client
  and remote NFS server.  The server was perfectly happy with the data
  coming at it, and appeared to be processing a COMMIT request from the
  client when the client shut down the connection.

  Looking at the client-side logs after a failure showed that it was
  logging "server ... not responding" messages, and bailed on the
  connection within the span of just a few seconds.

  This appears to be due to a very over-aggressive timeout being
  specified in /usr/sbin/kdump-config:

  mount -t nfs -o nolock -o tcp -o soft -o timeo=5 -o retrans=5 $NFS
  $KDUMP_COREDIR

  The timeo value is deciseconds, and "5" is far too aggressive for this
  type of connection.  From my observations, the COMMIT was not issued
  until about 60G was transferred, and most remote servers will take a
  lot longer than 5 tenths of a second to flush that amount of data and
  respond to the COMMIT.

  I'm not sure what problem specifying this timeo value was supposed to
  address, but it would be better to leave the timeo value at its
  default for a tcp connection (let the TCP protocol handle any
  communication timeouts on its own).  When I modified kdump-config to
  use the default timeo of 600, the kdump process transferred the entire
  vmcore without error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1641235/+subscriptions