← Back to team overview

kernel-packages team mailing list archive

[Bug 1564475] Comment bridged from LTC Bugzilla

 

------- Comment From thorsten.diehl@xxxxxxxxxx 2016-04-15 05:15 EDT-------
(In reply to comment #25)
> @thorsten
>
> If I understand correctly cio_ignore command needs to be on the running
> instance that is about to crash, rather than on the re-exec kernel. Thus it

@Dimitri,
that's not correct. The LPAR kdump would also work fine, if the zipl.conf has NO cio_ignore statement, just the KDUMP_CMDLINE_APPEND= statement.

> needs to be computed and added to e.g. /etc/zipl.conf. At the moment we do
> not generate/update /etc/zipl.conf in an automated way, but the more I think
> about it the better it sounds to e.g. able to specify all crashdump
> parameters, computed cio_ignore, correct root= argument, and generate menu
> items for every installed ubuntu kernel (rather than just the last one), and
> recovery stanzas too. I'll open a wishlist bug about that.

That sounds reasonable.

> BTW, does it make sense to compute and use `cio_ignore -k -u` generated
> command line by default? on one hand kernel will use less memory, on the
> other hand `lszdev` will be quite empty and one will have a harder time to
> discover additional devices one can bring online.

Exactly, that is the conflict. From performance point of view, it is appreciable to have cio_ignore enabled, since an LPAR boot with thousands of devices takes a little bit longer. For our Test LPARs with many shared resources we prefer and recommend to have cio_ignore enabled. In a resonably set up customer environment the disadvantage of the more inconvenient device handling is more relevant than the benfits of cio_ignore. Feel free to hear other opinions on that topic, e.g. from your z Systems experts Frank and Christian ;-)
But for the KDUMP_CMDLINE_APPEND= statement is does NOT harm,  and I recommend to add this automatically upon every restart of kdump-tools.service

> I have, for now, added things to https://wiki.ubuntu.com/S390X -> feel free
> to edit and/or improve that. And I will open a new bug report to get those
> updates into the Ubuntu Server Guide.

That reads good and is sufficient to close this bug. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1564475

Title:
  128M is not enough for kdump on s390 LPARs

Status in Ubuntu on IBM z Systems:
  Fix Released
Status in makedumpfile package in Ubuntu:
  Invalid
Status in s390-tools package in Ubuntu:
  Fix Released
Status in zipl-installer package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Xenial:
  Invalid
Status in s390-tools source package in Xenial:
  Fix Released
Status in zipl-installer source package in Xenial:
  Fix Released

Bug description:
  == Comment: #0 - Michael Holzheu <michael.holzheu@xxxxxxxxxx> - 2016-03-31 10:59:26 ==
  With the current Ubuntu default setting "crashkernel=128M" kdump on LPARs crashes with out-of-memory (see attachment "dmesg_lpar_out_of_mem_128M.txt").

  On z/VM guests 128M seems to be sufficient.

  One reason on our test LPAR is that a lot of devices are attached (see
  attachment "lscss_lpar.txt") which are not required for kdump but
  consume a lot of memory because the s390 CIO layer allocates data
  structures in the kernel for those devices.

  We can disable the devices by using the "cio_ignore=" kernel parameter
  in "/etc/default/kdump-tools". For example, on our LPAR that uses DASD
  0.0.e934 for /var/crash, we added the following line to disable the
  devices:

  KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1
  cio_ignore=all,!condev,!0.0.e934"

  For more information on the "cio_ignore=" kernel parameter see:
  https://github.com/torvalds/linux/blob/master/Documentation/s390/CommonIO

  Even with "cio_ignore=" we still get out-of-memory with
  "crashkernel=128M".

  With "crashkernel=196M" and "cio_ignore=" we are able to create a dump
  on our LPAR. We currently do not know why kdump with "cio_ignore=" on
  LPAR consumes more memory than on z/VM guests.

  == Comment: #1 - Michael Holzheu <michael.holzheu@xxxxxxxxxx> - 2016-03-31 11:03:15 ==
  Kernel messages of kdump out-of-memory crash on LPAR with many devices without cio_ignore parameter and 128M crashkernel memory.

  == Comment: #2 - Michael Holzheu <michael.holzheu@xxxxxxxxxx> - 2016-03-31 11:04:10 ==
  Output of lscss showing all attached (not online) devices on the LPAR.

  == Comment: #3 - Michael Holzheu <michael.holzheu@xxxxxxxxxx> - 2016-03-31 11:07:35 ==
  To solve this issue our recommendation is:

  1) Increase "crashkernel=" default to 196M on Ubuntu for s390.

  2) Document that KDUMP_CMDLINE_APPEND with "cio_ignore=" can be used
  to decrease memory consumption for kdump on systems with many devices
  that are not required for kdump.

  The most user friendly solution would be to automatically determine
  the required kdump devices and set the correct "cio_ignore=" kernel
  parameter. But this is not trivial, because it can be difficult to
  find out the required devices for stacked setups like LVM or for
  network dump.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1564475/+subscriptions