group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 1850540] Re: multi-zone raid0 corruption

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Launchpad Bug Tracker <1850540@xxxxxxxxxxxxxxxxxx>
Date: Wed, 04 Dec 2019 18:56:15 -0000
Reply-to: Bug 1850540 <1850540@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
This bug was fixed in the package mdadm - 4.1-4ubuntu1

---------------
mdadm (4.1-4ubuntu1) focal; urgency=medium

  [ dann frazier ]
  * Merge from Debian unstable.  Remaining changes:
    - Ship finalrd hook.
    - Do not install mdadm-shutdown.service on Ubuntu.
    - Drop broken and unused init scripts in favor of native systemd units,
      which can cause failure to reconfigure mdadm package under certain
      confiment types.
    - Drop /etc/cron.d/mdadm and migrate to systemd mdcheck_start|continue
      timer units.
    - Drop /etc/cron.daily/mdadm and migrate to system mdmonitor-oneshot
      timer unit.
    - mdcheck_start.timer configures the mdcheck on a first sunday of the
      month, with a randomized start delay of up to 24h, and runs for at
      most 6h. mdcheck_continue.timer kicks off daily, with a randomized
      start delay of up to 12h, and continues mdcheck for at most 6h.
    - mdmonitor-oneshot.timer runs daily, with a randomized start delay of
      up to 24h.
    - One can use systemd drop-ins to change .timer units timings, set
      environmental variables to decrease/increase the length of checking,
      or start the checks by hand. Previously used checkarray is still
      available, albeit not used by timer units.
    - Above ensures that previous daily / monthly checks are performed, but
      are randomized, such that performance is not as impacted across a
      cluster of machines.
  * Honor the debconf daily autoscan setting in the systemd timer.

  [ Guilherme G. Piccoli ]
  * Introduce "broken" state for RAID0/Linear in mdadm (LP: #1847924)

 -- dann frazier <dannf@xxxxxxxxxx>  Wed, 04 Dec 2019 07:05:07 -0700

** Changed in: mdadm (Ubuntu Focal)
       Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1850540

Title:
  multi-zone raid0 corruption

Status in Release Notes for Ubuntu:
  New
Status in linux package in Ubuntu:
  Confirmed
Status in mdadm package in Ubuntu:
  Fix Released
Status in linux source package in Precise:
  New
Status in mdadm source package in Precise:
  New
Status in linux source package in Trusty:
  Confirmed
Status in mdadm source package in Trusty:
  Confirmed
Status in linux source package in Xenial:
  Confirmed
Status in mdadm source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in mdadm source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in mdadm source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Confirmed
Status in mdadm source package in Eoan:
  Confirmed
Status in linux source package in Focal:
  Confirmed
Status in mdadm source package in Focal:
  Fix Released
Status in mdadm package in Debian:
  Fix Released

Bug description:
  Bug 1849682 tracks the temporarily revert of the fix for this issue,
  while this bug tracks the re-application of that fix once we have a
  full solution.

  Fix checklist:
  [ ] Restore c84a1372df929 md/raid0: avoid RAID0 data corruption due to layout confusion.
  [ ] Also apply these fixes:
      33f2c35a54dfd md: add feature flag MD_FEATURE_RAID0_LAYOUT
      3874d73e06c9b md/raid0: fix warning message for parameter default_layout
  [ ] If upstream, include https://marc.info/?l=linux-raid&m=157239231220119&w=2
  [ ] mdadm update (see Comment #2)
  [ ] Packaging work to detect/aide admin before reboot

  Users of RAID0 arrays are susceptible to a corruption issue if:
   - The members of the RAID array are not all the same size[*]
   - Data has been written to the array while running kernels < 3.14 *and* >= 3.14.

  This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
  https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9

  That change has been applied to stable, but we reverted it to fix
  1849682 until we have a full solution ready.

  To summarize, upstream is dealing with this by adding a versioned
  layout in v5.4, and that is being backported to stable kernels - which
  is why we're now seeing it. Layout version 1 is the pre-3.14 layout,
  version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause
  corruption. However, until an mdadm exists that is able to set a
  layout in the array, there's no way for the kernel to know which
  version(s) was used to write the existing data. This undefined mode is
  considered "Version 0", and the kernel will now refuse to start these
  arrays w/o user intervention.

  The user experience is pretty awful here. A user upgrades to the next
  SRU and all of a sudden their system stops at an (initramfs) prompt. A
  clueful user can spot something like the following in dmesg:

  Here's the message which , as you can see from the log in Comment #1,
  is hidden in a ton of other messages:

  [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
  [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
  [ 72.733979] md: pers->run() failed ...
  mdadm: failed to start array /dev/md0: Unknown error 524

  What that is trying to say is that you should determine if your data -
  specifically the data toward the end of your array - was most likely
  written with a pre-3.14 or post-3.14 kernel. Based on that, reboot
  with the kernel parameter raid0.default_layout=1 or
  raid0.default_layout=2 on the kernel command line. And note it should
  be *raid0.default_layout* not *raid.default_layout* as the message
  says - a fix for that message is now queued for stable:

  https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)

  IMHO, we should work with upstream to create a web page that clearly
  walks the user through this process, and update the error message to
  point to that page. I'd also like to see if we can detect this problem
  *before* the user reboots (debconf?) and help the user fix things.
  e.g. "We detected that you have RAID0 arrays that maybe susceptible to
  a corruption problem", guide the user to choosing a layout, and update
  the mdadm initramfs hook to poke the answer in via sysfs before
  starting the array on reboot.

  Note that it also seems like we should investigate backporting this to
  < 3.14 kernels. Imagine a user switching between the trusty HWE kernel
  and the GA kernel.

  References from users of other distros:
  https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
  https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/

  [*] Which surprisingly is not the case reported in this bug - the user
  here had a raid0 of 8 identically-sized devices. I suspect there's a
  bug in the detection code somewhere.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/1850540/+subscriptions
References

[Bug 1850540] [NEW] multi-zone raid0 corruption
From: dann frazier, 2019-10-29