kernel-packages team mailing list archive

Thread
Date
[Bug 320638] Re: hot-add/remove in mixed (IDE/SATA/USB/SD-card/...) RAIDs with device mapper on top => data corruption (bio too big device md0 (248 > 240))

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Phillip Susi <psusi@xxxxxxxxxx>
Date: Thu, 05 Feb 2015 16:09:04 -0000
Reply-to: Bug 320638 <320638@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
RAID is *not* a backup solution.  If you delete or overwrite a file,
then it's done on both disks, so you can't recover.  If you want a rapid
and coherent backup, use LVM and take a snapshot and back that up.

Also note that this commentary really isn't helping to fix the bug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/320638

Title:
  hot-add/remove in mixed (IDE/SATA/USB/SD-card/...)  RAIDs with device
  mapper on top => data corruption (bio too big device md0 (248 > 240))

Status in The Linux Kernel:
  Confirmed
Status in mdadm - Tool for managing linux software RAID arrays.:
  Confirmed
Status in debian-installer package in Ubuntu:
  Invalid
Status in linux package in Ubuntu:
  Won't Fix
Status in mdadm package in Ubuntu:
  Confirmed
Status in ubiquity package in Ubuntu:
  Invalid

Bug description:
  Problem: md changes max_sector setting of an already running and busy
  md device, when a (hotplugable) device is added or removed. However,
  the device mapper and filesystem layer on top of the raid can not
  (always?) cope with that.

  Observations:
  * "bio too big device mdX (248 > 240)" messages in the syslog
  * read/write errors (some dropped silently, no noticable errors reported during operation, until things like dhcpclient looses its IP etc.)

  Expected:
  Adding and removing members to running raids (hotplugging) should not change the raid device characteristics. If the new member supports only smaller max_sector values, buffer and split the data steam, until the raid device can be set up from a clean state with a more appropriate max_sector value. To avoid buffering and splitting in the future, md could save the smallest max_sector value of the known members in the superblock, and use that when setting up the raid even if that member is not present.

  Note: This is reproducible in much more common scenarios as the
  original reporter had (e.g. --add a USB (3.0 these days) drive to an
  already running SATA raid1 and grow the number of devices).

  Fix:
  Upsteam has no formal bug tracking, but a mailing list. The response was that finally this needs to be "fixed [outside of mdadm] by cleaning up the bio path so that big bios are split by the device that needs the split, not be the fs sending the bio."

  However, in the meantime mdadm needs to saveguard against the date
  corruption:

  > > [The mdadm] fix is to reject the added device [if] its limits are
  > > too low.
  > 
  > Good Idea to avoid the data corruption. MD could save the
  > max_sectors default limit for arrays. If the array is modified and the new 
  > limit gets smaller, postpone the sync until the next assembe/restart.
  > 
  > And of course print a message if postponing, that explains when --force would be save.
  > What ever that would be: no block device abstraction layer (device mapper, lvm, luks,...) 
  > between an unmounted? ext, fat?, ...? filesystem and md?

  As upsteam does not do public bug tracking, the status and
  rememberence of this need remains unsure though.

  
  ---

  This is on a MSI Wind U100 and I've got the following stack running:
  HDD & SD card (USB card reader) -> RAID1 -> LUKS -> LVM -> Reiser

  Whenever I remove the HDD from the Raid1
  > mdadm /dev/md0 --fail /dev/sda2
  > mdadm /dev/md0 --remove /dev/sda2)
  for powersaving reasons, I cannot run any apt related tools.

  > sudo apt-get update
  [...]
  Hit http://de.archive.ubuntu.com intrepid-updates/multiverse Sources
  Reading package lists... Error!
  E: Read error - read (5 Input/output error)
  E: The package lists or status file could not be parsed or opened.

  Taking a look at the kernel log shows (and many more above):
  > dmesg|tail
  [ 9479.330550] bio too big device md0 (248 > 240)
  [ 9479.331375] bio too big device md0 (248 > 240)
  [ 9479.332182] bio too big device md0 (248 > 240)
  [ 9611.980294] bio too big device md0 (248 > 240)
  [ 9742.929761] bio too big device md0 (248 > 240)
  [ 9852.932001] bio too big device md0 (248 > 240)
  [ 9852.935395] bio too big device md0 (248 > 240)
  [ 9852.938064] bio too big device md0 (248 > 240)
  [ 9853.081046] bio too big device md0 (248 > 240)
  [ 9853.081688] bio too big device md0 (248 > 240)

  $ sudo mdadm --detail /dev/md0
  /dev/md0:
          Version : 00.90
    Creation Time : Tue Jan 13 11:25:57 2009
       Raid Level : raid1
       Array Size : 3871552 (3.69 GiB 3.96 GB)
    Used Dev Size : 3871552 (3.69 GiB 3.96 GB)
     Raid Devices : 2
    Total Devices : 1
  Preferred Minor : 0
      Persistence : Superblock is persistent

    Intent Bitmap : Internal

      Update Time : Fri Jan 23 21:47:35 2009
            State : active, degraded
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 0
    Spare Devices : 0

             UUID : 89863068:bc52a0c0:44a5346e:9d69deca (local to host m-twain)
           Events : 0.8767

      Number   Major   Minor   RaidDevice State
         0       0        0        0      removed
         1       8       17        1      active sync writemostly   /dev/sdb1

  $ sudo ubuntu-bug -p linux-meta
  dpkg-query: failed in buffer_read(fd): copy info file `/var/lib/dpkg/status': Input/output error
  dpkg-query: failed in buffer_read(fd): copy info file `/var/lib/dpkg/status': Input/output error
  [...]

  Will provide separate attachements.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/320638/+subscriptions