← Back to team overview

kernel-packages team mailing list archive

[Bug 1190295] Re: 2.6.32-47 kernel update on 10.04 breaks software RAID (+ LVM)


:: could you confirm that these would be various different kernel
versions over time, and not ONLY on the 46 to 47 update?

That's probable. IIRC, my RAID1 MD devices have been blowing up
inexplicably over the past year, and I tend to stay quite current on
patches, applying them each week regardless.

:: Have you seen this when doing any other reboots (not after system

No, it really seems related to kernel updates.  The system I reported on had been pulled
out of production for hardware updates, so was powered off. In the time it took to replace hardware and test the new system new kernel patches had come out so I suspected things were going to get broken as soon as I applied the kernel patches. 

:: do you also see the MD "degraded raid" prompt and if so how do you

No, I only ever see the "Continue to wait; or Press S to skip mounting
or M for manual recovery". Nothing about MD degraded.

:: are the device links in /dev/<vgname>/<lvname> present?
:: are the LVs listed in the output of 'lvs' and what are their state?
:: are the PVs which are backed by md0 present in 'pvs' and what are their state?

Don't remember about /dev/<vgname>. lvs, vgs and pvs all were sporadic
in output, sometimes complaining of leaked memory, sometimes displaying
my LVs and PVs. It was unstable.

:: are the volumes present in the 'dmsetup ls' output?

Never used 'dmsetup ls'.

:: what is the actual state of md0 as show in 'cat /proc/mdstat'?

In the unstable state, after rebooting with patched kernel and RAID+LVM borked, sometimes mdstat would say something about the device not existing and then issuing it again would show "/dev/md_d0" -- which is not the correct MD device.  Sometimes I would
have to issue "mdadm -S /dev/md_d0" and then "mdadm --examine --scan" to restart it. I have never seen Linux software raid fail this badly. Hopefully it's fixed soon as I know from experience MD has historically been bulletproof. I have never had such problems before.

You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

  2.6.32-47 kernel update on 10.04 breaks software RAID (+ LVM)

Status in “linux” package in Ubuntu:

Bug description:
  Been running 10.04 LTS on 8 similar AMD Opteron x86_64 servers for
  several years.  The servers have been kept up-to-date with patches as
  they come out.  These servers have been running 2.6.x kernels. Each
  server has some form of Linux software RAID running on it as well as
  3Ware hardware RAID card using SATA disks.  Software RAID is
  configured as RAID1 for all but one server running software RAID10.
  All servers had software raid configured to use single partitions on
  each disk of types of 0xFD (Linux Software Raid Autodetect).  All
  servers were configured with LVM over the top of /dev/md0.

  In past year, mysterious problems have been happening with software
  RAID after applying system patches.  Upon reboot, server is unable to
  mount LVM partitions on Linux software RAID and boot is interrupted
  with "Continue to wait; or Press S to skip mounting or M for manual
  recovery" requiring intervention from an operator.

  Upon pressing 'M' and logging in as root, the LVM slices on the
  software RAID partition are not mounted and sometimes appear to be
  missing from LVM.  Oftentimes pvs, vgs and lvs will complain about
  "leaking memory". Germane to the issue, LVM will sometimes show the
  problem partitions as "Active" while other times during the login,
  they will simply be gone.  With LVM  and /dev/md0 unstable, there is
  no way to discern the true state of the partitons in question.
  Starting the system from alternate boot media such as CDROM or USB
  drive, sometimes shows the software RAID and LVM in proper state which
  leads to suspicion of a kernel update on the afflicted system.
  Historically and subjectively, best practice in this instance seems to
  be booting from live media and starting the array degraded mode, and
  backing up the array.

To manage notifications about this bug go to: