← Back to team overview

touch-packages team mailing list archive

[Bug 360237] Re: cannot boot root on lvm2 with (largish) snapshot

 

*** This bug is a duplicate of bug 1396213 ***
    https://bugs.launchpad.net/bugs/1396213

** This bug is no longer a duplicate of bug 995645
   udevd: timeout: killing 'watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y''
** This bug has been marked a duplicate of bug 1396213
   LVM VG is not activated during system boot

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to lvm2 in Ubuntu.
https://bugs.launchpad.net/bugs/360237

Title:
  cannot boot root on lvm2 with (largish) snapshot

Status in lvm2 - Logical Volume Manager:
  Invalid
Status in lvm2 package in Ubuntu:
  Confirmed

Bug description:
  When running root-on-lvm2 with the root being part of a vg that
  contains snapshotted volume(s), booting may fail if the snapshot size
  (or fill rate) grows over a certain size.

  The kernel will only wait a certain amount of time until dropping to an initramfs-shell reporting 'Gave up waiting for root device' as the reason. The harddisk drive activity indicator will still show lots of disk access for some more time (minutes in out case). 
  'ps | grep lvm' reveals that 'lvchange -a y' is taking a long time to complete.
  Waiting for the disk activity to die down, and exiting the shell will allow the boot to resume normally.

  Can anybody explain *why* it could take that long - each and every time?
  ==================== DETAILS
  More specificly, my volume group contained an Intrepid root partition of 20Gb (15Gb filled). I created a snapshot of this (18Gb) in order to 'sandbox upgrade' to Jaunty beta. This I did. I was kind of surprised by the volume of changes on the snapshot: the snapshot was 53% full after upgrade (so about 9Gb of changed blocks vs. Intrepid). On reboot, I found out that the system would not reboot itself.

  I have spent a long time around various seemingly related bugs
  (#290153, #332270) for a cause until I found out the culprit myself. I
  have not been able to find any (lvm) documentation warning to the fact
  that lvm operations might take several minutes (?!) to complete on
  snapshotted volumes.

  At the very least this warrants a CAPITAL RED warning flag in the
  docs, IMHO: using large snapshots might render a system unbootable
  (remotely) with root on lvm. Manual intervention at console/serial is
  required!

  1. [untested] it doesn't matter whether the root volume is actually a snapshot or origin, as long as the volume group contains said snapshot (in my case, intrepid was on the origin and jaunty on tje snapshot; both systems failed boot with the same symptoms).
  2. [untested] if root volume is in a separate vg things might work ok (assuming activating several volume groups can be done in parallel)
  3. [partially tested: freshly snapshotted system booted ok] I suspect nothing is the matter with a snapshot sized 18Gb, unless it fills up ('exception blocks' get used). However there is not much use in a snapshot of 18Gb if you are only allowed to use small parts of it.
  4. [tested] as a side note, once booted, both systems were reasonably performant (at least in responsiveness)
  5. [tested] other volume groups in the same server did not suffer noticeable performance penalties even when the problematic one performed badly
  6. [tested] the performance was back to normal (acceptable), even though my 40Gb Home lv still featured a (largely unaltered) snapshot of 5Gb/6% (in the _same_ volume group). This indicates that nothing special to be wrong/corrupted in the vg metadata.
  7. [tested] rebooting/shutting down seemed flawed when initiated from Jaunty (but it seems unrelated to me: Jaunty appears to use kexec for reboot, causing problems with my terminal display and harddisk spin-down as well)

  -------------- executed from the initramfs shell:
  /sbin/lvm vgchange -a n
  /sbin/lvm vgchange -a y # takes several minutes to complete (disk light on continuously)
  /sbin/lvm vgchange -a n
  /sbin/lvm vgchange -a y # again, same agony (continuous period of solid activity)

  /sbin/lvm lvremove vg/largish_snapshot

  /sbin/lvm vgchange -a n
  /sbin/lvm vgchange -a y # only takes seconds

  Of course I made a backup of the data in the snapshot that I actually
  wanted to keep :)

To manage notifications about this bug go to:
https://bugs.launchpad.net/lvm2/+bug/360237/+subscriptions