group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Matthew Ruffell <2036467@xxxxxxxxxxxxxxxxxx>
Date: Thu, 12 Oct 2023 03:42:46 -0000
Reply-to: Bug 2036467 <2036467@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
** Description changed:

  [Impact]
  
  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit the
  entire disk.
  
  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.
  
+ $ resize2fs /dev/nvme1n1p1
+ resize2fs 1.47.0 (5-Feb-2023)
+ resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1
+ Couldn't find valid filesystem superblock.
+ 
  Changing the read of the superblock to Direct I/O solves the issue.
  
  [Testcase]
  
  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for use
  as a scratch disk.
  
  Run the following script, courtesy of Krister Johansen and his team:
  
-    #!/usr/bin/bash
-    set -euxo pipefail
+    #!/usr/bin/bash
+    set -euxo pipefail
  
-    while true
-    do
-            parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
-            sleep .5
-            mkfs.ext4 /dev/nvme1n1p1
-            mount -t ext4 /dev/nvme1n1p1 /mnt
-            stress-ng --temp-path /mnt -D 4 &
-            STRESS_PID=$!
-            sleep 1
-            growpart /dev/nvme1n1 1
-            resize2fs /dev/nvme1n1p1
-            kill $STRESS_PID
-            wait $STRESS_PID
-            umount /mnt
-            wipefs -a /dev/nvme1n1p1
-            wipefs -a /dev/nvme1n1
-    done
+    while true
+    do
+            parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
+            sleep .5
+            mkfs.ext4 /dev/nvme1n1p1
+            mount -t ext4 /dev/nvme1n1p1 /mnt
+            stress-ng --temp-path /mnt -D 4 &
+            STRESS_PID=$!
+            sleep 1
+            growpart /dev/nvme1n1 1
+            resize2fs /dev/nvme1n1p1
+            kill $STRESS_PID
+            wait $STRESS_PID
+            umount /mnt
+            wipefs -a /dev/nvme1n1p1
+            wipefs -a /dev/nvme1n1
+    done
  
  Test packages are available in the following ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
  
  If you install the test packages, the race no longer occurs.
  
  [Where problems could occur]
  
  We are changing how resize2fs reads the superblock from underlying
  disks.
  
  If a regression were to occur, resize2fs could fail to resize offline or
  online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.
  
  [Other info]
  
- Upstream mailing list discussion: 
+ Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.GA5737@xxxxxxxxxxxxxxxxxx/
  https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@xxxxxxx/
  
  This was fixed in the below commit upstream:
  
  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o <tytso@xxxxxxx>
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
-  online resizes
+  online resizes
  Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
  
  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-ESM
  archives to be picked up in cloud images.

** Changed in: e2fsprogs (Ubuntu Bionic)
       Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/2036467

Title:
  Resizing cloud-images occasionally fails due to superblock checksum
  mismatch in resize2fs

Status in cloud-images:
  New
Status in e2fsprogs package in Ubuntu:
  In Progress
Status in e2fsprogs source package in Trusty:
  Won't Fix
Status in e2fsprogs source package in Xenial:
  Won't Fix
Status in e2fsprogs source package in Bionic:
  Won't Fix
Status in e2fsprogs source package in Focal:
  In Progress
Status in e2fsprogs source package in Jammy:
  In Progress
Status in e2fsprogs source package in Lunar:
  In Progress
Status in e2fsprogs source package in Mantic:
  In Progress

Bug description:
  [Impact]

  This is a long running bug plaguing cloud-images, where on a rare
  occasion resize2fs would fail and the image would not resize to fit
  the entire disk.

  Online resizes would fail due to a superblock checksum mismatch, where
  the superblock in memory differs from what is currently on disk due to
  changes made to the image.

  $ resize2fs /dev/nvme1n1p1
  resize2fs 1.47.0 (5-Feb-2023)
  resize2fs: Superblock checksum does not match superblock while trying to open /dev/nvme1n1p1
  Couldn't find valid filesystem superblock.

  Changing the read of the superblock to Direct I/O solves the issue.

  [Testcase]

  Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
  use as a scratch disk.

  Run the following script, courtesy of Krister Johansen and his team:

     #!/usr/bin/bash
     set -euxo pipefail

     while true
     do
             parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
             sleep .5
             mkfs.ext4 /dev/nvme1n1p1
             mount -t ext4 /dev/nvme1n1p1 /mnt
             stress-ng --temp-path /mnt -D 4 &
             STRESS_PID=$!
             sleep 1
             growpart /dev/nvme1n1 1
             resize2fs /dev/nvme1n1p1
             kill $STRESS_PID
             wait $STRESS_PID
             umount /mnt
             wipefs -a /dev/nvme1n1p1
             wipefs -a /dev/nvme1n1
     done

  Test packages are available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test

  If you install the test packages, the race no longer occurs.

  [Where problems could occur]

  We are changing how resize2fs reads the superblock from underlying
  disks.

  If a regression were to occur, resize2fs could fail to resize offline
  or online volumes. As all cloud-images are online resized during their
  initial boot, this could have a large impact to public and private
  clouds should a regression occur.

  [Other info]

  Upstream mailing list discussion:
  https://lore.kernel.org/linux-ext4/20230605225221.GA5737@xxxxxxxxxxxxxxxxxx/
  https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@xxxxxxx/

  This was fixed in the below commit upstream:

  commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
  Author: Theodore Ts'o <tytso@xxxxxxx>
  Date: Thu, 15 Jun 2023 00:17:01 -0400
  Subject: resize2fs: use Direct I/O when reading the superblock for
   online resizes
  Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

  The commit has not been tagged to any release. All supported Ubuntu
  releases require this fix, and need to be published in standard non-
  ESM archives to be picked up in cloud images.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions