group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #46022
[Bug 2036467] Re: Resizing cloud-images occasionally fails due to superblock checksum mismatch in resize2fs
trusty and xenial receive bug updates via Pro and not via the main
archive anymore, you'll have to get SEG to add bug tasks for Pro and
prepare +esm updates with them.
** Changed in: e2fsprogs (Ubuntu Trusty)
Status: In Progress => Won't Fix
** Changed in: e2fsprogs (Ubuntu Xenial)
Status: In Progress => Won't Fix
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/2036467
Title:
Resizing cloud-images occasionally fails due to superblock checksum
mismatch in resize2fs
Status in cloud-images:
New
Status in e2fsprogs package in Ubuntu:
In Progress
Status in e2fsprogs source package in Trusty:
Won't Fix
Status in e2fsprogs source package in Xenial:
Won't Fix
Status in e2fsprogs source package in Bionic:
In Progress
Status in e2fsprogs source package in Focal:
In Progress
Status in e2fsprogs source package in Jammy:
In Progress
Status in e2fsprogs source package in Lunar:
In Progress
Status in e2fsprogs source package in Mantic:
In Progress
Bug description:
[Impact]
This is a long running bug plaguing cloud-images, where on a rare
occasion resize2fs would fail and the image would not resize to fit
the entire disk.
Online resizes would fail due to a superblock checksum mismatch, where
the superblock in memory differs from what is currently on disk due to
changes made to the image.
Changing the read of the superblock to Direct I/O solves the issue.
[Testcase]
Start an c5.large instance on AWS, and attach a 60gb gp3 volume for
use as a scratch disk.
Run the following script, courtesy of Krister Johansen and his team:
#!/usr/bin/bash
set -euxo pipefail
while true
do
parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
sleep .5
mkfs.ext4 /dev/nvme1n1p1
mount -t ext4 /dev/nvme1n1p1 /mnt
stress-ng --temp-path /mnt -D 4 &
STRESS_PID=$!
sleep 1
growpart /dev/nvme1n1 1
resize2fs /dev/nvme1n1p1
kill $STRESS_PID
wait $STRESS_PID
umount /mnt
wipefs -a /dev/nvme1n1p1
wipefs -a /dev/nvme1n1
done
Test packages are available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/lp2036467-test
If you install the test packages, the race no longer occurs.
[Where problems could occur]
We are changing how resize2fs reads the superblock from underlying
disks.
If a regression were to occur, resize2fs could fail to resize offline
or online volumes. As all cloud-images are online resized during their
initial boot, this could have a large impact to public and private
clouds should a regression occur.
[Other info]
Upstream mailing list discussion:
https://lore.kernel.org/linux-ext4/20230605225221.GA5737@xxxxxxxxxxxxxxxxxx/
https://lore.kernel.org/linux-ext4/20230609042239.GA1436857@xxxxxxx/
This was fixed in the below commit upstream:
commit 43a498e938887956f393b5e45ea6ac79cc5f4b84
Author: Theodore Ts'o <tytso@xxxxxxx>
Date: Thu, 15 Jun 2023 00:17:01 -0400
Subject: resize2fs: use Direct I/O when reading the superblock for
online resizes
Link: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84
The commit has not been tagged to any release. All supported Ubuntu
releases require this fix, and need to be published in standard non-
ESM archives to be picked up in cloud images.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/2036467/+subscriptions