yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #60039
[Bug 1626243] Re: Cloud-init fails to write ext4 filesystem to Azure Ephemeral Drive
This is fixed in cloud-init 0.7.9.
** Changed in: cloud-init
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1626243
Title:
Cloud-init fails to write ext4 filesystem to Azure Ephemeral Drive
Status in cloud-init:
Fix Released
Status in cloud-init package in Ubuntu:
Fix Released
Status in cloud-init source package in Xenial:
Fix Released
Status in cloud-init source package in Yakkety:
Fix Released
Status in cloud-init source package in Zesty:
Fix Released
Bug description:
=== Begin SRU Template ===
[Impact]
There is a race condition that occurs when cloud-init tries to partition a
block device (/dev/sdb) and then put a filesystem on a partition on it. It is
possible that cloud-init tries to run mkfs on /dev/sdb1 after partitioning the
device /dev/sdb but before the partition device node '/dev/sdb1' exists.
When this race condition occurs, cloud-init will fail to make the "ephemeral"
device available to the user on Azure.
[Test Case]
A reliable reproduce test case is hard to come by here. The failure case
is believed to be well understood.
[Regression Potential]
There should be very little chance for regression, as essentially all the change
does is change:
1. sgdisk -n 1:0:0 /dev/sdb
2. mkfs.ext4 /dev/sdb1
to
1. sgdisk -n 1:0:0 /dev/sdb
1a udevadm settle
1b blockdev --rereadpt
1c udevadm settle
2. mkfs.ext4 /dev/sdb1
The steps '1b' and '1c' above are not necessary, but were present already in
the method. They serve here as additional wait.
[Other Info]
The change that fixes this is viewable at [1]. For context, viewin all of
cc_disk_setup.py [2]. Basically we just add a call to read_parttbl [3] to
exec_mkpart_gpt after invoking a sgdisk command that partitions a disk.
read_partbl basically does a udevadm settle which fixes the race condition that
was seen.
[1] https://git.launchpad.net/cloud-init/commit/?id=29348af1c889931e8973f8fc8cb090c063316f7a
[2] https://git.launchpad.net/cloud-init/tree/cloudinit/config/cc_disk_setup.py?id=29348af1c889931e8973f8fc8cb090c063316f7a
[3] https://git.launchpad.net/cloud-init/tree/cloudinit/config/cc_disk_setup.py?id=29348af1c889931e8973f8fc8cb090c063316f7a#n674
=== End SRU Template ===
The symptom is similar to bug 1611074 but the cause is different. In
this case it seems there is an error accessing /dev/sdb1 when lsblk is
run, possibly because sgdisk isn't done creating the partition. The
specific error message is "/dev/sdb1: not a block device." A simple
wait and retry here may resolve the issue.
util.py[DEBUG]: Running command ['/sbin/sgdisk', '-p', '/dev/sdb'] with allowed return codes [0] (shell=False, capture=True)
cc_disk_setup.py[DEBUG]: Device partitioning layout matches
util.py[DEBUG]: Creating partition on /dev/disk/cloud/azure_resource took 0.056 seconds
cc_disk_setup.py[DEBUG]: setting up filesystems: [{'filesystem': 'ext4', 'device': 'ephemeral0.1', 'replace_fs': 'ntfs'}]
cc_disk_setup.py[DEBUG]: ephemeral0.1 is mapped to disk=/dev/disk/cloud/azure_resource part=1
cc_disk_setup.py[DEBUG]: Creating new filesystem.
cc_disk_setup.py[DEBUG]: Checking /dev/sdb against default devices
cc_disk_setup.py[DEBUG]: Manual request of partition 1 for /dev/sdb1
cc_disk_setup.py[DEBUG]: Checking device /dev/sdb1
util.py[DEBUG]: Running command ['/sbin/blkid', '-c', '/dev/null', '/dev/sdb1'] with allowed return codes [0, 2] (shell=False, capture=True)
cc_disk_setup.py[DEBUG]: Device /dev/sdb1 has None None
cc_disk_setup.py[DEBUG]: Device /dev/sdb1 is cleared for formating
cc_disk_setup.py[DEBUG]: File system None will be created on /dev/sdb1
util.py[DEBUG]: Running command ['/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/sdb1', '--nodeps'] with allowed return codes [0] (shell=False, capture=True)
util.py[DEBUG]: Creating fs for /dev/disk/cloud/azure_resource took 0.008 seconds
util.py[WARNING]: Failed during filesystem operation#012Failed during disk check for /dev/sdb1#012Unexpected error while running command.#012Command: ['/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/sdb1', '--nodeps']#012Exit code: 32#012Reason: -#012Stdout: ''#012Stderr: 'lsblk: /dev/sdb1: not a block device\n'
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1626243/+subscriptions
References