← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1626243] Re: Cloud-init fails to write ext4 filesystem to Azure Ephemeral Drive

 

This bug was fixed in the package cloud-init - 0.7.8-49-g9e904bb-
0ubuntu1~16.10.1

---------------
cloud-init (0.7.8-49-g9e904bb-0ubuntu1~16.10.1) yakkety; urgency=medium

  * debian/cloud-init.templates: enable DigitalOcean by default [Ben Howard]
  * debian/cloud-init.postinst: update /etc/fstab on Azure to fix
    future resize operations. (LP: #1611074)
  * New upstream snapshot.
    - systemd/cloud-init-local.service:
      + replace 'Wants' and 'After' on local-fs.target with more granular
        After=systemd-remount-fs.service and RequiresMountsFor=/var/lib
        and Before=sysinit.target.
        This is done run sufficiently early enough to update /etc/fstab.
        (LP: #1611074)
    - systemd/cloud-init.service:
      + add Before=sysinit.target and DefaultDependencies=no (LP: #1611074)
      + drop Requires=networking.service to work where networking.service is
        not needed.
      + add Conflicts=shutdown.target
      + drop unnecessary Wants=local-fs.target
    - net: support reading ipv6 dhcp config from initramfs [LaMont Jones]
      (LP: #1621615)
    - dmidecode: Allow dmidecode to be used on aarch64, and only attempt
      usage on x86, x86_64, and aarch64. [Robert Schweikert]
    - disk-config: udev settle after partitioning in gpt format.
      (LP: #1626243)
    - Add support for snap create-user on Ubuntu Core images. [Ryan Harper]
      (LP: #1619393)
    - Fix sshd restarts for rhel distros. [Jim Gorz]
    - Move user/group functions to new ug_util file [Joshua Harlow]
    - update Gentoo initscripts to run in the correct order [Matthew Thode]
    - MAAS: improve the debugging tool in datasource to consider
      config provided on kernel cmdline.
    - DataSources:
      + Ec2: protect against non-dictionary in block-device-mapping.
      + AliYun: Add new datasource for Ali-Cloud ECS, that is
        available but not enabled by default [kaihuan.pkh]
      + OpenNebula: replace parsing of 'ip' command with similar function
        available in cloudinit.net.  This fixed unit tests when running
        in environment with no networking.
    - doc changes:
      + Add documentation on stages of boot.
      + make the RST files consistently formated and other improvements.
      + fixed example to not overwrite /etc/hosts [Chris Glass]
      + fix spelling / typos in ca_certs and scripts_vendor.
      + improve HACKING.rst file
      + Add documentation for logging features. [Wesley Wiedenmeier]
    - code style and unit test changes:
      + pep8: fix style errors reported by pycodestyle 2.1.0
      + pyflakes: fix issue with pyflakes 1.3 found in ubuntu zesty-proposed.
      + Add coverage dependency to bddeb to fix package build.
      + Add coverage collection to tox unit tests. [Joshua Powers]
      + do not read system /etc/cloud/cloud.cfg.d (LP: #1635350)
      + tests: silence the Cheetah UserWarning about NameMapper C version.
      + Fix python2.6 things found running in centos 6.

 -- Scott Moser <smoser@xxxxxxxxxx>  Tue, 22 Nov 2016 17:04:36 -0500

** Changed in: cloud-init (Ubuntu Yakkety)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1626243

Title:
  Cloud-init fails to write ext4 filesystem to Azure Ephemeral Drive

Status in cloud-init:
  Fix Committed
Status in cloud-init package in Ubuntu:
  Fix Released
Status in cloud-init source package in Xenial:
  Fix Released
Status in cloud-init source package in Yakkety:
  Fix Released
Status in cloud-init source package in Zesty:
  Fix Released

Bug description:
  === Begin SRU Template ===
  [Impact]
  There is a race condition that occurs when cloud-init tries to partition a
  block device (/dev/sdb) and then put a filesystem on a partition on it.  It is
  possible that cloud-init tries to run mkfs on /dev/sdb1 after partitioning the
  device /dev/sdb but before the partition device node '/dev/sdb1' exists.

  When this race condition occurs, cloud-init will fail to make the "ephemeral"
  device available to the user on Azure.

  [Test Case]
  A reliable reproduce test case is hard to come by here.  The failure case
  is believed to be well understood.

  [Regression Potential]
  There should be very little chance for regression, as essentially all the change
  does is change:

  1.   sgdisk -n 1:0:0 /dev/sdb
  2.   mkfs.ext4 /dev/sdb1

  to

  1.   sgdisk -n 1:0:0 /dev/sdb
  1a   udevadm settle
  1b   blockdev --rereadpt
  1c   udevadm settle
  2.   mkfs.ext4 /dev/sdb1

  The steps '1b' and '1c' above are not necessary, but were present already in
  the method.  They serve here as additional wait.

  [Other Info]
  The change that fixes this is viewable at [1].  For context, viewin all of
  cc_disk_setup.py [2].  Basically we just add a call to read_parttbl [3] to
  exec_mkpart_gpt after invoking a sgdisk command that partitions a disk.
  read_partbl basically does a udevadm settle which fixes the race condition that
  was seen.

  [1] https://git.launchpad.net/cloud-init/commit/?id=29348af1c889931e8973f8fc8cb090c063316f7a
  [2] https://git.launchpad.net/cloud-init/tree/cloudinit/config/cc_disk_setup.py?id=29348af1c889931e8973f8fc8cb090c063316f7a
  [3] https://git.launchpad.net/cloud-init/tree/cloudinit/config/cc_disk_setup.py?id=29348af1c889931e8973f8fc8cb090c063316f7a#n674

  === End SRU Template ===

  The symptom is similar to bug 1611074 but the cause is different. In
  this case it seems there is an error accessing /dev/sdb1 when lsblk is
  run, possibly because sgdisk isn't done creating the partition. The
  specific error message is "/dev/sdb1: not a block device." A simple
  wait and retry here may resolve the issue.

  util.py[DEBUG]: Running command ['/sbin/sgdisk', '-p', '/dev/sdb'] with allowed return codes [0] (shell=False, capture=True)
  cc_disk_setup.py[DEBUG]: Device partitioning layout matches
  util.py[DEBUG]: Creating partition on /dev/disk/cloud/azure_resource took 0.056 seconds
  cc_disk_setup.py[DEBUG]: setting up filesystems: [{'filesystem': 'ext4', 'device': 'ephemeral0.1', 'replace_fs': 'ntfs'}]
  cc_disk_setup.py[DEBUG]: ephemeral0.1 is mapped to disk=/dev/disk/cloud/azure_resource part=1
  cc_disk_setup.py[DEBUG]: Creating new filesystem.
  cc_disk_setup.py[DEBUG]: Checking /dev/sdb against default devices
  cc_disk_setup.py[DEBUG]: Manual request of partition 1 for /dev/sdb1
  cc_disk_setup.py[DEBUG]: Checking device /dev/sdb1
  util.py[DEBUG]: Running command ['/sbin/blkid', '-c', '/dev/null', '/dev/sdb1'] with allowed return codes [0, 2] (shell=False, capture=True)
  cc_disk_setup.py[DEBUG]: Device /dev/sdb1 has None None
  cc_disk_setup.py[DEBUG]: Device /dev/sdb1 is cleared for formating
  cc_disk_setup.py[DEBUG]: File system None will be created on /dev/sdb1
  util.py[DEBUG]: Running command ['/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/sdb1', '--nodeps'] with allowed return codes [0] (shell=False, capture=True)
  util.py[DEBUG]: Creating fs for /dev/disk/cloud/azure_resource took 0.008 seconds
  util.py[WARNING]: Failed during filesystem operation#012Failed during disk check for /dev/sdb1#012Unexpected error while running command.#012Command: ['/bin/lsblk', '--pairs', '--output', 'NAME,TYPE,FSTYPE,LABEL', '/dev/sdb1', '--nodeps']#012Exit code: 32#012Reason: -#012Stdout: ''#012Stderr: 'lsblk: /dev/sdb1: not a block device\n'

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1626243/+subscriptions


References