cloud-init team mailing list archive
Mailing list archive
cloud-init summit 2017 - notes
Hello Cloud-init --
Thanks to all who attended the cloud-init summit this past week. We
thank you for the participation. I wanted to follow up on list with the
notes from the event, included inline in the body of this email. Please
feel free to reply with any questions.
(I've included everyone who attended on bcc, in case you are wondring
how you got this.)
One request I have for people is to join the cloud-init mail list. It's
very low volume. You can join with a Launchpad account here:
Here are the notes:
cloud-init Summit 2017 Notes
* **Hangouts link**: [https://g.co/hangout/google.com/cloud-init]
* [Shared Notes] [[https://goo.gl/tngepy]] (**This doc**)
* [Welcome Slides & Agenda] [[https://goo.gl/zgw4Ug]]
* [Pre Summit Bug List] [[https://goo.gl/QQfXE6]]
* [Cloud-init Trello Roadmap]
* [Cloud-init Trello Daily
* Copr repos:
# Main Sessions, Day 1 -- Thursday, Aug 24, 2017
## Attendees Info
* Steve Zarkos (Azure) [stevez]
* Daniel Sol (Azure PM Azure Linux - azurelinuxagent) /
* Stephen Mathews (Softlayer)
* Scott Moser (Canonical) [smoser] US/Eastern
* Sankar Tanguturi (VMware @stanguturi) - Want to replace home grown
configuration engine we want to replace w/ cloud-init
* Robert Scheiwkert (SUSE) - Technical team lead for public cloud team,
carry lots of patches to cloud-init that we want to upstream. [robjo /
* Andrew Jorgensen (AWS and Amazon Linux) [ajorg / ajorgens]
* Zach Marano (GCE) - GCE Guest OS Images Lead
* Max Illfelder (GCE) - GCE Guest env author
* Ryan Harper (Canonical) [rharper] [TZ=US/Chicago]
* Scott Easum (Softlayer)
* [Robert Jennings] (Canonical) cloud image delivery
[rcj@irc/[launchpad], rcj4747@elsewhere TZ=US/Chicago]
* Matt Yeazel (AWS - Amazon Linux team) [yeazelm]
* David Britton (Canonical server team mgr) [dpb1] TZ=US/Mountain
* Josh Powers (Canonical server team eng) [powersj] TZ=US/Pacific
* Chad Smith (Canonical cloud-init eng) - blackboxsw TZ=US/Mountain
* Paul Meyer (Azure Linux) [paulmey@irc,github,microsoft.com
* Lars Kellogg-Stedman (Red Hat), larsks@(irc,github,twitter, etc.),
* Ryan McCabe (RH) - rmccabe
## Recent Features / Roadmap / Q&A
* Lars: version numbering (e.g. .1 releases hard to sell)
* Lars: external scripts versus inside cloud-init… moving ds-identify
knowledge back into cloud-init.
* Lars: slight concern about netplan as primary format w/ special
* Lars: There are webhooks in COPR that could trigger per-commit builds
for CI. Canonical’s CI does per-commit testing. Details in Josh’s
* Lars: OpenStack surfaces 3rd-party CI mechanisms, can we do this with
cloud-init upstream CI?
* Smoser: Lot’s of interest from the community in querying metadata
(like instance-id), it might be nice for cloud-init to provide that
* Lars: want to cleanly separate collection of data from cloud-init
from acting on that metadata. I want a tool to dump my cloud X’s
metadata in a unified/standard format (trello card:
* Robert: might we look at integrating existing tooling for metadata
* Paul/Stephen: Want to discuss eventing / hotplug operations
* Lars interested in KVM CI supporting multiple distros
## cloud-init integration overview
* Integration tests design original doc: [https://goo.gl/qVhSrq]
* CI injects the tested cloud-init into an image, boots the image with
provided test cloud-config content and runs collect scripts after
* Harvested output comes back to the host CI server, where nose is run,
to get processed for expected test output
* We obtain images and create customized lxc snapshots from
* Jenkins CI view -
* Robert (SUSE) - Are tests consumable by others. They have an Image
Proofing App tool ([IPA]) which wraps custom unit tests needs to
drive unit tests with custom configs or operations (like restarting or
rebooting instances). If cloud-init tests are available as a separate
consumable package, the IPA tool could source tests and augment the
tests or images as needed [https://github.com/SUSE/ipa]
* Paul mentions that Azure has a test framework that they use for
cloud-init testing and how it interacts with their WAL agent.
* Smoser: Integration test wants, on LXC we can jump in out-of-band
(sans SSH) to validate instances, but cloud/kvm deployments need an
ssh-key setup for accessing the instance under test.
## Decreasing Boot Time Overview
* Boot stages: [in tree docs]
* TZ in systemd: [hacker news post]
* When was the locale generating fix landed: rharper "now". Daily
cloud-images contain updated cloud-init and fixed pre-generate locale.
Stable, releases of Xenial and Zesty will follow on after an Ubuntu
* smoser: readurl … that would good to have logged each url request and
then the ability to show all the urls read and times for this.
* And existing method/function can be wrapped with util.log_time to
generate granular events which could be interpreted by cloud-init
* Lars: might want a function decorator to facilitate that more
* Want an optional mechanism to turn on deeper analysis (like strace
collection from all cloud-init actions)
* Don’t want to impact all cloud-init runs w/ analysis
* RE: The execve() analysis, would be nice to be able to optionally
group by positional arguments as well
* For module use slide: Ryan was going to collect the module import
flame graphs w/ snakefood to track whether we can distill python
modules to the minimal set required for cloud-init boot-stage
functionality. Less file imports == faster cloud-init
* Currently cloud-init busted in trunk :-(
* LP: #
* Do we track/count # of touches of the metadata service
* Azure: We want get_data analyze of datasource.get_data calls
* Robert: What is the total time of cloud-init during the boot?
(basically, what is systemd analyze blame overall?) - It is
significant, some things we can improve, some things we can’t. The
goal of this project is to give us the data, not to improve every tiny
performance problem in cloud-init we can.
* Smoser: Long-term we want to run analyze trend analysis against
previous CI runs to watch for negative impacts to bringup
* Amazon: module trim would be compelling as some disks are remote to
instances and some of significant slowness can be introduced in
init-local timeframe due to remote file loads
* Ryan/dpb: might look at lazy loading modules only when needed, or
prune most significant module from each cloud-init stage to trim it a
## Cloud-init schema validation
* As user’s primary interface with cloud-init, look at how to improve
that user experience. Catch errors and issues earlier.
* Each module has schema defined in them
* Path to get the schema validated in the modules: hope is to build in
unit testing (which will be run by CI) to exercise each key that was
added such that they are all tested.
* Lars: merging data - how are we validating the merging of a variety of
user-data, are you getting the data you expect? Suggestion: Document
merging behavior, and show a demo/example of how to test that.
* Potential subcommand for merge tool
* Amazon: Does this only look at yaml files, currrently "yes", See above
for our intent for a “merge” subcommand which will perform merges of
vendor_data with all user_data parts to show the coalesced
* Lars: we want to see merging of vendor_data substrate/metadata etc to
see if the mechanism for behaving the way we expect it to
* RobertS: image vendors would like to use a tool to provide known
vendor_data, user-data, etc to ensure expected behavior. That is an
evolution over just syntactic correctness.
* Robert: Not being able to overwrite certain keys might not be such a
* Differing opinions on this around the room
* Robert: The web service could have a catalog of default vendor
configuration, so they could see the assembled file and check it all
## Version Numbering
* Flags for the consumer of the product which indicates stability and
* 0.7.6 -> 0.7.9 completely different products, minor revision doesn’t
* Looking for deliberate/explicit version numbering schema
* Proposed: usage of [semantic versioning]
* smoser: focused on backwards compatibility: therefore we should be
moving up the minor version instead of patch version for new
features as it has been in the past.
* Proposal: Roll over to 1.0.0 and start using semantic versioning
* [Given a version number MAJOR.MINOR.PATCH, increment the:]
* MAJOR version when you make incompatible API changes,
* MINOR version when you add functionality in a
backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.
* Additional labels for pre-release and build metadata are
available as extensions to the MAJOR.MINOR.PATCH format.
* Lars: Having an established release schedule would also be very
helpful. Distributions can better package releases around a schedule.
Makes it easier to justify pulling in fixes.
* Lars: Development model using branches make sense to utilize as well?
E.g. having master freeze before a release, and a devel branch for
pushing new things in the meantime.
* Instead hide new features under command line and turn on formally
once tested (Lars likes this)
* Certain vendors carry large patch loads, how and where do we host
those patches long-term?
## external scripts vs in-program
* ds-identify feels like duplication of data source discovery is there a
way to roll that logic into the Datasource so that there isn’t that
* The crux of the matter was execution time of python loading in all
cloud-init modules on slow disk systems like raspi, etc. Compared to
dash (for what is in effect simple reads of /proc, /sys, etc).
* Ds-identify is run as a generator and only enables a datasource to run
if the underlying substrate is compatible(discovered) for that
* [ACTION] RCJ: Side issue, can unit test add shellcheck for
## cloud-init QA: Provider tests
* Want determine best specific vendor/cloud image testing integrated
somehow up into upstream
* [ACTION] Softlayer will grab information on CI to present tomorrow
* Softlayer: Provision - create a template and then start an instance
* RobertS: If tests are consumable he’d would run them on opensuse
images and ensure they publish results. Integration tests running
against clouds should have a framework which supports test result
validation for different distro/image vendor expectations.
* Looking for best practices for interacting with a given cloud:
* How best to boot a custom image
* [Action - Josh] - Put together an email with requests, and send to
cloud providers describing how to run tests and ask for best practices
or merge proposals for remote image testing
* KVM merge proposal:
* Full integration test run times: LXD 12 minutes, KVM haven’t looked at
total time cost yet (test merge proposal in flight^)
## cloud-init QA: Distro Tests, how to build your own CI
* Lars: There are webhooks in COPR that could trigger per-commit
builds for CI. Canonical’s CI does per-commit testing. Details in
## Python 3
* From a distribution perspective, SLES11 (old) on python 2.6.9 and SLES
11 is going EOL in March 2018. And python2 going EOL in 2 years 7
months, see pythonclock) meaning likely that clouds/distros might
start dropping support for py2
* SLES 15 will be python 3 next year with python2 in the legacy module
with 2 years of support
* At some point python2 support will be a "don’t care" for most distros
* Lars: RHEL6 may have a longer lifecycle than SLES but cloud-init is
pinned at an older version of cloud-init, so upstream cloud-init could
drop 2.6 shortly
* Lars: RHEL7 still only has python 2.7 so there RHEL still cares about
2.7 for a while (June 2024)
* smoser: python2.7 is probably present until RHEL7 has python3. Nobody
in cloud-init summit cares about 2.6 support.
* SLES: Today distro vendors have an increased QA support matrix to
validate python 2.6 versus python 3.0 support. SLES11-12 separated
distro into modules which have different update/backport policies. The
cloud module (Robert’s group) have CI exception/agreement for updates
so cloud-init can be moved to new versions as needed.
* [AGREED] Python 2.6 support limited to ~18 months, 2.7 will continue
for RHEL7 for a while unless RHEL7 can pull in a python3 version.
* [ACTION - Lars] Determine whether python3 support will be introduced
in RHEL7 at some point, or only RHEL8
## Using LXD for rapid dev/testing
* Getting started:
* Images: [https://us.images.linuxcontainers.org/]
* Ubuntu Daily Images: [https://cloud-images.ubuntu.com/daily/]
* Stephane’s DebConf presentation
* Good overview of basics and some advanced features
* Scott: Would be really nice to have other OS images with cloud-init
already in them like how the Ubuntu daily images do. Makes running
cloud-init development with them much easier.
* It would be nice if distro vendors interested in cloud-init could
provide "official" LXD images for their distributions.
* How do we create an LXD image? [tarball including rootfs +
metadata or squashfs]
* RobertS: might be able to teach kiwi build service to publish
lxd images. If there is an images endpoint, LXD could crawl it
* Can we serve images from our own endpoint? Yes, either by
implementing the [REST API] or by providing a simplestreams
* Lars: Thoughts about mocking/faking metadata service for quick
* Would love contributions of mock metadata services from cloud
* Chad: Serve up an instance on a cloud, harvest metadata information,
and then use that data for serving up to tests.
* smoser/lars: use docker official images for rhel/centos testing
## How to query metadata
* Might be nice for cloud-init to surface metadata since it crawls it
for most data sources so that other tools don’t have to do that as
* Datasources currently crawl and react and cache metadata in a pickled
object on the filesystem, we would like to query cloud-init for the
cached (or live) metadata and ultimately produce a unified JSON
structure on the filesystem to allow other tools to parse metadata.
* Originally there existed ‘cloud-init query’, not implemented
* AndrewJ: potentially dump standard format keys and custom "raw"
content within the same structure
* smoser: might have security concerns about leaking sensitive
information if we dump in a single blob, maybe we’d like to separate
* Why pickle? keeps the class on disk, so cloud-init local can check
instance_id for validation about whether cloud-init needs to be re-run
* [ACTION - blackboxsw] Path forward: cloud-init supports pkl load a
JSON load object and writes json instead of obj.pkl for new releases.
When writing JSON, remove obj.pkl file
* AndrewJ: Leveraging datasource get_data logic to handle retries or
waits etc would be a big win for script writers so they don’t have to
bake in that logic to their scripts.
* [ACTION - blackboxsw] Datasource.crawl_metadata() branch
* [ACTION - blackboxsw] Design schema specification for the unified
metadata keys that cloud-init’s JSON object
* Wants: Top-level cloud-type, some standard keys, an blob of
cloud-specific keys and standard network config format
# Main Sessions, Day 2 -- Friday, Aug 25, 2017
## Bug Triage & Squashing
## Device hotplug overview & feedback
* Ajorg: There is no hierarchy BOOT/INSTANCE/ALWAYS, if manually
running cloud-init --file config.yaml --frequency always, but the
module is per-boot or per-instance, a sane hierarchy would prevent a
module from running currently too rudimentary to handle that hierarchy
as it’s only sem...<freq>.
* GCE addresses this by documenting their interface and partitioning the
config file in a way that makes clear the pieces which are Google
managed; thus detection of changes by the user is possible and then
tooling can decide if it should still manage the config file
* In ssh authorized_keys files GCE adds dynamic metadata content
with comment tag #Added by google…
* For iptables Google tries to separate cloud changes from
user-driven changes with namespace scoping network definitions
* Need to take module case-by-case basis to determine whether idempotent
runs will be expected behavior when determining whether to be
* Spawning a background hotplug service should be explicitly default
disabled and ‘opt-in’ enabled by configuration if explicitly enabled
in case upgrade environments may be running hot-plug configuration
* Openstack adds disk and nic info to metadata
* AWS doesn’t surface updated disk info in metadata, but does add
dynamic network info
* AndrewJ: Computers are kittens vs. cattle
* Andrew: One place where hotplug is interesting for him is if it were
used to configure things all the time, not just when something was
hotplugged, from a risk/maintenance/code-path perspective.
* AWS has the [ec2-net-utils] tool (and [ubuntu port])
* SUSE project cloud-netconfig
## Network v2 yaml as primary format
* [Netplan config example]
* Having a common intermediary (which has a spec) to represent network
configuration makes unit testing easier as it’s a common spec that is
published and understood (even outside of cloud-init)
* Vmacs don’t seem to be defined in the spec, does that preclude the
netplan spec from describing such features. Answer: It’s a fluid spec
that is currently being extended, it’s a merge proposal away.
* [Action - dpb] AndrewJ to follow up with Ryan & Pat about net-utils
use cases for netplan
## Breakout JSON instance data storage for cloud-init part 1
* Bikeshedding of function names
* Question around non-utf8 encoding and handling of that
* How should we treat sensitive data (e.g. metadata)
* Option: treat it as a white list, assume not allowed and only show
* Option: if root, get it all; if not root, don’t
* Option: in the docs to avoid the phone call what is or isn’t
* Initial cut will separate all user-data in a separate file from
metadata which will only be readable by root. Will iterate on a
path-based blacklist for known sensitive data and extract that out
of the metadata blob into
## How can cloud-init improve & feedback
* GCE: Have we made progress on shutdown related actions?
* Certain clouds have shutdown script needs (poor-man’s backups,
* No, this has not had progress. It’s in our backlog now.
* (For Canonical, not cloud-init) - the Ubuntu user is problematic in
* Speed of language (python) vs others (golang)
* Parallel execution of jobs; which would require defining/using a
* How can we get distros to maintain cloud-init better? Can’t seem to
get distros to all be on the same page with distro support behavior
* Finding source for cloud-init seems tough. having something where
github contributions were allowed would be nice.
* [ACTION] Come up with a proposal/procedure/tool to close pull requests
and push to launchpad on the user’s behalf
* Andrew: Can CentOS cloud-init align w/ RHEL? What’s with CentOS EPEL
repos etc for cloud-init?
* RobertS: Canonical needs to support infrastructure that facilitates
distro support and contribution by separate interested party
* RobertS: Feels CLA is major hindrance for SuSE, only two contributors
allowed from SuSE and so two CLA-signers have to shepherd fixes in
through those signed LP users. Legal dept is concerned about adding
more users to a contribution list who have signed license rights away.
Any time new contributors to the list, they have to talk to lawyers
about the approval as well as higher level management for approval.
* Concern is around how broad the CLA is compared to other CLAs
* Feels like GPL in that it takes over other software developed
* Andrew: previous CLA incarnations having reference to "interpreted
according to British Law" kept US attorneys concerned. Changing to
more of an apache license reduced concern.
* "This Agreement will be governed by and construed in accordance
with the laws of England"
* Version numbering discussion is a big win for folks
* Improved testing and integration CI is really helpful (good for
surfacing systemd dependency trees etc)
* Balance between being upstream project and packaging for various
distributions. Up to the packager to know dependencies, etc. Therefore
should project carry lots of distribution specific things? Or be very
explicit about packaging (e.g. separate folders for each distro)
* E.g. Unexpected magic found in seutp.py for people trying to make
contributions or first look at the project.
* E.g. templated spec files
David Britton <david.britton@xxxxxxxxxxxxx>