← Back to team overview

cloud-init team mailing list archive

cloud-init summit 2017 - notes


Hello Cloud-init --

Thanks to all who attended the cloud-init summit this past week.  We
thank you for the participation.  I wanted to follow up on list with the
notes from the event, included inline in the body of this email.  Please
feel free to reply with any questions.

(I've included everyone who attended on bcc, in case you are wondring
how you got this.)

One request I have for people is to join the cloud-init mail list.  It's
very low volume.  You can join with a Launchpad account here:


Here are the notes:

cloud-init Summit 2017 Notes


# Links

* **Hangouts link**: [https://g.co/hangout/google.com/cloud-init][0]
* [Shared Notes][1] [[https://goo.gl/tngepy][2]] (**This doc**)
* [Welcome Slides & Agenda][3] [[https://goo.gl/zgw4Ug][4]]
* [Pre Summit Bug List][5] [[https://goo.gl/QQfXE6][6]]
* [Cloud-init Trello Roadmap][7]
* [Cloud-init Trello Daily
* Copr repos:

[0]: https://g.co/hangout/google.com/cloud-init
[1]: https://goo.gl/tngepy
[2]: https://goo.gl/tngepy
[3]: https://goo.gl/zgw4Ug
[4]: https://goo.gl/zgw4Ug
[5]: https://goo.gl/QQfXE6
[6]: https://goo.gl/QQfXE6
[7]: https://trello.com/b/W1LTVjQG/cloud-init-roadmap
[8]: https://trello.com/b/W1LTVjQG/cloud-init-roadmap
[9]: https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin
[10]: https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin]
[11]: https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/

# Main Sessions, Day 1 -- Thursday, Aug 24, 2017

## Attendees Info

* Steve Zarkos (Azure) [stevez]
* Daniel Sol (Azure PM Azure Linux - azurelinuxagent) /
* Stephen Mathews (Softlayer)
* Scott Moser (Canonical) [smoser] US/Eastern
* Sankar Tanguturi (VMware @stanguturi) - Want to replace home grown
  configuration engine we want to replace w/ cloud-init
* Robert Scheiwkert (SUSE) - Technical team lead for public cloud team,
  carry lots of patches to cloud-init that we want to upstream. [robjo /
  [rjschwei@xxxxxxxx][12] ]
* Andrew Jorgensen (AWS and Amazon Linux) [ajorg / ajorgens]
* Zach Marano (GCE) - GCE Guest OS Images Lead
  [[zmarano@xxxxxxxxxx][13], TZ=US/Seattle]
* Max Illfelder (GCE) - GCE Guest env author
  [[illfelder@xxxxxxxxxx][14], TZ=US/Seattle]
* Ryan Harper (Canonical) [rharper] [TZ=US/Chicago]
* Scott Easum (Softlayer)
* [Robert Jennings][15] (Canonical) cloud image delivery
  [rcj@irc/[launchpad][16], rcj4747@elsewhere TZ=US/Chicago]
* Matt Yeazel (AWS - Amazon Linux team) [yeazelm]
* David Britton (Canonical server team mgr) [dpb1] TZ=US/Mountain
* Josh Powers (Canonical server team eng) [powersj] TZ=US/Pacific
* Chad Smith (Canonical cloud-init eng) - blackboxsw TZ=US/Mountain
* Paul Meyer (Azure Linux) [paulmey@irc,github,microsoft.com
* Lars Kellogg-Stedman (Red Hat), larsks@(irc,github,twitter, etc.),
* Ryan McCabe (RH) - rmccabe

[12]: mailto:rjschwei@xxxxxxxx
[13]: mailto:zmarano@xxxxxxxxxx
[14]: mailto:illfelder@xxxxxxxxxx
[15]: mailto:robert.jennings@xxxxxxxxxxxxx
[16]: https://pad.lv/~rcj
[17]: mailto:lars@xxxxxxxxxx

## Recent Features / Roadmap / Q&A

[David Britton]

* [https://goo.gl/zgw4Ug][18]
* Lars: version numbering (e.g. .1 releases hard to sell)
* Lars: external scripts versus inside cloud-init… moving ds-identify
  knowledge back into cloud-init.
* Lars: slight concern about netplan as primary format w/ special
  handling logic
* Lars: There are webhooks in COPR that could trigger per-commit builds
  for CI. Canonical’s CI does per-commit testing. Details in Josh’s
* Lars: OpenStack surfaces 3rd-party CI mechanisms, can we do this with
  cloud-init upstream CI? 
* Smoser: Lot’s of interest from the community in querying metadata
  (like instance-id), it might be nice for cloud-init to provide that
    * Lars: want to cleanly separate collection of data from cloud-init
      from acting on that metadata. I want a tool to dump my cloud X’s
      metadata in a unified/standard format (trello card:
    * Robert: might we look at integrating existing tooling for metadata
      parsing? GCEmetadata
* Paul/Stephen: Want to discuss eventing / hotplug operations
* Lars interested in KVM CI supporting multiple distros

[18]: https://goo.gl/zgw4Ug
[19]: https://trello.com/c/AYaCdQyT
[20]: https://github.com/SUSE/Enceladus/tree/master/gcemetadata/
[21]: https://github.com/SUSE/Enceladus/tree/master/ec2utils/ec2metadata

## cloud-init integration overview

[Josh Powers]

* [https://goo.gl/vbGrjY][22]
* Integration tests design original doc: [https://goo.gl/qVhSrq][23]
* CI injects the tested cloud-init into an image, boots the image with
  provided test cloud-config content and runs collect scripts after
  cloud-init runs.
* Harvested output comes back to the host CI server, where nose is run,
  to get processed for expected test output
* We obtain images and create customized lxc snapshots from
* Jenkins CI view -
* Robert (SUSE) - Are tests consumable by others. They have an Image
  Proofing App tool ([IPA][26]) which wraps custom unit tests needs to
  drive unit tests with custom configs or operations (like restarting or
  rebooting instances). If cloud-init tests are available as a separate
  consumable package, the IPA tool could source tests and augment the
  tests or images as needed  [https://github.com/SUSE/ipa][27]
* Paul mentions that Azure has a test framework that they use for
  cloud-init testing and how it interacts with their WAL agent.
* Smoser: Integration test wants, on LXC we can jump in out-of-band
  (sans SSH) to validate instances, but cloud/kvm deployments need an
  ssh-key setup for accessing the instance under test. 

[22]: https://goo.gl/vbGrjY
[23]: https://goo.gl/qVhSrq
[24]: http://cloud-images.ubuntu.com/
[25]: https://jenkins.ubuntu.com/server/view/cloud-init/
[26]: https://github.com/SUSE/ipa
[27]: https://github.com/SUSE/ipa

## Decreasing Boot Time Overview 

[Ryan Harper]

* [https://goo.gl/92ghBa][28]
* Boot stages: [in tree docs][29]
* TZ in systemd: [hacker news post][30]
* When was the locale generating fix landed: rharper "now". Daily
  cloud-images contain updated cloud-init and fixed pre-generate locale.
  Stable, releases of Xenial and Zesty will follow on after an Ubuntu
* smoser: readurl … that would good to have logged each url request and
  then the ability to show all the urls read and times for this.
* And existing method/function can be wrapped with util.log_time to
  generate granular events which could be interpreted by cloud-init
    * Lars: might want a function decorator to facilitate that more
* Want an optional mechanism to turn on deeper analysis (like strace
  collection from all cloud-init actions)
    * Don’t want to impact all cloud-init runs w/ analysis
* RE: The execve() analysis, would be nice to be able to optionally
  group by positional arguments as well
* For module use slide: Ryan was going to collect the module import
  flame graphs w/ snakefood to track whether we can distill python
  modules to the minimal set required for cloud-init boot-stage
  functionality. Less file imports == faster cloud-init
* Currently cloud-init busted in trunk :-(
    * LP: #[1712676][32]
* Do we track/count # of touches of the metadata service
* Azure: We want get_data analyze of datasource.get_data calls
* Robert: What is the total time of cloud-init during the boot?
  (basically, what is systemd analyze blame overall?) - It is
  significant, some things we can improve, some things we can’t.  The
  goal of this project is to give us the data, not to improve every tiny
  performance problem in cloud-init we can.
* Smoser: Long-term we want to run analyze trend analysis against
  previous CI runs to watch for negative impacts to bringup
* Amazon: module trim would be compelling as some disks are remote to
  instances and some of significant slowness can be introduced in
  init-local timeframe due to remote file loads
* Ryan/dpb: might look at lazy loading modules only when needed, or
  prune most significant module from each cloud-init stage to trim it a

[28]: https://goo.gl/92ghBa
[29]: http://cloudinit.readthedocs.io/en/latest/topics/boot.html
[30]: https://news.ycombinator.com/item?id=13697555
[31]: http://paste.ubuntu.com/25384304/
[32]: https://pad.lv/1712676

## Cloud-init schema validation

[Chad Smith]

* [https://goo.gl/jm7Tec][33]
* As user’s primary interface with cloud-init, look at how to improve
  that user experience. Catch errors and issues earlier.
* Each module has schema defined in them
* Path to get the schema validated in the modules: hope is to build in
  unit testing (which will be run by CI)  to exercise each key that was
  added such that they are all tested.
* Lars: merging data - how are we validating the merging of a variety of
  user-data, are you getting the data you expect?  Suggestion: Document
  merging behavior, and show a demo/example of how to test that.
    * [https://cloudinit.readthedocs.io/en/latest/topics/merging.html][34] 
    * Potential subcommand for merge tool
* Amazon: Does this only look at yaml files, currrently "yes", See above
  for our intent for a “merge” subcommand which will perform merges of
	vendor_data with all user_data parts to show the coalesced
	cloud-config object.
* Lars: we want to see merging of vendor_data substrate/metadata etc to
  see if the mechanism for behaving the way we expect it to
* RobertS: image vendors would like to use a tool to provide known
  vendor_data, user-data, etc  to ensure expected behavior.  That is an
  evolution over just syntactic correctness.
* Robert: Not being able to overwrite certain keys might not be such a
  bad thing?
    * Differing opinions on this around the room
* Robert: The web service could have a catalog of default vendor
  configuration, so they could see the assembled file and check it all
  for correctness.  

[33]: https://goo.gl/jm7Tec
[34]: https://cloudinit.readthedocs.io/en/latest/topics/merging.html

## Version Numbering

[Lars Kellogg-Stedman]

* Flags for the consumer of the product which indicates stability and
* 0.7.6 -> 0.7.9 completely different products, minor revision doesn’t
* Looking for deliberate/explicit version numbering schema
* Proposed: usage of [semantic versioning][35]
    * smoser: focused on backwards compatibility: therefore we should be
      moving up the minor version instead of patch version for new
      features as it has been in the past.
* Proposal: Roll over to 1.0.0 and start using semantic versioning
    * [Given a version number MAJOR.MINOR.PATCH, increment the:][36]
	* MAJOR version when you make incompatible API changes,
	* MINOR version when you add functionality in a
	  backwards-compatible manner, and
	* PATCH version when you make backwards-compatible bug fixes.
	* Additional labels for pre-release and build metadata are
	  available as extensions to the MAJOR.MINOR.PATCH format.
* Lars: Having an established release schedule would also be very
  helpful. Distributions can better package releases around a schedule.
  Makes it easier to justify pulling in fixes.
* Lars: Development model using branches make sense to utilize as well?
  E.g. having master freeze before a release, and a devel branch for
  pushing new things in the meantime.
    * Instead hide new features under command line and turn on formally
      once tested (Lars likes this)
* Certain vendors carry large patch loads, how and where do we host
  those patches long-term?

[35]: http://semver.org/
[36]: http://semver.org/spec/v2.0.0.html

## external scripts vs in-program

[Lars Kellogg-Stedman]

* ds-identify feels like duplication of data source discovery is there a
  way to roll that logic into the Datasource so that there isn’t that
* The crux of the matter was execution time of python loading in all
  cloud-init modules on slow disk systems like raspi, etc.  Compared to
  dash (for what is in effect simple reads of /proc, /sys, etc).
* Ds-identify is run as a generator and only enables a datasource to run
  if the underlying substrate is compatible(discovered) for that
* [ACTION] RCJ: Side issue, can unit test add shellcheck for

## cloud-init QA: Provider tests


* [https://goo.gl/2BHACr][37] 
* Want determine best specific vendor/cloud image testing integrated
  somehow up into upstream
* [ACTION] Softlayer will grab information on CI to present tomorrow
* Softlayer: Provision - create a template and then start an instance
* RobertS: If tests are consumable he’d would run them on opensuse
  images and ensure they publish results. Integration tests running
  against clouds should have a framework which supports test result
  validation for different distro/image vendor expectations.
* Looking for best practices for interacting with a given cloud:
    * How best to boot a custom image 
* [Action - Josh] - Put together an email with requests, and send to
  cloud providers describing how to run tests and ask for best practices
  or merge proposals for remote image testing
* KVM merge proposal:
* Full integration test run times: LXD 12 minutes, KVM haven’t looked at
  total time cost yet (test merge proposal in flight^) 

[37]: https://goo.gl/2BHACr
[38]: https://code.launchpad.net/~powersj/cloud-init/+git/cloud-init/+merge/327646

## cloud-init QA: Distro Tests, how to build your own CI


* [https://goo.gl/dzQRHg][39] 
* Questions:
    * Lars: There are webhooks in COPR that could trigger per-commit
      builds for CI. Canonical’s CI does per-commit testing. Details in
      Josh’s topic

[39]: https://goo.gl/dzQRHg

## Python 3

[Robert Schweikert]

* [https://pythonclock.org/][40]
* [https://www.python.org/dev/peps/pep-0394/][41]
* From a distribution perspective, SLES11 (old) on python 2.6.9 and SLES
  11 is going EOL in March 2018. And python2 going EOL in 2 years 7
  months, see pythonclock)  meaning likely that clouds/distros might
  start dropping support for py2
* SLES 15 will be python 3 next year with python2 in the legacy module
  with 2 years of support
* At some point python2 support will be a "don’t care" for most distros
  & clouds.
* Lars: RHEL6 may have a longer lifecycle than SLES but cloud-init is
  pinned at an older version of cloud-init, so upstream cloud-init could
  drop 2.6 shortly
* Lars: RHEL7 still only has python 2.7 so there RHEL still cares about
  2.7 for a while (June 2024)
* smoser: python2.7 is probably present until RHEL7 has python3. Nobody
  in cloud-init summit cares about 2.6 support.
* SLES: Today distro vendors have an increased QA support matrix to
  validate python 2.6  versus python 3.0 support. SLES11-12 separated
  distro into modules which have different update/backport policies. The
  cloud module (Robert’s group) have CI exception/agreement for updates
  so cloud-init can be moved to new versions as needed.
* [AGREED] Python 2.6 support limited to ~18 months, 2.7 will continue
  for RHEL7 for a while unless RHEL7 can pull in a python3 version.
* [ACTION - Lars] Determine whether python3 support will be introduced
  in RHEL7 at some point, or only RHEL8

[40]: https://pythonclock.org/
[41]: https://www.python.org/dev/peps/pep-0394/

## Using LXD for rapid dev/testing


* [https://goo.gl/3sJuX9][42]
* Getting started:
* Images: [https://us.images.linuxcontainers.org/][44] 
* Ubuntu Daily Images: [https://cloud-images.ubuntu.com/daily/][45] 
* Stephane’s DebConf presentation
    * Good overview of basics and some advanced features
* Scott: Would be really nice to have other OS images with cloud-init
  already in them like how the Ubuntu daily images do. Makes running
  cloud-init development with them much easier.
* It would be nice if distro vendors interested in cloud-init could
  provide "official" LXD images for their distributions.
    * How do we create an LXD image? [tarball including rootfs +
      metadata or squashfs][47]
	* RobertS: might be able to teach kiwi build service to publish
	  lxd images. If there is an images endpoint, LXD could crawl it
    * Can we serve images from our own endpoint? Yes, either by
      implementing the [REST API][48] or by providing a simplestreams
* Lars: Thoughts about mocking/faking metadata service for quick
    * Would love contributions of mock metadata services from cloud
* Chad: Serve up an instance on a cloud, harvest metadata information,
  and then use that data for serving up to tests.
* smoser/lars: use docker official images for rhel/centos testing

[42]: https://goo.gl/3sJuX9
[43]: https://linuxcontainers.org/lxd/getting-started-cli/
[44]: https://us.images.linuxcontainers.org/
[45]: https://cloud-images.ubuntu.com/daily/
[46]: https://debconf17.debconf.org/talks/53/
[47]: https://github.com/lxc/lxd/blob/master/doc/image-handling.md
[48]: https://github.com/lxc/lxd/blob/master/doc/rest-api.md

## How to query metadata


*  [https://trello.com/c/AYaCdQyT/21-cloud-init-query-standardized-information][49]
* Might be nice for cloud-init to surface metadata since it crawls it
  for most data sources so that other tools don’t have to do that as
* Datasources currently crawl and react  and cache metadata in a pickled
  object on the filesystem, we would like to query cloud-init for the
  cached (or live) metadata and ultimately produce a unified JSON
  structure on the filesystem to allow other tools to parse metadata.
* Originally there existed ‘cloud-init query’, not implemented
* AndrewJ: potentially dump standard format keys and custom "raw"
  content within the same structure
* smoser: might have security concerns about leaking sensitive
  information if we dump in a single blob, maybe we’d like to separate
* Why pickle? keeps the class on disk, so cloud-init local can check
  instance_id for validation about whether cloud-init needs to be re-run
* [ACTION - blackboxsw] Path forward: cloud-init supports pkl load a
  JSON load object and writes json instead of obj.pkl for new releases.
  When writing JSON, remove obj.pkl file
* AndrewJ: Leveraging datasource get_data logic to handle retries or
  waits etc would be a big win for script writers so they don’t have to
  bake in that logic to their scripts.
* [ACTION - blackboxsw] Datasource.crawl_metadata() branch 
* [ACTION - blackboxsw] Design schema specification for the unified
  metadata keys that cloud-init’s JSON object
    * Wants: Top-level cloud-type, some standard keys, an blob of
      cloud-specific keys and standard network config format

[49]: https://trello.com/c/AYaCdQyT/21-cloud-init-query-standardized-information

# Main Sessions, Day 2 -- Friday, Aug 25, 2017

## Bug Triage & Squashing


* [https://goo.gl/QQfX][50][E6][51]

[50]: https://goo.gl/QQfXE6
[51]: https://goo.gl/QQfXE6

## Device hotplug overview & feedback


* [https://goo.gl/WsBPkk][52]

[52]: https://goo.gl/WsBPkk

* Ajorg: There is no hierarchy  BOOT/INSTANCE/ALWAYS, if manually
  running cloud-init --file config.yaml --frequency always, but the
  module is per-boot or per-instance, a sane hierarchy would prevent a
  module from running currently too rudimentary to handle that hierarchy
  as it’s only sem...<freq>.
* GCE addresses this by documenting their interface and partitioning the
  config file in a way that makes clear the pieces which are Google
  managed; thus detection of changes by the user is possible and then
  tooling can decide if it should still manage the config file
    * In ssh authorized_keys files GCE adds dynamic metadata content
      with comment tag #Added by google…
    * For iptables Google tries to separate cloud changes from
      user-driven changes with namespace scoping network definitions
      "proto66" prefix
* Need to take module case-by-case basis to determine whether idempotent
  runs will be expected behavior when determining whether to be
* Spawning a background hotplug service should be explicitly default
  disabled and ‘opt-in’ enabled by configuration if explicitly enabled
  in case upgrade environments may be running hot-plug configuration
  behavior already.
* Openstack adds disk and nic info to metadata
* AWS doesn’t surface updated disk info in metadata, but does add
  dynamic network info
* AndrewJ: Computers are kittens  vs. cattle
* Andrew: One place where hotplug is interesting for him is if it were
  used to configure things all the time, not just when something was
  hotplugged, from a risk/maintenance/code-path perspective.  
* AWS has the [ec2-net-utils][53] tool (and [ubuntu port][54])
* SUSE project cloud-netconfig

[53]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#ec2-net-utils
[54]: https://github.com/ademaria/ubuntu-ec2net
[55]: https://github.com/SUSE/Enceladus/tree/master/cloud-netconfig

## Network v2 yaml as primary format


* [Netplan config example][56]
* Having a common intermediary (which has a spec) to represent network
  configuration makes unit testing easier as it’s a common spec that is
  published and understood (even outside of cloud-init)
* Vmacs don’t seem to be defined in the spec, does that preclude the
  netplan spec from describing such features. Answer: It’s a fluid spec
  that is currently being extended, it’s a merge proposal away.
* [Action - dpb]  AndrewJ to follow up with Ryan & Pat about net-utils
  use cases for netplan

[56]: https://git.launchpad.net/netplan/plain/doc/example-config

## Breakout JSON instance data storage for cloud-init part 1


* [https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/329589][57]
* Bikeshedding of function names
* Question around non-utf8 encoding and handling of that
* How should we treat sensitive data (e.g. metadata)
    * Option: treat it as a white list, assume not allowed and only show
    * Option: if root, get it all; if not root, don’t
    * Option: in the docs to avoid the phone call what is or isn’t
    * Initial cut will separate all user-data in a separate file from
      metadata which will only be readable by root. Will iterate on a
      path-based blacklist for known sensitive data and extract that out
      of the metadata blob into

[57]: https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/329589

## How can cloud-init improve & feedback


* GCE: Have we made progress on shutdown related actions?
    * Certain clouds have shutdown script needs (poor-man’s backups,
      scale-up/scale-down, etc) 
    * No, this has not had progress.  It’s in our backlog now.
* (For Canonical, not cloud-init) - the Ubuntu user is problematic in
  many places.
* Speed of language (python) vs others (golang)
* Parallel execution of jobs; which would require defining/using a
  dependency chain
* How can we get distros to maintain cloud-init better? Can’t seem to
  get distros to all be on the same page with distro support behavior
* Finding source for cloud-init seems tough.  having something where
  github contributions were allowed would be nice. 
* [ACTION] Come up with a proposal/procedure/tool to close pull requests
  and push to launchpad on the user’s behalf
* Andrew: Can CentOS cloud-init align w/ RHEL? What’s with CentOS EPEL
  repos etc for cloud-init?
* RobertS: Canonical needs to support infrastructure that facilitates
  distro support and contribution by separate interested party
* RobertS: Feels CLA is major hindrance for SuSE, only two contributors
  allowed from SuSE and so two CLA-signers have to shepherd fixes in
  through those signed LP users. Legal dept is concerned about adding
  more users to a contribution list who have signed license rights away.
  Any time new contributors to the list, they have to talk to lawyers
  about the approval as well as higher level management for approval.
    * Concern is around how broad the CLA is compared to other CLAs
    * Feels like GPL in that it takes over other software developed
* Andrew: previous CLA incarnations having reference to "interpreted
  according to British Law" kept US attorneys concerned. Changing to
  more of an apache license reduced concern.
    * "This Agreement will be governed by and construed in accordance
      with the laws of England"
* Version numbering discussion is a big win for folks
* Improved testing and integration CI is really helpful (good for
  surfacing systemd dependency  trees etc)
* Balance between being upstream project and packaging for various
  distributions. Up to the packager to know dependencies, etc. Therefore
  should project carry lots of distribution specific things? Or be very
  explicit about packaging (e.g. separate folders for each distro)
    * E.g. Unexpected magic found in seutp.py for people trying to make
      contributions or first look at the project. 
    * E.g. templated spec files

David Britton <david.britton@xxxxxxxxxxxxx>