← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1657130] Re: get_data in DataSourceOpenStack.py can time out if metadata service is slow

 

This bug was fixed in the package cloud-init -
0.7.9-48-g1c795b9-0ubuntu1~16.04.1

---------------
cloud-init (0.7.9-48-g1c795b9-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/rules: install Z99-cloudinit-warnings.sh to /etc/profile.d
  * debian/patches/ds-identify-behavior-xenial.patch: adjust default
    behavior of ds-identify for SRU (LP: #1669675, #1660385).
  * New upstream snapshot.
    - Support warning if the used datasource is not in ds-identify's list
      (LP: #1669675).
    - DatasourceEc2: add warning message when not on AWS. (LP: #1660385)
    - Z99-cloudinit-warnings: Add profile.d script for showing warnings on
    - Z99-cloud-locale-test.sh: convert tabs to spaces, remove unneccesary
      execute bit in permissions.
    - (RedHat) net: correct errors in cloudinit/net/sysconfig.py
      [Lars Kellogg-Stedman]
    - ec2_utils: fix MetadataLeafDecoder that returned bytes on empty
    - Fix eni rendering of multiple IPs per interface [Ryan Harper]
      (LP: #1657940)
    - Add 3 ecdsa-sha2-nistp* ssh key types now that they are standardized
      [Lars Kellogg-Stedman]
    - EC2: Do not cache security credentials on disk [Andrew Jorgensen]
      (LP: #1638312)
    - OpenStack: Use timeout and retries from config in get_data.
      [Lars Kellogg-Stedman] (LP: #1657130)
    - Fixed Misc issues related to VMware customization. [Sankar Tanguturi]
    - (RedHat) Use dnf instead of yum when available [Lars Kellogg-Stedman]
    - Get early logging logged, including failures of cmdline url.
    - test / doc / build environment changes
      - Remove style checking during build and add latest style checks to
        tox [Joshua Powers]
      - code-style: make master pass pycodestyle (2.3.1) cleanly, currently
        [Joshua Powers]
      - Fix small typo and change iso-filename for consistency
      - tools/mock-meta: support python2 or python3 and ipv6 in both.
      - tests: remove executable bit on test_net, so it runs, and fix it.
      - tests: No longer monkey patch httpretty for python 3.4.2
      - reset httppretty for each test [Lars Kellogg-Stedman]
      - build: fix running Make on a branch with tags other than master
      - doc: Fix typos and clarify some aspects of the part-handler
        [Erik M. Bray]
      - doc: add some documentation on OpenStack datasource.
      - Fix minor docs typo: perserve > preserve [Jeremy Bicha]
      - validate-yaml: use python rather than explicitly python3

 -- Scott Moser <smoser@xxxxxxxxxx>  Mon, 06 Mar 2017 16:34:10 -0500

** Changed in: cloud-init (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1657130

Title:
  get_data in DataSourceOpenStack.py can time out if metadata service is
  slow

Status in cloud-init:
  Fix Committed
Status in cloud-init package in Ubuntu:
  Fix Released
Status in cloud-init source package in Xenial:
  Fix Released
Status in cloud-init source package in Yakkety:
  Fix Released

Bug description:
  === Begin SRU Template ===
  [Impact]
  On heavily loaded openstack metadata services, cloud-init may hit a timeout
  and not properly retry when waiting longer or retring would allow it to 
  succeed.

  cloud-init contained a setting to configure this but it was not used in all
  cases. The change here enabled usage of timeout and retry for.

  [Test Case]
  1. Launch an instance on openstack.
  2. Verify inconsistent use of 'timeout' in /var/log/cloud-init.log
    $ grep http://169.254.169.254/openstack /var/log/cloud-init.log  | grep 0/ | head -n 2
    2017-03-03 16:51:23,824 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 10.0} configuration
    2017-03-03 16:51:24,384 - url_helper.py[DEBUG]: [0/6] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'headers': {'User-Agent': 'Cloud-Init/0.7.9'}, 'method': 'GET', 'timeout': 5.0} configuration

  3. enable proposed, update, upgrade
  4. clean
     rm -Rf /var/lib/cloud /var/log/cloud-init*
  5. reboot
  6. re-check step 2, expect see 'timeout' is consistent.

  [Regression Potential]
  low chance for regression.  Slower boot times but more reliable on a non-perform
  ant metadata service.

  === End SRU Template ===

  cloud-init sometimes times out and fails to fetch metadata in the
  OpenStack environment when the Controller node is under high workload.

  The default timeout value is 5 seconds and it may be too small in some
  cases where the Controller node is too busy to respond to the metadata
  request  from the instance in time.

  There is a 'timeout' configuration setting, as in...

    datasource:
      OpenStack:
        timeout: 30

  ...but this value is not used by the get_data method in
  cloudinit/sources/DataSourceOpenStack.py, because get_data is called
  from cloudinit/sources/__init__.py with no keyword arguments:

                  LOG.debug("Seeing if we can get any data from %s", cls)
                  s = cls(sys_cfg, distro, paths)
                  if s.get_data():
                      myrep.message = "found %s data from %s" % (mode, name)
                      return (s, type_utils.obj_name(cls))

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1657130/+subscriptions


References