yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80549
[Bug 1850642] Re: Cloud init unable to find the metadata service but can CURL it
*** This bug is a duplicate of bug 1821102 ***
https://bugs.launchpad.net/bugs/1821102
The logs do contain that bug, but I'm not sure that's the failure here.
Looking at the cloud-init logs, we can see the Ephemeral DHCP start and
obtain an lease, but the end point does not respond:
2019-10-30 13:28:41,516 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 136.156.90.74/255.255.254.0
2019-10-30 13:28:41,516 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 136.156.90.74/23 brd 136.156.91.255
2019-10-30 13:28:41,516 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '136.156.90.74/23', 'broadcast', '136.156.91.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,519 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,522 - util.py[DEBUG]: Running command ['ip', 'route', 'show', '0.0.0.0/0'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,525 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '136.156.91.254', 'dev', 'eth0', 'src', '136.156.90.74'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:28:41,527 - util.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', 'default', 'via', '136.156.91.254', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:29:21,573 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 40.044 seconds
2019-10-30 13:29:21,574 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/18.5'}, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0} configuration
2019-10-30 13:29:31,608 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [10/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7fdc5e5dc210>, 'Connection to 169.254.169.254 timed out. (connect timeout=10.0)'))]
2019-10-30 13:29:31,608 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack md from ['http://169.254.169.254/openstack'] after 10 seconds
2019-10-30 13:29:31,609 - util.py[DEBUG]: Crawl of metadata service took 50.079 seconds
Cloud-init did not find the OpenStack datasource at local time:
2019-10-30 13:29:31,689 - main.py[DEBUG]: No local datasource found
So we write out a fallback network config (dhcp on eth0).
2019-10-30 13:29:31,722 - stages.py[INFO]: Applying network configuration from fallback bringup=False: {'version': 1, 'config': [{'subnets': [{'type': 'dhcp'}], 'type': 'physical', 'name': 'eth0', 'mac_address': 'fa:16:3e:f4:b5:1d'}]}
2019-10-30 13:29:31,723 - __init__.py[DEBUG]: Selected renderer 'sysconfig' from priority list: None
2019-10-30 13:29:31,725 - util.py[DEBUG]: Writing to /etc/sysconfig/network-scripts/ifcfg-eth0 - wb: [644] 159 bytes
2019-10-30 13:29:31,726 - util.py[DEBUG]: Restoring selinux mode for /etc/sysconfig/network-scripts/ifcfg-eth0 (recursive=False)
2019-10-30 13:29:31,726 - util.py[DEBUG]: Restoring selinux mode for /etc/sysconfig/network-scripts/ifcfg-eth0 (recursive=False)
And then exit cloud-init-local.service:
2019-10-30 13:29:31,730 - util.py[DEBUG]: cloud-init mode 'init' took
52.389 seconds (52.39)
Now, the OS networking layer comes up:
Oct 30 13:29:31.623535 localhost.localdomain cloud-init[732]: 2019-10-30 13:29:31,619 - util.py[WARNING]: No active metadata service found
Oct 30 13:29:31.751295 localhost.localdomain systemd[1]: Started Initial cloud-init job (pre-networking).
Oct 30 13:29:31.754472 localhost.localdomain systemd[1]: Reached target Network (Pre).
Oct 30 13:29:31.756812 localhost.localdomain systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 13:29:31.961050 localhost.localdomain network[866]: Bringing up loopback interface: [ OK ]
Oct 30 13:29:31.990878 localhost.localdomain network[866]: Bringing up interface eth0:
Oct 30 13:29:32.034411 localhost.localdomain dhclient[995]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 (xid=0xeceb89e)
Oct 30 13:29:32.055157 localhost.localdomain dhclient[995]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0xeceb89e)
Oct 30 13:29:32.055188 localhost.localdomain dhclient[995]: DHCPOFFER from 136.156.90.11
Oct 30 13:29:32.055796 localhost.localdomain dhclient[995]: DHCPACK from 136.156.90.11 (xid=0xeceb89e)
Oct 30 13:29:34.132189 localhost.localdomain NET[1054]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Oct 30 13:29:34.166284 host-136-156-90-74 dhclient[995]: bound to 136.156.90.74 -- renewal in 34571 seconds.
Oct 30 13:29:34.167466 host-136-156-90-74 network[866]: Determining IP information for eth0... done.
Oct 30 13:29:34.224637 host-136-156-90-74 network[866]: [ OK ]
Oct 30 13:29:34.245397 host-136-156-90-74 systemd[1]: Started LSB: Bring up/down networking.
Oct 30 13:29:34.249783 host-136-156-90-74 systemd[1]: Reached target Network.
And cloud-init.service starts up, and this time it is able to connect to
the IMDS:
2019-10-30 13:29:34,638 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStack'>
2019-10-30 13:29:34,638 - __init__.py[DEBUG]: Update datasource metadata and network config due to events: New instance first boot
2019-10-30 13:29:34,638 - util.py[DEBUG]: Running command ['systemd-detect-virt', '--quiet', '--container'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:29:34,641 - util.py[DEBUG]: Running command ['running-in-container'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:29:34,644 - util.py[DEBUG]: Running command ['lxc-is-container'] with allowed return codes [0] (shell=False, capture=True)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Reading from /proc/1/environ (quiet=False)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Read 0 bytes from /proc/1/environ
2019-10-30 13:29:34,648 - util.py[DEBUG]: Reading from /proc/self/status (quiet=False)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Read 1206 bytes from /proc/self/status
2019-10-30 13:29:34,648 - util.py[DEBUG]: querying dmi data /sys/class/dmi/id/product_name
2019-10-30 13:29:34,648 - util.py[DEBUG]: Reading from /sys/class/dmi/id/product_name (quiet=False)
2019-10-30 13:29:34,648 - util.py[DEBUG]: Read 18 bytes from /sys/class/dmi/id/product_name
2019-10-30 13:29:34,648 - util.py[DEBUG]: dmi data /sys/class/dmi/id/product_name returned OpenStack Compute
2019-10-30 13:29:34,654 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 0.006 seconds
2019-10-30 13:29:34,654 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/18.5'}, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0} configuration
2019-10-30 13:29:36,359 - url_helper.py[DEBUG]: Read from http://169.254.169.254/openstack (200, 94b) after 1 attempts
...
2019-10-30 13:29:37,717 - util.py[DEBUG]: Crawl of openstack metadata service took 1.358 seconds
2019-10-30 13:29:37,718 - __init__.py[WARNING]: Error persisting instance-data.json: 'utf8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
2019-10-30 13:29:37,718 - handlers.py[DEBUG]: finish: init-network/search-OpenStack: SUCCESS: found network data from DataSourceOpenStack
2019-10-30 13:29:37,718 - stages.py[INFO]: Loaded datasource DataSourceOpenStack - DataSourceOpenStack [net,ver=2]
So we know we're OpenStack, and as Scott pointed out, it suffers from
the py27 json bug.
For this bug, the warning is not a False positive, rather, there seems
to be an issue with connecting to the metadata service early in boot. I
suspect we'd need some debugging enabled to see why we can't connect to
the endpoint after a successful dhclient.
One interesting bit would be to compare the leases file from cloud-init
Ephemeral DHCP, with the system dhcp. Another thought is if the image
has any custom dhclient hooks that setup additional route information
needed to hit the metadata service.
We've seen that bug before,
https://bugs.launchpad.net/cloud-init/+bug/1821102
Which is not in 18.5, so possibly a dupe of that issue.
** This bug has been marked a duplicate of bug 1821102
Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1850642
Title:
Cloud init unable to find the metadata service but can CURL it
Status in cloud-init:
New
Bug description:
In a tripleo Rocky deployment I am seeing the behaviour bellow.
In the bootup logs I see that the metadata service could not be
reached.
File "/usr/lib/python2.7/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 177, in _crawl_metadata
'No active metadata service found')
However the service is curlable from inside the VM and I am also seeing some requests in the metadata-service in Openstack. Furthemore I am getting SSH keys inserted so I believe this might be a false warning.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1850642/+subscriptions
References