yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79940
[Bug 1843634] Re: cloud-init misconfigure the network on SLES
OK, thanks for the logs. Could you re-attach those running via sudo (or
as root)? The default user on SLES does not have permissions to read
the journal.
What I see so far looks like networking did not come up after cloud-
init-local.service completes and writes out a network config.
2019-09-11 18:00:15,242 - stages.py[INFO]: Applying network
configuration from ds bringup=False: {'ethernets': {'eth0': {'set-name':
'eth0', 'match': {'macaddress': u'00:0d:3a:6e:6f:8f'}, 'dhcp4': True}},
'version': 2}
This results in the following files being written:
% cat test_azure_sles/etc/sysconfig/network/ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=00:0d:3a:6e:6f:8f
NM_CONTROLLED=no
ONBOOT=yes
STARTMODE=auto
TYPE=Ethernet
USERCTL=no
Upstream cloud-init on SLES does not generate/update /etc/resolv.conf
but in the logs the cloud-init in does:
2019-09-11 18:00:15,246 - util.py[DEBUG]: Writing to /etc/sysconfig/network/ifcfg-eth0 - wb: [644] 191 bytes
2019-09-11 18:00:15,247 - util.py[DEBUG]: Reading from /etc/resolv.conf (quiet=False)
2019-09-11 18:00:15,247 - util.py[DEBUG]: Read 795 bytes from /etc/resolv.conf
2019-09-11 18:00:15,247 - util.py[DEBUG]: Writing to /etc/resolv.conf - wb: [644] 866 bytes
At first, I thought maybe it was missing this commit:
% git show b74ebca563a21332b29482c8029e7908f60225a4
commit b74ebca563a21332b29482c8029e7908f60225a4
Author: Robert Schweikert <rjschwei@xxxxxxxx>
Date: Wed Jan 23 22:35:32 2019 +0000
net/sysconfig: do not write a resolv.conf file with only the header.
Writing the file with no dns information may prevent distro tools
from writing a resolv.conf file with dns information obtained from
a dhcp server.
diff --git a/cloudinit/net/sysconfig.py b/cloudinit/net/sysconfig.py
index ae41f7b..fd8e501 100644
--- a/cloudinit/net/sysconfig.py
+++ b/cloudinit/net/sysconfig.py
@@ -557,6 +557,8 @@ class Renderer(renderer.Renderer):
content.add_nameserver(nameserver)
for searchdomain in network_state.dns_searchdomains:
content.add_search_domain(searchdomain)
+ if not str(content):
+ return None
header = _make_header(';')
content_str = str(content)
if not content_str.startswith(header):
@@ -666,7 +668,8 @@ class Renderer(renderer.Renderer):
dns_path = util.target_path(target, self.dns_path)
resolv_content = self._render_dns(network_state,
existing_dns_path=dns_path)
- util.write_file(dns_path, resolv_content, file_mode)
+ if resolv_content:
+ util.write_file(dns_path, resolv_content, file_mode)
if self.networkmanager_conf_path:
nm_conf_path = util.target_path(target,
self.networkmanager_conf_path)
diff --git a/tests/unittests/test_net.py b/tests/unittests/test_net.py
index d679e92..5313d2d 100644
--- a/tests/unittests/test_net.py
+++ b/tests/unittests/test_net.py
@@ -2098,6 +2098,10 @@ TYPE=Ethernet
USERCTL=no
"""
self.assertEqual(expected, found[nspath + 'ifcfg-interface0'])
+ # The configuration has no nameserver information make sure we
+ # do not write the resolv.conf file
+ respath = '/etc/resolv.conf'
+ self.assertNotIn(respath, found.keys())
def test_config_with_explicit_loopback(self):
ns = network_state.parse_net_config_data(CONFIG_V1_EXPLICIT_LOOPBACK)
@@ -2456,6 +2460,10 @@ TYPE=Ethernet
USERCTL=no
"""
self.assertEqual(expected, found[nspath + 'ifcfg-interface0'])
+ # The configuration has no nameserver information make sure we
+ # do not write the resolv.conf file
+ respath = '/etc/resolv.conf'
+ self.assertNotIn(respath, found.keys())
def test_config_with_explicit_loopback(self):
ns = network_state.parse_net_config_data(CONFIG_V1_EXPLICIT_LOOPBACK)
But, I believe that is in 19.1 (or likely patched in the distro version).
Later in the boot, we can see that networking didn't actually come up as Azure datasource can't find a lease file and then goes into some sort of fallback mode which tries to bring up networking (it does) but not with dhcp which is why you're missing DNS (it's provided via option to the DHCP response.
2019-09-11 18:00:15,946 - azure.py[DEBUG]: Unable to find endpoint in dhclient logs. Falling back to check lease files
2019-09-11 18:00:15,946 - azure.py[DEBUG]: Looking for endpoint in lease file /var/lib/dhcp/dhclient.eth0.leases
2019-09-11 18:00:15,946 - handlers.py[DEBUG]: start: azure-ds/_get_value_from_leases_file: _get_value_from_leases_file
2019-09-11 18:00:15,946 - util.py[DEBUG]: Reading from /var/lib/dhcp/dhclient.eth0.leases (quiet=False)
2019-09-11 18:00:15,947 - azure.py[ERROR]: Failed to read /var/lib/dhcp/dhclient.eth0.leases: [Errno 2] No such file or directory: '/var/lib/dhcp/dhclient.eth0.leases'
2019-09-11 18:00:15,959 - handlers.py[DEBUG]: finish: azure-ds/_get_value_from_leases_file: SUCCESS: _get_value_from_leases_file
2019-09-11 18:00:15,959 - util.py[DEBUG]: Running command ['ifconfig'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:16,020 - azure.py[DEBUG]: ifconfig out: lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:140 (140.0 b) TX bytes:140 (140.0 b)
, err:
2019-09-11 18:00:16,020 - util.py[DEBUG]: Running command ['route', '-n'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:16,093 - azure.py[DEBUG]: route out: Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
, err:
2019-09-11 18:00:16,093 - util.py[DEBUG]: Running command ['ip', 'a'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:16,095 - azure.py[DEBUG]: ip out: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 00:0d:3a:6e:6f:8f brd ff:ff:ff:ff:ff:ff
, err:
2019-09-11 18:00:16,095 - util.py[DEBUG]: Running command ['ifup', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:31,824 - azure.py[DEBUG]: ifup out: eth0 up
, err:
2019-09-11 18:00:31,824 - util.py[DEBUG]: Running command ['ip', '-o', 'route', 'list'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:31,827 - azure.py[DEBUG]: ip out: default via 10.0.0.1 dev eth0 proto dhcp
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.17
168.63.129.16 via 10.0.0.1 dev eth0 proto dhcp
169.254.169.254 via 10.0.0.1 dev eth0 proto dhcp
, err:
2019-09-11 18:00:31,828 - util.py[DEBUG]: Running command ['ifconfig'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:31,830 - azure.py[DEBUG]: ifconfig out: eth0 Link encap:Ethernet HWaddr 00:0D:3A:6E:6F:8F
inet addr:10.0.0.17 Bcast:10.0.0.255 Mask:255.255.255.0
inet6 addr: fe80::20d:3aff:fe6e:6f8f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2709 (2.6 Kb) TX bytes:3373 (3.2 Kb)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:468 (468.0 b) TX bytes:468 (468.0 b)
, err:
2019-09-11 18:00:31,831 - util.py[DEBUG]: Running command ['route', '-n'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:31,834 - azure.py[DEBUG]: route out: Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
168.63.129.16 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0
169.254.169.254 10.0.0.1 255.255.255.255 UGH 0 0 0 eth0
, err:
2019-09-11 18:00:31,834 - util.py[DEBUG]: Running command ['ip', 'a'] with allowed return codes [0] (shell=False, capture=True)
2019-09-11 18:00:31,837 - azure.py[DEBUG]: ip out: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:0d:3a:6e:6f:8f brd ff:ff:ff:ff:ff:ff
inet 10.0.0.17/24 brd 10.0.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20d:3aff:fe6e:6f8f/64 scope link
valid_lft forever preferred_lft forever
, err:
2019-09-11 18:00:31,837 - azure.py[WARNING]: No lease found; using default endpoint
2019-09-11 18:00:31,837 - azure.py[DEBUG]: Azure endpoint found at 168.63.129.16
2019-09-11 18:00:31,837 - handlers.py[DEBUG]: finish: azure-ds/find_endpoint: SUCCESS: find_endpoint
So, I'd like to also the contents of:
/etc/resolv.conf
/etc/sysconfig/network/ifcfg-eth0
And running sudo cloud-init collect-logs so we can get the journal,
which should reveal why SLES's networking service didn't come online
before cloud-init.service started.
** Changed in: cloud-init
Status: New => Incomplete
** Also affects: cloud-init (Suse)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1843634
Title:
cloud-init misconfigure the network on SLES
Status in cloud-init:
Incomplete
Status in cloud-init package in Suse:
New
Bug description:
I reproduced the issue on an Azure VM with SLES12 SP4 and cloud-init 19.1.
The DNS is unreachable when cloud-init takes the responsibility of configuring the network. No nameservers or search domains are added to the /etc/resolv.conf as following:
; Created by cloud-init on instance boot automatically, do not edit.
;
### /etc/resolv.conf file autogenerated by netconfig!
#
# Before you change this file manually, consider to define the
# static DNS configuration using the following variables in the
# /etc/sysconfig/network/config file:
# NETCONFIG_DNS_STATIC_SEARCHLIST
# NETCONFIG_DNS_STATIC_SERVERS
# NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig by setting:
# NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other documentation.
#
# Note: Manual change of this file disables netconfig too, but
# may get lost when this file contains comments or empty lines
# only, the netconfig settings are same with settings in this
# file and in case of a "netconfig update -f" call.
#
### Please remove (at least) this line when you modify the file!
I also attached the "/etc/sysconfig/network/config" in the first
comment for your reference:
When I disable the network configuration in cloud-init and leave it for netconfig, the /etc/resolv.conf is correctly populated with the search domain and the nameserver and the DNS is reachable. Here's the contents of the /etc/resolv.conf:
### /etc/resolv.conf file autogenerated by netconfig!
#
# Before you change this file manually, consider to define the
# static DNS configuration using the following variables in the
# /etc/sysconfig/network/config file:
# NETCONFIG_DNS_STATIC_SEARCHLIST
# NETCONFIG_DNS_STATIC_SERVERS
# NETCONFIG_DNS_FORWARDER
# or disable DNS configuration updates via netconfig by setting:
# NETCONFIG_DNS_POLICY=''
#
# See also the netconfig(8) manual page and other documentation.
#
# Note: Manual change of this file disables netconfig too, but
# may get lost when this file contains comments or empty lines
# only, the netconfig settings are same with settings in this
# file and in case of a "netconfig update -f" call.
#
### Please remove (at least) this line when you modify the file!
search xkf00b0rtzgejkug4xc2pcinre.xx.internal.cloudapp.net
nameserver 168.63.129.16
When I tried to populate the network config dictionary that's built by
DataSourceAzure with a default nameserver "168.63.129.16" and search
domain "xkf00b0rtzgejkug4xc2pcinre.xx.internal.cloudapp.net", The DNS
was reachable. But It's my understanding that cloud-init should be
able to figure out this nameserver and the search domain the same way
netconfig does.
Another issue is the eth0 interface is not brought up automatically even though the contents of the file "/etc/sysconfig/network/ifcfg-eth0" seems correct
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=eth0
HWADDR=00:0d:3a:06:1e:04
NM_CONTROLLED=no
ONBOOT=yes
STARTMODE=auto
TYPE=Ethernet
USERCTL=no
I had to bring it up by executing "ifup eth0" from Azure.py code after
the network config is applied. This way I was able to ssh into the VM.
Here's also the contents of "etc/udev/rules.d/85-persistent-net-cloud-
init.rules":SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*",
ATTR{address}=="00:0d:3a:6d:e4:53", NAME="eth0"
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1843634/+subscriptions
References