yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80672
[Bug 1852504] [NEW] DHCP reserved ports that were unscheduled are advertised as DNS servers
Public bug reported:
We have 2 DHCP servers per network. After network outages, and when
hosts come back online, the number of ACTIVE DHCP servers grow. This
happened again after more outages, with some networks having up to 9-10+
DHCP ports, many in ACTIVE state, despite neutron-server's neutron.conf
only having dhcp_agents_per_network = 2
It turns out these are "reserved_dhcp_port" as indicated by the
device_id.
As you can see here:
https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L399
When a network is rescheduled to a new DHCP agent, the old port is not
deleted, not is its status marked as DOWN. All that is done is it is
marked as reserved and the port updated.
However VMs on the network now get advertised all the DHCP ports on the
network as internal DNS servers, several stale entries in
/etc/resolv.conf in our case. Problem is some of these DHCP agents have
been unscheduled so the DNS servers don't actually exist. Also in the
VMs, more than 3 entries are not queried.
As you can see here, is resolv.conf on a VM:
[root@arjunpmk-master ~]# vim /etc/resolv.conf
# Generated by NetworkManager
search mpt1.pf9.io
nameserver 10.128.144.16
nameserver 10.128.144.23
nameserver 10.128.144.15
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 10.128.144.7
nameserver 10.128.144.4
nameserver 10.128.144.8
nameserver 10.128.144.9
nameserver 10.128.144.17
nameserver 10.128.144.12
nameserver 10.128.144.45
nameserver 10.128.144.46
nameserver 10.128.144.51
Here you can see all the DHCP ports for the network of this VM:
[root@df-us-mpt1-kvm arjun(admin)]# openstack port list --network ead88ed3-f1e0-4498-8c1e-6d091083ae33 --device-owner network:dhcp
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
| 02ff0f4c-f39d-4207-90b4-2a69585f4c8a | | fa:16:3e:a9:36:82 | ip_address='10.128.144.16', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 0b612f86-ad06-4bce-a333-bc18f3e9e7b1 | | fa:16:3e:bb:d8:3d | ip_address='10.128.144.23', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
| 402338ac-2ca6-4312-a2df-a306fc589f10 | | fa:16:3e:a3:a8:57 | ip_address='10.128.144.15', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 5d2edc73-4eff-44c0-8993-125636973384 | | fa:16:3e:6c:cd:2b | ip_address='10.128.144.7', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 78241da3-9674-479a-8b45-a580c7f8b117 | | fa:16:3e:d0:9d:ef | ip_address='10.128.144.4', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 7b41bf47-d4d4-434a-b704-4c67182ffcaa | | fa:16:3e:4c:cf:54 | ip_address='10.128.144.8', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 96897190-1aa8-4c17-a7d1-c3744f1bf962 | | fa:16:3e:e8:55:29 | ip_address='10.128.144.45', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| af87dde6-fb46-4516-9569-e46496398b64 | | fa:16:3e:0e:61:14 | ip_address='10.128.144.9', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| c2a2112d-c6ef-4411-a415-1a453d74a838 | | fa:16:3e:d0:39:67 | ip_address='10.128.144.46', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
| c8298fbd-06e7-4488-a3e1-874e9341d4cf | | fa:16:3e:d6:3c:ac | ip_address='10.128.144.51', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
| d6f0206f-ae3c-4ebf-95cb-104dad786724 | | fa:16:3e:ab:ab:22 | ip_address='10.128.144.17', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| e2be0f98-3333-4645-b58a-435e5513a4d3 | | fa:16:3e:b4:ba:c0 | ip_address='10.128.144.12', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
If I view the first DNS server for the VM's resolv.conf (10.128.144.16), you can see its status is ACTIVE but its actually a reserved port. This is the same case for 2nd nameserver entry. Luckily the 3rd entry is valid, but this causes timeouts and all DNS lookups to take 10 seconds since first two fail. VMs on other networks aren't so lucky, where all 3 nameservers are reserved.
Expectation: Only DHCP ports that are actually scheduled (not reserved) should be advertised as DNS nameservers. I don't know if this means marking the port as DOWN, or deleting the port when unscheduled.
maybe status needs to also be updated here?
https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L417
** Affects: neutron
Importance: Undecided
Status: New
** Tags: dns
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1852504
Title:
DHCP reserved ports that were unscheduled are advertised as DNS
servers
Status in neutron:
New
Bug description:
We have 2 DHCP servers per network. After network outages, and when
hosts come back online, the number of ACTIVE DHCP servers grow. This
happened again after more outages, with some networks having up to
9-10+ DHCP ports, many in ACTIVE state, despite neutron-server's
neutron.conf only having dhcp_agents_per_network = 2
It turns out these are "reserved_dhcp_port" as indicated by the
device_id.
As you can see here:
https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L399
When a network is rescheduled to a new DHCP agent, the old port is not
deleted, not is its status marked as DOWN. All that is done is it is
marked as reserved and the port updated.
However VMs on the network now get advertised all the DHCP ports on
the network as internal DNS servers, several stale entries in
/etc/resolv.conf in our case. Problem is some of these DHCP agents
have been unscheduled so the DNS servers don't actually exist. Also in
the VMs, more than 3 entries are not queried.
As you can see here, is resolv.conf on a VM:
[root@arjunpmk-master ~]# vim /etc/resolv.conf
# Generated by NetworkManager
search mpt1.pf9.io
nameserver 10.128.144.16
nameserver 10.128.144.23
nameserver 10.128.144.15
# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.
nameserver 10.128.144.7
nameserver 10.128.144.4
nameserver 10.128.144.8
nameserver 10.128.144.9
nameserver 10.128.144.17
nameserver 10.128.144.12
nameserver 10.128.144.45
nameserver 10.128.144.46
nameserver 10.128.144.51
Here you can see all the DHCP ports for the network of this VM:
[root@df-us-mpt1-kvm arjun(admin)]# openstack port list --network ead88ed3-f1e0-4498-8c1e-6d091083ae33 --device-owner network:dhcp
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
| 02ff0f4c-f39d-4207-90b4-2a69585f4c8a | | fa:16:3e:a9:36:82 | ip_address='10.128.144.16', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 0b612f86-ad06-4bce-a333-bc18f3e9e7b1 | | fa:16:3e:bb:d8:3d | ip_address='10.128.144.23', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
| 402338ac-2ca6-4312-a2df-a306fc589f10 | | fa:16:3e:a3:a8:57 | ip_address='10.128.144.15', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 5d2edc73-4eff-44c0-8993-125636973384 | | fa:16:3e:6c:cd:2b | ip_address='10.128.144.7', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 78241da3-9674-479a-8b45-a580c7f8b117 | | fa:16:3e:d0:9d:ef | ip_address='10.128.144.4', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 7b41bf47-d4d4-434a-b704-4c67182ffcaa | | fa:16:3e:4c:cf:54 | ip_address='10.128.144.8', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| 96897190-1aa8-4c17-a7d1-c3744f1bf962 | | fa:16:3e:e8:55:29 | ip_address='10.128.144.45', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| af87dde6-fb46-4516-9569-e46496398b64 | | fa:16:3e:0e:61:14 | ip_address='10.128.144.9', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| c2a2112d-c6ef-4411-a415-1a453d74a838 | | fa:16:3e:d0:39:67 | ip_address='10.128.144.46', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
| c8298fbd-06e7-4488-a3e1-874e9341d4cf | | fa:16:3e:d6:3c:ac | ip_address='10.128.144.51', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
| d6f0206f-ae3c-4ebf-95cb-104dad786724 | | fa:16:3e:ab:ab:22 | ip_address='10.128.144.17', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | ACTIVE |
| e2be0f98-3333-4645-b58a-435e5513a4d3 | | fa:16:3e:b4:ba:c0 | ip_address='10.128.144.12', subnet_id='9757ae4a-ccfb-49b0-a9cc-53b8664631a6' | DOWN |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------+--------+
If I view the first DNS server for the VM's resolv.conf (10.128.144.16), you can see its status is ACTIVE but its actually a reserved port. This is the same case for 2nd nameserver entry. Luckily the 3rd entry is valid, but this causes timeouts and all DNS lookups to take 10 seconds since first two fail. VMs on other networks aren't so lucky, where all 3 nameservers are reserved.
Expectation: Only DHCP ports that are actually scheduled (not reserved) should be advertised as DNS nameservers. I don't know if this means marking the port as DOWN, or deleting the port when unscheduled.
maybe status needs to also be updated here?
https://github.com/openstack/neutron/blob/master/neutron/db/agentschedulers_db.py#L417
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1852504/+subscriptions