yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96376
[Bug 2122079] [NEW] DHCP agent incorrectly releasing leases due to missing client_id
Public bug reported:
Upon reload of the dhcp agent as port assignments are updated, the
existing leases for dnsmasq are examined against the valid host entries
to determine if any of the existing leases are invalid (no longer
defined) and should be expired.
As currently written, the neutron code performs its comparison using the tuple of mac, ip and client_id. By default, clients use their hardware type (0x01 for Ethernet in this case) concatenated with their MAC address for client_id (option 61) in DHCPDISCOVER/DHCPREQUEST calls:
https://www.rfc-editor.org/rfc/rfc2132.html#section-9.14
By default, dnsmasq uses this client_id internally and also writes it to
the leases file.
When the neutron dhcp agent code is comparing the host and lease data,
since the leases data contains a client_id but the host data does not,
neutron concludes that the lease should be expired due to the missing
(None) value for client-id in the host file. As such, neutron issues a
command to NAK the lease. When the client then receives this NAK, it
forces a new DORA from the client to reinitialize the DHCP lease
process.
For an OS such as Windows, this is very impactful as upon receipt of the
NAK, the Windows DHCP client concludes it is no longer authorized to use
this IP and should immediately release the IP before performing a new
DORA process. In doing so, all connectivity to/from the VM is lost as
all active connections are dropped while the DORA process is ongoing and
the VM reacquires a new lease to the same IP it had previously. At that
point, all connections must be reestablished.
Example host/leases files:
cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/leases
1756993657 fa:16:3e:e9:50:dd 172.16.1.77 test 01:fa:16:3e:e9:50:dd
1756993640 fa:16:3e:d5:f8:1b 172.16.2.30 test2 01:fa:16:3e:d5:f8:1b
cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/host
fa:16:3e:e9:50:dd,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test,172.16.1.77,set:port-5c9388b8-60f2-4e93-8a7a-954acf662bc5
fa:16:3e:d5:f8:1b,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test2,172.16.2.30,set:port-c23dcf52-1404-4a5a-b7ee-7cd9f7530837
I am conflicted on the best/proper way to resolve this.
During local testing, if the '--dhcp-ignore-clid' option is passed to
dnsmasq so that it ignores and does not use the client provided client-
id and instead writes a '*' to the leases file for client-id, the
existing neutron code works as expected when parsing the host/leases
files. Existing leases are retained when the agent reloads upon port
updates.
Given that the current neutron code explicitly checks client-id, it
seems undesired to bypass this check in this manner.
However, since an individual client can send any value it likes for
client-id, it does not seem that the current neutron code will behave as
expected in that case. It seems that only when extra_dhcp_opts are
specified on the port and an expected client-id is used will the code
properly handle comparing the host/leases file entries.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2122079
Title:
DHCP agent incorrectly releasing leases due to missing client_id
Status in neutron:
New
Bug description:
Upon reload of the dhcp agent as port assignments are updated, the
existing leases for dnsmasq are examined against the valid host
entries to determine if any of the existing leases are invalid (no
longer defined) and should be expired.
As currently written, the neutron code performs its comparison using the tuple of mac, ip and client_id. By default, clients use their hardware type (0x01 for Ethernet in this case) concatenated with their MAC address for client_id (option 61) in DHCPDISCOVER/DHCPREQUEST calls:
https://www.rfc-editor.org/rfc/rfc2132.html#section-9.14
By default, dnsmasq uses this client_id internally and also writes it
to the leases file.
When the neutron dhcp agent code is comparing the host and lease data,
since the leases data contains a client_id but the host data does not,
neutron concludes that the lease should be expired due to the missing
(None) value for client-id in the host file. As such, neutron issues a
command to NAK the lease. When the client then receives this NAK, it
forces a new DORA from the client to reinitialize the DHCP lease
process.
For an OS such as Windows, this is very impactful as upon receipt of
the NAK, the Windows DHCP client concludes it is no longer authorized
to use this IP and should immediately release the IP before performing
a new DORA process. In doing so, all connectivity to/from the VM is
lost as all active connections are dropped while the DORA process is
ongoing and the VM reacquires a new lease to the same IP it had
previously. At that point, all connections must be reestablished.
Example host/leases files:
cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/leases
1756993657 fa:16:3e:e9:50:dd 172.16.1.77 test 01:fa:16:3e:e9:50:dd
1756993640 fa:16:3e:d5:f8:1b 172.16.2.30 test2 01:fa:16:3e:d5:f8:1b
cat /var/lib/neutron/dhcp/bae80a38-1f4c-4b51-ab4b-1a0df7f79933/host
fa:16:3e:e9:50:dd,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test,172.16.1.77,set:port-5c9388b8-60f2-4e93-8a7a-954acf662bc5
fa:16:3e:d5:f8:1b,set:66e5668b6d354f38bd80ed7c2a2fb9fe,test2,172.16.2.30,set:port-c23dcf52-1404-4a5a-b7ee-7cd9f7530837
I am conflicted on the best/proper way to resolve this.
During local testing, if the '--dhcp-ignore-clid' option is passed to
dnsmasq so that it ignores and does not use the client provided
client-id and instead writes a '*' to the leases file for client-id,
the existing neutron code works as expected when parsing the
host/leases files. Existing leases are retained when the agent reloads
upon port updates.
Given that the current neutron code explicitly checks client-id, it
seems undesired to bypass this check in this manner.
However, since an individual client can send any value it likes for
client-id, it does not seem that the current neutron code will behave
as expected in that case. It seems that only when extra_dhcp_opts are
specified on the port and an expected client-id is used will the code
properly handle comparing the host/leases file entries.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2122079/+subscriptions