yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #84680
[Bug 1906406] [NEW] [segments] dnsmasq can't delete lease for instance due to mismatch between client ip and local addr
Public bug reported:
Issue:
The Neutron DHCP agent bootstraps the DHCP leases file for a network
using all associated subnets[1]. In a multi-segment environment,
however, a DHCP agent can only service a single segment/subnet of a
given network.
The DHCP namespace, then, is configured with an interface containing a
single IP address for the respective segment/subnet it's servicing. When
a VM from the same network but different segment/subnet is deleted, the
DHCP release packet that would be issued by dhcp_release isn't sent due
to a mismatch between client IP and local addr.
Brian Haley patched dhcp_release.c recently to fix a similar issue here:
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=d9f882bea2806799bf3d1f73937f5e72d0bfc650;hp=fef2f1c75eba56b7355cbe729e4362474d558aa4;ds=sidebyside
We can probably update dnsmasq-utils in the short term, but maybe making
the DHCP agent segment aware is a better long-term solution?
Here are the steps to reproduce:
-=-=-=-=-
Network: rpn_multisegment
Segment 1:
VLAN 106 10.106.0.0/24
Provider Mapping: physnet1:bond1
Segment 2:
VLAN 206 10.206.0.0/24
Provider Mapping: physnet2:bond1
Two VMs:
🌕OpenStack Lab % openstack server list
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
| 40f94b68-7e38-45b6-855d-792399c2a9ff | vm-seg2 | ACTIVE | rpn_multisegment=10.206.0.53 | bionic-osa-master | osa-dev-8-8-60 |
| 34f8ff53-e505-4267-a13a-b881dfcec240 | vm-seg1 | ACTIVE | rpn_multisegment=10.106.0.98 | bionic-osa-master | osa-dev-8-8-60 |
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
On compute01, we can see host file populated with entries for each
subnet associated with the network:
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53
Same on compute02:
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53
The leases file, however, contains only those hosts that have obtained
leases (expected):
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
Everything looks OK so far.
When restarting the neutron-dhcp-agent, however, the leases file is
bootstrapped and contains entries for all subnets associated with the
network:
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
This configuration becomes a problem when a VM is deleted and
dhcp_release is executed, as the the namespaces on each host only have
an IP from their respective segment and will not be able to delete a
lease for what essentially is a non-connected subnet:
root@lab-compute01:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ns-5ccc6426-59@if102: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59
valid_lft forever preferred_lft forever
inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe2c:da6d/64 scope link
valid_lft forever preferred_lft forever
root@lab-compute02:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ns-0c51acd3-60@if85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60
valid_lft forever preferred_lft forever
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe07:f7af/64 scope link
valid_lft forever preferred_lft forever
Example:
🌕OpenStack Lab % openstack server delete vm-seg1
lab-compute01:
Dec 01 13:58:12 lab-compute01 dnsmasq-dhcp[56028]: DHCPRELEASE(ns-5ccc6426-59) 10.106.0.98 fa:16:3e:46:7b:d1
Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
lab-compute02:
Dec 01 13:58:13 lab-compute02 neutron-dhcp-agent[48564]: 2020-12-01 13:58:13.946 48564 WARNING neutron.agent.linux.dhcp [-] Could not release DHCP leases for these IP addresses after 3 tries: 10.106.0.98
Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
As you can see, the lease for 10.106.0.98 was not deleted on compute02,
as that segment/subnet is not configured on ns-0c51acd3-60 in the DHCP
namespace like it would be in an ordinary provider network.
[1]
https://github.com/openstack/neutron/blob/5529b2f5cc6b451c771bc5134018e9dbd2cb6598/neutron/agent/linux/dhcp.py#L758
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1906406
Title:
[segments] dnsmasq can't delete lease for instance due to mismatch
between client ip and local addr
Status in neutron:
New
Bug description:
Issue:
The Neutron DHCP agent bootstraps the DHCP leases file for a network
using all associated subnets[1]. In a multi-segment environment,
however, a DHCP agent can only service a single segment/subnet of a
given network.
The DHCP namespace, then, is configured with an interface containing a
single IP address for the respective segment/subnet it's servicing.
When a VM from the same network but different segment/subnet is
deleted, the DHCP release packet that would be issued by dhcp_release
isn't sent due to a mismatch between client IP and local addr.
Brian Haley patched dhcp_release.c recently to fix a similar issue
here:
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=d9f882bea2806799bf3d1f73937f5e72d0bfc650;hp=fef2f1c75eba56b7355cbe729e4362474d558aa4;ds=sidebyside
We can probably update dnsmasq-utils in the short term, but maybe
making the DHCP agent segment aware is a better long-term solution?
Here are the steps to reproduce:
-=-=-=-=-
Network: rpn_multisegment
Segment 1:
VLAN 106 10.106.0.0/24
Provider Mapping: physnet1:bond1
Segment 2:
VLAN 206 10.206.0.0/24
Provider Mapping: physnet2:bond1
Two VMs:
🌕OpenStack Lab % openstack server list
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
| 40f94b68-7e38-45b6-855d-792399c2a9ff | vm-seg2 | ACTIVE | rpn_multisegment=10.206.0.53 | bionic-osa-master | osa-dev-8-8-60 |
| 34f8ff53-e505-4267-a13a-b881dfcec240 | vm-seg1 | ACTIVE | rpn_multisegment=10.106.0.98 | bionic-osa-master | osa-dev-8-8-60 |
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
On compute01, we can see host file populated with entries for each
subnet associated with the network:
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53
Same on compute02:
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53
The leases file, however, contains only those hosts that have obtained
leases (expected):
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
Everything looks OK so far.
When restarting the neutron-dhcp-agent, however, the leases file is
bootstrapped and contains entries for all subnets associated with the
network:
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
This configuration becomes a problem when a VM is deleted and
dhcp_release is executed, as the the namespaces on each host only have
an IP from their respective segment and will not be able to delete a
lease for what essentially is a non-connected subnet:
root@lab-compute01:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ns-5ccc6426-59@if102: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59
valid_lft forever preferred_lft forever
inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe2c:da6d/64 scope link
valid_lft forever preferred_lft forever
root@lab-compute02:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ns-0c51acd3-60@if85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60
valid_lft forever preferred_lft forever
inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe07:f7af/64 scope link
valid_lft forever preferred_lft forever
Example:
🌕OpenStack Lab % openstack server delete vm-seg1
lab-compute01:
Dec 01 13:58:12 lab-compute01 dnsmasq-dhcp[56028]: DHCPRELEASE(ns-5ccc6426-59) 10.106.0.98 fa:16:3e:46:7b:d1
Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts
root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
lab-compute02:
Dec 01 13:58:13 lab-compute02 neutron-dhcp-agent[48564]: 2020-12-01 13:58:13.946 48564 WARNING neutron.agent.linux.dhcp [-] Could not release DHCP leases for these IP addresses after 3 tries: 10.106.0.98
Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts
root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *
As you can see, the lease for 10.106.0.98 was not deleted on
compute02, as that segment/subnet is not configured on ns-0c51acd3-60
in the DHCP namespace like it would be in an ordinary provider
network.
[1]
https://github.com/openstack/neutron/blob/5529b2f5cc6b451c771bc5134018e9dbd2cb6598/neutron/agent/linux/dhcp.py#L758
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1906406/+subscriptions