← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1906406] [NEW] [segments] dnsmasq can't delete lease for instance due to mismatch between client ip and local addr

 

Public bug reported:

Issue:

The Neutron DHCP agent bootstraps the DHCP leases file for a network
using all associated subnets[1]. In a multi-segment environment,
however, a DHCP agent can only service a single segment/subnet of a
given network.

The DHCP namespace, then, is configured with an interface containing a
single IP address for the respective segment/subnet it's servicing. When
a VM from the same network but different segment/subnet is deleted, the
DHCP release packet that would be issued by dhcp_release isn't sent due
to a mismatch between client IP and local addr.

Brian Haley patched dhcp_release.c recently to fix a similar issue here:

http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=d9f882bea2806799bf3d1f73937f5e72d0bfc650;hp=fef2f1c75eba56b7355cbe729e4362474d558aa4;ds=sidebyside

We can probably update dnsmasq-utils in the short term, but maybe making
the DHCP agent segment aware is a better long-term solution?

Here are the steps to reproduce:

-=-=-=-=-

Network: rpn_multisegment

Segment 1:
VLAN 106 10.106.0.0/24
Provider Mapping: physnet1:bond1

Segment 2:
VLAN 206 10.206.0.0/24
Provider Mapping: physnet2:bond1

Two VMs:

🌕OpenStack Lab % openstack server list
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
| ID                                   | Name                | Status  | Networks                                      | Image                        | Flavor             |
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
| 40f94b68-7e38-45b6-855d-792399c2a9ff | vm-seg2             | ACTIVE  | rpn_multisegment=10.206.0.53                  | bionic-osa-master            | osa-dev-8-8-60     |
| 34f8ff53-e505-4267-a13a-b881dfcec240 | vm-seg1             | ACTIVE  | rpn_multisegment=10.106.0.98                  | bionic-osa-master            | osa-dev-8-8-60     |
+--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+

On compute01, we can see host file populated with entries for each
subnet associated with the network:

root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53

Same on compute02:


root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53

The leases file, however, contains only those hosts that have obtained
leases (expected):

root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

Everything looks OK so far.

When restarting the neutron-dhcp-agent, however, the leases file is
bootstrapped and contains entries for all subnets associated with the
network:

root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

This configuration becomes a problem when a VM is deleted and
dhcp_release is executed, as the the namespaces on each host only have
an IP from their respective segment and will not be able to delete a
lease for what essentially is a non-connected subnet:

root@lab-compute01:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ns-5ccc6426-59@if102: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59
       valid_lft forever preferred_lft forever
    inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe2c:da6d/64 scope link
       valid_lft forever preferred_lft forever

root@lab-compute02:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ns-0c51acd3-60@if85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60
       valid_lft forever preferred_lft forever
    inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe07:f7af/64 scope link
       valid_lft forever preferred_lft forever

Example:

🌕OpenStack Lab % openstack server delete vm-seg1

lab-compute01:

Dec 01 13:58:12 lab-compute01 dnsmasq-dhcp[56028]: DHCPRELEASE(ns-5ccc6426-59) 10.106.0.98 fa:16:3e:46:7b:d1
Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts

root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

lab-compute02:

Dec 01 13:58:13 lab-compute02 neutron-dhcp-agent[48564]: 2020-12-01 13:58:13.946 48564 WARNING neutron.agent.linux.dhcp [-] Could not release DHCP leases for these IP addresses after 3 tries: 10.106.0.98
Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts

root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

As you can see, the lease for 10.106.0.98 was not deleted on compute02,
as that segment/subnet is not configured on ns-0c51acd3-60 in the DHCP
namespace like it would be in an ordinary provider network.

[1]
https://github.com/openstack/neutron/blob/5529b2f5cc6b451c771bc5134018e9dbd2cb6598/neutron/agent/linux/dhcp.py#L758

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1906406

Title:
  [segments] dnsmasq can't delete lease for instance due to mismatch
  between client ip and local addr

Status in neutron:
  New

Bug description:
  Issue:

  The Neutron DHCP agent bootstraps the DHCP leases file for a network
  using all associated subnets[1]. In a multi-segment environment,
  however, a DHCP agent can only service a single segment/subnet of a
  given network.

  The DHCP namespace, then, is configured with an interface containing a
  single IP address for the respective segment/subnet it's servicing.
  When a VM from the same network but different segment/subnet is
  deleted, the DHCP release packet that would be issued by dhcp_release
  isn't sent due to a mismatch between client IP and local addr.

  Brian Haley patched dhcp_release.c recently to fix a similar issue
  here:

  http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=d9f882bea2806799bf3d1f73937f5e72d0bfc650;hp=fef2f1c75eba56b7355cbe729e4362474d558aa4;ds=sidebyside

  We can probably update dnsmasq-utils in the short term, but maybe
  making the DHCP agent segment aware is a better long-term solution?

  Here are the steps to reproduce:

  -=-=-=-=-

  Network: rpn_multisegment

  Segment 1:
  VLAN 106 10.106.0.0/24
  Provider Mapping: physnet1:bond1

  Segment 2:
  VLAN 206 10.206.0.0/24
  Provider Mapping: physnet2:bond1

  Two VMs:

  🌕OpenStack Lab % openstack server list
  +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
  | ID                                   | Name                | Status  | Networks                                      | Image                        | Flavor             |
  +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+
  | 40f94b68-7e38-45b6-855d-792399c2a9ff | vm-seg2             | ACTIVE  | rpn_multisegment=10.206.0.53                  | bionic-osa-master            | osa-dev-8-8-60     |
  | 34f8ff53-e505-4267-a13a-b881dfcec240 | vm-seg1             | ACTIVE  | rpn_multisegment=10.106.0.98                  | bionic-osa-master            | osa-dev-8-8-60     |
  +--------------------------------------+---------------------+---------+-----------------------------------------------+------------------------------+--------------------+

  On compute01, we can see host file populated with entries for each
  subnet associated with the network:

  root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
  fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
  fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
  fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
  fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53

  Same on compute02:

  
  root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
  fa:16:3e:07:f7:af,host-10-206-0-2.openstacklocal,10.206.0.2
  fa:16:3e:2c:da:6d,host-10-106-0-2.openstacklocal,10.106.0.2
  fa:16:3e:46:7b:d1,host-10-106-0-98.openstacklocal,10.106.0.98
  fa:16:3e:ce:b1:b5,host-10-206-0-53.openstacklocal,10.206.0.53

  The leases file, however, contains only those hosts that have obtained
  leases (expected):

  root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
  1606916842 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
  1606916738 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
  1606916738 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

  root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
  1606916917 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 ff:b5:5e:67:ff:00:02:00:00:ab:11:9e:a5:86:fd:ae:2f:49:ad
  1606916626 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

  Everything looks OK so far.

  When restarting the neutron-dhcp-agent, however, the leases file is
  bootstrapped and contains entries for all subnets associated with the
  network:

  root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
  1606917246 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
  1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
  1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
  1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

  root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
  1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
  1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
  1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
  1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

  This configuration becomes a problem when a VM is deleted and
  dhcp_release is executed, as the the namespaces on each host only have
  an IP from their respective segment and will not be able to delete a
  lease for what essentially is a non-connected subnet:

  root@lab-compute01:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  2: ns-5ccc6426-59@if102: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether fa:16:3e:2c:da:6d brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-5ccc6426-59
         valid_lft forever preferred_lft forever
      inet 10.106.0.2/24 brd 10.106.0.255 scope global ns-5ccc6426-59
         valid_lft forever preferred_lft forever
      inet6 fe80::f816:3eff:fe2c:da6d/64 scope link
         valid_lft forever preferred_lft forever

  root@lab-compute02:~# ip netns exec qdhcp-0e4fa560-1483-4ac5-be44-0542503f1e5a ip addr
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
         valid_lft forever preferred_lft forever
  2: ns-0c51acd3-60@if85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether fa:16:3e:07:f7:af brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet 10.206.0.2/24 brd 10.206.0.255 scope global ns-0c51acd3-60
         valid_lft forever preferred_lft forever
      inet 169.254.169.254/16 brd 169.254.255.255 scope global ns-0c51acd3-60
         valid_lft forever preferred_lft forever
      inet6 fe80::f816:3eff:fe07:f7af/64 scope link
         valid_lft forever preferred_lft forever

  Example:

  🌕OpenStack Lab % openstack server delete vm-seg1

  lab-compute01:

  Dec 01 13:58:12 lab-compute01 dnsmasq-dhcp[56028]: DHCPRELEASE(ns-5ccc6426-59) 10.106.0.98 fa:16:3e:46:7b:d1
  Dec 01 13:58:13 lab-compute01 dnsmasq[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
  Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
  Dec 01 13:58:13 lab-compute01 dnsmasq-dhcp[56028]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts

  root@lab-compute01:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
  1606917246 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
  1606917246 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
  1606917246 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

  lab-compute02:

  Dec 01 13:58:13 lab-compute02 neutron-dhcp-agent[48564]: 2020-12-01 13:58:13.946 48564 WARNING neutron.agent.linux.dhcp [-] Could not release DHCP leases for these IP addresses after 3 tries: 10.106.0.98
  Dec 01 13:58:14 lab-compute02 dnsmasq[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/addn_hosts - 3 addresses
  Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/host
  Dec 01 13:58:14 lab-compute02 dnsmasq-dhcp[589]: read /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/opts

  root@lab-compute02:~# cat /var/lib/neutron/dhcp/0e4fa560-1483-4ac5-be44-0542503f1e5a/leases
  1606917254 fa:16:3e:46:7b:d1 10.106.0.98 host-10-106-0-98 *
  1606917254 fa:16:3e:2c:da:6d 10.106.0.2 host-10-106-0-2 *
  1606917254 fa:16:3e:ce:b1:b5 10.206.0.53 host-10-206-0-53 *
  1606917254 fa:16:3e:07:f7:af 10.206.0.2 host-10-206-0-2 *

  As you can see, the lease for 10.106.0.98 was not deleted on
  compute02, as that segment/subnet is not configured on ns-0c51acd3-60
  in the DHCP namespace like it would be in an ordinary provider
  network.

  [1]
  https://github.com/openstack/neutron/blob/5529b2f5cc6b451c771bc5134018e9dbd2cb6598/neutron/agent/linux/dhcp.py#L758

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1906406/+subscriptions