← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1456321] [NEW] nova-network DHCP server not correct when creating multiple networks

 

Public bug reported:

This bug pertains to stable Kilo, package versions below:
ii  nova-api                             1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - API frontend
ii  nova-cert                            1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - certificate management
ii  nova-common                          1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - common files
ii  nova-compute                         1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - compute node base
ii  nova-compute-kvm                     1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - compute node (KVM)
ii  nova-compute-libvirt                 1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - compute node libvirt support
ii  nova-conductor                       1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - conductor service
ii  nova-consoleauth                     1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - Console Authenticator
ii  nova-network                         1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - Network manager
ii  nova-novncproxy                      1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - NoVNC proxy
ii  nova-scheduler                       1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - virtual machine scheduler
ii  python-nova                          1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute Python libraries
ii  python-nova-adminclient              0.1.8-0ubuntu2                        amd64        client for administering Openstack Nova
ii  python-novaclient                    1:2.22.0-0ubuntu1~cloud0              all          client library for OpenStack Compute API

This bug was originally reported at https://github.com/bloomberg/chef-
bcpc/issues/573. I triaged it there and am distilling it down here.

In essence, we have a command in a Chef recipe that builds out a number
of fixed IP networks using nova-manage during initial setup of an
OpenStack cluster. The command looks like this:

nova-manage network create --label fixed --fixed_range_v4=1.104.0.0/16
--num_networks=16 --multi_host=T --network_size=128 --vlan_start=1000
--bridge_interface=p2p2

As per the original bug report on GitHub, all subnets after the first
one were being created with a DHCP server address of the gateway of the
first subnet to be created. I dug in and found the problem code in
nova/network/manager.py; if dhcp_server is not provided, the first
iteration through the subnets enumeration will set it to the gateway IP.
Since dhcp_server is scoped at method level, it sticks around for the
entire loop and so every created subnet gets the same DHCP server IP. A
little println debugging indicated that this was indeed the case.

This causes things to break for us pretty badly. When launching
instances in any network other than the first one, launch will fail
because Nova tries to launch dnsmasq and bind it to an IP that's already
bound by another instance of dnsmasq, which fails.

I patched manager.py in the following way, which writes the DHCP server
IP to a local variable in the loop. Specifying a DHCP server manually
will still override inferring the DHCP server from the gateway address.
Tests pass after this change (sorry that it is not in Gerrit already,
but I haven't been able to sit down and get git review working yet):

diff --git a/nova/network/manager.py b/nova/network/manager.py
index 3e8e8b1..832fd1b 100644
--- a/nova/network/manager.py
+++ b/nova/network/manager.py
@@ -1351,13 +1351,15 @@ class NetworkManager(manager.Manager):
                 else:
                     net.gateway = current
                     current += 1
-                if not dhcp_server:
-                    dhcp_server = net.gateway
+                if dhcp_server:
+                    subnet_dhcp_server = dhcp_server
+                else:
+                    subnet_dhcp_server = net.gateway
                 net.dhcp_start = current
                 current += 1
-                if str(net.dhcp_start) == dhcp_server:
+                if str(net.dhcp_start) == subnet_dhcp_server:
                     net.dhcp_start = current
-                net.dhcp_server = dhcp_server
+                net.dhcp_server = subnet_dhcp_server
                 extra_reserved.append(str(net.dhcp_server))
                 extra_reserved.append(str(net.gateway))

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1456321

Title:
  nova-network DHCP server not correct when creating multiple networks

Status in OpenStack Compute (Nova):
  New

Bug description:
  This bug pertains to stable Kilo, package versions below:
  ii  nova-api                             1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - API frontend
  ii  nova-cert                            1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - certificate management
  ii  nova-common                          1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - common files
  ii  nova-compute                         1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - compute node base
  ii  nova-compute-kvm                     1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - compute node (KVM)
  ii  nova-compute-libvirt                 1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - compute node libvirt support
  ii  nova-conductor                       1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - conductor service
  ii  nova-consoleauth                     1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - Console Authenticator
  ii  nova-network                         1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - Network manager
  ii  nova-novncproxy                      1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - NoVNC proxy
  ii  nova-scheduler                       1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute - virtual machine scheduler
  ii  python-nova                          1:2015.1.0-0ubuntu1~cloud0            all          OpenStack Compute Python libraries
  ii  python-nova-adminclient              0.1.8-0ubuntu2                        amd64        client for administering Openstack Nova
  ii  python-novaclient                    1:2.22.0-0ubuntu1~cloud0              all          client library for OpenStack Compute API

  This bug was originally reported at https://github.com/bloomberg/chef-
  bcpc/issues/573. I triaged it there and am distilling it down here.

  In essence, we have a command in a Chef recipe that builds out a
  number of fixed IP networks using nova-manage during initial setup of
  an OpenStack cluster. The command looks like this:

  nova-manage network create --label fixed --fixed_range_v4=1.104.0.0/16
  --num_networks=16 --multi_host=T --network_size=128 --vlan_start=1000
  --bridge_interface=p2p2

  As per the original bug report on GitHub, all subnets after the first
  one were being created with a DHCP server address of the gateway of
  the first subnet to be created. I dug in and found the problem code in
  nova/network/manager.py; if dhcp_server is not provided, the first
  iteration through the subnets enumeration will set it to the gateway
  IP. Since dhcp_server is scoped at method level, it sticks around for
  the entire loop and so every created subnet gets the same DHCP server
  IP. A little println debugging indicated that this was indeed the
  case.

  This causes things to break for us pretty badly. When launching
  instances in any network other than the first one, launch will fail
  because Nova tries to launch dnsmasq and bind it to an IP that's
  already bound by another instance of dnsmasq, which fails.

  I patched manager.py in the following way, which writes the DHCP
  server IP to a local variable in the loop. Specifying a DHCP server
  manually will still override inferring the DHCP server from the
  gateway address. Tests pass after this change (sorry that it is not in
  Gerrit already, but I haven't been able to sit down and get git review
  working yet):

  diff --git a/nova/network/manager.py b/nova/network/manager.py
  index 3e8e8b1..832fd1b 100644
  --- a/nova/network/manager.py
  +++ b/nova/network/manager.py
  @@ -1351,13 +1351,15 @@ class NetworkManager(manager.Manager):
                   else:
                       net.gateway = current
                       current += 1
  -                if not dhcp_server:
  -                    dhcp_server = net.gateway
  +                if dhcp_server:
  +                    subnet_dhcp_server = dhcp_server
  +                else:
  +                    subnet_dhcp_server = net.gateway
                   net.dhcp_start = current
                   current += 1
  -                if str(net.dhcp_start) == dhcp_server:
  +                if str(net.dhcp_start) == subnet_dhcp_server:
                       net.dhcp_start = current
  -                net.dhcp_server = dhcp_server
  +                net.dhcp_server = subnet_dhcp_server
                   extra_reserved.append(str(net.dhcp_server))
                   extra_reserved.append(str(net.gateway))

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1456321/+subscriptions


Follow ups

References