← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1849479] Re: neutron l2 to dhcp lost when migrating in stable/stein 14.0.2

 

I'm goint to close this as Stein has been EOL for quite a while. If this
is happening on a newer, supported release please open a new bug.

** Changed in: neutron
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1849479

Title:
  neutron l2 to dhcp lost when migrating in stable/stein 14.0.2

Status in neutron:
  Invalid

Bug description:
  Info about the environment:

  3x controller nodes
  50+ compute nodes

  all in stable stein, neutron is 14.0.2 using OVS 2.11.0

  neutron settings:
    - max_l3_agents_per_router = 3
    - dhcp_agents_per_network = 2
    - router_distributed = true
    - interface_driver = openvswitch
    - l3_ha = true

  l3 agent:
    - agent_mode = dvr

  ml2:
    - type_drivers = flat,vlan,vxlan
    - tenant_network_types = vxlan
    - mechanism_drivers = openvswitch,l2population
    - extension_drivers = port_security,dns
    - external_network_type = vlan

  tenants may have multiple external networks
  instances may have multiple interfaces

  tests have been performed on 10 instances launched in a tenant network
  connected to a router in an external network. all instances have
  floating ip's assigned. these instances had only 1 interface. this
  particular testing tenant has rbac's for 4 external networks in which
  only 1 is used.

  migrations have been done via cli with admin:
  openstack server migrate --live <new_host> <instance_uuid>
  have also tested using evacuate with same results

  expected behavior:
  when _multiple_ (in the ranges of 10+) instances is migrated simultaneously from one computehost to another, they should come up with a minor network service drop. all l2 should be resumed.

  what actually happends:
  instances are migrated, some errors pop in neutron/nova and then instances comes up with a minor network service drop. However L2 toward dhcp-servers is totally severed in OVS. The migrated instances will as expected start try renewal of lease half-way through it's current lease and at the end of it drop the IP. Easy test is try renewal of lease on an instance or icmp to any dhcp-server in that vxlan L2.

  current workaround:
  once the instance is migrated the l2 to dhcp-servers can be re-established by restarting neutron-openvswitch-agent on the destination host.

  how to test:
  create instances (10+), migrate and then try to ping neutron dhcp-server in the vxlan (tenant created network) or simply renew dhcp-leases.

  error messages:

  Exception during message handling: TooManyExternalNetworks: More than
  one external network exists. TooManyExternalNetworks: More than one
  external network exists.

  other oddities:
  when performing migration of small number of instances i.e. 1-4 migrations become successful and L2 with dhcp-servers is not lost.

  when looking through debug logs i can't really find anything of
  relevance. no other large errors/warnings occur other that the one
  above.

  i will perform more test when migrations are successful and/or
  neutron-openvswitch-agent restarted and see if L2 to dhcp-servers
  survive 24h.

  This occurs in a 14.0.0 regression bug which should be fixed in 14.0.2
  (this bugreport is for 14.0.2) but it could possible not work with
  this combination of settings(?).

  Please let me know if any versions to api/services is required for
  this or any configurations or other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1849479/+subscriptions



References