yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80467
[Bug 1849479] [NEW] neutron l2 to dhcp lost when migrating in stable/stein 14.0.2
Public bug reported:
Info about the environment:
3x controller nodes
50+ compute nodes
all in stable stein, neutron is 14.0.2 using OVS 2.11.0
neutron settings:
- max_l3_agents_per_router = 3
- dhcp_agents_per_network = 2
- router_distributed = true
- interface_driver = openvswitch
- l3_ha = true
l3 agent:
- agent_mode = dvr
ml2:
- type_drivers = flat,vlan,vxlan
- tenant_network_types = vxlan
- mechanism_drivers = openvswitch,l2population
- extension_drivers = port_security,dns
- external_network_type = vlan
tenants may have multiple external networks
instances may have multiple interfaces
tests have been performed on 10 instances launched in a tenant network
connected to a router in an external network. all instances have
floating ip's assigned. these instances had only 1 interface. this
particular testing tenant has rbac's for 4 external networks in which
only 1 is used.
migrations have been done via cli with admin:
openstack server migrate --live <new_host> <instance_uuid>
have also tested using evacuate with same results
expected behavior:
when _multiple_ (in the ranges of 10+) instances is migrated simultaneously from one computehost to another, they should come up with a minor network service drop. all l2 should be resumed.
what actually happends:
instances are migrated, some errors pop in neutron/nova and then instances comes up with a minor network service drop. However L2 toward dhcp-servers is totally severed in OVS. The migrated instances will as expected start try renewal of lease half-way through it's current lease and at the end of it drop the IP. Easy test is try renewal of lease on an instance or icmp to any dhcp-server in that vxlan L2.
current workaround:
once the instance is migrated the l2 to dhcp-servers can be re-established by restarting neutron-openvswitch-agent on the destination host.
how to test:
create instances (10+), migrate and then try to ping neutron dhcp-server in the vxlan (tenant created network) or simply renew dhcp-leases.
error messages:
Exception during message handling: TooManyExternalNetworks: More than
one external network exists. TooManyExternalNetworks: More than one
external network exists.
other oddities:
when performing migration of small number of instances i.e. 1-4 migrations become successful and L2 with dhcp-servers is not lost.
when looking through debug logs i can't really find anything of
relevance. no other large errors/warnings occur other that the one
above.
i will perform more test when migrations are successful and/or neutron-
openvswitch-agent restarted and see if L2 to dhcp-servers survive 24h.
This occurs in a 14.0.0 regression bug which should be fixed in 14.0.2
(this bugreport is for 14.0.2) but it could possible not work with this
combination of settings(?).
Please let me know if any versions to api/services is required for this
or any configurations or other info.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1849479
Title:
neutron l2 to dhcp lost when migrating in stable/stein 14.0.2
Status in neutron:
New
Bug description:
Info about the environment:
3x controller nodes
50+ compute nodes
all in stable stein, neutron is 14.0.2 using OVS 2.11.0
neutron settings:
- max_l3_agents_per_router = 3
- dhcp_agents_per_network = 2
- router_distributed = true
- interface_driver = openvswitch
- l3_ha = true
l3 agent:
- agent_mode = dvr
ml2:
- type_drivers = flat,vlan,vxlan
- tenant_network_types = vxlan
- mechanism_drivers = openvswitch,l2population
- extension_drivers = port_security,dns
- external_network_type = vlan
tenants may have multiple external networks
instances may have multiple interfaces
tests have been performed on 10 instances launched in a tenant network
connected to a router in an external network. all instances have
floating ip's assigned. these instances had only 1 interface. this
particular testing tenant has rbac's for 4 external networks in which
only 1 is used.
migrations have been done via cli with admin:
openstack server migrate --live <new_host> <instance_uuid>
have also tested using evacuate with same results
expected behavior:
when _multiple_ (in the ranges of 10+) instances is migrated simultaneously from one computehost to another, they should come up with a minor network service drop. all l2 should be resumed.
what actually happends:
instances are migrated, some errors pop in neutron/nova and then instances comes up with a minor network service drop. However L2 toward dhcp-servers is totally severed in OVS. The migrated instances will as expected start try renewal of lease half-way through it's current lease and at the end of it drop the IP. Easy test is try renewal of lease on an instance or icmp to any dhcp-server in that vxlan L2.
current workaround:
once the instance is migrated the l2 to dhcp-servers can be re-established by restarting neutron-openvswitch-agent on the destination host.
how to test:
create instances (10+), migrate and then try to ping neutron dhcp-server in the vxlan (tenant created network) or simply renew dhcp-leases.
error messages:
Exception during message handling: TooManyExternalNetworks: More than
one external network exists. TooManyExternalNetworks: More than one
external network exists.
other oddities:
when performing migration of small number of instances i.e. 1-4 migrations become successful and L2 with dhcp-servers is not lost.
when looking through debug logs i can't really find anything of
relevance. no other large errors/warnings occur other that the one
above.
i will perform more test when migrations are successful and/or
neutron-openvswitch-agent restarted and see if L2 to dhcp-servers
survive 24h.
This occurs in a 14.0.0 regression bug which should be fixed in 14.0.2
(this bugreport is for 14.0.2) but it could possible not work with
this combination of settings(?).
Please let me know if any versions to api/services is required for
this or any configurations or other info.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1849479/+subscriptions
Follow ups