yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #60053
[Bug 1611308] Re: L2pop add_fdb_entries concurrency issue
[Expired for neutron because there has been no activity for 60 days.]
** Changed in: neutron
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1611308
Title:
L2pop add_fdb_entries concurrency issue
Status in neutron:
Expired
Bug description:
This is observed during live migration in a large scale env. ovs-
agent+l2pop is used in the env.
The oberserved issue is:
If multiple vms live migrates at the same time, some host will have stale unicast information at table 20, which still points vm to the old host.
After checking the code, there is a potential issue for [1], when
concurrent call to it.
Assuming there is 3 hosts, A, B, C. The VMs are being migrate from A
to B and C. The VMs are in the same neutron network. and host B don't
have any port of that neutron network before the migration.
The scenario might be:
1) VM1 migrates from host A to host B.
2) When the port of VM1 is up in host B, neutron server will be informed, and all the fdb_entries of that neutron network will be generated and sent to host B. The code at [2] will be hit. Let's assume the neutron network has lots of ports in it. So, the call at [2] is expected to take long time.
3) In the middle of 2), another VM, called VM 2 migrate from host A to host C.
4) Let's assume host C already has ports in the neutron network of VM2. So, the code will not hit [2], and just go to [3]. [3] is a lightweight fanout rpc request. ovs-agent at host B might get this request when still processing 2).
5) 4) finished, but 2) is still ongoing.
At this point, host B will have the new unicast information of VM2.
However, the information at 2) contains stale information, which still
thinks VM2 is at host A.
6) When 2) finished, the stale information of VM2 might cover the new
information of VM2, which lead to the reported issue.
[1] https://github.com/openstack/neutron/blob/fd401fe0a052a7103cb19d7385a1c702de05577f/neutron/plugins/ml2/drivers/l2pop/rpc_manager/l2population_rpc.py#L38
[2] https://github.com/openstack/neutron/blob/fd401fe0a052a7103cb19d7385a1c702de05577f/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L240
[3] https://github.com/openstack/neutron/blob/fd401fe0a052a7103cb19d7385a1c702de05577f/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L247
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1611308/+subscriptions
References