yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1426427] Re: Improve tunnel_sync server side rpc to handle race conditions

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Romil Gupta <romilg@xxxxxx>
Date: Sun, 01 Mar 2015 19:13:30 -0000
Reply-to: Bug 1426427 <1426427@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Description changed:

- We  have a concern that we may end up with db conflict errors due to
- multiple parallel requests incoming.
+ We  have a concern that we may have race conditions with the following
+ code snippet:
  
- Consider two threads (A and B), each receiving tunnel_sync with host set to HOST1 and HOST2. The race scenario is:
- A checks whether tunnel exists and receives nothing.
- B checks whether tunnel exists and receives nothing.
- A adds endpoint with HOST1.
- B adds endpoint with HOST2.
+             if host:
+                 host_endpoint = driver.obj.get_endpoint_by_host(host)
+                 ip_endpoint = driver.obj.get_endpoint_by_ip(tunnel_ip)
  
- Now we have two endpoints for the same IP address with different hosts (I guess that is not what we would expect).
- I think the only way to avoid it is check for tunnel existence under the same transaction that will update it, if present. Probably meaning, making add_endpoint aware of potential tunnel existence.
+                 if (ip_endpoint and ip_endpoint.host is None
+                     and host_endpoint is None):
+                     driver.obj.delete_endpoint(ip_endpoint.ip_address)
+                 elif (ip_endpoint and ip_endpoint.host != host):
+                     msg = (_("Tunnel IP %(ip)s in use with host %(host)s"),
+                            {'ip': ip_endpoint.ip_address,
+                             'host': ip_endpoint.host})
+                     raise exc.InvalidInput(error_message=msg)
+                 elif (host_endpoint and host_endpoint.ip_address != tunnel_ip):
+                     # Notify all other listening agents to delete stale tunnels
+                     self._notifier.tunnel_delete(rpc_context,
+                         host_endpoint.ip_address, tunnel_type)
+                     driver.obj.delete_endpoint(host_endpoint.ip_address)
+ 
+ Consider two threads (A and B), where for
+ 
+ Thread A we have following use case:
+ if Host is passed from an agent and it is not found in DB but the passed tunnel_ip is found, delete the endpoint from DB and add the endpoint with 
+ (tunnel_ip, host), it's an upgrade case.
+ 
+ whereas for Thread B we have following use case:
+ if passed host and tunnel_ip are not found in the DB, it is a new endpoint.
+ 
+ Both threads will do the following in the end:
+ 
+             tunnel = driver.obj.add_endpoint(tunnel_ip, host)
+             tunnels = driver.obj.get_endpoints()
+             entry = {'tunnels': tunnels}
+             # Notify all other listening agents
+             self._notifier.tunnel_update(rpc_context, tunnel.ip_address,
+                                          tunnel_type)
+             # Return the list of tunnels IP's to the agent
+             return entry
+ 
+ 
+ Since, Thread A first deletes the endpoints and adds it, we may have chances where Thread B doesn't get that endpoint in get_endpoints call during race condition.
+ 
+ One way to overcome this problem would be instead of doing
+ delete_endpoint we could introduce update_endpoint method in
+ type_drivers.

** Changed in: neutron
     Assignee: (unassigned) => Romil Gupta (romilg)

** Changed in: neutron
       Status: Invalid => New

** Tags added: ml2

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1426427

Title:
  Improve tunnel_sync server side rpc to handle race conditions

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  We  have a concern that we may have race conditions with the following
  code snippet:

              if host:
                  host_endpoint = driver.obj.get_endpoint_by_host(host)
                  ip_endpoint = driver.obj.get_endpoint_by_ip(tunnel_ip)

                  if (ip_endpoint and ip_endpoint.host is None
                      and host_endpoint is None):
                      driver.obj.delete_endpoint(ip_endpoint.ip_address)
                  elif (ip_endpoint and ip_endpoint.host != host):
                      msg = (_("Tunnel IP %(ip)s in use with host %(host)s"),
                             {'ip': ip_endpoint.ip_address,
                              'host': ip_endpoint.host})
                      raise exc.InvalidInput(error_message=msg)
                  elif (host_endpoint and host_endpoint.ip_address != tunnel_ip):
                      # Notify all other listening agents to delete stale tunnels
                      self._notifier.tunnel_delete(rpc_context,
                          host_endpoint.ip_address, tunnel_type)
                      driver.obj.delete_endpoint(host_endpoint.ip_address)

  Consider two threads (A and B), where for

  Thread A we have following use case:
  if Host is passed from an agent and it is not found in DB but the passed tunnel_ip is found, delete the endpoint from DB and add the endpoint with 
  (tunnel_ip, host), it's an upgrade case.

  whereas for Thread B we have following use case:
  if passed host and tunnel_ip are not found in the DB, it is a new endpoint.

  Both threads will do the following in the end:

              tunnel = driver.obj.add_endpoint(tunnel_ip, host)
              tunnels = driver.obj.get_endpoints()
              entry = {'tunnels': tunnels}
              # Notify all other listening agents
              self._notifier.tunnel_update(rpc_context, tunnel.ip_address,
                                           tunnel_type)
              # Return the list of tunnels IP's to the agent
              return entry

  
  Since, Thread A first deletes the endpoints and adds it, we may have chances where Thread B doesn't get that endpoint in get_endpoints call during race condition.

  One way to overcome this problem would be instead of doing
  delete_endpoint we could introduce update_endpoint method in
  type_drivers.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1426427/+subscriptions
References

[Bug 1426427] [NEW] Improve tunnel_sync server side rpc
From: Romil Gupta, 2015-02-27