yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #39125
[Bug 1499488] [NEW] Race condition puts ovs agent in resync
Public bug reported:
The following code is from
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated():
devices_details_list = (
self.plugin_rpc.get_devices_details_list_and_failed_devices(
self.context,
devices,
self.agent_id,
self.conf.host))
if devices_details_list.get('failed_devices'):
#TODO(rossella_s) handle better the resync in next patches,
# this is just to preserve the current behavior
raise DeviceListRetrievalError(devices=devices)
devices = devices_details_list.get('devices')
vif_by_id = self.int_br.get_vifs_by_ids(
[vif['device'] for vif in devices])
The race condition comes in between
get_devices_details_list_and_failed_devices() and get_vifs_by_ids(). If
a VM is deleted in that time, then the OVS port goes away and
get_vifs_by_ids() raises an exception, which bumps us out to the
exception handler in rpc_loop and puts us in resync, causing the next
rpc_loop to rescan ALL ports. On a highly scaled system, this resync
can take many minutes, in which time new plug requests all timeout.
get_vifs_by_ids() was added under this patch:
https://review.openstack.org/#/c/186734/
The reason the exception is raised due to the missing port is because
this new get_vifs_by_id method is not passing if_exists=True on the call
to get_ports_attributes(). A grep within that file shows every other
call to get_ports_attributes passing if_exists=True.
I believe the fix is to simply start passing if_exists=True in
get_vifs_by_ids.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1499488
Title:
Race condition puts ovs agent in resync
Status in neutron:
New
Bug description:
The following code is from
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated():
devices_details_list = (
self.plugin_rpc.get_devices_details_list_and_failed_devices(
self.context,
devices,
self.agent_id,
self.conf.host))
if devices_details_list.get('failed_devices'):
#TODO(rossella_s) handle better the resync in next patches,
# this is just to preserve the current behavior
raise DeviceListRetrievalError(devices=devices)
devices = devices_details_list.get('devices')
vif_by_id = self.int_br.get_vifs_by_ids(
[vif['device'] for vif in devices])
The race condition comes in between
get_devices_details_list_and_failed_devices() and get_vifs_by_ids().
If a VM is deleted in that time, then the OVS port goes away and
get_vifs_by_ids() raises an exception, which bumps us out to the
exception handler in rpc_loop and puts us in resync, causing the next
rpc_loop to rescan ALL ports. On a highly scaled system, this resync
can take many minutes, in which time new plug requests all timeout.
get_vifs_by_ids() was added under this patch:
https://review.openstack.org/#/c/186734/
The reason the exception is raised due to the missing port is because
this new get_vifs_by_id method is not passing if_exists=True on the
call to get_ports_attributes(). A grep within that file shows every
other call to get_ports_attributes passing if_exists=True.
I believe the fix is to simply start passing if_exists=True in
get_vifs_by_ids.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1499488/+subscriptions
Follow ups