← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1992109] [NEW] Possible race condition when port unplugged from ovs

 

Public bug reported:

A possible race condition can occur when nova unplug a port on
integration bridge (when shelving an instance e.g.)

On such action, openvswitch is sending 2 events to neutron:

2022-10-07 01:00:31.790 214734 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["650d47f8-05c2-41ed-aaa2-701432203f49","old",null,47,null],["","new","tap216e7eb3-bc",-1,["map",[["attached-mac","fa:16:3e:54:bc:55"],["iface-id","216e7eb3-bc4e-44d4-a743-d628e9187924"],["iface-status","active"],["vm-id","e28807f6-826a-49b4-84ee-223be559885e"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /opt/openstack/neutron/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262
2022-10-07 01:00:32.179 214734 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["650d47f8-05c2-41ed-aaa2-701432203f49","delete","tap216e7eb3-bc",-1,["map",[["attached-mac","fa:16:3e:54:bc:55"],["iface-id","216e7eb3-bc4e-44d4-a743-d628e9187924"],["iface-status","active"],["vm-id","e28807f6-826a-49b4-84ee-223be559885e"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /opt/openstack/neutron/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262


Now imagine that the second event is delayed, so neutron iteration will consider the first event as a port update and will:
- check the ofport --> -1
- put the port in skipped_devices
- put the port DOWN in DB through RPC call
- remove the port from "current" ports
- do nothing else, so the port is still configured: openflow rules, etc. stays

The on next iteration, if the "delete" event is received, the agent will:
- try to figure out if this port is configured by looking in "current"
- it's not so it does nothing

As a result, the port stays configured on the compute. Some openflow
rules are left over.

Note that I am running neutron Stein with openvswitch 2.11.4.

I also check that the 2 events are received on an neutron victoria with
openvswitch 2.15.0.

Note also that the race condition is very rare and difficult to
reproduce, because the port needs to be removed from br-int, but still
in ovs db with of_port=-1.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1992109

Title:
  Possible race condition when port unplugged from ovs

Status in neutron:
  New

Bug description:
  A possible race condition can occur when nova unplug a port on
  integration bridge (when shelving an instance e.g.)

  On such action, openvswitch is sending 2 events to neutron:

  2022-10-07 01:00:31.790 214734 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["650d47f8-05c2-41ed-aaa2-701432203f49","old",null,47,null],["","new","tap216e7eb3-bc",-1,["map",[["attached-mac","fa:16:3e:54:bc:55"],["iface-id","216e7eb3-bc4e-44d4-a743-d628e9187924"],["iface-status","active"],["vm-id","e28807f6-826a-49b4-84ee-223be559885e"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /opt/openstack/neutron/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262
  2022-10-07 01:00:32.179 214734 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["650d47f8-05c2-41ed-aaa2-701432203f49","delete","tap216e7eb3-bc",-1,["map",[["attached-mac","fa:16:3e:54:bc:55"],["iface-id","216e7eb3-bc4e-44d4-a743-d628e9187924"],["iface-status","active"],["vm-id","e28807f6-826a-49b4-84ee-223be559885e"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /opt/openstack/neutron/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262

  
  Now imagine that the second event is delayed, so neutron iteration will consider the first event as a port update and will:
  - check the ofport --> -1
  - put the port in skipped_devices
  - put the port DOWN in DB through RPC call
  - remove the port from "current" ports
  - do nothing else, so the port is still configured: openflow rules, etc. stays

  The on next iteration, if the "delete" event is received, the agent will:
  - try to figure out if this port is configured by looking in "current"
  - it's not so it does nothing

  As a result, the port stays configured on the compute. Some openflow
  rules are left over.

  Note that I am running neutron Stein with openvswitch 2.11.4.

  I also check that the 2 events are received on an neutron victoria
  with openvswitch 2.15.0.

  Note also that the race condition is very rare and difficult to
  reproduce, because the port needs to be removed from br-int, but still
  in ovs db with of_port=-1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1992109/+subscriptions



Follow ups