← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1848311] [NEW] trunk + subports not working

 

Public bug reported:

Since upgrading from Rocky to Stein we are experiencing problems live
migrating vm's with trunk ports and creating new trunk ports. The live
migrations of the vm itself eventually completes but the trunk ports
remain in the status "BUILD" or "DOWN". The corresponding subports
and/or the parent port are mostly in status "DOWN" too. It looks like
not all of the corresponding needed ports get moved from hypervisor host
a to host b. Given theses status from the ports it is obvious that the
VM is not accessible from the network at all.

Most of the time when the migration is about to finish we see such kind
of time out messages in the neutron-openvswitch-agent log:

2019-10-14 12:28:56.559 20071 ERROR neutron_lib.rpc [-] Timeout in RPC method trunk.update_subport_bindings. Waiting for 114 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871
2019-10-14 12:28:56.560 20071 WARNING neutron_lib.rpc [-] Increasing timeout for trunk.update_subport_bindings calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871
2019-10-14 12:28:56.562 20071 ERROR neutron_lib.rpc [-] Timeout in RPC method trunk.update_subport_bindings. Waiting for 56 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c1e5f0b50f044c8ea1f40f3e2e959fc0
2019-10-14 12:29:53.021 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Got messaging error while processing trunk bridge tbr-e4685a7d-2: Timed out waiting for a reply to message ID c1e5f0b50f044c8ea1f40f3e2e959fc0: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c1e5f0b50f044c8ea1f40f3e2e959fc0
2019-10-14 12:30:24.896 20071 ERROR neutron_lib.rpc [req-85c86e08-52a3-4199-a1af-915f4847e9fc cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Timeout in RPC method trunk.update_trunk_status. Waiting for 75 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5
2019-10-14 12:30:24.896 20071 WARNING neutron_lib.rpc [req-85c86e08-52a3-4199-a1af-915f4847e9fc cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Increasing timeout for trunk.update_trunk_status calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5
2019-10-14 12:30:50.133 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Got messaging error while processing trunk bridge tbr-b56178af-8: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871
2019-10-14 12:31:39.851 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.driver [req-85c86e08-52a3-4199-a1af-915f4847e9fc cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Error on event deleted for subports [SubPort(port_id=c048169f-a005-44a3-88e3-03a34d778bb5,segmentation_id=843,segmentation_type='vlan',trunk_id=b56178af-8d6f-4660-ac3b-cc469c3de4ce)]: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5
2019-10-14 12:35:26.906 20071 ERROR neutron_lib.rpc [req-e7ac3037-3598-4003-90db-d59985cf5326 cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Timeout in RPC method trunk.update_subport_bindings. Waiting for 53 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a6d04566
2019-10-14 12:35:26.907 20071 WARNING neutron_lib.rpc [req-e7ac3037-3598-4003-90db-d59985cf5326 cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Increasing timeout for trunk.update_subport_bindings calls to 480 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a6d04566
2019-10-14 12:36:20.366 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.driver [req-e7ac3037-3598-4003-90db-d59985cf5326 cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Error on event created for subports [SubPort(port_id=c048169f-a005-44a3-88e3-03a34d778bb5,segmentation_id=843,segmentation_type='vlan',trunk_id=b56178af-8d6f-4660-ac3b-cc469c3de4ce)]: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a6d04566: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a


the os/neutron setup we have here is the following:

- 3 Controller Nodes behind HaProxy
- Ubuntu 18.04 Installation with Ubuntu Cloud Archive Repositories (Stein) (Python 3)
- Neutron ML2 Plugin with OVS Setup
- Provider Networks
- Package Version neutron-common: 2:14.0.2-0ubuntu1~cloud0 
- Package Version neutron-plugin-ml2: 2:14.0.2-0ubuntu1~cloud0
- Package Version neutron-server: 2:14.0.2-0ubuntu1~cloud0
- Package Version neutron-openvswitch-agent: 2:14.0.2-0ubuntu1~cloud0
- Package Version neutron-dhcp-agent: 2:14.0.2-0ubuntu1~cloud0
- Package Version openvswitch-common: 2.11.0-0ubuntu2~cloud0
- Package Version openvswitch-switch: 2.11.0-0ubuntu2~cloud0

the port/trunk setup is as followed:

- trunk port belonging to project p1
- parent port belonging to project p1, subnet s1
- subnet s1 belongs to project p1, network n1 
- network n1 belongs to project admin and has provider:segmentation_id = 700
- subport belonging to project p1, subnet s2
- subnet s2 belongs to project p1, network n2
- network n2 belongs to project p1, and has provider:segmentation_id = 843

** Affects: neutron
     Importance: Undecided
         Status: New

** Summary changed:

- trunk + subports not working after live migration
+ trunk + subports not working

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1848311

Title:
  trunk + subports not working

Status in neutron:
  New

Bug description:
  Since upgrading from Rocky to Stein we are experiencing problems live
  migrating vm's with trunk ports and creating new trunk ports. The live
  migrations of the vm itself eventually completes but the trunk ports
  remain in the status "BUILD" or "DOWN". The corresponding subports
  and/or the parent port are mostly in status "DOWN" too. It looks like
  not all of the corresponding needed ports get moved from hypervisor
  host a to host b. Given theses status from the ports it is obvious
  that the VM is not accessible from the network at all.

  Most of the time when the migration is about to finish we see such
  kind of time out messages in the neutron-openvswitch-agent log:

  2019-10-14 12:28:56.559 20071 ERROR neutron_lib.rpc [-] Timeout in RPC method trunk.update_subport_bindings. Waiting for 114 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871
  2019-10-14 12:28:56.560 20071 WARNING neutron_lib.rpc [-] Increasing timeout for trunk.update_subport_bindings calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871
  2019-10-14 12:28:56.562 20071 ERROR neutron_lib.rpc [-] Timeout in RPC method trunk.update_subport_bindings. Waiting for 56 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c1e5f0b50f044c8ea1f40f3e2e959fc0
  2019-10-14 12:29:53.021 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Got messaging error while processing trunk bridge tbr-e4685a7d-2: Timed out waiting for a reply to message ID c1e5f0b50f044c8ea1f40f3e2e959fc0: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c1e5f0b50f044c8ea1f40f3e2e959fc0
  2019-10-14 12:30:24.896 20071 ERROR neutron_lib.rpc [req-85c86e08-52a3-4199-a1af-915f4847e9fc cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Timeout in RPC method trunk.update_trunk_status. Waiting for 75 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5
  2019-10-14 12:30:24.896 20071 WARNING neutron_lib.rpc [req-85c86e08-52a3-4199-a1af-915f4847e9fc cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Increasing timeout for trunk.update_trunk_status calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5
  2019-10-14 12:30:50.133 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Got messaging error while processing trunk bridge tbr-b56178af-8: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 58a64b2c975143a4bbfd07ab3b10e871
  2019-10-14 12:31:39.851 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.driver [req-85c86e08-52a3-4199-a1af-915f4847e9fc cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Error on event deleted for subports [SubPort(port_id=c048169f-a005-44a3-88e3-03a34d778bb5,segmentation_id=843,segmentation_type='vlan',trunk_id=b56178af-8d6f-4660-ac3b-cc469c3de4ce)]: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8093a1090d47426380434e65559875e5
  2019-10-14 12:35:26.906 20071 ERROR neutron_lib.rpc [req-e7ac3037-3598-4003-90db-d59985cf5326 cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Timeout in RPC method trunk.update_subport_bindings. Waiting for 53 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a6d04566
  2019-10-14 12:35:26.907 20071 WARNING neutron_lib.rpc [req-e7ac3037-3598-4003-90db-d59985cf5326 cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Increasing timeout for trunk.update_subport_bindings calls to 480 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a6d04566
  2019-10-14 12:36:20.366 20071 ERROR neutron.services.trunk.drivers.openvswitch.agent.driver [req-e7ac3037-3598-4003-90db-d59985cf5326 cd9715e9b4714bc6b4d77f15f12ba5a9 1e205eb2989a4beb9ef5947abff00b35 - - -] Error on event created for subports [SubPort(port_id=c048169f-a005-44a3-88e3-03a34d778bb5,segmentation_id=843,segmentation_type='vlan',trunk_id=b56178af-8d6f-4660-ac3b-cc469c3de4ce)]: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a6d04566: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID b16fcf72d3284439a06f8383a

  
  the os/neutron setup we have here is the following:

  - 3 Controller Nodes behind HaProxy
  - Ubuntu 18.04 Installation with Ubuntu Cloud Archive Repositories (Stein) (Python 3)
  - Neutron ML2 Plugin with OVS Setup
  - Provider Networks
  - Package Version neutron-common: 2:14.0.2-0ubuntu1~cloud0 
  - Package Version neutron-plugin-ml2: 2:14.0.2-0ubuntu1~cloud0
  - Package Version neutron-server: 2:14.0.2-0ubuntu1~cloud0
  - Package Version neutron-openvswitch-agent: 2:14.0.2-0ubuntu1~cloud0
  - Package Version neutron-dhcp-agent: 2:14.0.2-0ubuntu1~cloud0
  - Package Version openvswitch-common: 2.11.0-0ubuntu2~cloud0
  - Package Version openvswitch-switch: 2.11.0-0ubuntu2~cloud0

  the port/trunk setup is as followed:

  - trunk port belonging to project p1
  - parent port belonging to project p1, subnet s1
  - subnet s1 belongs to project p1, network n1 
  - network n1 belongs to project admin and has provider:segmentation_id = 700
  - subport belonging to project p1, subnet s2
  - subnet s2 belongs to project p1, network n2
  - network n2 belongs to project p1, and has provider:segmentation_id = 843

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1848311/+subscriptions