← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1969592] Re: [OVN] Frequent DB leader changes causes 'VIF creation failed' on nova side

 

There were many fixes related to the reported issue in neutron and ovs since this bug report, some of these that i quickly catched are:-
- https://patchwork.ozlabs.org/project/openvswitch/patch/20220819230810.2626573-1-i.maximets@xxxxxxx/
- https://review.opendev.org/c/openstack/ovsdbapp/+/856200
- https://review.opendev.org/c/openstack/ovsdbapp/+/862524
- https://review.opendev.org/c/openstack/neutron/+/857775
- https://review.opendev.org/c/openstack/neutron/+/871825

One of the issue that i know still open but that happens very rarely
https://bugzilla.redhat.com/show_bug.cgi?id=2214289

Closing it based on above and Comment #2 and #3. If the issues are still
seen with python-ovs>=2.17 and above fixes included please feel free to
open the issue along with neutron and nova debug logs.


** Bug watch added: Red Hat Bugzilla #2214289
   https://bugzilla.redhat.com/show_bug.cgi?id=2214289

** Changed in: neutron
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1969592

Title:
  [OVN] Frequent DB leader changes causes 'VIF creation failed' on nova
  side

Status in neutron:
  Fix Released
Status in OpenStack Compute (nova):
  Incomplete

Bug description:
  Hello guys.

  As I found in 2021 there was a commit to OVS [since OVS v2.16 or 2.15 with that backport] that changed behavior during OVN DBs snapshoting. 
  Now before the leader creates a snapshot it will transfer leadership to another node. 

  We've run tests with rally and tempest and looks like there is a problem now when there is interaction between nova and neutron. 
  For example, simple rally test like 'create {network,router,subnet} -> add interface to router' looks okay even with 256 concurrent same tests/threads. But something like 'neutron.create_subnet -> nova.boot_server -> nova.attach_interface' will fail in time when transfer leadership happens. 
  Since it happens to often [ each 10m + rand(10) ] we will get a lot of errors. 

  This problem can be observed on all versions where OVS 2.16 [or
  backport] or higher invited :)

  
  Some tracing from logs [neutron, nova, ovn-sb-db]:

  CONTROL-NODES:

  
  ctl01-ovn-sb-db.log:2022-04-19T12:30:03.089Z|01002|raft|INFO|Transferring leadership to write a snapshot.
  ctl01-ovn-sb-db.log:2022-04-19T12:30:03.099Z|01003|raft|INFO|server 1c5f is leader for term 42

  ctl03-ovn-sb-db.log:2022-04-19T12:30:03.090Z|00938|raft|INFO|received leadership transfer from 1f46 in term 41
  ctl03-ovn-sb-db.log:2022-04-19T12:30:03.092Z|00940|raft|INFO|term 42: elected leader by 2+ of 3 servers
  ctl03-ovn-sb-db.log:2022-04-19T12:30:10.941Z|00941|jsonrpc|WARN|tcp:xx.yy.zz.26:41882: send error: Connection reset by peer
  ctl03-ovn-sb-db.log:2022-04-19T12:30:27.324Z|00943|jsonrpc|WARN|tcp:xx.yy.zz.26:41896: send error: Connection reset by peer
  ctl03-ovn-sb-db.log:2022-04-19T12:30:27.325Z|00945|jsonrpc|WARN|tcp:xx.yy.zz.26:41880: send error: Connection reset by peer
  ctl03-ovn-sb-db.log:2022-04-19T12:30:27.325Z|00947|jsonrpc|WARN|tcp:xx.yy.zz.26:41892: send error: Connection reset by peer
  ctl03-ovn-sb-db.log:2022-04-19T12:30:27.327Z|00949|jsonrpc|WARN|tcp:xx.yy.zz.26:41884: send error: Connection reset by peer
  ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00951|jsonrpc|WARN|tcp:xx.yy.zz.25:40260: send error: Connection timed out
  ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00953|jsonrpc|WARN|tcp:xx.yy.zz.25:40264: send error: Connection timed out
  ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00955|jsonrpc|WARN|tcp:xx.yy.zz.24:37440: send error: Connection timed out
  ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00957|jsonrpc|WARN|tcp:xx.yy.zz.24:37442: send error: Connection timed out
  ctl03-ovn-sb-db.log:2022-04-19T12:31:49.245Z|00959|jsonrpc|WARN|tcp:xx.yy.zz.24:37446: send error: Connection timed out
  ctl03-ovn-sb-db.log:2022-04-19T12:32:01.533Z|01001|jsonrpc|WARN|tcp:xx.yy.zz.67:57586: send error: Connection timed out

  
  2022-04-19 12:30:08.898 27 INFO neutron.db.ovn_revision_numbers_db [req-7fcfdd74-482d-46b2-9f76-07190669d76d ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 - default default] Successfully bumped revision number for resource be178a9a-26d7-4bf0-a4e8-d206a6965205 (type: ports) to 1

  2022-04-19 12:30:09.644 27 INFO neutron.db.ovn_revision_numbers_db
  [req-a8278418-3ad9-450c-89bb-e7a5c1c0a06d
  a9864cd890224c079051b3f56021be64 72db34087b9b401d842b66643b647e16 -
  default default] Successfully bumped revision number for resource
  be178a9a-26d7-4bf0-a4e8-d206a6965205 (type: ports) to 2

  
  2022-04-19 12:30:10.235 27 INFO neutron.wsgi [req-571b53cc-ca04-46f7-89f9-fdf8e5931f4c a9864cd890224c079051b3f56021be64 72db34087b9b401d842b66643b647e16 - default default] xx.yy.zz.68,xx.yy.zz.26 "GET /v2.0/ports?tenant_id=9d3ae9a7b121488285203b0fdeabc3a3&device_id=7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 HTTP/1.1" status: 200  len: 1081 time: 0.0363808

  2022-04-19 12:30:10.395 25 INFO
  nova.api.openstack.compute.server_external_events
  [req-d144458a-ca21-4e92-b5be-4f589e118d8c
  4688fc44551b45c6a761c9c0351e76a8 72db34087b9b401d842b66643b647e16 -
  default default] Creating event network-
  changed:be178a9a-26d7-4bf0-a4e8-d206a6965205 for instance
  7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 on cmp-ucs-01-05.poc.example.com

  
  2022-04-19 12:30:10.402 27 INFO neutron.notifiers.nova [-] Nova event response: {'name': 'network-changed', 'server_uuid': '7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34', 'tag': 'be178a9a-26d7-4bf0-a4e8-d206a6965205', 'status': 'completed', 'code': 200}

  2022-04-19 12:33:01.497 34 INFO neutron.db.ovn_revision_numbers_db
  [req-82f3d4fe-1276-4918-86d3-55cc42ca2816 - - - - -] Successfully
  bumped revision number for resource
  be178a9a-26d7-4bf0-a4e8-d206a6965205 (type: ports) to 3

  COMPUTE-NODES:

  ovs-vswitchd.log:2022-04-19T12:30:10.743Z|08415|bridge|INFO|bridge br-int: added interface tapbe178a9a-26 on port 874
  ovn-controller.log:2022-04-19T12:30:11.251Z|02356|binding|INFO|Claiming lport be178a9a-26d7-4bf0-a4e8-d206a6965205 for this chassis.
  ovn-controller.log:2022-04-19T12:30:11.251Z|02357|binding|INFO|be178a9a-26d7-4bf0-a4e8-d206a6965205: Claiming fa:16:3e:8f:3b:47 1.80.1.254

  nova-compute.log:2022-04-19 12:30:10.672 8 INFO os_vif
  [req-2c8b5f82-ce4a-4aa7-9f0a-d346ee210ffb
  ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 -
  default default] Successfully plugged vif
  VIFOpenVSwitch(active=False,address=fa:16:3e:8f:3b:47,bridge_name='br-
  int',has_traffic_filtering=True,id=be178a9a-26d7-4bf0-a4e8-d206a6965205,network=Network(5dd2512b-56f8-4e04-8623-971a633a76aa),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapbe178a9a-26')

  ovs-vswitchd.log:2022-04-19T12:35:11.302Z|08448|bridge|INFO|bridge br-int: deleted interface tapbe178a9a-26 on port 874
  ovn-controller.log:2022-04-19T12:35:11.307Z|02380|binding|INFO|Releasing lport be178a9a-26d7-4bf0-a4e8-d206a6965205 from this chassis.

  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [req-2c8b5f82-ce4a-4aa7-9f0a-d346ee210ffb ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 - default default] [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Failed to
   allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Traceback (most recent call last):
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7260, in _create_guest_with_network
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     post_xml_callback=post_xml_callback)
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     next(self.gen)
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     actual_event = event.wait()
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     result = hub.switch()
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     return self.greenlet.switch()
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] eventlet.timeout.Timeout: 300 seconds
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] 
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] During handling of the above exception, another exception occurred:
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] 
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Traceback (most recent call last):
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 2402, in _build_and_run_instance
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     accel_info=accel_info)
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4225, in spawn
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     cleanup_instance_disks=created_disks)
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]   File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7283, in _create_guest_with_network
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]     raise exception.VirtualInterfaceCreateException()
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
  2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] 
  2022-04-19 12:35:11.844 8 ERROR nova.compute.manager [req-2c8b5f82-ce4a-4aa7-9f0a-d346ee210ffb ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 - default default] [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Build of 
  instance 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 aborted: Failed to allocate the network(s), not rescheduling.: nova.exception.BuildAbortException: Build of instance 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 aborted: Failed to allocate the networ
  k(s), not rescheduling.

  
  CONTROL:

  2022-04-19 12:35:11.330 25 INFO
  neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver
  [req-2b4d10b1-3a7a-4683-97df-990259705b46 - - - - -] OVN reports
  status down for port: be178a9a-26d7-4bf0-a4e8-d206a6965205

  2022-04-19 12:35:12.524 23 WARNING neutron.notifiers.nova [-] Nova
  event: {'server_uuid': '7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34', 'name':
  'network-vif-deleted', 'tag': 'be178a9a-26d7-4bf0-a4e8-d206a6965205',
  'status': 'failed', 'code': 422} returned with failed status

  /var/log/kolla/neutron/neutron-server.log:2022-04-19 12:35:13.016 25
  WARNING neutron.notifiers.nova [-] Nova event: {'server_uuid':
  '7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34', 'name': 'network-vif-plugged',
  'status': 'failed', 'tag': 'be178a9a-26d7-4bf0-a4e8-d206a6965205',
  'code': 422} returned with failed status

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1969592/+subscriptions



References