yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92650
[Bug 1969592] Re: [OVN] Frequent DB leader changes causes 'VIF creation failed' on nova side
There were many fixes related to the reported issue in neutron and ovs since this bug report, some of these that i quickly catched are:-
- https://patchwork.ozlabs.org/project/openvswitch/patch/20220819230810.2626573-1-i.maximets@xxxxxxx/
- https://review.opendev.org/c/openstack/ovsdbapp/+/856200
- https://review.opendev.org/c/openstack/ovsdbapp/+/862524
- https://review.opendev.org/c/openstack/neutron/+/857775
- https://review.opendev.org/c/openstack/neutron/+/871825
One of the issue that i know still open but that happens very rarely
https://bugzilla.redhat.com/show_bug.cgi?id=2214289
Closing it based on above and Comment #2 and #3. If the issues are still
seen with python-ovs>=2.17 and above fixes included please feel free to
open the issue along with neutron and nova debug logs.
** Bug watch added: Red Hat Bugzilla #2214289
https://bugzilla.redhat.com/show_bug.cgi?id=2214289
** Changed in: neutron
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1969592
Title:
[OVN] Frequent DB leader changes causes 'VIF creation failed' on nova
side
Status in neutron:
Fix Released
Status in OpenStack Compute (nova):
Incomplete
Bug description:
Hello guys.
As I found in 2021 there was a commit to OVS [since OVS v2.16 or 2.15 with that backport] that changed behavior during OVN DBs snapshoting.
Now before the leader creates a snapshot it will transfer leadership to another node.
We've run tests with rally and tempest and looks like there is a problem now when there is interaction between nova and neutron.
For example, simple rally test like 'create {network,router,subnet} -> add interface to router' looks okay even with 256 concurrent same tests/threads. But something like 'neutron.create_subnet -> nova.boot_server -> nova.attach_interface' will fail in time when transfer leadership happens.
Since it happens to often [ each 10m + rand(10) ] we will get a lot of errors.
This problem can be observed on all versions where OVS 2.16 [or
backport] or higher invited :)
Some tracing from logs [neutron, nova, ovn-sb-db]:
CONTROL-NODES:
ctl01-ovn-sb-db.log:2022-04-19T12:30:03.089Z|01002|raft|INFO|Transferring leadership to write a snapshot.
ctl01-ovn-sb-db.log:2022-04-19T12:30:03.099Z|01003|raft|INFO|server 1c5f is leader for term 42
ctl03-ovn-sb-db.log:2022-04-19T12:30:03.090Z|00938|raft|INFO|received leadership transfer from 1f46 in term 41
ctl03-ovn-sb-db.log:2022-04-19T12:30:03.092Z|00940|raft|INFO|term 42: elected leader by 2+ of 3 servers
ctl03-ovn-sb-db.log:2022-04-19T12:30:10.941Z|00941|jsonrpc|WARN|tcp:xx.yy.zz.26:41882: send error: Connection reset by peer
ctl03-ovn-sb-db.log:2022-04-19T12:30:27.324Z|00943|jsonrpc|WARN|tcp:xx.yy.zz.26:41896: send error: Connection reset by peer
ctl03-ovn-sb-db.log:2022-04-19T12:30:27.325Z|00945|jsonrpc|WARN|tcp:xx.yy.zz.26:41880: send error: Connection reset by peer
ctl03-ovn-sb-db.log:2022-04-19T12:30:27.325Z|00947|jsonrpc|WARN|tcp:xx.yy.zz.26:41892: send error: Connection reset by peer
ctl03-ovn-sb-db.log:2022-04-19T12:30:27.327Z|00949|jsonrpc|WARN|tcp:xx.yy.zz.26:41884: send error: Connection reset by peer
ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00951|jsonrpc|WARN|tcp:xx.yy.zz.25:40260: send error: Connection timed out
ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00953|jsonrpc|WARN|tcp:xx.yy.zz.25:40264: send error: Connection timed out
ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00955|jsonrpc|WARN|tcp:xx.yy.zz.24:37440: send error: Connection timed out
ctl03-ovn-sb-db.log:2022-04-19T12:31:49.244Z|00957|jsonrpc|WARN|tcp:xx.yy.zz.24:37442: send error: Connection timed out
ctl03-ovn-sb-db.log:2022-04-19T12:31:49.245Z|00959|jsonrpc|WARN|tcp:xx.yy.zz.24:37446: send error: Connection timed out
ctl03-ovn-sb-db.log:2022-04-19T12:32:01.533Z|01001|jsonrpc|WARN|tcp:xx.yy.zz.67:57586: send error: Connection timed out
2022-04-19 12:30:08.898 27 INFO neutron.db.ovn_revision_numbers_db [req-7fcfdd74-482d-46b2-9f76-07190669d76d ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 - default default] Successfully bumped revision number for resource be178a9a-26d7-4bf0-a4e8-d206a6965205 (type: ports) to 1
2022-04-19 12:30:09.644 27 INFO neutron.db.ovn_revision_numbers_db
[req-a8278418-3ad9-450c-89bb-e7a5c1c0a06d
a9864cd890224c079051b3f56021be64 72db34087b9b401d842b66643b647e16 -
default default] Successfully bumped revision number for resource
be178a9a-26d7-4bf0-a4e8-d206a6965205 (type: ports) to 2
2022-04-19 12:30:10.235 27 INFO neutron.wsgi [req-571b53cc-ca04-46f7-89f9-fdf8e5931f4c a9864cd890224c079051b3f56021be64 72db34087b9b401d842b66643b647e16 - default default] xx.yy.zz.68,xx.yy.zz.26 "GET /v2.0/ports?tenant_id=9d3ae9a7b121488285203b0fdeabc3a3&device_id=7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 HTTP/1.1" status: 200 len: 1081 time: 0.0363808
2022-04-19 12:30:10.395 25 INFO
nova.api.openstack.compute.server_external_events
[req-d144458a-ca21-4e92-b5be-4f589e118d8c
4688fc44551b45c6a761c9c0351e76a8 72db34087b9b401d842b66643b647e16 -
default default] Creating event network-
changed:be178a9a-26d7-4bf0-a4e8-d206a6965205 for instance
7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 on cmp-ucs-01-05.poc.example.com
2022-04-19 12:30:10.402 27 INFO neutron.notifiers.nova [-] Nova event response: {'name': 'network-changed', 'server_uuid': '7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34', 'tag': 'be178a9a-26d7-4bf0-a4e8-d206a6965205', 'status': 'completed', 'code': 200}
2022-04-19 12:33:01.497 34 INFO neutron.db.ovn_revision_numbers_db
[req-82f3d4fe-1276-4918-86d3-55cc42ca2816 - - - - -] Successfully
bumped revision number for resource
be178a9a-26d7-4bf0-a4e8-d206a6965205 (type: ports) to 3
COMPUTE-NODES:
ovs-vswitchd.log:2022-04-19T12:30:10.743Z|08415|bridge|INFO|bridge br-int: added interface tapbe178a9a-26 on port 874
ovn-controller.log:2022-04-19T12:30:11.251Z|02356|binding|INFO|Claiming lport be178a9a-26d7-4bf0-a4e8-d206a6965205 for this chassis.
ovn-controller.log:2022-04-19T12:30:11.251Z|02357|binding|INFO|be178a9a-26d7-4bf0-a4e8-d206a6965205: Claiming fa:16:3e:8f:3b:47 1.80.1.254
nova-compute.log:2022-04-19 12:30:10.672 8 INFO os_vif
[req-2c8b5f82-ce4a-4aa7-9f0a-d346ee210ffb
ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 -
default default] Successfully plugged vif
VIFOpenVSwitch(active=False,address=fa:16:3e:8f:3b:47,bridge_name='br-
int',has_traffic_filtering=True,id=be178a9a-26d7-4bf0-a4e8-d206a6965205,network=Network(5dd2512b-56f8-4e04-8623-971a633a76aa),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapbe178a9a-26')
ovs-vswitchd.log:2022-04-19T12:35:11.302Z|08448|bridge|INFO|bridge br-int: deleted interface tapbe178a9a-26 on port 874
ovn-controller.log:2022-04-19T12:35:11.307Z|02380|binding|INFO|Releasing lport be178a9a-26d7-4bf0-a4e8-d206a6965205 from this chassis.
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [req-2c8b5f82-ce4a-4aa7-9f0a-d346ee210ffb ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 - default default] [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Failed to
allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Traceback (most recent call last):
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7260, in _create_guest_with_network
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] post_xml_callback=post_xml_callback)
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] next(self.gen)
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] actual_event = event.wait()
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/event.py", line 125, in wait
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] result = hub.switch()
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] return self.greenlet.switch()
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] eventlet.timeout.Timeout: 300 seconds
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] During handling of the above exception, another exception occurred:
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Traceback (most recent call last):
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 2402, in _build_and_run_instance
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] accel_info=accel_info)
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4225, in spawn
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] cleanup_instance_disks=created_disks)
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7283, in _create_guest_with_network
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] raise exception.VirtualInterfaceCreateException()
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2022-04-19 12:35:11.842 8 ERROR nova.compute.manager [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34]
2022-04-19 12:35:11.844 8 ERROR nova.compute.manager [req-2c8b5f82-ce4a-4aa7-9f0a-d346ee210ffb ff1516be452b4b939314bf3864a63f35 9d3ae9a7b121488285203b0fdeabc3a3 - default default] [instance: 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34] Build of
instance 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 aborted: Failed to allocate the network(s), not rescheduling.: nova.exception.BuildAbortException: Build of instance 7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34 aborted: Failed to allocate the networ
k(s), not rescheduling.
CONTROL:
2022-04-19 12:35:11.330 25 INFO
neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver
[req-2b4d10b1-3a7a-4683-97df-990259705b46 - - - - -] OVN reports
status down for port: be178a9a-26d7-4bf0-a4e8-d206a6965205
2022-04-19 12:35:12.524 23 WARNING neutron.notifiers.nova [-] Nova
event: {'server_uuid': '7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34', 'name':
'network-vif-deleted', 'tag': 'be178a9a-26d7-4bf0-a4e8-d206a6965205',
'status': 'failed', 'code': 422} returned with failed status
/var/log/kolla/neutron/neutron-server.log:2022-04-19 12:35:13.016 25
WARNING neutron.notifiers.nova [-] Nova event: {'server_uuid':
'7560fbb7-3ec7-41ef-b7a5-5e955ca4ff34', 'name': 'network-vif-plugged',
'status': 'failed', 'tag': 'be178a9a-26d7-4bf0-a4e8-d206a6965205',
'code': 422} returned with failed status
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1969592/+subscriptions
References