yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87703
[Bug 1951720] [NEW] Virtual interface creation failed
Public bug reported:
Hi,
I have testig stack of openstack wallaby (deployed with kolla-ansible with kolla source images)
and found probably weird bug in nova/neutron.
I have testing heat template which was starting about 6 instances with
bunch of network interfaces with security group - nothing special.
Testing openstack ENV is clean, fresh installed, tempest passing. So
what is going on ?
1. Sometimes heat stack is created successfully without error
2. Sometimes heat stack is created sucessfully - BUT with errors in nova-compute - so retry
mechanism works
Errors in nova-compute :
7d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Took 0.10 seconds to destroy the instance on the hypervisor.
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [req-5b21ebad-ffb7-46b0-8c37-fd665d01013e 64fe2842ff8c6302c0450bee25600a10e54f2b9793e9c8776f956c993a7a7ee8 0960461696f64f82ba108f8397bf508c - e01e19b257d241eaae5bb137aedc6fcc e01e19b2
57d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in _create_guest_with_network
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] guest = self._create_guest(
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] next(self.gen)
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] actual_event = event.wait()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] result = hub.switch()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] return self.greenlet.switch()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] eventlet.timeout.Timeout: 300 seconds
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] During handling of the above exception, another exception occurred:
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in _build_and_run_instance
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] self.driver.spawn(context, instance, image_meta,
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in spawn
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] self._create_guest_with_network(
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6956, in _create_guest_with_network
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] raise exception.VirtualInterfaceCreateException()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]
So, it looks like nova didn't received message from neutron that vif is
plugged in in 300 seconds.
Then I patched nova code with this review, so I can see times of this
event.
https://review.opendev.org/c/openstack/nova/+/817030
Logs here :
2021-11-09 14:07:07.251 9 DEBUG nova.compute.manager [req-ff47242a-ec1e-4f3e-8989-dc17b9b66702 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 280 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:08:49.726 9 DEBUG nova.compute.manager [req-6f3db32a-0c79-4d70-b61f-018e5bb6047e 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 8 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:08:53.898 9 DEBUG nova.compute.manager [req-10eb5d9e-e534-4f41-b7b8-019455cd7df2 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 10 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:29.747 9 DEBUG nova.compute.manager [req-6f6b3769-46a9-4188-8db3-fdc7a1f33c5d 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 46 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:31.846 9 DEBUG nova.compute.manager [req-dd2d6dd5-5197-4c21-8f7f-48c58b2aa397 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 43 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:36.003 9 DEBUG nova.compute.manager [req-2e836d62-d368-45a9-bcbd-6162b126d556 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 48 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
Once again, that openstack is clean, fresh installed , no instance running, so why there is VIF plugging completed from 8 sec to 280 seconds (and potentionally more) ?
It looks like there is some race condition , but I don't know where to
find it.
When I bump timeout for vif plugging in nova compute below , everything
is working well, but it's only workaround.
vif_plugging_timeout = 3600
Last thing what I see is that it is something with security group, if
noop driver is used for security group, I can't reproduce issue, so it's
"fixed".
Testing environment using OVS (not OVN) , and firewall driver openvswitch.
Thank you very much,
Michal Arbet (kekvo)
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1951720
Title:
Virtual interface creation failed
Status in OpenStack Compute (nova):
New
Bug description:
Hi,
I have testig stack of openstack wallaby (deployed with kolla-ansible with kolla source images)
and found probably weird bug in nova/neutron.
I have testing heat template which was starting about 6 instances with
bunch of network interfaces with security group - nothing special.
Testing openstack ENV is clean, fresh installed, tempest passing. So
what is going on ?
1. Sometimes heat stack is created successfully without error
2. Sometimes heat stack is created sucessfully - BUT with errors in nova-compute - so retry
mechanism works
Errors in nova-compute :
7d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Took 0.10 seconds to destroy the instance on the hypervisor.
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [req-5b21ebad-ffb7-46b0-8c37-fd665d01013e 64fe2842ff8c6302c0450bee25600a10e54f2b9793e9c8776f956c993a7a7ee8 0960461696f64f82ba108f8397bf508c - e01e19b257d241eaae5bb137aedc6fcc e01e19b2
57d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in _create_guest_with_network
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] guest = self._create_guest(
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] next(self.gen)
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] actual_event = event.wait()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] result = hub.switch()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] return self.greenlet.switch()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] eventlet.timeout.Timeout: 300 seconds
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] During handling of the above exception, another exception occurred:
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in _build_and_run_instance
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] self.driver.spawn(context, instance, image_meta,
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in spawn
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] self._create_guest_with_network(
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6956, in _create_guest_with_network
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] raise exception.VirtualInterfaceCreateException()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]
So, it looks like nova didn't received message from neutron that vif
is plugged in in 300 seconds.
Then I patched nova code with this review, so I can see times of this
event.
https://review.opendev.org/c/openstack/nova/+/817030
Logs here :
2021-11-09 14:07:07.251 9 DEBUG nova.compute.manager [req-ff47242a-ec1e-4f3e-8989-dc17b9b66702 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 280 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:08:49.726 9 DEBUG nova.compute.manager [req-6f3db32a-0c79-4d70-b61f-018e5bb6047e 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 8 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:08:53.898 9 DEBUG nova.compute.manager [req-10eb5d9e-e534-4f41-b7b8-019455cd7df2 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 10 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:29.747 9 DEBUG nova.compute.manager [req-6f6b3769-46a9-4188-8db3-fdc7a1f33c5d 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 46 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:31.846 9 DEBUG nova.compute.manager [req-dd2d6dd5-5197-4c21-8f7f-48c58b2aa397 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 43 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:36.003 9 DEBUG nova.compute.manager [req-2e836d62-d368-45a9-bcbd-6162b126d556 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 48 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
Once again, that openstack is clean, fresh installed , no instance running, so why there is VIF plugging completed from 8 sec to 280 seconds (and potentionally more) ?
It looks like there is some race condition , but I don't know where to
find it.
When I bump timeout for vif plugging in nova compute below ,
everything is working well, but it's only workaround.
vif_plugging_timeout = 3600
Last thing what I see is that it is something with security group, if
noop driver is used for security group, I can't reproduce issue, so
it's "fixed".
Testing environment using OVS (not OVN) , and firewall driver openvswitch.
Thank you very much,
Michal Arbet (kekvo)
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1951720/+subscriptions
Follow ups