← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1951720] [NEW] Virtual interface creation failed

 

Public bug reported:

Hi,

I have testig stack of openstack wallaby (deployed with kolla-ansible with kolla source images)
and found probably weird bug in nova/neutron.

I have testing heat template which was starting about 6 instances with
bunch of network interfaces with security group - nothing special.
Testing openstack ENV is clean, fresh installed, tempest passing. So
what is going on ?

1. Sometimes heat stack is created successfully without error
2. Sometimes heat stack is created sucessfully - BUT with errors in nova-compute - so retry 
   mechanism works

Errors in nova-compute :

7d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Took 0.10 seconds to destroy the instance on the hypervisor.
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [req-5b21ebad-ffb7-46b0-8c37-fd665d01013e 64fe2842ff8c6302c0450bee25600a10e54f2b9793e9c8776f956c993a7a7ee8 0960461696f64f82ba108f8397bf508c - e01e19b257d241eaae5bb137aedc6fcc e01e19b2
57d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in _create_guest_with_network
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    guest = self._create_guest(
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    next(self.gen)
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    actual_event = event.wait()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    result = hub.switch()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    return self.greenlet.switch()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] eventlet.timeout.Timeout: 300 seconds
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] During handling of the above exception, another exception occurred:
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in _build_and_run_instance
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    self.driver.spawn(context, instance, image_meta,
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in spawn
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    self._create_guest_with_network(
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6956, in _create_guest_with_network
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    raise exception.VirtualInterfaceCreateException()
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]


So, it looks like nova didn't received message from neutron that vif is
plugged in in 300 seconds.

Then I patched nova code with this review, so I can see times of this
event.

https://review.opendev.org/c/openstack/nova/+/817030


Logs here :

2021-11-09 14:07:07.251 9 DEBUG nova.compute.manager [req-ff47242a-ec1e-4f3e-8989-dc17b9b66702 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 280 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:08:49.726 9 DEBUG nova.compute.manager [req-6f3db32a-0c79-4d70-b61f-018e5bb6047e 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 8 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:08:53.898 9 DEBUG nova.compute.manager [req-10eb5d9e-e534-4f41-b7b8-019455cd7df2 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 10 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:29.747 9 DEBUG nova.compute.manager [req-6f6b3769-46a9-4188-8db3-fdc7a1f33c5d 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 46 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:31.846 9 DEBUG nova.compute.manager [req-dd2d6dd5-5197-4c21-8f7f-48c58b2aa397 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 43 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
2021-11-09 14:09:36.003 9 DEBUG nova.compute.manager [req-2e836d62-d368-45a9-bcbd-6162b126d556 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 48 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490



Once again, that openstack is clean, fresh installed , no instance running, so why there is VIF plugging completed from 8 sec to 280 seconds (and potentionally more) ?

It looks like there is some race condition , but I don't know where to
find it.

When I bump timeout for vif plugging in nova compute below , everything
is working well, but it's only workaround.

vif_plugging_timeout = 3600

Last thing what I see is that it is something with security group, if
noop driver is used for security group, I can't reproduce issue, so it's
"fixed".


Testing environment using OVS (not OVN) , and firewall driver openvswitch.

Thank you very much,
Michal Arbet (kekvo)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1951720

Title:
  Virtual interface creation failed

Status in OpenStack Compute (nova):
  New

Bug description:
  Hi,

  I have testig stack of openstack wallaby (deployed with kolla-ansible with kolla source images)
  and found probably weird bug in nova/neutron.

  I have testing heat template which was starting about 6 instances with
  bunch of network interfaces with security group - nothing special.
  Testing openstack ENV is clean, fresh installed, tempest passing. So
  what is going on ?

  1. Sometimes heat stack is created successfully without error
  2. Sometimes heat stack is created sucessfully - BUT with errors in nova-compute - so retry 
     mechanism works

  Errors in nova-compute :

  7d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Took 0.10 seconds to destroy the instance on the hypervisor.
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [req-5b21ebad-ffb7-46b0-8c37-fd665d01013e 64fe2842ff8c6302c0450bee25600a10e54f2b9793e9c8776f956c993a7a7ee8 0960461696f64f82ba108f8397bf508c - e01e19b257d241eaae5bb137aedc6fcc e01e19b2
  57d241eaae5bb137aedc6fcc] [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Failed to allocate network(s): nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6930, in _create_guest_with_network
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    guest = self._create_guest(
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3.9/contextlib.py", line 124, in __exit__
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    next(self.gen)
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 479, in wait_for_instance_event
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    actual_event = event.wait()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/eventlet/event.py", line 125, in wait
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    result = hub.switch()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    return self.greenlet.switch()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] eventlet.timeout.Timeout: 300 seconds
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] During handling of the above exception, another exception occurred:
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] 
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] Traceback (most recent call last):
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2366, in _build_and_run_instance
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    self.driver.spawn(context, instance, image_meta,
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 3885, in spawn
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    self._create_guest_with_network(
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 6956, in _create_guest_with_network
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]    raise exception.VirtualInterfaceCreateException()
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
  2021-11-03 14:56:42.453 7 ERROR nova.compute.manager [instance: 69b8dd7e-a9db-44fb-ab41-832942cb9e7e]


  So, it looks like nova didn't received message from neutron that vif
  is plugged in in 300 seconds.

  Then I patched nova code with this review, so I can see times of this
  event.

  https://review.opendev.org/c/openstack/nova/+/817030


  
  Logs here : 

  2021-11-09 14:07:07.251 9 DEBUG nova.compute.manager [req-ff47242a-ec1e-4f3e-8989-dc17b9b66702 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 280 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
  2021-11-09 14:08:49.726 9 DEBUG nova.compute.manager [req-6f3db32a-0c79-4d70-b61f-018e5bb6047e 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 8 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
  2021-11-09 14:08:53.898 9 DEBUG nova.compute.manager [req-10eb5d9e-e534-4f41-b7b8-019455cd7df2 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 10 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
  2021-11-09 14:09:29.747 9 DEBUG nova.compute.manager [req-6f6b3769-46a9-4188-8db3-fdc7a1f33c5d 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 46 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
  2021-11-09 14:09:31.846 9 DEBUG nova.compute.manager [req-dd2d6dd5-5197-4c21-8f7f-48c58b2aa397 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 43 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490
  2021-11-09 14:09:36.003 9 DEBUG nova.compute.manager [req-2e836d62-d368-45a9-bcbd-6162b126d556 612fb894e80a454881a7ffd6ff3565e7 3ea2f30b9a7445ccb6151d27d65e9c8b - default default] VIF plugging completed in 48 seconds wait_for_instance_event /usr/lib/python3/dist-packages/nova/compute/manager.py:490



  
  Once again, that openstack is clean, fresh installed , no instance running, so why there is VIF plugging completed from 8 sec to 280 seconds (and potentionally more) ?

  It looks like there is some race condition , but I don't know where to
  find it.

  When I bump timeout for vif plugging in nova compute below ,
  everything is working well, but it's only workaround.

  vif_plugging_timeout = 3600

  Last thing what I see is that it is something with security group, if
  noop driver is used for security group, I can't reproduce issue, so
  it's "fixed".

  
  Testing environment using OVS (not OVN) , and firewall driver openvswitch.

  Thank you very much,
  Michal Arbet (kekvo)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1951720/+subscriptions



Follow ups