← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1744361] Re: test_server_security_groups failed to reboot with "Domain not found: no domain with matching uuid" because of missing vif event from linuxbridge agent

 

Reviewed:  https://review.openstack.org/540168
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=236bb54493385bcfe2034e8b0fd7a387fa66635a
Submitter: Zuul
Branch:    master

commit 236bb54493385bcfe2034e8b0fd7a387fa66635a
Author: melanie witt <melwittt@xxxxxxxxx>
Date:   Thu Feb 1 22:27:57 2018 +0000

    Don't wait for vif plug events during _hard_reboot
    
    Originally, in change Id188d48609f3d22d14e16c7f6114291d547a8986 we
    added a re-initialization of volumes, encryptors, and vifs to hard
    reboot. When creating the libvirt domain and network, we were waiting
    for vif plug events from neutron when we replugged the vifs. Then, we
    started seeing timeouts in the linuxbridge gate job because compute
    was timing out waiting for plug events from neutron during a hard
    reboot.
    
    It turns out that the behavior of neutron plug events depends on what
    vif type we're using and we're also using a stale network info_cache
    throughout the hard reboot code path, so we can't be 100% sure we know
    which vifs to wait for plug events from anyway. We coincidentally get
    some info_cache refreshes from network-changed events from neutron,
    but we shouldn't rely on that.
    
    Ideally, we could do something like wait for an unplug event after we
    unplug the vif, then refresh the network_info cache, then wait for the
    plug event. BUT, in the case of the os-vif linuxbridge unplug method,
    it is a no-op, so I don't think we could expect to get an unplug
    event for it (and we don't see any network-vif-unplugged events sent
    in the q-svc log for the linuxbridge job during a hard reboot).
    
    Closes-Bug: #1744361
    
    Change-Id: Ib0cf5d55750f13d0499a570f14024dca551ed4d4


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1744361

Title:
  test_server_security_groups failed to reboot with "Domain not found:
  no domain with matching uuid" because of missing vif event from
  linuxbridge agent

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Confirmed

Bug description:
  This happened in master (Queens) in tempest-linuxbridge job:

  http://logs.openstack.org/56/534456/1/check/neutron-tempest-
  linuxbridge/7693ca5/logs/testr_results.html.gz

  From tempest log:

  Traceback (most recent call last):
    File "tempest/api/compute/security_groups/test_security_groups.py", line 105, in test_server_security_groups
      'ACTIVE')
    File "tempest/common/waiters.py", line 96, in wait_for_server_status
      raise lib_exc.TimeoutException(message)
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: (SecurityGroupsTestJSON:test_server_security_groups) Server dd2ec3b0-f909-4104-ae2a-9b878d936ed4 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: HARD_REBOOT. Current task state: reboot_started_hard.

  In nova-compute log, we see this:

  Jan 19 10:38:37.869397 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: WARNING nova.virt.libvirt.driver [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Timeout waiting for vif plugging callback for instance with vm_state active and task_state reboot_started_hard.: Timeout: 300 seconds
  Jan 19 10:38:37.880789 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: DEBUG nova.compute.manager [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Checking state {{(pid=28907) _get_power_state /opt/stack/new/nova/nova/compute/manager.py:1183}}
  Jan 19 10:38:37.885265 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR nova.compute.manager [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Cannot reboot instance: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023): libvirtError: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023)
  Jan 19 10:38:37.981484 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: DEBUG nova.compute.manager [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Instance has been destroyed from under us while trying to set it to ERROR {{(pid=28907) _set_instance_obj_error_state /opt/stack/new/nova/nova/compute/manager.py:586}}
  Jan 19 10:38:38.150498 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] Exception during message handling: libvirtError: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023)
  Jan 19 10:38:38.150813 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  Jan 19 10:38:38.151125 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
  Jan 19 10:38:38.151388 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
  Jan 19 10:38:38.151668 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
  Jan 19 10:38:38.151939 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  Jan 19 10:38:38.152205 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
  Jan 19 10:38:38.152491 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  Jan 19 10:38:38.152705 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/exception_wrapper.py", line 76, in wrapped
  Jan 19 10:38:38.152962 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
  Jan 19 10:38:38.153268 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Jan 19 10:38:38.153481 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     self.force_reraise()
  Jan 19 10:38:38.153710 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Jan 19 10:38:38.153942 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  Jan 19 10:38:38.154238 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/exception_wrapper.py", line 67, in wrapped
  Jan 19 10:38:38.154512 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
  Jan 19 10:38:38.154794 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/manager.py", line 186, in decorated_function
  Jan 19 10:38:38.155086 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
  Jan 19 10:38:38.155354 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Jan 19 10:38:38.155598 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     self.force_reraise()
  Jan 19 10:38:38.155850 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Jan 19 10:38:38.156133 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  Jan 19 10:38:38.156369 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/manager.py", line 156, in decorated_function
  Jan 19 10:38:38.156603 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  Jan 19 10:38:38.156856 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/utils.py", line 944, in decorated_function
  Jan 19 10:38:38.157092 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  Jan 19 10:38:38.157335 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/manager.py", line 214, in decorated_function
  Jan 19 10:38:38.157592 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     kwargs['instance'], e, sys.exc_info())
  Jan 19 10:38:38.157845 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Jan 19 10:38:38.158114 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     self.force_reraise()
  Jan 19 10:38:38.158316 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Jan 19 10:38:38.158512 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  Jan 19 10:38:38.158710 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/manager.py", line 202, in decorated_function
  Jan 19 10:38:38.158908 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  Jan 19 10:38:38.159229 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/manager.py", line 3213, in reboot_instance
  Jan 19 10:38:38.159465 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     self._set_instance_obj_error_state(context, instance)
  Jan 19 10:38:38.159721 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Jan 19 10:38:38.161021 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     self.force_reraise()
  Jan 19 10:38:38.161349 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Jan 19 10:38:38.161588 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  Jan 19 10:38:38.161807 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/compute/manager.py", line 3188, in reboot_instance
  Jan 19 10:38:38.162017 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     bad_volumes_callback=bad_volumes_callback)
  Jan 19 10:38:38.162329 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2478, in reboot
  Jan 19 10:38:38.162660 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     block_device_info)
  Jan 19 10:38:38.162986 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2583, in _hard_reboot
  Jan 19 10:38:38.163312 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     block_device_info=block_device_info)
  Jan 19 10:38:38.163642 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5298, in _create_domain_and_network
  Jan 19 10:38:38.163904 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     destroy_disks_on_failure)
  Jan 19 10:38:38.164127 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5222, in _cleanup_failed_start
  Jan 19 10:38:38.164343 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     if guest and guest.is_active():
  Jan 19 10:38:38.164583 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 553, in is_active
  Jan 19 10:38:38.164891 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     return self._domain.isActive()
  Jan 19 10:38:38.165124 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
  Jan 19 10:38:38.165331 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     result = proxy_call(self._autowrap, f, *args, **kwargs)
  Jan 19 10:38:38.165548 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
  Jan 19 10:38:38.165759 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     rv = execute(f, *args, **kwargs)
  Jan 19 10:38:38.166402 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
  Jan 19 10:38:38.166610 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     six.reraise(c, e, tb)
  Jan 19 10:38:38.166812 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
  Jan 19 10:38:38.167027 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     rv = meth(*args, **kwargs)
  Jan 19 10:38:38.167254 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1368, in isActive
  Jan 19 10:38:38.167456 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server     if ret == -1: raise libvirtError ('virDomainIsActive() failed', dom=self)
  Jan 19 10:38:38.167663 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server libvirtError: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023)

  Despite nova claims it didn't receive plugged event, here it is sent
  in q-svc:

  Jan 19 10:33:35.836759 ubuntu-xenial-rax-iad-0002010825 neutron-server[20644]: DEBUG neutron.notifiers.nova [-] Sending events: [{'status': 'completed', 'tag': u'567525a4-4b16-4004-91d5-545086d01227', 'name': 'network-vif-plugged', 'server_uuid': u'dd2ec3b0-f909-4104-ae2a-9b878d936ed4'}] {{(pid=20733) send_events /opt/stack/new/neutron/neutron/notifiers/nova.py:241}}
  Jan 19 10:33:35.838577 ubuntu-xenial-rax-iad-0002010825 neutron-server[20644]: DEBUG novaclient.v2.client [-] REQ: curl -g -i --cacert "/opt/stack/data/ca-bundle.pem" -X POST https://104.239.173.102/compute/v2.1/os-server-external-events -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-OpenStack-Nova-API-Version: 2.1" -H "X-Auth-Token: {SHA1}ba0cdcef7f7e42ba834fc341acb0aca1b904075b" -d '{"events": [{"status": "completed", "tag": "567525a4-4b16-4004-91d5-545086d01227", "name": "network-vif-plugged", "server_uuid": "dd2ec3b0-f909-4104-ae2a-9b878d936ed4"}]}' {{(pid=20733) _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py:372}}

  The nova-compute log contains this message when the event received:

  Jan 19 10:33:35.986648 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: DEBUG nova.compute.manager [None req-4bcf94cf-2048-4e99-94e6-646519900469 service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] No waiting events found dispatching network-vif-plugged-567525a4-4b16-4004-91d5-545086d01227 {{(pid=28907) pop_instance_event /opt/stack/new/nova/nova/compute/manager.py:363}}
  Jan 19 10:33:35.987219 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: WARNING nova.compute.manager [None req-4bcf94cf-2048-4e99-94e6-646519900469 service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Received unexpected event network-vif-plugged-567525a4-4b16-4004-91d5-545086d01227 for instance with vm_state active and task_state rebooting_hard.

  Looks like it ignored the event due to instance state, and never got a
  new one?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1744361/+subscriptions


References