yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #70861
[Bug 1744361] Re: test_server_security_groups failed to reboot with "Domain not found: no domain with matching uuid" because of missing vif event from linuxbridge agent
Reviewed: https://review.openstack.org/540168
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=236bb54493385bcfe2034e8b0fd7a387fa66635a
Submitter: Zuul
Branch: master
commit 236bb54493385bcfe2034e8b0fd7a387fa66635a
Author: melanie witt <melwittt@xxxxxxxxx>
Date: Thu Feb 1 22:27:57 2018 +0000
Don't wait for vif plug events during _hard_reboot
Originally, in change Id188d48609f3d22d14e16c7f6114291d547a8986 we
added a re-initialization of volumes, encryptors, and vifs to hard
reboot. When creating the libvirt domain and network, we were waiting
for vif plug events from neutron when we replugged the vifs. Then, we
started seeing timeouts in the linuxbridge gate job because compute
was timing out waiting for plug events from neutron during a hard
reboot.
It turns out that the behavior of neutron plug events depends on what
vif type we're using and we're also using a stale network info_cache
throughout the hard reboot code path, so we can't be 100% sure we know
which vifs to wait for plug events from anyway. We coincidentally get
some info_cache refreshes from network-changed events from neutron,
but we shouldn't rely on that.
Ideally, we could do something like wait for an unplug event after we
unplug the vif, then refresh the network_info cache, then wait for the
plug event. BUT, in the case of the os-vif linuxbridge unplug method,
it is a no-op, so I don't think we could expect to get an unplug
event for it (and we don't see any network-vif-unplugged events sent
in the q-svc log for the linuxbridge job during a hard reboot).
Closes-Bug: #1744361
Change-Id: Ib0cf5d55750f13d0499a570f14024dca551ed4d4
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1744361
Title:
test_server_security_groups failed to reboot with "Domain not found:
no domain with matching uuid" because of missing vif event from
linuxbridge agent
Status in neutron:
Confirmed
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
Confirmed
Bug description:
This happened in master (Queens) in tempest-linuxbridge job:
http://logs.openstack.org/56/534456/1/check/neutron-tempest-
linuxbridge/7693ca5/logs/testr_results.html.gz
From tempest log:
Traceback (most recent call last):
File "tempest/api/compute/security_groups/test_security_groups.py", line 105, in test_server_security_groups
'ACTIVE')
File "tempest/common/waiters.py", line 96, in wait_for_server_status
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (SecurityGroupsTestJSON:test_server_security_groups) Server dd2ec3b0-f909-4104-ae2a-9b878d936ed4 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: HARD_REBOOT. Current task state: reboot_started_hard.
In nova-compute log, we see this:
Jan 19 10:38:37.869397 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: WARNING nova.virt.libvirt.driver [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Timeout waiting for vif plugging callback for instance with vm_state active and task_state reboot_started_hard.: Timeout: 300 seconds
Jan 19 10:38:37.880789 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: DEBUG nova.compute.manager [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Checking state {{(pid=28907) _get_power_state /opt/stack/new/nova/nova/compute/manager.py:1183}}
Jan 19 10:38:37.885265 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR nova.compute.manager [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Cannot reboot instance: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023): libvirtError: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023)
Jan 19 10:38:37.981484 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: DEBUG nova.compute.manager [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Instance has been destroyed from under us while trying to set it to ERROR {{(pid=28907) _set_instance_obj_error_state /opt/stack/new/nova/nova/compute/manager.py:586}}
Jan 19 10:38:38.150498 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server [None req-838e0ebe-afe5-4cca-abcc-4ba116889cba service nova] Exception during message handling: libvirtError: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023)
Jan 19 10:38:38.150813 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Jan 19 10:38:38.151125 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
Jan 19 10:38:38.151388 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Jan 19 10:38:38.151668 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
Jan 19 10:38:38.151939 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Jan 19 10:38:38.152205 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
Jan 19 10:38:38.152491 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Jan 19 10:38:38.152705 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/exception_wrapper.py", line 76, in wrapped
Jan 19 10:38:38.152962 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
Jan 19 10:38:38.153268 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 19 10:38:38.153481 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server self.force_reraise()
Jan 19 10:38:38.153710 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 19 10:38:38.153942 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jan 19 10:38:38.154238 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/exception_wrapper.py", line 67, in wrapped
Jan 19 10:38:38.154512 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
Jan 19 10:38:38.154794 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 186, in decorated_function
Jan 19 10:38:38.155086 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
Jan 19 10:38:38.155354 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 19 10:38:38.155598 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server self.force_reraise()
Jan 19 10:38:38.155850 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 19 10:38:38.156133 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jan 19 10:38:38.156369 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 156, in decorated_function
Jan 19 10:38:38.156603 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Jan 19 10:38:38.156856 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/utils.py", line 944, in decorated_function
Jan 19 10:38:38.157092 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Jan 19 10:38:38.157335 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 214, in decorated_function
Jan 19 10:38:38.157592 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
Jan 19 10:38:38.157845 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 19 10:38:38.158114 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server self.force_reraise()
Jan 19 10:38:38.158316 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 19 10:38:38.158512 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jan 19 10:38:38.158710 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 202, in decorated_function
Jan 19 10:38:38.158908 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
Jan 19 10:38:38.159229 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 3213, in reboot_instance
Jan 19 10:38:38.159465 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server self._set_instance_obj_error_state(context, instance)
Jan 19 10:38:38.159721 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Jan 19 10:38:38.161021 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server self.force_reraise()
Jan 19 10:38:38.161349 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Jan 19 10:38:38.161588 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Jan 19 10:38:38.161807 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 3188, in reboot_instance
Jan 19 10:38:38.162017 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server bad_volumes_callback=bad_volumes_callback)
Jan 19 10:38:38.162329 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2478, in reboot
Jan 19 10:38:38.162660 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server block_device_info)
Jan 19 10:38:38.162986 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2583, in _hard_reboot
Jan 19 10:38:38.163312 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
Jan 19 10:38:38.163642 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5298, in _create_domain_and_network
Jan 19 10:38:38.163904 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server destroy_disks_on_failure)
Jan 19 10:38:38.164127 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5222, in _cleanup_failed_start
Jan 19 10:38:38.164343 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server if guest and guest.is_active():
Jan 19 10:38:38.164583 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 553, in is_active
Jan 19 10:38:38.164891 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server return self._domain.isActive()
Jan 19 10:38:38.165124 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
Jan 19 10:38:38.165331 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server result = proxy_call(self._autowrap, f, *args, **kwargs)
Jan 19 10:38:38.165548 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
Jan 19 10:38:38.165759 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server rv = execute(f, *args, **kwargs)
Jan 19 10:38:38.166402 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
Jan 19 10:38:38.166610 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server six.reraise(c, e, tb)
Jan 19 10:38:38.166812 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
Jan 19 10:38:38.167027 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server rv = meth(*args, **kwargs)
Jan 19 10:38:38.167254 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1368, in isActive
Jan 19 10:38:38.167456 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server if ret == -1: raise libvirtError ('virDomainIsActive() failed', dom=self)
Jan 19 10:38:38.167663 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: ERROR oslo_messaging.rpc.server libvirtError: Domain not found: no domain with matching uuid 'dd2ec3b0-f909-4104-ae2a-9b878d936ed4' (instance-00000023)
Despite nova claims it didn't receive plugged event, here it is sent
in q-svc:
Jan 19 10:33:35.836759 ubuntu-xenial-rax-iad-0002010825 neutron-server[20644]: DEBUG neutron.notifiers.nova [-] Sending events: [{'status': 'completed', 'tag': u'567525a4-4b16-4004-91d5-545086d01227', 'name': 'network-vif-plugged', 'server_uuid': u'dd2ec3b0-f909-4104-ae2a-9b878d936ed4'}] {{(pid=20733) send_events /opt/stack/new/neutron/neutron/notifiers/nova.py:241}}
Jan 19 10:33:35.838577 ubuntu-xenial-rax-iad-0002010825 neutron-server[20644]: DEBUG novaclient.v2.client [-] REQ: curl -g -i --cacert "/opt/stack/data/ca-bundle.pem" -X POST https://104.239.173.102/compute/v2.1/os-server-external-events -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-OpenStack-Nova-API-Version: 2.1" -H "X-Auth-Token: {SHA1}ba0cdcef7f7e42ba834fc341acb0aca1b904075b" -d '{"events": [{"status": "completed", "tag": "567525a4-4b16-4004-91d5-545086d01227", "name": "network-vif-plugged", "server_uuid": "dd2ec3b0-f909-4104-ae2a-9b878d936ed4"}]}' {{(pid=20733) _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py:372}}
The nova-compute log contains this message when the event received:
Jan 19 10:33:35.986648 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: DEBUG nova.compute.manager [None req-4bcf94cf-2048-4e99-94e6-646519900469 service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] No waiting events found dispatching network-vif-plugged-567525a4-4b16-4004-91d5-545086d01227 {{(pid=28907) pop_instance_event /opt/stack/new/nova/nova/compute/manager.py:363}}
Jan 19 10:33:35.987219 ubuntu-xenial-rax-iad-0002010825 nova-compute[28907]: WARNING nova.compute.manager [None req-4bcf94cf-2048-4e99-94e6-646519900469 service nova] [instance: dd2ec3b0-f909-4104-ae2a-9b878d936ed4] Received unexpected event network-vif-plugged-567525a4-4b16-4004-91d5-545086d01227 for instance with vm_state active and task_state rebooting_hard.
Looks like it ignored the event due to instance state, and never got a
new one?
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1744361/+subscriptions
References