yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #60637
[Bug 1651650] Re: XenAPI: server rescue test sometime failed with timeout waiting for vif plugging
Reviewed: https://review.openstack.org/413469
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2207dcf560413b213a8fb3737bb4b0923dcd96e0
Submitter: Jenkins
Branch: master
commit 2207dcf560413b213a8fb3737bb4b0923dcd96e0
Author: Huan Xie <huan.xie@xxxxxxxxxx>
Date: Tue Dec 20 23:26:49 2016 -0800
XenAPI: Fix vif plug problem during VM rescue/unrescue
During VM rescue tests, we found nova xenserver driver failed due
to waiting vif-plug-event from neutron timeout. when checking
nova and neutron logs, we found there are several mistakes in
nova driver:
(1) After several rounds of rescuing/unrescuing, it will wait for
vif-plug-event, but actually, it shouldn't wait for such event
(2) Checking neutron log, we found the port status sometimes will
change during rescuing/unrescuing, which also shouldn't happen
(3) Checking nova related code, we found each time when booting a
VM, it will delete and create the tap device, which is used by
neutron security group, this delete/re-create action will cause
the port status change which shouldn't be changed.
(4) When adding/deleting security groups to VM's port, it will
trigger the port status change, e.g. from ACTIVE to BUILDING, but
under rescue scenario, we also depends on VIF's status to determine
whether waiting for vif plug event is not appropriate.
This patch is to fix the above problem and there is another patch
to enable the exclude rescue tests to test this fix
https://review.openstack.org/#/c/416197/
Closes-Bug: #1651650
Change-Id: I32c66733330bc9877caea7e2a2290c02b3906708
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1651650
Title:
XenAPI: server rescue test sometime failed with timeout waiting for
vif plugging
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Observed several failure in citrix xenserver CI for this test case:
tempest.api.compute.servers.test_server_rescue
See there are timeout waiting for vif:
$ grep 'Timeout waiting for vif plugging callbac' screen-n-cpu.txt.gz
2016-12-20 10:58:52.036 4528 WARNING nova.virt.xenapi.vmops [req-ff027cef-59be-4326-95e1-065f68077d63 tempest-ServerRescueTestJSON-1293983176 tempest-ServerRescueTestJSON-1293983176] [instance: 28b094ee-c571-4083-b72b-5ea78f1f4291] Timeout waiting for vif plugging callback
For rescue, it seems shouldn't wait for this event as this port should be active at the rescuing start.
But observed:
neutron service reported the 2nd vif-plugin event:
2016-12-20 10:52:31.689 712 DEBUG neutron.notifiers.nova [-] Sending events: [{'status': 'completed', 'tag': u'52d79a78-7205-4e69-8005-76a3cebbf267', 'name': 'network-vif-plugged', 'server_uuid': u'28b094ee-c571-4083-b72b-5ea78f1f4291'}] send_events /opt/stack/new/neutron/neutron/notifiers/nova.py:248
2016-12-20 10:53:45.179 712 DEBUG neutron.notifiers.nova [-] Sending
events: [{'status': 'completed', 'tag':
u'52d79a78-7205-4e69-8005-76a3cebbf267', 'name': 'network-vif-
plugged', 'server_uuid': u'28b094ee-c571-4083-b72b-5ea78f1f4291'}]
send_events /opt/stack/new/neutron/neutron/notifiers/nova.py:248
And nova attempts to wait for this event after the 2nd event sent out; so it won't catch the 2nd event at all:
2016-12-20 10:53:46.326 4528 DEBUG nova.virt.xenapi.vmops [req-ff027cef-59be-4326-95e1-065f68077d63 tempest-ServerRescueTestJSON-1293983176 tempest-ServerRescueTestJSON-1293983176] wait for instance event:[('network-vif-plugged', u'52d79a78-7205-4e69-8005-76a3cebbf267')] _spawn /opt/stack/new/nova/nova/virt/xenapi/vmops.py:599
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1651650/+subscriptions
References