yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #21523
[Bug 1329546] Re: Upon rebuild instances might never get to Active state
** Changed in: neutron/icehouse
Importance: Undecided => High
** Changed in: neutron/icehouse
Assignee: (unassigned) => Ihar Hrachyshka (ihar-hrachyshka)
** Also affects: neutron/havana
Importance: Undecided
Status: New
** Changed in: neutron/havana
Importance: Undecided => High
** Changed in: neutron/havana
Assignee: (unassigned) => Ihar Hrachyshka (ihar-hrachyshka)
** Changed in: neutron/havana
Status: New => In Progress
** Changed in: neutron/havana
Milestone: None => 2013.2.4
** Tags removed: icehouse-backport-potential in-stable-icehouse
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1329546
Title:
Upon rebuild instances might never get to Active state
Status in OpenStack Neutron (virtual network service):
Fix Released
Status in neutron havana series:
In Progress
Status in neutron icehouse series:
Fix Released
Bug description:
VMware mine sweeper for Neutron (*) recently showed a 100% failure
rate on tempest.api.compute.v3.servers.test_server_actions
Logs for two instances of these failures are available at [1] and [2]
The failure manifested as an instance unable to go active after a rebuild.
A bit of instrumentation and log analysis revealed no obvious error on the neutron side - and also that the instance was actually in "running" state even if its take state was "rebuilding/spawning"
N-API logs [3] revealed that the instance spawn was timing out on a
missed notification from neutron regarding VIF plug - however the same
log showed such notification was received [4]
It turns out that, after rebuild, the instance network cache had still
'active': False for the instance's VIF, even if the status for the
corresponding port was 'ACTIVE'. This happened because after the
network-vif-plugged event was received, nothing triggered a refresh of
the instance network info. For this reason, the VM, after a rebuild,
kept waiting for an even which obviously was never sent from neutron.
While this manifested only on mine sweeper - this appears to be a nova bug - manifesting in vmware minesweeper only because of the way the plugin synchronizes with the backend for reporting the operational status of a port.
A simple solution for this problem would be to reload the instance network info cache when network-vif-plugged events are received by nova. (But as the reporter knows nothing about nova this might be a very bad idea as well)
[1] http://208.91.1.172/logs/neutron/98278/2/413209/testr_results.html
[2] http://208.91.1.172/logs/neutron/73234/34/413213/testr_results.html
[3] http://208.91.1.172/logs/neutron/73234/34/413213/logs/screen-n-cpu.txt.gz?level=WARNING#_2014-06-06_01_46_36_219
[4] http://208.91.1.172/logs/neutron/73234/34/413213/logs/screen-n-cpu.txt.gz?level=DEBUG#_2014-06-06_01_41_31_767
(*) runs libvirt/KVM + NSX
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1329546/+subscriptions
References