yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1329546] Re: Upon rebuild instances might never get to Active state

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Alan Pevec <1329546@xxxxxxxxxxxxxxxxxx>
Date: Wed, 17 Sep 2014 15:00:50 -0000
Reply-to: Bug 1329546 <1329546@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: neutron/icehouse
   Importance: Undecided => High

** Changed in: neutron/icehouse
     Assignee: (unassigned) => Ihar Hrachyshka (ihar-hrachyshka)

** Also affects: neutron/havana
   Importance: Undecided
       Status: New

** Changed in: neutron/havana
   Importance: Undecided => High

** Changed in: neutron/havana
     Assignee: (unassigned) => Ihar Hrachyshka (ihar-hrachyshka)

** Changed in: neutron/havana
       Status: New => In Progress

** Changed in: neutron/havana
    Milestone: None => 2013.2.4

** Tags removed: icehouse-backport-potential in-stable-icehouse

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1329546

Title:
  Upon rebuild instances might never get to Active state

Status in OpenStack Neutron (virtual network service):
  Fix Released
Status in neutron havana series:
  In Progress
Status in neutron icehouse series:
  Fix Released

Bug description:
  VMware mine sweeper for Neutron (*) recently showed a 100% failure
  rate on tempest.api.compute.v3.servers.test_server_actions

  Logs for two instances of these failures are available at [1] and [2]
  The failure manifested as an instance unable to go active after a rebuild.
  A bit of instrumentation and log analysis revealed no obvious error on the neutron side - and also that the instance was actually in "running" state even if its take state was "rebuilding/spawning"

  N-API logs [3] revealed that the instance spawn was timing out on a
  missed notification from neutron regarding VIF plug - however the same
  log showed such notification was received [4]

  It turns out that, after rebuild, the instance network cache had still
  'active': False for the instance's VIF, even if the status for the
  corresponding port was 'ACTIVE'. This happened because after the
  network-vif-plugged event was received, nothing triggered a refresh of
  the instance network info. For this reason, the VM, after a rebuild,
  kept waiting for an even which obviously was never sent from neutron.

  While this manifested only on mine sweeper - this appears to be a nova bug - manifesting in vmware minesweeper only because of the way the plugin synchronizes with the backend for reporting the operational status of a port.
  A simple solution for this problem would be to reload the instance network info cache when network-vif-plugged events are received by nova. (But as the reporter knows nothing about nova this might be a very bad idea as well)

  [1] http://208.91.1.172/logs/neutron/98278/2/413209/testr_results.html
  [2] http://208.91.1.172/logs/neutron/73234/34/413213/testr_results.html
  [3] http://208.91.1.172/logs/neutron/73234/34/413213/logs/screen-n-cpu.txt.gz?level=WARNING#_2014-06-06_01_46_36_219
  [4] http://208.91.1.172/logs/neutron/73234/34/413213/logs/screen-n-cpu.txt.gz?level=DEBUG#_2014-06-06_01_41_31_767

  (*) runs libvirt/KVM + NSX

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1329546/+subscriptions

References

[Bug 1329546] [NEW] Upon rebuild instances might never get to Active state
From: Salvatore Orlando, 2014-06-12