← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1329546] Re: Upon rebuild instances might never get to Active state

 

Contrary to what claimed in the bug description, the actual root cause
is instead a different one, and it's in neutron.

For events like rebuilding or rebooting an instance a VIF disappears and reappears rather quickly.
In this case the OVS agent loop starts processing the VIF, and then it skips processing when it realizes it's not anymore on the integration bridge.

However it keeps it into the set of 'current' VIFs. This means that when
the VIF is plugged again it's not processed and hence the problem.

Removing nova from affected projects. Patch will follow up soon.



** No longer affects: nova

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1329546

Title:
  Upon rebuild instances might never get to Active state

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  VMware mine sweeper for Neutron (*) recently showed a 100% failure
  rate on tempest.api.compute.v3.servers.test_server_actions

  Logs for two instances of these failures are available at [1] and [2]
  The failure manifested as an instance unable to go active after a rebuild.
  A bit of instrumentation and log analysis revealed no obvious error on the neutron side - and also that the instance was actually in "running" state even if its take state was "rebuilding/spawning"

  N-API logs [3] revealed that the instance spawn was timing out on a
  missed notification from neutron regarding VIF plug - however the same
  log showed such notification was received [4]

  It turns out that, after rebuild, the instance network cache had still
  'active': False for the instance's VIF, even if the status for the
  corresponding port was 'ACTIVE'. This happened because after the
  network-vif-plugged event was received, nothing triggered a refresh of
  the instance network info. For this reason, the VM, after a rebuild,
  kept waiting for an even which obviously was never sent from neutron.

  While this manifested only on mine sweeper - this appears to be a nova bug - manifesting in vmware minesweeper only because of the way the plugin synchronizes with the backend for reporting the operational status of a port.
  A simple solution for this problem would be to reload the instance network info cache when network-vif-plugged events are received by nova. (But as the reporter knows nothing about nova this might be a very bad idea as well)

  [1] http://208.91.1.172/logs/neutron/98278/2/413209/testr_results.html
  [2] http://208.91.1.172/logs/neutron/73234/34/413213/testr_results.html
  [3] http://208.91.1.172/logs/neutron/73234/34/413213/logs/screen-n-cpu.txt.gz?level=WARNING#_2014-06-06_01_46_36_219
  [4] http://208.91.1.172/logs/neutron/73234/34/413213/logs/screen-n-cpu.txt.gz?level=DEBUG#_2014-06-06_01_41_31_767

  (*) runs libvirt/KVM + NSX

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1329546/+subscriptions


References