← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2003377] [NEW] network-vif-plugged event timeouts during resize-confirm can resutlt vms enterign error state with a mix of the old and new flavor

 

Public bug reported:

if a network vif plugged events times out in resize confirm the VM will enter an error state.
if the VM is not using numa then a hard reboot should be enough to fix that.
if it has a numa toplgoy the instnace_numa_topogy and flavor can disagree on the number of vcpu requested depending on when the failure happen.

in this case the VM can try to boot with the instance numa toplgoy for
the new flavor on the dest host but the flavor.vcpus form the old
flavor.


ideally if we have such a failure the vms should either revert to verify_resize or you should be able to do resize_confirm again to try and finish the resize.
alternately we could provide a nova-manage command to help fix the embedded flavor and or flavor in the request spec and reconsile those with the instance numa topology.

the intest would be to ensure its possible to recover the VM either with
a second cofnrim or by using the nova manage command and then hard
rebooting the isntnace.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2003377

Title:
  network-vif-plugged event timeouts during resize-confirm can resutlt
  vms enterign error state with a mix of the old and new flavor

Status in OpenStack Compute (nova):
  New

Bug description:
  if a network vif plugged events times out in resize confirm the VM will enter an error state.
  if the VM is not using numa then a hard reboot should be enough to fix that.
  if it has a numa toplgoy the instnace_numa_topogy and flavor can disagree on the number of vcpu requested depending on when the failure happen.

  in this case the VM can try to boot with the instance numa toplgoy for
  the new flavor on the dest host but the flavor.vcpus form the old
  flavor.

  
  ideally if we have such a failure the vms should either revert to verify_resize or you should be able to do resize_confirm again to try and finish the resize.
  alternately we could provide a nova-manage command to help fix the embedded flavor and or flavor in the request spec and reconsile those with the instance numa topology.

  the intest would be to ensure its possible to recover the VM either
  with a second cofnrim or by using the nova manage command and then
  hard rebooting the isntnace.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2003377/+subscriptions