yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1735407] Re: [Nova] Evacuation doesn't respect anti-affinity rules

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1735407@xxxxxxxxxxxxxxxxxx>
Date: Wed, 07 Feb 2018 03:46:46 -0000
Reply-to: Bug 1735407 <1735407@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Reviewed:  https://review.openstack.org/525242
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=edeeaf9102eccb78f1a2555c7e18c3d706f07639
Submitter: Zuul
Branch:    master

commit edeeaf9102eccb78f1a2555c7e18c3d706f07639
Author: Balazs Gibizer <balazs.gibizer@xxxxxxxxxxxx>
Date:   Mon Dec 4 16:18:30 2017 +0100

    Add late server group policy check to rebuild
    
    The affinity and anti-affinity server group policy is enforced by the
    scheduler but two parallel scheduling could cause that such policy is
    violated. During instance boot a late policy check was performed in
    the compute manager to prevent this. This check was missing in case
    of rebuild. Therefore two parallel evacuate command could cause that
    the server group policy is violated. This patch introduces the late
    policy check to rebuild to prevent such situation. When the violation
    is detected during boot a re-scheduling happens. However the rebuild
    action does not have the re-scheduling implementation so in this case
    the rebuild will fail and the evacuation needs to be retried by the
    user. Still this is better than allowing a parallel evacuation to
    break the server group affinity policy.
    
    To make the late policy check possible in the compute/manager the
    rebuild_instance compute RPC call was extended with a request_spec
    parameter.
    
    Co-Authored-By: Richard Zsarnoczai <richard.zsarnoczai@xxxxxxxxxxxx>
    
    Change-Id: I752617066bb2167b49239ab9d17b0c89754a3e12
    Closes-Bug: #1735407


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1735407

Title:
  [Nova] Evacuation doesn't respect anti-affinity rules

Status in Mirantis OpenStack:
  Won't Fix
Status in Mirantis OpenStack 9.x series:
  Won't Fix
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  --- Environment ---
  MOS: 9.2
  Nova: 13.1.1-7~u14.04+mos20
  3 compute nodes

  --- Steps to reproduce ---
  1. Create a new server group:
  nova server-group-create anti anti-affinity

  2. Launch 2 VMs in this server group:
  nova boot --image TestVM --flavor m1.tiny --nic net-id=889e4e01-9b38-4007-829d-b69d53269874 --hint group=def58398-4a00-4066-a2aa-13f1b6e7e327 vm-1
  nova boot --image TestVM --flavor m1.tiny --nic net-id=889e4e01-9b38-4007-829d-b69d53269874 --hint group=def58398-4a00-4066-a2aa-13f1b6e7e327 vm-2

  3. Stop nova-compute on the nodes where these 2 VMs are running:
  nova show vm-1 | grep "hypervisor"
  OS-EXT-SRV-ATTR:hypervisor_hostname  | node-12.domain.tld
  nova show vm-2 | grep "hypervisor"
  OS-EXT-SRV-ATTR:hypervisor_hostname  | node-13.domain.tld
  [root@node-12 ~]$ service nova-compute stop
  nova-compute stop/waiting
  [root@node-13 ~]$ service nova-compute stop
  nova-compute stop/waiting

  4. Evacuate both VMs almost at once:
  nova evacuate vm-1
  nova evacuate vm-2

  5. Check where these 2 VMs are running:
  nova show vm-1 | grep "hypervisor"
  nova show vm-2 | grep "hypervisor"

  --- Actual behavior ---
  Both VMs have been evacuated on the same node:
  [root@node-11 ~]$ virsh list
   Id    Name                           State
  ----------------------------------------------------
   2     instance-00000001              running
   3     instance-00000002              running

  --- Expected behavior ---
  According to the anti-affinity rule, only 1 VM is evacuated.
  Another one failed to evacuate with the appropriate message.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mos/+bug/1735407/+subscriptions