← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1821755] Re: live migration break the anti-affinity policy of server group simultaneously

 

*** This bug is a duplicate of bug 1526642 ***
    https://bugs.launchpad.net/bugs/1526642

This is a long-standing known issue I believe, same for server build and
evacuate (evacuate was fixed later in Rocky I think). There is a late
affinity check in the compute service to check for the race in the
scheduler and then reschedule for server create to another host, or fail
in the case of evacuate. There is no such late affinity check for other
move operations like live migration, cold migration (resize) or
unshelve.

I believe StarlingX's nova fork has some server group checks in the live
migration task though, so maybe those fixes could be 'upstreamed' to
nova:

https://github.com/starlingx-staging/stx-
nova/blob/3155137b8a0f00cfdc534e428037e1a06e98b871/nova/conductor/tasks/live_migrate.py#L88

Looking at that StarlingX code, they basically check to see if the
server being live migrated is in an anti-affinity group and if so they
restrict scheduling via external lock to one live migration at a time,
which might be OK in a small edge node with 1-2 compute nodes but would
be pretty severe in a large public cloud with lots of concurrent live
migrations. Granted it's only the scheduling portion of the live
migration task, not the actual live migration of the guest itself once a
target host is selected. I'm also not sure if that external lock would
be sufficient if you have multiple nova-conductors running on different
hosts unless you were using a distributed lock manager like etcd, which
nova upstream does not use (I'm not sure if oslo.concurrency can be
configured for etcd under the covers or not).

Long-term this should all be resolved with placement when we can model
affinity and anti-affinity in the placement service.

** Tags added: starlingx

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** This bug has been marked a duplicate of bug 1526642
   Simultaneous live migrations break anti-affinity policy

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821755

Title:
  live migration break the anti-affinity policy of server group
  simultaneously

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Description
  ===========
  If we live migrate two instance simultaneously, the instances will break the instance group policy.

  Steps to reproduce
  ==================
  OpenStack env with three compute nodes(node1, node2 and node3). Then we create two VMs(vm1, vm2) with the anti-affinity policy.
  At last, we live migrate two VMs simultaneously.

  Before live-migration, the VMs are located as followed:
  node1  ->  vm1
  node2  ->  vm2
  node3

  * nova live-migration vm1
  * nova live-migration vm2

  Expected result
  ===============
  Fail to live migrate vm1 and vm2.

  Actual result
  =============
  node1
  node2
  node3  ->  vm1,vm2

  Environment
  ===========
  master branch of openstack

  As described above, the live migration could not check the in-progress
  live-migration and just select the host by scheduler filter. So that
  they are migrated to the same host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821755/+subscriptions


References