yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1647584] Re: Instance with anti-affinity server group booted failed in concurrent scenario

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Sean Dague <sean@xxxxxxxxx>
Date: Wed, 07 Dec 2016 20:39:07 -0000
Reply-to: Bug 1647584 <1647584@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

There are not strong guaruntees about scheduling, especially when you
are hitting failure domains. So while this is not ideal, I think this is
more of a wishlist item to have this function when you are at a full
packed scenario (more guests than computes trying policy based
placement).

Marking Opinion / Wishlist. It is fine to push fixes for this bug if you
have one.

** Changed in: nova
       Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1647584

Title:
  Instance with anti-affinity server group booted failed in concurrent
  scenario

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Description
  ===========
  the follows assumption scenario。
  1. The compute resources are enough.
  2. Booting instances in concurrent scenario.
  3. The number of instance is more than the instances'。
  4. The instances booting with the same anti-affinity.
  5. more than one controller node。

  In the above scenario， the number of instances booting failed are more
  than expected。In concurrent scenario， one more instances will be
  scheduled to the same compute nodes even specifying anti-affinity， so
  after 'instance_claim', compute will check the anti-affinity without
  lock， perhaps two or more instances will be checked together and
  failed because of affecting each other。so these instances will be
  rescheduled. In the next scheduling， the previous compute node will be
  ignored.

  Steps to reproduce
  ==================
  1. Assumpt 3 compute nodes and 2 or more controller nodes.
  2. Create a anti-affinity server group.
  3. Construct a bash script for booting instances with anti-affinity group in concurrent scenario.
     nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test01 &
  nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test02 &
  nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test03 &
  nova boot --flavor 1 --image cirros --nic net-id=b0406792-26a8-4f26-843e-3b2231dbd4da --hint group=eaa8694e-8c83-47f2-8c02-93657c08d2bd lt_test04 
  4. execute the bash script.

  Expected result
  ===============
  3 instances were booting successfully.

  Actual result
  =============
  2 instances were booting succssfully.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1647584/+subscriptions

References

[Bug 1647584] [NEW] Instance with anti-affinity server group booted failed in concurrent scenario
From: Tao Li, 2016-12-06