← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1488986] Re: nova scheduler for race condition

 

So, it's pretty hard to explain in one small comment how the model is
behaving, but please consider that we have 'sort of' two-phase commits
when booting an instance.

When a request comes in, you're right, the instances are elected
iteratively by decrementing the resource usage of the elected node in
HostState.consume_from_instance(). That means that when you're asking
for 10 instances of the same type, the corresponding HostState(s) will
be decremented before the next filters call which should provide a good
way for ensuring consistency. That's only when the 10 instances are
elected that the scheduler gives the answer back to the *conductor*
which calls the respective compute managers (ie. your step #3 is
incorrect since Juno).

Now, that HostState model is something kept in-memory and only refreshed
when a new request comes in. That means that if you have two schedulers
running separately (or when you have 2 concurrent requests coming in),
then yes you could have race conditions.

That's not really a problem in general, because if your could is enough
sized, it will go to the compute manager which will use a context
manager called "instance_claim()" for ensuring that its *OWN* internal
representation is correct (and that method is thread-safe in the
context). If the scheduler decision was incorrect, then it raises an
error which is catched by the compute manager which calls again the
conductor to ask for a reschedule (by excluding the wrong host).

So, see, when we have races, we have retries (that's the 2PC I
mentioned). That's not perfect, in particular when the cloud is a bit
full, and that's why we're working towards resolving that thru multiple
possibilities :

https://review.openstack.org/#/c/192260/7/doc/source/scheduler_evolution.rst,cm

To be honest, I don't see clear actionable items in your bug request.
I'd rather propose you to join the scheduler meetings happening every
Mondays at 1400 UTC if you wish to help us and contribute.


** Changed in: nova
       Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1488986

Title:
  nova scheduler for race condition

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  a) nova compute service updates info of compute-node by run update_available_resource every CONF.update_resources_interval(60s by default). 
  b) for every scheduler request:
  1. select_destinations is called and get all HostStates(if compute-node is newer that local hoststate info based on updated_at, update the HostStates with the compute info from DB)
  2. check if the host resource can meet instance requirement one by one with updating the HostState resource iteratively, if yes, send build_and_run_instance cast RPC to the corresponding compute node.
  3.compute service accept the amqp message and consumed the instance requirement and write new compute info into DB.
  4.compute try to spawn the instance, once failed, roll back step 3.

  My question:
  if user set CONF.update_resources_interval 1s, that is, compute node service updates compute info into DB every 1s. 
  For the case: the user sends multi nova boot request,  and the first boot request goes to step 2 and the compute node service runs periodic task update_available_resource at the same time. And the second boot request go to step 1 and the first request still not goes to step3, so the second boot request gets HostStates set without the first instance's consumption and scheduler service will schedule a host for it without considering the first instance consumption. And the following request repeats.

  Can this race condition occur?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1488986/+subscriptions


References