← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1488986] [NEW] nova scheduler for race condition

 

Public bug reported:

a) nova compute service updates info of compute-node by run update_available_resource every CONF.update_resources_interval(60s by default). 
b) for every scheduler request:
1. select_destinations is called and get all HostStates(if compute-node is newer that local hoststate info based on updated_at, update the HostStates with the compute info from DB)
2. check if the host resource can meet instance requirement one by one with updating the HostState resource iteratively, if yes, send build_and_run_instance cast RPC to the corresponding compute node.
3.compute service accept the amqp message and consumed the instance requirement and write new compute info into DB.
4.compute try to spawn the instance, once failed, roll back step 3.

My question:
if user set CONF.update_resources_interval 1s, that is, compute node service updates compute info into DB every 1s. 
For the case: the user sends multi nova boot request,  and the first boot request goes to step 2 and the compute node service runs periodic task update_available_resource at the same time. And the second boot request go to step 1 and the first request still not goes to step3, so the second boot request gets HostStates set without the first instance's consumption and scheduler service will schedule a host for it without considering the first instance consumption. And the following request repeats.

Can this race condition occur?

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1488986

Title:
  nova scheduler for race condition

Status in OpenStack Compute (nova):
  New

Bug description:
  a) nova compute service updates info of compute-node by run update_available_resource every CONF.update_resources_interval(60s by default). 
  b) for every scheduler request:
  1. select_destinations is called and get all HostStates(if compute-node is newer that local hoststate info based on updated_at, update the HostStates with the compute info from DB)
  2. check if the host resource can meet instance requirement one by one with updating the HostState resource iteratively, if yes, send build_and_run_instance cast RPC to the corresponding compute node.
  3.compute service accept the amqp message and consumed the instance requirement and write new compute info into DB.
  4.compute try to spawn the instance, once failed, roll back step 3.

  My question:
  if user set CONF.update_resources_interval 1s, that is, compute node service updates compute info into DB every 1s. 
  For the case: the user sends multi nova boot request,  and the first boot request goes to step 2 and the compute node service runs periodic task update_available_resource at the same time. And the second boot request go to step 1 and the first request still not goes to step3, so the second boot request gets HostStates set without the first instance's consumption and scheduler service will schedule a host for it without considering the first instance consumption. And the following request repeats.

  Can this race condition occur?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1488986/+subscriptions


Follow ups