← Back to team overview

openstack team mailing list archive

Re: Scheduler issues in folsom

 

On Wed, Oct 31, 2012 at 12:21 AM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote:
> Hi All,
>
> I'm having what I consider serious issues with teh scheduler in
> Folsom.  It seems to relate to the introdution of threading in the
> scheduler.
How many scheduler instances do you have?
>
> For a number of local reason we prefer to have instances start on the
> compute node with the least amount of free RAM that is still enough to
> satisfy the request which is the reverse of the default policy of
> scheduling on the system with the most free RAM.  I'm fairly certain
> the smae behavior would be seen with that policy as well, and any
> other policy that results in a "best" choice for scheduling the next
> instance.
>
> We have work loads that start hundreds of instances or the same image
> and there are plans on scaling this to thousands.  What I'm seeing is
> somehting like this:
>
> * user submits API request for 300 instances
> * scheduler puts them all on one node
> * retry schedule kicks in at some point for the 276 that don't fit
> * those 276 are all scheduled on the next "best" node
> * retry cycle repeats with the 252 that don't fit there
>
> I'm not clear exactly where the RetryScheduler in serts itself (I
> should probably read it) but the first compute node is very overloaded
> handling start up request which results in a fair number of instances
> entering "ERROR" state rather than rescheduling (so not all 276
> actually make it to the next round) and the whole process it painfully
> slow.  In the end we are lucky to see 50% of the requested instances
> actually make it into Active state (and then only becasue we increased
> scheduler_max_attempts).
>
> Is that really how it's supposed to work?  With the introduction of
> the RetryScheduler as a fix for the scheduling race condition I think
> it is, but it is a pretty bad solution for me, unless I'm missing
> something, am I?  wouln't be the first time...
>
> For now I'm working around this by using the ChanceScheduler
> (compute_scheduler_driver=nova.scheduler.chance.ChanceScheduler) so
> the scheduler threads don't pick a "best" node.  This is orders of
> magnitude faster and consistantly successful in my tests.  It is not
> ideal for us as we have a small minority of ciompute nodes with twice
> the memory capacity of our standard nodes and would prefer to keep
> those available for some of our extra large memory flavors and we'd
> also liek to minimize memory fragmentation on the standard sized nodes
> for similar reasons.
>
> -Jon
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp



-- 
Regards
Huang Zhiteng


References