openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #18015
Re: Scheduler issues in folsom
On Wed, Oct 31, 2012 at 6:55 AM, Vishvananda Ishaya
<vishvananda@xxxxxxxxx> wrote:
> The retry scheduler is NOT meant to be a workaround for this. It sounds like
> the ram filter is not working properly somehow. Have you changed the setting
> for ram_allocation_ratio? It defaults to 1.5 allowing overallocation, but in
> your case you may want 1.0.
>
> I would be using the following two config options to achieve what you want:
> compute_fill_first_cost_fn_weight=1.0
> ram_allocation_ratio=1.0
I'd suggest the same ratio too. But besides memory overcommitment, I
suspect this issue is also related to how KVM do memory allocation (it
doesn't do actual allocation of the entire memory for guest when
booting). I've seen compute node reported more memory than it should
have (e.g. 4G node has two 1GB instances running but still report >3GB
free memory) because libvirt driver calculates free memory simply
based on /proc/meminfo, which doesn't reflect how many memory guests
are intended to use.
>
> If you are using the settings above, then the scheduler should be using up the
> resources on the node it schedules to until it fills up the available ram and
> then moving on to the next node. If this is not occurring then you have uncovered
> some sort of bug.
>
> Vish
> On Oct 30, 2012, at 9:21 AM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote:
>
>> Hi All,
>>
>> I'm having what I consider serious issues with teh scheduler in
>> Folsom. It seems to relate to the introdution of threading in the
>> scheduler.
>>
>> For a number of local reason we prefer to have instances start on the
>> compute node with the least amount of free RAM that is still enough to
>> satisfy the request which is the reverse of the default policy of
>> scheduling on the system with the most free RAM. I'm fairly certain
>> the smae behavior would be seen with that policy as well, and any
>> other policy that results in a "best" choice for scheduling the next
>> instance.
>>
>> We have work loads that start hundreds of instances or the same image
>> and there are plans on scaling this to thousands. What I'm seeing is
>> somehting like this:
>>
>> * user submits API request for 300 instances
>> * scheduler puts them all on one node
>> * retry schedule kicks in at some point for the 276 that don't fit
>> * those 276 are all scheduled on the next "best" node
>> * retry cycle repeats with the 252 that don't fit there
>>
>> I'm not clear exactly where the RetryScheduler in serts itself (I
>> should probably read it) but the first compute node is very overloaded
>> handling start up request which results in a fair number of instances
>> entering "ERROR" state rather than rescheduling (so not all 276
>> actually make it to the next round) and the whole process it painfully
>> slow. In the end we are lucky to see 50% of the requested instances
>> actually make it into Active state (and then only becasue we increased
>> scheduler_max_attempts).
>>
>> Is that really how it's supposed to work? With the introduction of
>> the RetryScheduler as a fix for the scheduling race condition I think
>> it is, but it is a pretty bad solution for me, unless I'm missing
>> something, am I? wouln't be the first time...
>>
>> For now I'm working around this by using the ChanceScheduler
>> (compute_scheduler_driver=nova.scheduler.chance.ChanceScheduler) so
>> the scheduler threads don't pick a "best" node. This is orders of
>> magnitude faster and consistantly successful in my tests. It is not
>> ideal for us as we have a small minority of ciompute nodes with twice
>> the memory capacity of our standard nodes and would prefer to keep
>> those available for some of our extra large memory flavors and we'd
>> also liek to minimize memory fragmentation on the standard sized nodes
>> for similar reasons.
>>
>> -Jon
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
--
Regards
Huang Zhiteng
Follow ups
References