openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #18061
Re: Scheduler issues in folsom
Hi Jonathan,
If I understand correctly, that bug is about multiple scheduler
instances(processes) doing scheduler at the same time. When compute
node found itself unable to fulfil a create_instance request, it'll
resend the request back to scheduler (max_retry is to avoid endless
retry). From your description, I only see one scheduler. And you are
right, even memory may have some issue but about cpu_allocation_ratio
should have limit scheduler to put instances with more vCPUs then
pCPUs. What openstack package are you using?
On Wed, Oct 31, 2012 at 11:41 PM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote:
> Hi All
>
> While the RetryScheduler may not have been designed specifically to
> fix this issue https://bugs.launchpad.net/nova/+bug/1011852 suggests
> that it is meant to fix it, well if "it" is a scheduler race condition
> which is my suspicion.
>
> This is my current scheduler config which gives the failure mode I describe:
>
> scheduler_available_filters=nova.scheduler.filters.standard_filters
> scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilte
> r,RetryFilter
> scheduler_max_attempts=30
> least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
> compute_fill_first_cost_fn_weight=1.0
> cpu_allocation_ratio=1.0
> ram_allocation_ratio=1.0
>
> I'm running the scheduler and api server on a single controller host
> and it's pretty consistent about scheduling >hundred instances per
> node at first then iteratively rescheduling them elsewhere when
> presented with either an single API request to start many instances
> (using euca2ools) or a shell loop around nova boot to generate one api
> request per server.
>
> the cpu_allocation ratio should limit the scheduler to 24 instances
> per compute node regardless how how it's calculating memory, so while
> I talked a lot about memory allocation as a motivation it is more
> frequent for cpu to actually be the limiting factor in my deployment
> and it certainly should.
>
> And yet after attempting to launch 200 m1.tiny instances:
>
> root@nimbus-0:~# nova-manage service describe_resource nova-23
> 2012-10-31 11:17:56
> HOST PROJECT cpu mem(mb) hdd
> nova-23 (total) 24 48295 882
> nova-23 (used_now) 107 56832 30
> nova-23 (used_max) 107 56320 30
> nova-23 98333a1a28e746fa8c629c83a818ad57 106
> 54272 0
> nova-23 3008a142e9524f7295b06ea811908f93 1
> 2048 30
>
> eventually those bleed off to other systems though not entirely
>
> 2012-10-31 11:29:41
> HOST PROJECT cpu mem(mb) hdd
> nova-23 (total) 24 48295 882
> nova-23 (used_now) 43 24064 30
> nova-23 (used_max) 43 23552 30
> nova-23 98333a1a28e746fa8c629c83a818ad57 42
> 21504 0
> nova-23 3008a142e9524f7295b06ea811908f93 1
> 2048 30
>
> at this point 12min later out of 200 instances 168 are active 22 are
> errored and 10 are still "building". Notably only 23 actual VMs are
> running on "nova-23":
>
> root@nova-23:~# virsh list|grep instance |wc -l
> 23
>
> So that's what I see perhaps my assumptions about why I'm seeing it
> are incorrect
>
> Thanks,
> -Jon
--
Regards
Huang Zhiteng
Follow ups
References