openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #18041
Re: Scheduler issues in folsom
Hi All
While the RetryScheduler may not have been designed specifically to
fix this issue https://bugs.launchpad.net/nova/+bug/1011852 suggests
that it is meant to fix it, well if "it" is a scheduler race condition
which is my suspicion.
This is my current scheduler config which gives the failure mode I describe:
scheduler_available_filters=nova.scheduler.filters.standard_filters
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilte
r,RetryFilter
scheduler_max_attempts=30
least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
compute_fill_first_cost_fn_weight=1.0
cpu_allocation_ratio=1.0
ram_allocation_ratio=1.0
I'm running the scheduler and api server on a single controller host
and it's pretty consistent about scheduling >hundred instances per
node at first then iteratively rescheduling them elsewhere when
presented with either an single API request to start many instances
(using euca2ools) or a shell loop around nova boot to generate one api
request per server.
the cpu_allocation ratio should limit the scheduler to 24 instances
per compute node regardless how how it's calculating memory, so while
I talked a lot about memory allocation as a motivation it is more
frequent for cpu to actually be the limiting factor in my deployment
and it certainly should.
And yet after attempting to launch 200 m1.tiny instances:
root@nimbus-0:~# nova-manage service describe_resource nova-23
2012-10-31 11:17:56
HOST PROJECT cpu mem(mb) hdd
nova-23 (total) 24 48295 882
nova-23 (used_now) 107 56832 30
nova-23 (used_max) 107 56320 30
nova-23 98333a1a28e746fa8c629c83a818ad57 106
54272 0
nova-23 3008a142e9524f7295b06ea811908f93 1
2048 30
eventually those bleed off to other systems though not entirely
2012-10-31 11:29:41
HOST PROJECT cpu mem(mb) hdd
nova-23 (total) 24 48295 882
nova-23 (used_now) 43 24064 30
nova-23 (used_max) 43 23552 30
nova-23 98333a1a28e746fa8c629c83a818ad57 42
21504 0
nova-23 3008a142e9524f7295b06ea811908f93 1
2048 30
at this point 12min later out of 200 instances 168 are active 22 are
errored and 10 are still "building". Notably only 23 actual VMs are
running on "nova-23":
root@nova-23:~# virsh list|grep instance |wc -l
23
So that's what I see perhaps my assumptions about why I'm seeing it
are incorrect
Thanks,
-Jon
Follow ups
References