← Back to team overview

openstack team mailing list archive

Re: quota question


On Fri, Jul 20, 2012 at 4:38 AM, Eoghan Glynn <eglynn@xxxxxxxxxx> wrote:

> Hi Narayan,
> I had the idea previously of applying a "weighting function" to the
> resource usage being allocated from the quota, as opposed to simply
> counting raw instances.
> The notion I had in mind was more related to image usage in glance,
> where the image "footprint" can vary very widely. However I think it
> could be useful for some nova resources also.
> Now for some resource types, for example say volumes, usage can be
> controlled along multiple axes (i.e. number of volumes and total size),
> so that gives more flexibility.
> But if I'm hearing you correctly, you'd want to apply a lower weighting
> to instances that are scheduled onto one of the higher-memory compute
> nodes, and vice versa a higher weighting to instances that happen to
> be run on lower-memory nodes.

> Does that sum it up, or have I misunderstood?

I think you've got it. I hadn't really asked with a particular
solution in mind, i was mainly looking for ideas.

I think that weighting would help. Effectively we need to discount
memory usage on the bigmem nodes, or something like that.

The harder part is that we need to be able to specify
independent/orthogonal quota constraints on different flavors. It
would be really useful to be able to say basically, you can have 2TB
of memory from this flavor, and 4TB of memory from that flavor. This
would allow saying something like "you can have up to 3 1TB instances,
and independently have up to 3TB of small instances as well."

> BTW what kind of nova-scheduler config are you using?

We're using the filter scheduler. We've defined a bunch of custom
flavors, in addition to the stock ones, that allow us to fill up all
of our node types. So for each node type, we define flavors for the
complete node (minus a GB of memory for the hypervisor), and 3/4, 1/2,
1/4, and 1/8, 1/16, and 1/32 of the node. We've used a machine type
prefix for each one. The compute nodes are IBM idataplex, so we have
idp.{100,75,50,25,12,6,3}. We've done this for each machine type, so
we have idp.*, mem.*, gpu.*, etc. Each machine type has a unique
hostname prefix (cc for the idp nodes, cm for the bigmem nodes, cg for
gpu nodes, etc), and the filter scheduler is setup to route requests
for these custom flavors only to nodes with the appropriate hostname
prefix. This isn't an ideal solution, but it minimizes risk of
fragmentation. (With the default flavors, we'd see a lot of cases
where there was idle capacity left on the nodes that wasn't usable
because the ratio was wrong for the default flavors)

So far, this scheduling scheme has worked pretty well, aside from
leaving some instances in a weird state when you try to start a bunch
(20-50) at a time. I haven't had time to track that down yet.

Follow ups