openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #21267
Re: AggregateInstanceExtraSpecs very slow?
On 26/02/2013, at 2:15 PM, Chris Behrens <cbehrens@xxxxxxxxxxxx> wrote:
>
> On Feb 25, 2013, at 6:39 PM, Joe Gordon <jogo@xxxxxxxxxxxxxxxx> wrote:
>
>>
>> It looks like the scheduler issues are related to the rabbitmq issues. "host 'qh2-rcc77' ... is disabled or has not been heard from in a while"
>>
>> What does 'nova host-list' say? the clocks must all be synced up?
>
> Good things to check. It feels like something is spinning way too much within this filter, though. This can also cause the above message. The scheduler pulls all of the records before it starts filtering… and if there's a huge delay somewhere, it can start seeing a bunch of hosts as disabled.
>
> The filter doesn't look like a problem.. unless there's a large amount of aggregate metadata… and/or a large amount of key/values for the instance_type's extra specs. There *is* a DB call in the filter. If that's blocking for an extended period of time, the whole process is blocked… But I suspect by the '100% cpu' comment, that this is not the case… So the only thing I can think of is that it returns a tremendous amount of metadata.
>
> Adding some extra logging in the filter could be useful.
>
> - Chris
Thanks Chris, I have 2 aggregates and 2 keys defined and each of the 80 hosts has either one or the other. At the moment every flavour has either one or the other too so I don't think it's too much data.
I've tracked it down to this call:
metadata = db.aggregate_metadata_get_by_host(context, host_state.host)
It's taking forever to complete. Just having a look into that code to see why, there is a nested for loop in there so my guess is something to do with that although there is hardly any data in our aggregates tables so I can't see it taking that long.
Cheers,
Sam
Follow ups
References