openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #21271
Re: AggregateInstanceExtraSpecs very slow?
Replying from my phone, so I can't look, but I wonder if we have an index missing.
On Feb 25, 2013, at 8:54 PM, Sam Morrison <sorrison@xxxxxxxxx> wrote:
> On Tue, Feb 26, 2013 at 3:15 PM, Sam Morrison <sorrison@xxxxxxxxx> wrote:
>>
>> On 26/02/2013, at 2:15 PM, Chris Behrens <cbehrens@xxxxxxxxxxxx> wrote:
>>
>>>
>>> On Feb 25, 2013, at 6:39 PM, Joe Gordon <jogo@xxxxxxxxxxxxxxxx> wrote:
>>>
>>>>
>>>> It looks like the scheduler issues are related to the rabbitmq issues. "host 'qh2-rcc77' ... is disabled or has not been heard from in a while"
>>>>
>>>> What does 'nova host-list' say? the clocks must all be synced up?
>>>
>>> Good things to check. It feels like something is spinning way too much within this filter, though. This can also cause the above message. The scheduler pulls all of the records before it starts filtering… and if there's a huge delay somewhere, it can start seeing a bunch of hosts as disabled.
>>>
>>> The filter doesn't look like a problem.. unless there's a large amount of aggregate metadata… and/or a large amount of key/values for the instance_type's extra specs. There *is* a DB call in the filter. If that's blocking for an extended period of time, the whole process is blocked… But I suspect by the '100% cpu' comment, that this is not the case… So the only thing I can think of is that it returns a tremendous amount of metadata.
>>>
>>> Adding some extra logging in the filter could be useful.
>>>
>>> - Chris
>>
>> Thanks Chris, I have 2 aggregates and 2 keys defined and each of the 80 hosts has either one or the other. At the moment every flavour has either one or the other too so I don't think it's too much data.
>>
>> I've tracked it down to this call:
>>
>> metadata = db.aggregate_metadata_get_by_host(context, host_state.host)
>
> More debugging has got it down to a query
>
> In db.api.aggregate_metadata_get_by_host:
>
> query = model_query(context, models.Aggregate).join(
> "_hosts").filter(models.AggregateHost.host == host).join(
> "_metadata")
> ......
> rows = query.all()
>
> With query debug on this resolves to:
>
> SELECT aggregates.created_at AS aggregates_created_at,
> aggregates.updated_at AS aggregates_updated_at, aggregates.deleted_at
> AS aggregates_deleted_at, aggregates.deleted AS aggregates_deleted,
> aggregates.id AS aggregates_id, aggregates.name AS aggregates_name,
> aggregates.availability_zone AS aggregates_availability_zone,
> aggregate_hosts_1.created_at AS aggregate_hosts_1_created_at,
> aggregate_hosts_1.updated_at AS aggregate_hosts_1_updated_at,
> aggregate_hosts_1.deleted_at AS aggregate_hosts_1_deleted_at,
> aggregate_hosts_1.deleted AS aggregate_hosts_1_deleted,
> aggregate_hosts_1.id AS aggregate_hosts_1_id, aggregate_hosts_1.host
> AS aggregate_hosts_1_host, aggregate_hosts_1.aggregate_id AS
> aggregate_hosts_1_aggregate_id FROM aggregates INNER JOIN
> aggregate_hosts AS aggregate_hosts_2 ON aggregates.id =
> aggregate_hosts_2.aggregate_id AND aggregate_hosts_2.deleted = 0 AND
> aggregates.deleted = 0 INNER JOIN aggregate_hosts ON
> aggregate_hosts.aggregate_id = aggregates.id AND
> aggregate_hosts.deleted = 0 AND aggregates.deleted = 0 INNER JOIN
> aggregate_metadata AS aggregate_metadata_1 ON aggregates.id =
> aggregate_metadata_1.aggregate_id AND aggregate_metadata_1.deleted = 0
> AND aggregates.deleted = 0 INNER JOIN aggregate_metadata ON
> aggregate_metadata.aggregate_id = aggregates.id AND
> aggregate_metadata.deleted = 0 AND aggregates.deleted = 0 LEFT OUTER
> JOIN aggregate_hosts AS aggregate_hosts_3 ON aggregates.id =
> aggregate_hosts_3.aggregate_id AND aggregate_hosts_3.deleted = 0 AND
> aggregates.deleted = 0 LEFT OUTER JOIN aggregate_hosts AS
> aggregate_hosts_1 ON aggregate_hosts_1.aggregate_id = aggregates.id
> AND aggregate_hosts_1.deleted = 0 AND aggregates.deleted = 0 WHERE
> aggregates.deleted = 0 AND aggregate_hosts.host = 'qh2-rcc34';
>
> Which in our case returns 328509 rows in set (25.97 sec)
>
> Seems a bit off considering there are 80 rows in aggregate_hosts, 2
> rows in aggregates and 2 rows in aggregate_metadata
>
> In the code rows is only equal to 1 so it seems to be doing something
> inside to code to do this? Don't know too much how sqlalchemy works.
>
> Seems like a bug to me? or maybe our database has something wrong in it?
>
> Cheers,
> Sam
Follow ups
References