yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #78599
[Bug 1830234] [NEW] InstanceGroup.get_hosts() uses inefficient DB queries
Public bug reported:
The InstanceGroup.get_hosts() method is pretty inefficient when pulling
instances out of the database here:
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/objects/instance_group.py#L500
because that by default is going to join on the following tables:
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/db/sqlalchemy/api.py#L2098
if columns_to_join is None:
columns_to_join_new = ['info_cache', 'security_groups']
manual_joins = ['metadata', 'system_metadata']
And then just turn around and only use the instance.host value:
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/objects/instance_group.py#L502
return list(set([instance.host for instance in instances
if instance.host]))
We should be:
1. Avoiding those unnecessary joins by passing expected_attrs=[] which
is a simple backportable fix.
2. Write a new DB API method which would get the set of distinct
instances.host values for the list of instance uuids where the
instances.host value is not None, so I think:
SELECT host FROM instances WHERE host IS NOT NULL AND deleted == 0 AND
uuid IN ($instance_uuids) GROUP BY instances.host;
That way we let the DB query do the work and we don't have to load up
all of the additional instances fields we don't care about.
The DB API optimization would likely need to be a remotable method
because InstanceGroup.get_hosts() is calling in the late affinity check
ComputeManager._validate_instance_group_policy() method in the nova-
compute service (and because InstanceGroup.get_hosts() is remotable
itself).
** Affects: nova
Importance: Medium
Assignee: Matt Riedemann (mriedem)
Status: Triaged
** Tags: db performance
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1830234
Title:
InstanceGroup.get_hosts() uses inefficient DB queries
Status in OpenStack Compute (nova):
Triaged
Bug description:
The InstanceGroup.get_hosts() method is pretty inefficient when
pulling instances out of the database here:
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/objects/instance_group.py#L500
because that by default is going to join on the following tables:
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/db/sqlalchemy/api.py#L2098
if columns_to_join is None:
columns_to_join_new = ['info_cache', 'security_groups']
manual_joins = ['metadata', 'system_metadata']
And then just turn around and only use the instance.host value:
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/objects/instance_group.py#L502
return list(set([instance.host for instance in instances
if instance.host]))
We should be:
1. Avoiding those unnecessary joins by passing expected_attrs=[] which
is a simple backportable fix.
2. Write a new DB API method which would get the set of distinct
instances.host values for the list of instance uuids where the
instances.host value is not None, so I think:
SELECT host FROM instances WHERE host IS NOT NULL AND deleted == 0 AND
uuid IN ($instance_uuids) GROUP BY instances.host;
That way we let the DB query do the work and we don't have to load up
all of the additional instances fields we don't care about.
The DB API optimization would likely need to be a remotable method
because InstanceGroup.get_hosts() is calling in the late affinity
check ComputeManager._validate_instance_group_policy() method in the
nova-compute service (and because InstanceGroup.get_hosts() is
remotable itself).
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1830234/+subscriptions
Follow ups