← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1746863] [NEW] database query via lazy-load in ServerGroup(Anti|)AffinityFilter

 

Public bug reported:

I happened upon this while hacking on my WIP CellDatabases fixture
patch. Some of the nova/tests/functional/test_server_group.py tests
started failing with multiple cells and I found that it's because
there's a database query objects.InstanceList.get_by_filters for all
instances who are members of the server group, every time the
ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't
check all cells, so it fails to return any hosts that group members are
currently on.

This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple
cells. Affinity *is* however ultimately checked via the late-affinity
check in compute, so affinity is not totally broken for multiple cells.

Aside from that, I would expect the database query to noticeably degrade
performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in
enabled_filters, for both the single cell and multiple cell cases.

To fix this, I expect we'll need to pre-load
RequestSpec.instance_group.hosts before we schedule each instance -- and
make sure we query all cells for the instances. I'm not sure what
special consideration we might need for multi-create.


This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells:

nova/scheduler/filters/affinity_filter.py:

        group_hosts = (spec_obj.instance_group.hosts
                       if spec_obj.instance_group else [])

nova/objects/instance_group.py:

    def obj_load_attr(self, attrname):
        ...  
        self.hosts = self.get_hosts()
        self.obj_reset_changes(['hosts'])

    ...

    @base.remotable
    def get_hosts(self, exclude=None):
        """Get a list of hosts for non-deleted instances in the group
        This method allows you to get a list of the hosts where instances in
        this group are currently running.  There's also an option to exclude
        certain instance UUIDs from this calculation.
        """
        filter_uuids = self.members
        if exclude:
            filter_uuids = set(filter_uuids) - set(exclude)
        filters = {'uuid': filter_uuids, 'deleted': False}
        instances = objects.InstanceList.get_by_filters(self._context,
                                                        filters=filters)
        return list(set([instance.host for instance in instances
                         if instance.host]))

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: cells performance scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1746863

Title:
  database query via lazy-load in ServerGroup(Anti|)AffinityFilter

Status in OpenStack Compute (nova):
  New

Bug description:
  I happened upon this while hacking on my WIP CellDatabases fixture
  patch. Some of the nova/tests/functional/test_server_group.py tests
  started failing with multiple cells and I found that it's because
  there's a database query objects.InstanceList.get_by_filters for all
  instances who are members of the server group, every time the
  ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't
  check all cells, so it fails to return any hosts that group members
  are currently on.

  This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple
  cells. Affinity *is* however ultimately checked via the late-affinity
  check in compute, so affinity is not totally broken for multiple
  cells.

  Aside from that, I would expect the database query to noticeably
  degrade performance of scheduling if the
  ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the
  single cell and multiple cell cases.

  To fix this, I expect we'll need to pre-load
  RequestSpec.instance_group.hosts before we schedule each instance --
  and make sure we query all cells for the instances. I'm not sure what
  special consideration we might need for multi-create.

  
  This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells:

  nova/scheduler/filters/affinity_filter.py:

          group_hosts = (spec_obj.instance_group.hosts
                         if spec_obj.instance_group else [])

  nova/objects/instance_group.py:

      def obj_load_attr(self, attrname):
          ...  
          self.hosts = self.get_hosts()
          self.obj_reset_changes(['hosts'])

      ...

      @base.remotable
      def get_hosts(self, exclude=None):
          """Get a list of hosts for non-deleted instances in the group
          This method allows you to get a list of the hosts where instances in
          this group are currently running.  There's also an option to exclude
          certain instance UUIDs from this calculation.
          """
          filter_uuids = self.members
          if exclude:
              filter_uuids = set(filter_uuids) - set(exclude)
          filters = {'uuid': filter_uuids, 'deleted': False}
          instances = objects.InstanceList.get_by_filters(self._context,
                                                          filters=filters)
          return list(set([instance.host for instance in instances
                           if instance.host]))

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1746863/+subscriptions


Follow ups