← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wallyworld/launchpad/person-picker-results-order into lp:launchpad

 

Ian Booth has proposed merging lp:~wallyworld/launchpad/person-picker-results-order into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #720239 in Launchpad itself: "Show exact matches first in the person-picker"
  https://bugs.launchpad.net/launchpad/+bug/720239

For more details, see:
https://code.launchpad.net/~wallyworld/launchpad/person-picker-results-order/+merge/63057

The change ensures that the person picker results are sensibly ordered and also introduces extra matching options when searching.

== Implementation ==

The core issue was that the query to do the search on Person was constructed in such a way that even though a ranking value was calculated for each match, this ordering was discarded in the final washup. Due to bugs in Storm at the time the work was done, plus implementation decisions at the time, the query used a number of nested selects which contributed to the problem.

The query was totally restructured to allow the ranking value for each match type to be used for the final ordering of results. A SQL WITH construct was used to allow this to be done simply - one query to get the rankings and person ids, a second query to assemble the results. Also, the union of queries used to do the matching was simplified by using SQL CASE construct in places. Finally, the searching for public and private teams was simplified and a union is only used when required to match against the team participation table.

Additional matching was provided: partial matches on displayname and name. So the matches done in order of importance are:
1. name (lp id) match
2. name prefix match
3. case insensitive displayname prefix match
4. irc nickname match
5. case insensitive email prefix match
6. fti match

To keep performance, it is necessary to index lower(displayname) on Person. This index is/will be created manually on qastaging, staging, prod by stub. I have included in this branch the db patch for the index creation, but since the index is being created "live" there's no need to land this in db-devel.

Another performance improvement was made - the email address and irc nick records required for the person result set are bulk loaded and added to the property cache. Without this change, one query per person per property (email, nick) was made to get the info to display. Note that the ircnicknames field on Person was an old style SQLMultiJoin so this was converted to be private and a new public cached property introduced. This field is readonly so the change done is the simplist approach and works ok.

With the structure of the query, a couple of options were explored. Profiling was done on qas and the resulting solution the result of that. The queries look something like:

WITH MatchingPerson AS
  ( SELECT id, MAX(rank) AS rank, FALSE AS is_private_team
   FROM
     ( SELECT Person.id, 
      (CASE WHEN person.name='team' THEN 100 
            WHEN LOWER(person.name) LIKE 'team' || '%%' THEN 75 
            WHEN LOWER(person.displayname) LIKE 'team' || '%%' THEN 50 
            ELSE rank(fti, ftq('team')) 
      END) AS rank
      FROM Person
      WHERE LOWER(Person.name) LIKE 'team' || '%%'
        OR LOWER(Person.displayname) LIKE 'team' || '%%'
        OR Person.fti @@ ftq('team')
      UNION ALL SELECT Person.id, 25 AS rank
      FROM Person, IrcID
      WHERE Person.id = IrcID.person
        AND IrcID.nickname = 'team'
      UNION ALL SELECT Person.id, 10 AS rank
      FROM Person, EmailAddress
      WHERE Person.id = EmailAddress.person
        AND LOWER(EmailAddress.email) LIKE 'team' || '%%'
        AND status IN (2, 4) ) AS person_match
   GROUP BY id, is_private_team )
SELECT Person.account,
       Person.creation_comment,
       Person.creation_rationale,
       Person.datecreated,
       Person.defaultmembershipperiod,
       Person.defaultrenewalperiod,
       Person.displayname,
       Person.hide_email_addresses,
       Person.homepage_content,
       Person.icon,
       Person.id,
       Person.logo,
       Person.mailing_list_auto_subscribe_policy,
       Person.merged,
       Person.mugshot,
       Person.name,
       Person.personal_standing,
       Person.personal_standing_reason,
       Person.registrant,
       Person.renewal_policy,
       Person.subscriptionpolicy,
       Person.teamdescription,
       Person.teamowner,
       Person.verbose_bugnotifications,
       Person.visibility
FROM MatchingPerson,
     Person
LEFT JOIN EmailAddress ON EmailAddress.person = Person.id
WHERE (Person.id = MatchingPerson.id)
  AND (Person.visibility = 1
       AND Person.merged IS NULL
       AND (NOT (Person.teamowner IS NULL)
            OR EmailAddress.status = 4)
       OR NOT (Person.teamowner IS NULL)
       AND Person.visibility = 30)
  AND TRUE
ORDER BY rank DESC, Person.displayname, Person.name

== Tests ==

Tests for the new ordering were added to lib/registry/doc/vocabularies.txt. In addition, one existing test was changed to account for the new ordering and another deleted since it's no longer required. The other existing tests in vocabularies.txt ensure the correct results for private teams etc are still being produced. An existing test in vocabulary-json.txt was also modified to account for the new ordering.

== Lint ==

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/canonical/launchpad/doc/vocabulary-json.txt
  lib/lp/registry/vocabularies.py
  lib/lp/registry/doc/vocabularies.txt
  lib/lp/registry/model/person.py

./lib/canonical/launchpad/doc/vocabulary-json.txt
       1: narrative uses a moin header.
     162: want exceeds 78 characters.
./lib/lp/registry/vocabularies.py
     663: E261 at least two spaces before inline comment



-- 
https://code.launchpad.net/~wallyworld/launchpad/person-picker-results-order/+merge/63057
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wallyworld/launchpad/person-picker-results-order into lp:launchpad.
=== added file 'database/schema/patch-2208-99-0.sql'
--- database/schema/patch-2208-99-0.sql	1970-01-01 00:00:00 +0000
+++ database/schema/patch-2208-99-0.sql	2011-06-01 00:47:33 +0000
@@ -0,0 +1,5 @@
+SET client_min_messages=ERROR;
+
+CREATE INDEX person__displayname__idx ON Person(lower(displayname));
+
+INSERT INTO LaunchpadDatabaseRevision VALUES (2208, 99, 0);

=== modified file 'lib/canonical/launchpad/doc/vocabulary-json.txt'
--- lib/canonical/launchpad/doc/vocabulary-json.txt	2011-05-27 01:06:07 +0000
+++ lib/canonical/launchpad/doc/vocabulary-json.txt	2011-06-01 00:47:33 +0000
@@ -84,10 +84,10 @@
     {
         "entries": [
             {
-                "api_uri": "/~commercial-admins",
+                "api_uri": "/~admins",
                 "css": "sprite team",
-                "title": "Commercial Subscription Admins (commercial-admins)",
-                "value": "commercial-admins"
+                "title": "Launchpad Administrators (admins)",
+                "value": "admins"
             }
         ],
         "total_size": 6

=== modified file 'lib/lp/registry/doc/vocabularies.txt'
--- lib/lp/registry/doc/vocabularies.txt	2011-05-27 19:53:20 +0000
+++ lib/lp/registry/doc/vocabularies.txt	2011-06-01 00:47:33 +0000
@@ -890,34 +890,54 @@
 
 Since there are so many people and teams a vocabulary that includes them
 all is not very useful when displaying in the user interface.  So we
-limit the number of results.  The results are ordered by displayname and
+limit the number of results.  The results are ordered by rank, displayname and
 the first set of those are the ones returned
 
     >>> login(ANONYMOUS)
     >>> [person.displayname for person in vocab.search('team')]
-    [u'HWDB Team', u'Hoary Gnome Team', u'No Team Memberships',
-     u'Other Team', u'Simple Team', u'Ubuntu Gnome Team',
-     u'Ubuntu Security Team', u'Ubuntu Team', u'Warty Gnome Team',
-     u'Warty Security Team', u'testing Spanish team']
+    [u'HWDB Team', u'No Team Memberships', u'Simple Team', u'Ubuntu Team',
+     u'testing Spanish team', u'Hoary Gnome Team', u'Other Team',
+     u'Ubuntu Gnome Team', u'Ubuntu Security Team', u'Warty Gnome Team',
+     u'Warty Security Team']
+
+If a match is done against irc nick, that is ranked higher than a fti match.
+
+    >>> from lp.registry.interfaces.irc import IIrcIDSet
+    >>> ircid_set = getUtility(IIrcIDSet)
+
+    >>> irc_person = factory.makePerson(name='ircperson')
+    >>> irc_id = ircid_set.new(
+    ...     irc_person, 'irc.freenode.net', 'team')
+    >>> [person.displayname for person in vocab.search('team')]
+    [u'Ircperson', u'HWDB Team', u'No Team Memberships', u'Simple Team',
+     u'Ubuntu Team', u'testing Spanish team', u'Hoary Gnome Team',
+     u'Other Team', u'Ubuntu Gnome Team', u'Ubuntu Security Team',
+     u'Warty Gnome Team', u'Warty Security Team']
+
+A match on launchpad name ranks higher than irc nickname:
+    >>> lifeless2 = factory.makePerson(name='anotherlifeless')
+    >>> irc_id = ircid_set.new(
+    ...     lifeless2, 'irc.freenode.net', 'lifeless')
+    >>> [person.displayname for person in vocab.search('lifeless')]
+    [u'Robert Collins', u'Anotherlifeless']
+
+A match on displayname ranks higher than email address:
+    >>> lifeless3 = factory.makePerson(name='nolife', displayname='RobertC')
+    >>> [person.displayname for person in vocab.search('robertc')]
+    [u'RobertC', u'Robert Collins']
+
+But even a partial match on name ranks higher:
+    >>> lifeless3 = factory.makePerson(name='robertc2', displayname='RobertC')
+    >>> [person.name for person in vocab.search('robertc')]
+    [u'robertc2', u'nolife', u'lifeless']
 
     >>> login(ANONYMOUS)
     >>> vocab.LIMIT
     100
 
-    >>> vocab.LIMIT = 4
-
-    # The limit gets applied in multiple subselects, so the result
-    # can actually be less than the limit.
-
-    >>> [person.displayname for person in vocab.search('team')]
-    [u'HWDB Team', u'No Team Memberships', u'testing Spanish team']
-
 Search for names with '%' and '?' is supported.
 
-    >>> from lp.registry.interfaces.irc import IIrcIDSet
-
     >>> symbolic_person = factory.makePerson(name='symbolic')
-    >>> ircid_set = getUtility(IIrcIDSet)
     >>> irc_id = ircid_set.new(
     ...     symbolic_person, 'irc.freenode.net', '%percent')
     >>> irc_id =ircid_set.new(
@@ -947,10 +967,11 @@
     []
 
 Almost all teams have the word 'team' as part of their names, so a
-search for 'team' should give us some of them:
+search for 'team' should give us some of them. There's also the ircperson
+created earlier with an icr nickname of 'team':
 
     >>> sorted(person.name for person in vocab.search('team'))
-    [u'hwdb-team', u'name18', u'name20', u'name21',
+    [u'hwdb-team', u'ircperson', u'name18', u'name20', u'name21',
      u'no-team-memberships', u'otherteam',
      u'simple-team', u'testing-spanish-team', u'ubuntu-security',
      u'ubuntu-team', u'warty-gnome']

=== modified file 'lib/lp/registry/model/person.py'
--- lib/lp/registry/model/person.py	2011-05-30 20:26:48 +0000
+++ lib/lp/registry/model/person.py	2011-06-01 00:47:33 +0000
@@ -597,7 +597,7 @@
     verbose_bugnotifications = BoolCol(notNull=True, default=True)
 
     signedcocs = SQLMultipleJoin('SignedCodeOfConduct', joinColumn='owner')
-    ircnicknames = SQLMultipleJoin('IrcID', joinColumn='person')
+    _ircnicknames = SQLMultipleJoin('IrcID', joinColumn='person')
     jabberids = SQLMultipleJoin('JabberID', joinColumn='person')
 
     entitlements = SQLMultipleJoin('Entitlement', joinColumn='person')
@@ -613,6 +613,10 @@
     personal_standing_reason = StringCol(default=None)
 
     @cachedproperty
+    def ircnicknames(self):
+        return list(self._ircnicknames)
+
+    @cachedproperty
     def languages(self):
         """See `IPerson`."""
         results = Store.of(self).find(

=== modified file 'lib/lp/registry/vocabularies.py'
--- lib/lp/registry/vocabularies.py	2011-05-28 04:09:11 +0000
+++ lib/lp/registry/vocabularies.py	2011-06-01 00:47:33 +0000
@@ -75,6 +75,8 @@
     Or,
     Select,
     SQL,
+    Union,
+    With,
     )
 from storm.info import ClassAlias
 from zope.component import getUtility
@@ -169,7 +171,10 @@
 from lp.registry.model.karma import KarmaCategory
 from lp.registry.model.mailinglist import MailingList
 from lp.registry.model.milestone import Milestone
-from lp.registry.model.person import Person, IrcID
+from lp.registry.model.person import (
+    IrcID,
+    Person,
+    )
 from lp.registry.model.pillar import PillarName
 from lp.registry.model.product import Product
 from lp.registry.model.productrelease import ProductRelease
@@ -179,7 +184,10 @@
 from lp.registry.model.teammembership import TeamParticipation
 from lp.services.database import bulk
 from lp.services.features import getFeatureFlag
-from lp.services.propertycache import cachedproperty
+from lp.services.propertycache import (
+    cachedproperty,
+    get_property_cache
+    )
 
 
 class BasePersonVocabulary:
@@ -519,7 +527,6 @@
         """Return the people/teams whose fti or email address match :text:"""
 
         private_query, private_tables = self._privateTeamQueryAndTables()
-        exact_match = None
 
         # Short circuit if there is no search text - all valid people and
         # teams have been requested.
@@ -540,6 +547,8 @@
                     self.extra_clause
                     )
                 )
+            result.config(distinct=True)
+            result.order_by(Person.displayname, Person.name)
         else:
             # Do a full search based on the text given.
 
@@ -547,17 +556,15 @@
             # The public person and team searches do not need to join with the
             # TeamParticipation table, which is very expensive.  The search
             # for private teams does need that table but the number of private
-            # teams is very small so the cost is not great.
-
-            # First search for public persons and teams that match the text.
-            public_tables = [
-                Person,
-                LeftJoin(EmailAddress, EmailAddress.person == Person.id),
-                ]
-
-            # Create an inner query that will match public persons and teams
-            # that have the search text in the fti, at the start of the email
-            # address, or as their full IRC nickname.
+            # teams is very small so the cost is not great. However, if the
+            # person is a logged in administrator, we don't need to join to
+            # the TeamParticipation table and can construct a more efficient
+            # query (since in this case we are searching all private teams).
+
+            # Create a query that will match public persons and teams that
+            # have the search text in the fti, at the start of their email
+            # address, as their full IRC nickname, or at the start of their
+            # displayname.
             # Since we may be eliminating results with the limit to improve
             # performance, we sort by the rank, so that we will always get
             # the best results. The fti rank will be between 0 and 1.
@@ -565,103 +572,133 @@
             # ILIKE doesn't hit the indexes.
             # The '%%' is necessary because storm variable substitution
             # converts it to '%'.
-            public_inner_textual_select = SQL("""
-                SELECT id FROM (
-                    SELECT Person.id, 100 AS rank
+
+            # This is the SQL that will give us the IDs of the people we want
+            # in the result.
+            matching_person_sql = SQL("""
+                SELECT id, MAX(rank) AS rank, false as is_private_team
+                FROM (
+                    SELECT Person.id,
+                    (case
+                        when person.name=? then 100
+                        when lower(person.name) like ? || '%%' then 75
+                        when lower(person.displayname) like ? || '%%' then 50
+                        else rank(fti, ftq(?))
+                    end) as rank
                     FROM Person
-                    WHERE name = ?
+                    WHERE lower(Person.name) LIKE ? || '%%'
+                    or lower(Person.displayname) LIKE ? || '%%'
+                    or Person.fti @@ ftq(?)
                     UNION ALL
-                    SELECT Person.id, rank(fti, ftq(?))
-                    FROM Person
-                    WHERE Person.fti @@ ftq(?)
+                    SELECT Person.id, 25 AS rank
+                    FROM Person, IrcID
+                    WHERE Person.id = IrcID.person
+                        AND IrcID.nickname = ?
                     UNION ALL
                     SELECT Person.id, 10 AS rank
-                    FROM Person, IrcId
-                    WHERE IrcId.person = Person.id
-                        AND lower(IrcId.nickname) = ?
-                    UNION ALL
-                    SELECT Person.id, 1 AS rank
                     FROM Person, EmailAddress
-                    WHERE EmailAddress.person = Person.id
-                        AND lower(email) LIKE ? || '%%'
-                        AND EmailAddress.status IN (?, ?)
-                    ) AS public_subquery
-                ORDER BY rank DESC
-                LIMIT ?
-                """, (text, text, text, text, text,
-                      EmailAddressStatus.VALIDATED.value,
-                      EmailAddressStatus.PREFERRED.value,
-                      self.LIMIT))
-
-            public_result = self.store.using(*public_tables).find(
-                Person,
-                And(
-                    Person.id.is_in(public_inner_textual_select),
-                    Person.visibility == PersonVisibility.PUBLIC,
-                    Person.merged == None,
-                    Or(# A valid person-or-team is either a team...
-                       # Note: 'Not' due to Bug 244768.
-                       Not(Person.teamowner == None),
-                       # Or a person who has a preferred email address.
-                       EmailAddress.status == EmailAddressStatus.PREFERRED),
-                    ))
-            # The public query doesn't need to be ordered as it will be done
-            # at the end.
-            public_result.order_by()
-
-            # Next search for the private teams.
-            private_query, private_tables = self._privateTeamQueryAndTables()
-            private_tables = [Person] + private_tables
-
-            # Searching for private teams that match can be easier since we
-            # are only interested in teams.  Teams can have email addresses
-            # but we're electing to ignore them here.
-            private_result = self.store.using(*private_tables).find(
-                Person,
-                And(
-                    SQL('Person.fti @@ ftq(?)', [text]),
-                    private_query,
-                    )
+                    WHERE Person.id = EmailAddress.person
+                        AND LOWER(EmailAddress.email) LIKE ? || '%%'
+                        AND status IN (?, ?)
+                ) AS person_match
+                GROUP BY id, is_private_team
+            """, (text, text, text, text, text, text, text, text, text,
+                  EmailAddressStatus.VALIDATED.value,
+                  EmailAddressStatus.PREFERRED.value))
+
+            # Do we need to search for private teams.
+            if private_tables:
+                private_tables = [Person] + private_tables
+                private_ranking_sql = SQL("""
+                    (case
+                        when person.name=? then 100
+                        when lower(person.name) like ? || '%%' then 75
+                        when lower(person.displayname) like ? || '%%' then 50
+                        else rank(fti, ftq(?))
+                    end) as rank
+                """, (text, text, text, text))
+
+                # Searching for private teams that match can be easier since
+                # we are only interested in teams.  Teams can have email
+                # addresses but we're electing to ignore them here.
+                private_result_select = Select(
+                    tables=private_tables,
+                    columns=(Person.id, private_ranking_sql,
+                                SQL("true as is_private_team")),
+                    where=And(
+                        SQL("""
+                            lower(Person.name) LIKE ? || '%%'
+                            OR lower(Person.displayname) LIKE ? || '%%'
+                            OR Person.fti @@ ftq(?)
+                            """, [text, text, text]),
+                        private_query))
+                matching_person_sql = Union(matching_person_sql,
+                          private_result_select, all=True)
+
+            # The tables for public persons and teams that match the text.
+            public_tables = [
+                SQL("MatchingPerson"),
+                Person,
+                LeftJoin(EmailAddress, EmailAddress.person == Person.id),
+                ]
+
+            # If private_tables is empty, we are searching for all private
+            # teams. We can simply append the private query component to the
+            # public query. Otherwise, for efficiency as stated earlier, we
+            # need to do a separate query to join to the TeamParticipation
+            # table.
+            private_teams_query = private_query
+            if private_tables:
+                private_teams_query = SQL("is_private_team")
+
+            # We just select the required ids since we will use
+            # IPersonSet.getPrecachedPersonsFromIDs to load the results
+            matching_with = With("MatchingPerson", matching_person_sql)
+            result = self.store.with_(
+                matching_with).using(*public_tables).find(
+                Person,
+                And(
+                    SQL("Person.id = MatchingPerson.id"),
+                    Or(
+                        And(# A public person or team
+                            Person.visibility == PersonVisibility.PUBLIC,
+                            Person.merged == None,
+                            Or(# A valid person-or-team is either a team...
+                                # Note: 'Not' due to Bug 244768.
+                                Not(Person.teamowner == None),
+                                # Or a person who has preferred email address.
+                                EmailAddress.status ==
+                                    EmailAddressStatus.PREFERRED)),
+                        # Or a private team
+                        private_teams_query),
+                    self.extra_clause),
                 )
-
-            private_result.order_by(SQL('rank(fti, ftq(?)) DESC', [text]))
-            private_result.config(limit=self.LIMIT)
-
-            combined_result = public_result.union(private_result)
-            # Eliminate default ordering.
-            combined_result.order_by()
-            # XXX: BradCrittenden 2009-04-26 bug=217644: The use of Alias and
-            # _get_select() is a work-around for .count() not working
-            # with the 'distinct' option.
-            subselect = Alias(combined_result._get_select(), 'Person')
-            exact_match = (Person.name == text)
-            result = self.store.using(subselect).find(
-                (Person, exact_match),
-                self.extra_clause)
-        # XXX: BradCrittenden 2009-05-07 bug=373228: A bug in Storm prevents
-        # setting the 'distinct' and 'limit' options in a single call to
-        # .config().  The work-around is to split them up.  Note the limit has
-        # to be after the call to 'order_by' for this work-around to be
-        # effective.
-        result.config(distinct=True)
-        if exact_match is not None:
-            # A DISTINCT requires that the sort parameters appear in the
-            # select, but it will break the vocabulary if it returns a list of
-            # tuples instead of a list of Person objects, so we create
-            # another subselect to sort after the DISTINCT is done.
-            distinct_subselect = Alias(result._get_select(), 'Person')
-            result = self.store.using(distinct_subselect).find(Person)
+            # Better ranked matches go first.
             result.order_by(
-                Desc(exact_match), Person.displayname, Person.name)
-        else:
-            result.order_by(Person.displayname, Person.name)
+                SQL("rank desc"), Person.displayname, Person.name)
         result.config(limit=self.LIMIT)
 
-        # We will be displaying the person's irc nic(s) in the description
-        # so we need to bulk load them in one query for performance.
+        # We will be displaying the person's irc nick(s) and emails in the
+        # description so we need to bulk load them for performance, otherwise
+        # we get one query per person per attribute.
         def pre_iter_hook(rows):
             persons = set(obj for obj in rows)
-            bulk.load_referencing(IrcID, persons, ['personID'])
+            # The emails.
+            emails = bulk.load_referencing(
+                EmailAddress, persons, ['personID'])
+            email_by_person = dict((email.personID, email)
+                for email in emails
+                if email.status == EmailAddressStatus.PREFERRED)
+
+            # The irc nicks.
+            nicks = bulk.load_referencing(IrcID, persons, ['personID'])
+            nicks_by_person = dict((nick.personID, nicks)
+                for nick in nicks)
+
+            for person in persons:
+                cache = get_property_cache(person)
+                cache.preferredemail = email_by_person.get(person.id, None)
+                cache.ircnicknames = nicks_by_person.get(person.id, None)
 
         return DecoratedResultSet(result, pre_iter_hook=pre_iter_hook)
 
@@ -714,7 +751,11 @@
                         self.extra_clause)
             result = self.store.using(*tables).find(Person, query)
         else:
-            name_match_query = SQL("Person.fti @@ ftq(%s)" % quote(text))
+            name_match_query = SQL("""
+                lower(Person.name) LIKE ? || '%%'
+                OR lower(Person.displayname) LIKE ? || '%%'
+                OR Person.fti @@ ftq(?)
+                """, [text, text, text]),
 
             email_storm_query = self.store.find(
                 EmailAddress.personID,
@@ -732,11 +773,8 @@
                     Or(name_match_query,
                        EmailAddress.person != None)))
 
-        # XXX: BradCrittenden 2009-05-07 bug=373228: A bug in Storm prevents
-        # setting the 'distinct' and 'limit' options in a single call to
-        # .config().  The work-around is to split them up.  Note the limit has
-        # to be after the call to 'order_by' for this work-around to be
-        # effective.
+        # To get the correct results we need to do distinct first, then order
+        # by, then limit.
         result.config(distinct=True)
         result.order_by(Person.displayname, Person.name)
         result.config(limit=self.LIMIT)