launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #10311
[Merge] lp:~adeuring/launchpad/bug-1020443-2 into lp:launchpad
Abel Deuring has proposed merging lp:~adeuring/launchpad/bug-1020443-2 into lp:launchpad with lp:~adeuring/launchpad/bug-1020443 as a prerequisite.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~adeuring/launchpad/bug-1020443-2/+merge/116876
This branch fixes numerous test failures which were triggered by the
changes from lp:~adeuring/launchpad/bug-1020443 (already approved).
In this branch, I dropped the questionble feature that the symbols
"&", "|", "!" are treaded as logical operators in full text search
queries.
This change had more side effects han I expected. DUring my work on
the fix I noticed two interesting bugs:
- the ranking of search results was sometimes wrong
- the words used in full text searches were sometimes stemmed twice
-- and the stemming algorithm from the Postgres text search
implementation is not idempotent, for certain words. Two examples
I noticed in the tests:
stemmed("extension") -> "extens"
stemmed("extens") -> "exten"
stemmed("cheese") -> "chees"
stemmed("chees") -> "chee"
When a text is indexed after a change, the indexer stores the stem of
the words in the FTI; when a full text search expression is processed,
the procedure to_tsquery('searchtext') parses the search text into
tokens, and if a token is treated as a word, it is stemmed. Finally,
the stemmed search words are looked up in an FTI column.
So, if we have the word "cheese" in an indexed text, "chees" is stored
in the FTI. During a regular search for "cheese", "cheese" is passed
(via the procedure ftq()) to to_tsquery(), and to_tsquery returns a
tsquery object containing the word "chees".
The broken searches call the function nl_phrase_search(), which issue
qn SQL query like "SELECT ftq('cheese')", which returns a string
representation of a tsquery. nl_phrase_search() extracts the different
stemmed words from the result, builds all subsets having one element
less than the original set of stemmd words, creates AND expressions
for the words of these subsets and finnaly returns an OR expression
of the AND expressions. In other words: It builds a query where at most
one word from the original search term may be missing in the searched
texts.
The string returned by nl_phrase_search() was often passed again to
ftq(). This has two side effects:
- ftq() calls to_tsquery() again, but now with the stemmed words,
leading to mappings like "cheese" -> "chees" -> "chee" -- and
the search will not return data having the original word "cheese".
And if "chee" instead if "chees" is used in the "sort by rank"
expression, we get the wrong sort order.
- ftq() now converts the symbols "&|!" to " ". (See the MP for
lp:~adeuring/launchpad/bug-1020443 for the reason of this change)
This means that the '|' is converted into an implicit AND, hence
the feature is dropped to find texts where one word from the search
string may be mssing.
notes about the changes in detail:
lib/lp/answers/doc/faq-vocabulary.txt:
ranking bug.
lib/lp/answers/doc/faq.txt:
'|' no longer treated equivalent to "OR"
lib/lp/answers/model/faq.py:
Avoid "double calls" of ftq(). fti_search is the result of an
nl_phrase_search() call; these results can be directly used
as tsquery expressions. If a query string is used like
'query&string'::tsquery
the words are not stemmed. (See chapter 8.11.2 of the Postgres 9.1
documentation.)
test: ./bin/test -vvt lib/lp/answers/doc/faq.txt
./bin/test -vvt lib/lp/answers/stories/this-is-a-faq.txt
lib/lp/answers/model/question.py
class QuestionSearch is the base class a number of other classes.
Most of them store search phrases provided by a user in
self.search_text, with one exception: class SimilarQuestionsSearch.
The class stores the result of nl_phrase_search() in self.search_text.
I added the flag self.nl_phrase_used to keep track of this difference.
The methods getConstraints() and getOrderByClause() now generate
SQL expressions that are useful both for "plain" search texts and
for texts processed by for nl_phrase_search().
test: ./bin/test -vvt lib/lp/answers/doc/questiontarget.txt
(no explicit additions or changes in this doc test, but the changes
described above fix failures.)
lib/lp/bugs/model/bugtasksearch.py
Correct treatment of BugTaskSearchParams.fast_searchtext, different
from BugTaskSearchParams.searchtext. The latter property stores
search text as provided by a user, the former stores the result of
an nl_phrase_search() call. fast_searchtext is now treated as
described above. I had to change lib/lp/bugs/doc/bugtask-search.txt
The first change in this file (line 138) showed an improper use of
fast_searchtext: That property is supposed to contain only stemmed
words, which are always lower case. With my changes to
bugtasksearch.py, the example fails. I added a few lines to explain
the difference between the the BugTaskSearchParams properties.
The change in lib/lp/bugs/doc/bugtask-find-similar.txt -- one more
search result -- is caused because the search word "cheese" is no
longer stemmed twice before being used in an SQL expression like
"table.fti @@ 'chees'::tsquery.
test: ./bin/test -vvt lib/lp/bugs/doc/bugtask-search.txt
./bin/test -vvt lib/lp/bugs/doc/bugtask-find-similar.txt
./bin/test -vvt lp.bugs.model.tests.test_bugtasksearch.*test_fast_fulltext_search
lib/lp/registry/browser/product.py
Product.search_results() builds a query where a number of words are
OR combined. The method used the no longer working operator '|';
now changed to 'OR'.
test: ./bin/test -vvt lib/lp/registry/stories/product/xx-product-add.txt
(yes -- testing this change only in a story is not strictly sufficient --
but the feature "search for any word" should have had it's own test before.
I can add a test if it's really necessary.)
lib/lp/registry/model/person.py
The methods find(text,...), findPerson(text,...), findTeam(text,...) of
class PersonSet called "text = text.lower()" before doing any further
processing. This breaks ftq() when the query includes the operators
AND, OR, NOT: They are only treated as operators if they are upper case.
Since these methods also use things like
EmailAddress.email.lower().startswith(search_text)
where search_text should of course be lower case, the code looks now a
bit more complicated. I added a few tests to test_personset.py to
check that the variants "plain search text" and "lower case search text"
are properly used.
test: ./bin/test -vvt test_personset.*test_find
./bin/test -vvt lib/lp/registry/doc/vocabularies.txt
(the changed methods are used in "vocabulary filtering".)
lib/lp/registry/vocabularies.py
Vocabularies have a method search(text, ...), where text is used in
SQL expressions like "Product.name LIKE 'text'" as well as
"Product.fti @@ ftq('text')".
These methods used, like Person.findSomething(...), the lower
case variant for both expressions. I changed this for the FTI query.
test: ./bin/test registry -vvt test_person_vocabularies.*test_search_accepts_or_expressions
./bin/test registry -vvt test_product_vocabularies.*test_search_with_or_expression
./bin/test registry -vvt test_projectgroup_vocabulary
Lint: Lots of messages about moin headers in doc tests and related
doc test complaints, but none of them are related to my changes. (OK,
except that I added a moin header for consistency's sake to
lib/lp/bugs/doc/bugtask-search.txt.)
--
https://code.launchpad.net/~adeuring/launchpad/bug-1020443-2/+merge/116876
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~adeuring/launchpad/bug-1020443-2 into lp:launchpad.
=== added file 'database/schema/patch-2209-24-3.sql'
--- database/schema/patch-2209-24-3.sql 1970-01-01 00:00:00 +0000
+++ database/schema/patch-2209-24-3.sql 2012-07-26 14:34:45 +0000
@@ -0,0 +1,124 @@
+-- Copyright 2012 Canonical Ltd. This software is licensed under the
+-- GNU Affero General Public License version 3 (see the file LICENSE).
+
+SET client_min_messages=ERROR;
+
+CREATE OR REPLACE FUNCTION _ftq(text) RETURNS text
+ LANGUAGE plpythonu IMMUTABLE STRICT
+ AS $_$
+ import re
+
+ # I think this method would be more robust if we used a real
+ # tokenizer and parser to generate the query string, but we need
+ # something suitable for use as a stored procedure which currently
+ # means no external dependancies.
+
+ # Convert to Unicode
+ query = args[0].decode('utf8')
+ ## plpy.debug('1 query is %s' % repr(query))
+
+ # Replace tsquery operators with ' '.
+ query = re.sub('[|&!]', ' ', query)
+
+ # Normalize whitespace
+ query = re.sub("(?u)\s+"," ", query)
+
+ # Convert AND, OR, NOT to tsearch2 punctuation
+ query = re.sub(r"(?u)\bAND\b", "&", query)
+ query = re.sub(r"(?u)\bOR\b", "|", query)
+ query = re.sub(r"(?u)\bNOT\b", " !", query)
+ ## plpy.debug('2 query is %s' % repr(query))
+
+ # Deal with unwanted punctuation.
+ # ':' is used in queries to specify a weight of a word.
+ # '\' is treated differently in to_tsvector() and to_tsquery().
+ punctuation = r'[:\\]'
+ query = re.sub(r"(?u)%s+" % (punctuation,), " ", query)
+ ## plpy.debug('3 query is %s' % repr(query))
+
+ # Now that we have handle case sensitive booleans, convert to lowercase
+ query = query.lower()
+
+ # Remove unpartnered bracket on the left and right
+ query = re.sub(r"(?ux) ^ ( [^(]* ) \)", r"(\1)", query)
+ query = re.sub(r"(?ux) \( ( [^)]* ) $", r"(\1)", query)
+
+ # Remove spurious brackets
+ query = re.sub(r"(?u)\(([^\&\|]*?)\)", r" \1 ", query)
+ ## plpy.debug('5 query is %s' % repr(query))
+
+ # Insert & between tokens without an existing boolean operator
+ # ( not proceeded by (|&!
+ query = re.sub(r"(?u)(?<![\(\|\&\!])\s*\(", "&(", query)
+ ## plpy.debug('6 query is %s' % repr(query))
+ # ) not followed by )|&
+ query = re.sub(r"(?u)\)(?!\s*(\)|\||\&|\s*$))", ")&", query)
+ ## plpy.debug('6.1 query is %s' % repr(query))
+ # Whitespace not proceded by (|&! not followed by &|
+ query = re.sub(r"(?u)(?<![\(\|\&\!\s])\s+(?![\&\|\s])", "&", query)
+ ## plpy.debug('7 query is %s' % repr(query))
+
+ # Detect and repair syntax errors - we are lenient because
+ # this input is generally from users.
+
+ # Fix unbalanced brackets
+ openings = query.count("(")
+ closings = query.count(")")
+ if openings > closings:
+ query = query + " ) "*(openings-closings)
+ elif closings > openings:
+ query = " ( "*(closings-openings) + query
+ ## plpy.debug('8 query is %s' % repr(query))
+
+ # Strip ' character that do not have letters on both sides
+ query = re.sub(r"(?u)((?<!\w)'|'(?!\w))", "", query)
+
+ # Brackets containing nothing but whitespace and booleans, recursive
+ last = ""
+ while last != query:
+ last = query
+ query = re.sub(r"(?u)\([\s\&\|\!]*\)", "", query)
+ ## plpy.debug('9 query is %s' % repr(query))
+
+ # An & or | following a (
+ query = re.sub(r"(?u)(?<=\()[\&\|\s]+", "", query)
+ ## plpy.debug('10 query is %s' % repr(query))
+
+ # An &, | or ! immediatly before a )
+ query = re.sub(r"(?u)[\&\|\!\s]*[\&\|\!]+\s*(?=\))", "", query)
+ ## plpy.debug('11 query is %s' % repr(query))
+
+ # An &,| or ! followed by another boolean.
+ query = re.sub(r"(?ux) \s* ( [\&\|\!] ) [\s\&\|]+", r"\1", query)
+ ## plpy.debug('12 query is %s' % repr(query))
+
+ # Leading & or |
+ query = re.sub(r"(?u)^[\s\&\|]+", "", query)
+ ## plpy.debug('13 query is %s' % repr(query))
+
+ # Trailing &, | or !
+ query = re.sub(r"(?u)[\&\|\!\s]+$", "", query)
+ ## plpy.debug('14 query is %s' % repr(query))
+
+ # If we have nothing but whitespace and tsearch2 operators,
+ # return NULL.
+ if re.search(r"(?u)^[\&\|\!\s\(\)]*$", query) is not None:
+ return None
+
+ # Convert back to UTF-8
+ query = query.encode('utf8')
+ ## plpy.debug('15 query is %s' % repr(query))
+
+ return query or None
+ $_$;
+
+CREATE OR REPLACE FUNCTION ftq(text) RETURNS pg_catalog.tsquery
+ LANGUAGE plpythonu IMMUTABLE STRICT
+ AS $_$
+ p = plpy.prepare(
+ "SELECT to_tsquery('default', _ftq($1)) AS x", ["text"])
+ query = plpy.execute(p, args, 1)[0]["x"]
+ return query or None
+ $_$;
+
+INSERT INTO LaunchpadDatabaseRevision VALUES (2209, 24, 3);
=== modified file 'lib/lp/answers/doc/faq-vocabulary.txt'
--- lib/lp/answers/doc/faq-vocabulary.txt 2011-12-24 17:49:30 +0000
+++ lib/lp/answers/doc/faq-vocabulary.txt 2012-07-26 14:34:45 +0000
@@ -82,7 +82,5 @@
2
>>> for term in terms:
... print term.title
+ How do I install Extensions?
How do I troubleshoot problems with extensions/themes?
- How do I install Extensions?
-
-
=== modified file 'lib/lp/answers/doc/faq.txt'
--- lib/lp/answers/doc/faq.txt 2011-12-24 17:49:30 +0000
+++ lib/lp/answers/doc/faq.txt 2012-07-26 14:34:45 +0000
@@ -232,7 +232,7 @@
>>> from lp.registry.interfaces.person import IPersonSet
>>> foo_bar = getUtility(IPersonSet).getByEmail('foo.bar@xxxxxxxxxxxxx')
>>> for faq in faqset.searchFAQs(
- ... search_text='java | flash', owner=foo_bar):
+ ... search_text='java OR flash', owner=foo_bar):
... print '%s (%s)' % (faq.title, faq.target.displayname)
How do I install plugins (Shockwave, QuickTime, etc.)? (Mozilla Firefox)
How can I play MP3/Divx/DVDs/Quicktime/Realmedia files
=== modified file 'lib/lp/answers/model/faq.py'
--- lib/lp/answers/model/faq.py 2011-12-30 06:14:56 +0000
+++ lib/lp/answers/model/faq.py 2012-07-26 14:34:45 +0000
@@ -138,7 +138,7 @@
return FAQ.select(
'%s AND FAQ.fti @@ %s' % (target_constraint, quote(fti_search)),
orderBy=[
- SQLConstant("-rank(FAQ.fti, ftq(%s))" % quote(fti_search)),
+ SQLConstant("-rank(FAQ.fti, %s::tsquery)" % quote(fti_search)),
"-FAQ.date_created"])
@staticmethod
=== modified file 'lib/lp/answers/model/question.py'
--- lib/lp/answers/model/question.py 2011-12-30 06:14:56 +0000
+++ lib/lp/answers/model/question.py 2012-07-26 14:34:45 +0000
@@ -864,6 +864,7 @@
product=None, distribution=None, sourcepackagename=None,
project=None):
self.search_text = search_text
+ self.nl_phrase_used = False
if zope_isinstance(status, DBItem):
self.status = [status]
@@ -944,8 +945,12 @@
constraints = self.getTargetConstraints()
if self.search_text is not None:
- constraints.append(
- 'Question.fti @@ ftq(%s)' % quote(self.search_text))
+ if self.nl_phrase_used:
+ constraints.append(
+ 'Question.fti @@ %s' % quote(self.search_text))
+ else:
+ constraints.append(
+ 'Question.fti @@ ftq(%s)' % quote(self.search_text))
if self.status:
constraints.append('Question.status IN %s' % sqlvalues(
@@ -1009,10 +1014,16 @@
elif sort is QuestionSort.RELEVANCY:
if self.search_text:
# SQLConstant is a workaround for bug 53455
- return [SQLConstant(
- "-rank(Question.fti, ftq(%s))" % quote(
- self.search_text)),
- "-Question.datecreated"]
+ if self.nl_phrase_used:
+ return [SQLConstant(
+ "-rank(Question.fti, %s::tsquery)" % quote(
+ self.search_text)),
+ "-Question.datecreated"]
+ else:
+ return [SQLConstant(
+ "-rank(Question.fti, ftq(%s))" % quote(
+ self.search_text)),
+ "-Question.datecreated"]
else:
return "-Question.datecreated"
elif sort is QuestionSort.RECENT_OWNER_ACTIVITY:
@@ -1113,6 +1124,7 @@
# similarity search algorithm.
self.search_text = nl_phrase_search(
title, Question, " AND ".join(self.getTargetConstraints()))
+ self.nl_phrase_used = True
class QuestionPersonSearch(QuestionSearch):
=== modified file 'lib/lp/answers/stories/this-is-a-faq.txt'
--- lib/lp/answers/stories/this-is-a-faq.txt 2011-12-23 23:44:59 +0000
+++ lib/lp/answers/stories/this-is-a-faq.txt 2012-07-26 14:34:45 +0000
@@ -66,8 +66,8 @@
... print radio, label, link
>>> printFAQOptions(user_browser.contents)
(*) No existing FAQs are relevant
+ ( ) 8: How do I install Extensions?
( ) 9: How do I troubleshoot problems with extensions/themes?
- ( ) 8: How do I install Extensions?
>>> print user_browser.getLink('How do I troubleshoot problems').url
http://answers.launchpad.dev/firefox/+faq/9
@@ -157,8 +157,8 @@
>>> printFAQOptions(user_browser.contents)
( ) No existing FAQs are relevant
(*) 10: How do I install plugins (Shockwave, QuickTime, etc.)?
+ ( ) 8: How do I install Extensions?
( ) 9: How do I troubleshoot problems with extensions/themes?
- ( ) 8: How do I install Extensions?
He changes the message and click 'Link to FAQ'.
=== modified file 'lib/lp/bugs/doc/bugtask-find-similar.txt'
--- lib/lp/bugs/doc/bugtask-find-similar.txt 2012-04-13 21:04:08 +0000
+++ lib/lp/bugs/doc/bugtask-find-similar.txt 2012-07-26 14:34:45 +0000
@@ -142,6 +142,7 @@
... distribution=test_distro)
>>> for bugtask in similar_bugs:
... print bugtask.bug.title
+ Nothing to do with cheese or sandwiches
This cheese sandwich should show up
>>> similar_bugs = getUtility(IBugTaskSet).findSimilar(
=== modified file 'lib/lp/bugs/doc/bugtask-search.txt'
--- lib/lp/bugs/doc/bugtask-search.txt 2012-05-14 07:30:13 +0000
+++ lib/lp/bugs/doc/bugtask-search.txt 2012-07-26 14:34:45 +0000
@@ -138,7 +138,7 @@
For example, there are no bugs with the word 'Fnord' in Firefox.
- >>> text_search = BugTaskSearchParams(user=None, fast_searchtext=u'Fnord')
+ >>> text_search = BugTaskSearchParams(user=None, searchtext=u'Fnord')
>>> found_bugtasks = firefox.searchTasks(text_search)
>>> found_bugtasks.count()
0
@@ -156,6 +156,63 @@
... print "#%s" % bugtask.bug.id
#4
+=== BugTaskSearchParams' parameters searchtext and fast_searchtext ===
+
+Normally, the parameter searchtext should be used. The alternative
+parameter fast_searchtext requires a syntactically correct tsquery
+expression containing stemmed words.
+
+A simple phrase can be passed as searchtext, but not as fast_searchtext,
+see below.
+
+ >>> good_search = BugTaskSearchParams(
+ ... user=None, searchtext=u'happens pretty often')
+ >>> found_bugtasks = firefox.searchTasks(good_search)
+ >>> for bugtask in found_bugtasks:
+ ... print "#%s" % bugtask.bug.id
+ #4
+
+The unstemmed word "happens" does not yield any results when used
+as fast_textsearch.
+
+ >>> bad_search = BugTaskSearchParams(
+ ... user=None, fast_searchtext=u'happens')
+ >>> found_bugtasks = firefox.searchTasks(bad_search)
+ >>> print found_bugtasks.count()
+ 0
+
+If the stem of "happens" is used, we get results.
+
+ >>> good_search = BugTaskSearchParams(
+ ... user=None, fast_searchtext=u'happen')
+ >>> found_bugtasks = firefox.searchTasks(good_search)
+ >>> for bugtask in found_bugtasks:
+ ... print "#%s" % bugtask.bug.id
+ #4
+ #6
+
+Stemmed words may be combined into a valid tsquery expression.
+
+ >>> good_search = BugTaskSearchParams(
+ ... user=None, fast_searchtext=u'happen&pretti&often')
+ >>> found_bugtasks = firefox.searchTasks(good_search)
+ >>> for bugtask in found_bugtasks:
+ ... print "#%s" % bugtask.bug.id
+ #4
+
+Passing invalid tsquery expressions as fast_searchtext raises an exception.
+
+ >>> bad_search = BugTaskSearchParams(
+ ... user=None, fast_searchtext=u'happens pretty often')
+ >>> list(firefox.searchTasks(bad_search))
+ Traceback (most recent call last):
+ ...
+ ProgrammingError: syntax error in tsquery: "happens pretty often"
+ ...
+
+ >>> import transaction
+ >>> transaction.abort()
+
== Searching by bug reporter ==
@@ -340,7 +397,6 @@
bugs are found for him:
>>> from lp.registry.interfaces.person import IPersonSet
- >>> import transaction
>>> transaction.abort()
>>> no_priv = getUtility(IPersonSet).getByName('no-priv')
=== modified file 'lib/lp/bugs/model/bugtasksearch.py'
--- lib/lp/bugs/model/bugtasksearch.py 2012-07-16 01:24:26 +0000
+++ lib/lp/bugs/model/bugtasksearch.py 2012-07-26 14:34:45 +0000
@@ -862,18 +862,21 @@
assert params.searchtext is None, (
'Cannot use searchtext at the same time as fast_searchtext.')
searchtext = params.fast_searchtext
+ fti_expression = "?::tsquery"
else:
assert params.fast_searchtext is None, (
'Cannot use fast_searchtext at the same time as searchtext.')
searchtext = params.searchtext
+ fti_expression = "ftq(?)"
if params.orderby is None:
# Unordered search results aren't useful, so sort by relevance
# instead.
params.orderby = [
- SQL("-rank(BugTaskFlat.fti, ftq(?))", params=(searchtext,))]
+ SQL("-rank(BugTaskFlat.fti, %s)" % fti_expression,
+ params=(searchtext,))]
- return SQL("BugTaskFlat.fti @@ ftq(?)", params=(searchtext,))
+ return SQL("BugTaskFlat.fti @@ %s" % fti_expression, params=(searchtext,))
def _build_status_clause(col, status):
=== modified file 'lib/lp/bugs/model/tests/test_bugtasksearch.py'
--- lib/lp/bugs/model/tests/test_bugtasksearch.py 2012-07-24 04:05:21 +0000
+++ lib/lp/bugs/model/tests/test_bugtasksearch.py 2012-07-26 14:34:45 +0000
@@ -285,9 +285,11 @@
def test_fast_fulltext_search(self):
# Fast full text searches find text indexed by Bug.fti...
+ # Note that a valid tsquery expression with stemmed words must
+ # be specified.
self.setUpFullTextSearchTests()
params = self.getBugTaskSearchParams(
- user=None, fast_searchtext=u'one title')
+ user=None, fast_searchtext=u'one&titl')
self.assertSearchFinds(params, self.bugtasks[:1])
def test_tags(self):
@@ -750,7 +752,7 @@
# Someone without permission to see deactiveated projects does
# not see bugtasks for deactivated projects.
bugtask_set = getUtility(IBugTaskSet)
- param = BugTaskSearchParams(user=None, fast_searchtext=u'Monkeys')
+ param = BugTaskSearchParams(user=None, searchtext=u'Monkeys')
results = bugtask_set.search(param, _noprejoins=True)
self.assertEqual([self.active_bugtask], list(results))
=== modified file 'lib/lp/registry/browser/product.py'
--- lib/lp/registry/browser/product.py 2012-07-13 08:29:56 +0000
+++ lib/lp/registry/browser/product.py 2012-07-26 14:34:45 +0000
@@ -205,7 +205,7 @@
)
-OR = '|'
+OR = ' OR '
SPACE = ' '
=== modified file 'lib/lp/registry/doc/vocabularies.txt'
--- lib/lp/registry/doc/vocabularies.txt 2012-07-06 19:36:03 +0000
+++ lib/lp/registry/doc/vocabularies.txt 2012-07-26 14:34:45 +0000
@@ -684,6 +684,11 @@
>>> checked_count == len(INACTIVE_ACCOUNT_STATUSES)
True
+It is possible to search for alternative names.
+
+ >>> [person.name for person in vocab.search('matsubara OR salgado')]
+ [u'matsubara', u'salgado']
+
AdminMergeablePerson
--------------------
@@ -738,6 +743,11 @@
>>> fooperson in vocab
False
+It is possible to search for alternative names.
+
+ >>> [(p.name) for p in vocab.search('matsubara OR salgado')]
+ [u'matsubara', u'salgado']
+
ValidPersonOrTeam
.................
@@ -1050,7 +1060,7 @@
[(u'testing Spanish team', u'Carlos Perell\xf3 Mar\xedn')]
>>> sorted((team.displayname, team.teamowner.displayname)
- ... for team in vocab.search('spanish | ubuntu'))
+ ... for team in vocab.search('spanish OR ubuntu'))
[(u'Mirror Administrators', u'Mark Shuttleworth'),
(u'Ubuntu Gnome Team', u'Mark Shuttleworth'),
(u'Ubuntu Security Team', u'Colin Watson'),
=== modified file 'lib/lp/registry/model/person.py'
--- lib/lp/registry/model/person.py 2012-07-16 20:42:12 +0000
+++ lib/lp/registry/model/person.py 2012-07-26 14:34:45 +0000
@@ -3483,7 +3483,8 @@
return EmptyResultSet()
orderBy = Person._sortingColumnsForSetOperations
- text = ensure_unicode(text).lower()
+ text = ensure_unicode(text)
+ lower_case_text = text.lower()
# Teams may not have email addresses, so we need to either use a LEFT
# OUTER JOIN or do a UNION between four queries. Using a UNION makes
# it a lot faster than with a LEFT OUTER JOIN.
@@ -3493,7 +3494,7 @@
EmailAddress.person == Person.id,
Person.account == Account.id,
Not(Account.status.is_in(INACTIVE_ACCOUNT_STATUSES)),
- EmailAddress.email.lower().startswith(text))
+ EmailAddress.email.lower().startswith(lower_case_text))
store = IStore(Person)
@@ -3516,7 +3517,7 @@
results = results.union(store.find(
Person, person_name_query)).order_by()
- team_email_query = self._teamEmailQuery(text)
+ team_email_query = self._teamEmailQuery(lower_case_text)
results = results.union(
store.find(Person, team_email_query)).order_by()
team_name_query = self._teamNameQuery(text)
@@ -3530,7 +3531,7 @@
must_have_email=False, created_after=None, created_before=None):
"""See `IPersonSet`."""
orderBy = Person._sortingColumnsForSetOperations
- text = ensure_unicode(text).lower()
+ text = ensure_unicode(text)
store = IStore(Person)
base_query = And(
Person.teamowner == None,
@@ -3570,7 +3571,7 @@
email_query = And(
base_query,
EmailAddress.person == Person.id,
- EmailAddress.email.lower().startswith(ensure_unicode(text)))
+ EmailAddress.email.lower().startswith(text.lower()))
name_query = And(
base_query,
@@ -3583,11 +3584,11 @@
def findTeam(self, text=""):
"""See `IPersonSet`."""
orderBy = Person._sortingColumnsForSetOperations
- text = ensure_unicode(text).lower()
+ text = ensure_unicode(text)
# Teams may not have email addresses, so we need to either use a LEFT
# OUTER JOIN or do a UNION between two queries. Using a UNION makes
# it a lot faster than with a LEFT OUTER JOIN.
- email_query = self._teamEmailQuery(text)
+ email_query = self._teamEmailQuery(text.lower())
store = IStore(Person)
email_results = store.find(Person, email_query).order_by()
name_query = self._teamNameQuery(text)
=== modified file 'lib/lp/registry/tests/test_person_vocabularies.py'
--- lib/lp/registry/tests/test_person_vocabularies.py 2012-07-20 03:15:04 +0000
+++ lib/lp/registry/tests/test_person_vocabularies.py 2012-07-26 14:34:45 +0000
@@ -25,6 +25,7 @@
from lp.services.webapp.vocabulary import FilteredVocabularyBase
from lp.testing import (
login_person,
+ person_logged_in,
StormStatementRecorder,
TestCaseWithFactory,
)
@@ -188,6 +189,22 @@
name="fredteam", email="fredteam@xxxxxxx")
self._team_filter_tests([team])
+ def test_search_accepts_or_expressions(self):
+ person = self.factory.makePerson(name='baz')
+ team = self.factory.makeTeam(name='blah')
+ result = list(self.searchVocabulary(None, 'baz OR blah'))
+ self.assertEqual([person, team], result)
+ private_team_one = self.factory.makeTeam(
+ name='private-eye', visibility=PersonVisibility.PRIVATE,
+ owner=person)
+ private_team_two = self.factory.makeTeam(
+ name='paranoid', visibility=PersonVisibility.PRIVATE,
+ owner=person)
+ with person_logged_in(person):
+ result = list(
+ self.searchVocabulary(None, 'paranoid OR private-eye'))
+ self.assertEqual([private_team_one, private_team_two], result)
+
class TestValidPersonOrTeamPreloading(VocabularyTestBase,
TestCaseWithFactory):
@@ -373,13 +390,20 @@
def test_unvalidated_emails_ignored(self):
person = self.factory.makePerson()
- unvalidated_email = self.factory.makeEmail(
+ self.factory.makeEmail(
'fnord@xxxxxxxxxxx',
person,
email_status=EmailAddressStatus.NEW)
search = self.searchVocabulary(None, 'fnord@xxxxxxxxxxx')
self.assertEqual([], [s for s in search])
+ def test_search_accepts_or_expressions(self):
+ team_one = self.factory.makeTeam(name='baz')
+ team_two = self.factory.makeTeam(name='blah')
+ result = list(self.searchVocabulary(None, 'baz OR blah'))
+ self.assertEqual([team_one, team_two], result)
+
+
class TestNewPillarGranteeVocabulary(VocabularyTestBase,
TestCaseWithFactory):
"""Test that the NewPillarGranteeVocabulary behaves as expected."""
=== modified file 'lib/lp/registry/tests/test_personset.py'
--- lib/lp/registry/tests/test_personset.py 2012-06-07 21:38:31 +0000
+++ lib/lp/registry/tests/test_personset.py 2012-07-26 14:34:45 +0000
@@ -117,13 +117,13 @@
def test_getByEmail_ignores_unvalidated_emails(self):
person = self.factory.makePerson()
- unvalidated_email = self.factory.makeEmail(
+ self.factory.makeEmail(
'fnord@xxxxxxxxxxx',
person,
email_status=EmailAddressStatus.NEW)
found = self.person_set.getByEmail('fnord@xxxxxxxxxxx')
self.assertTrue(found is None)
-
+
def test_getPrecachedPersonsFromIDs(self):
# The getPrecachedPersonsFromIDs() method should only make one
# query to load all the extraneous data. Accessing the
@@ -183,6 +183,57 @@
self.person_set.getByOpenIDIdentifier(
u'http://not.launchpad.dev/+id/%s' % identifier))
+ def test_find__accepts_queries_with_or_operator(self):
+ # PersonSet.find() allows to search for OR combined terms.
+ person_one = self.factory.makePerson(name='baz')
+ person_two = self.factory.makeTeam(name='blah')
+ result = list(self.person_set.find('baz OR blah'))
+ self.assertEqual([person_one, person_two], result)
+
+ def test_findPerson__accepts_queries_with_or_operator(self):
+ # PersonSet.findPerson() allows to search for OR combined terms.
+ person_one = self.factory.makePerson(
+ name='baz', email='one@xxxxxxxxxxx')
+ person_two = self.factory.makePerson(
+ name='blah', email='two@xxxxxxxxxxx')
+ result = list(self.person_set.findPerson('baz OR blah'))
+ self.assertEqual([person_one, person_two], result)
+ # Note that these OR searches do not work for email addresses.
+ result = list(self.person_set.findPerson(
+ 'one@xxxxxxxxxxx OR two@xxxxxxxxxxx'))
+ self.assertEqual([], result)
+
+ def test_findPerson__case_insensitive_email_address_search(self):
+ # A search for email addresses is case insensitve.
+ person_one = self.factory.makePerson(
+ name='baz', email='ONE@xxxxxxxxxxx')
+ person_two = self.factory.makePerson(
+ name='blah', email='two@xxxxxxxxxxx')
+ result = list(self.person_set.findPerson('one@xxxxxxxxxxx'))
+ self.assertEqual([person_one], result)
+ result = list(self.person_set.findPerson('TWO@xxxxxxxxxxx'))
+ self.assertEqual([person_two], result)
+
+ def test_findTeam__accepts_queries_with_or_operator(self):
+ # PersonSet.findTeam() allows to search for OR combined terms.
+ team_one = self.factory.makeTeam(name='baz', email='ONE@xxxxxxxxxxx')
+ team_two = self.factory.makeTeam(name='blah', email='TWO@xxxxxxxxxxx')
+ result = list(self.person_set.findTeam('baz OR blah'))
+ self.assertEqual([team_one, team_two], result)
+ # Note that these OR searches do not work for email addresses.
+ result = list(self.person_set.findTeam(
+ 'one@xxxxxxxxxxx OR two@xxxxxxxxxxx'))
+ self.assertEqual([], result)
+
+ def test_findTeam__case_insensitive_email_address_search(self):
+ # A search for email addresses is case insensitve.
+ team_one = self.factory.makeTeam(name='baz', email='ONE@xxxxxxxxxxx')
+ team_two = self.factory.makeTeam(name='blah', email='TWO@xxxxxxxxxxx')
+ result = list(self.person_set.findTeam('one@xxxxxxxxxxx'))
+ self.assertEqual([team_one], result)
+ result = list(self.person_set.findTeam('TWO@xxxxxxxxxxx'))
+ self.assertEqual([team_two], result)
+
class TestPersonSetMergeMailingListSubscriptions(TestCaseWithFactory):
=== modified file 'lib/lp/registry/tests/test_product_vocabularies.py'
--- lib/lp/registry/tests/test_product_vocabularies.py 2012-01-01 02:58:52 +0000
+++ lib/lp/registry/tests/test_product_vocabularies.py 2012-07-26 14:34:45 +0000
@@ -66,6 +66,16 @@
self.assertEqual(
[quux_product, bar_product], list(result))
+ def test_search_with_or_expression(self):
+ # Searches for either of two or more names are possible.
+ blah_product = self.factory.makeProduct(
+ name='blah', displayname='Blah', summary='Blah blather')
+ baz_product = self.factory.makeProduct(
+ name='baz', displayname='Baz')
+ result = self.vocabulary.search('blah OR baz')
+ self.assertEqual(
+ [blah_product, baz_product], list(result))
+
def test_exact_match_is_first(self):
# When the flag is enabled, an exact name match always wins.
the_quux_product = self.factory.makeProduct(
=== added file 'lib/lp/registry/tests/test_projectgroup_vocabulary.py'
--- lib/lp/registry/tests/test_projectgroup_vocabulary.py 1970-01-01 00:00:00 +0000
+++ lib/lp/registry/tests/test_projectgroup_vocabulary.py 2012-07-26 14:34:45 +0000
@@ -0,0 +1,26 @@
+# Copyright 2012 Canonical Ltd. This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test the ProjectGroup vocabulary."""
+
+__metaclass__ = type
+
+from lp.registry.vocabularies import ProjectGroupVocabulary
+from lp.testing import TestCaseWithFactory
+from lp.testing.layers import DatabaseFunctionalLayer
+
+
+class TestProjectGroupVocabulary(TestCaseWithFactory):
+ """Test that the ProjectGroupVocabulary behaves as expected."""
+ layer = DatabaseFunctionalLayer
+
+ def test_search_with_or_expression(self):
+ # Searches for either of two or more names are possible.
+ blah_group = self.factory.makeProject(
+ name='blah', displayname='Blah', summary='Blah blather')
+ baz_group = self.factory.makeProject(
+ name='baz', displayname='Baz')
+ vocabulary = ProjectGroupVocabulary()
+ result = vocabulary.search('blah OR baz')
+ self.assertEqual(
+ [blah_group, baz_group], list(result))
=== modified file 'lib/lp/registry/vocabularies.py'
--- lib/lp/registry/vocabularies.py 2012-07-20 03:15:04 +0000
+++ lib/lp/registry/vocabularies.py 2012-07-26 14:34:45 +0000
@@ -304,7 +304,8 @@
if query is None or an empty string.
"""
if query:
- query = ensure_unicode(query).lower()
+ query = ensure_unicode(query)
+ like_query = query.lower()
like_query = "'%%' || %s || '%%'" % quote_like(query)
fti_query = quote(query)
sql = "active = 't' AND (name LIKE %s OR fti @@ ftq(%s))" % (
@@ -354,7 +355,8 @@
if query is None or an empty string.
"""
if query:
- query = ensure_unicode(query).lower()
+ query = ensure_unicode(query)
+ like_query = query.lower()
like_query = "'%%' || %s || '%%'" % quote_like(query)
fti_query = quote(query)
sql = "active = 't' AND (name LIKE %s OR fti @@ ftq(%s))" % (
@@ -435,7 +437,7 @@
if not text:
return self.emptySelectResults()
- return self._select(ensure_unicode(text).lower())
+ return self._select(ensure_unicode(text))
class PersonAccountToMergeVocabulary(
@@ -469,7 +471,7 @@
if not text:
return self.emptySelectResults()
- text = ensure_unicode(text).lower()
+ text = ensure_unicode(text)
return self._select(text)
@@ -649,14 +651,15 @@
FROM (
SELECT Person.id,
(case
- when person.name=? then 100
- when person.name like ? || '%%' then 0.6
- when lower(person.displayname) like ? || '%%' then 0.5
+ when person.name=lower(?) then 100
+ when person.name like lower(?) || '%%' then 0.6
+ when lower(person.displayname) like lower(?)
+ || '%%' then 0.5
else rank(fti, ftq(?))
end) as rank
FROM Person
- WHERE Person.name LIKE ? || '%%'
- or lower(Person.displayname) LIKE ? || '%%'
+ WHERE Person.name LIKE lower(?) || '%%'
+ or lower(Person.displayname) LIKE lower(?) || '%%'
or Person.fti @@ ftq(?)
UNION ALL
SELECT Person.id, 0.8 AS rank
@@ -667,7 +670,7 @@
SELECT Person.id, 0.4 AS rank
FROM Person, EmailAddress
WHERE Person.id = EmailAddress.person
- AND LOWER(EmailAddress.email) LIKE ? || '%%'
+ AND LOWER(EmailAddress.email) LIKE lower(?) || '%%'
AND status IN (?, ?)
) AS person_match
GROUP BY id, is_private_team
@@ -680,9 +683,10 @@
private_tables = [Person] + private_tables
private_ranking_sql = SQL("""
(case
- when person.name=? then 100
- when person.name like ? || '%%' then 0.6
- when lower(person.displayname) like ? || '%%' then 0.5
+ when person.name=lower(?) then 100
+ when person.name like lower(?) || '%%' then 0.6
+ when lower(person.displayname) like lower(?)
+ || '%%' then 0.5
else rank(fti, ftq(?))
end) as rank
""", (text, text, text, text))
@@ -696,8 +700,8 @@
SQL("true as is_private_team")),
where=And(
SQL("""
- Person.name LIKE ? || '%%'
- OR lower(Person.displayname) LIKE ? || '%%'
+ Person.name LIKE lower(?) || '%%'
+ OR lower(Person.displayname) LIKE lower(?) || '%%'
OR Person.fti @@ ftq(?)
""", [text, text, text]),
private_query))
@@ -792,7 +796,7 @@
else:
return self.emptySelectResults()
- text = ensure_unicode(text).lower()
+ text = ensure_unicode(text)
return self._doSearch(text=text, vocab_filter=vocab_filter)
def searchForTerms(self, query=None, vocab_filter=None):
@@ -838,8 +842,8 @@
result = self.store.using(*tables).find(Person, query)
else:
name_match_query = SQL("""
- Person.name LIKE ? || '%%'
- OR lower(Person.displayname) LIKE ? || '%%'
+ Person.name LIKE lower(?) || '%%'
+ OR lower(Person.displayname) LIKE lower(?) || '%%'
OR Person.fti @@ ftq(?)
""", [text, text, text]),
=== modified file 'lib/lp/services/database/doc/textsearching.txt'
--- lib/lp/services/database/doc/textsearching.txt 2012-06-26 09:40:38 +0000
+++ lib/lp/services/database/doc/textsearching.txt 2012-07-26 14:34:45 +0000
@@ -172,23 +172,16 @@
>>> ftq('hi AND mom')
hi&mom <=> 'hi' & 'mom'
- >>> ftq('hi & mom')
- hi&mom <=> 'hi' & 'mom'
-
>>> ftq('hi OR mom')
hi|mom <=> 'hi' | 'mom'
- >>> ftq('hi | mom')
- hi|mom <=> 'hi' | 'mom'
-
- >>> ftq('hi & -dad')
+ >>> ftq('hi AND NOT dad')
hi&!dad <=> 'hi' & !'dad'
-
Brackets are allowed to specify precidence
- >>> ftq('(HI OR HELLO) & mom')
+ >>> ftq('(HI OR HELLO) AND mom')
(hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
>>> ftq('Hi(Mom)')
@@ -203,19 +196,16 @@
>>> ftq('foo(bar OR baz)') # Bug #32071
foo&(bar|baz) <=> 'foo' & ( 'bar' | 'baz' )
- >>> ftq('foo (bar OR baz)')
- foo&(bar|baz) <=> 'foo' & ( 'bar' | 'baz' )
-
We also support negation
- >>> ftq('!Hi')
+ >>> ftq('NOT Hi')
!hi <=> !'hi'
- >>> ftq('-(Hi & Mom)')
+ >>> ftq('NOT(Hi AND Mom)')
!(hi&mom) <=> !( 'hi' & 'mom' )
- >>> ftq('Foo & ! Bar')
+ >>> ftq('Foo AND NOT Bar')
foo&!bar <=> 'foo' & !'bar'
@@ -224,7 +214,7 @@
>>> ftq('Hi Mom')
hi&mom <=> 'hi' & 'mom'
- >>> ftq('Hi -mom')
+ >>> ftq('Hi NOT mom')
hi&!mom <=> 'hi' & !'mom'
>>> ftq('hi (mom OR mum)')
@@ -233,18 +223,34 @@
>>> ftq('(hi OR hello) mom')
(hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
- >>> ftq('(hi OR hello) -mom')
+ >>> ftq('(hi OR hello) NOT mom')
(hi|hello)&!mom <=> ( 'hi' | 'hello' ) & !'mom'
>>> ftq('(hi ho OR hoe) work go')
(hi&ho|hoe)&work&go <=> ( 'hi' & 'ho' | 'hoe' ) & 'work' & 'go'
-If a single '-' precedes a word, it is converted into the '!' operator.
-Note also that a trailing '-' is dropped by to_tsquery().
-
- >>> ftq('-foo bar-')
- !foo&bar- <=> !'foo' & 'bar'
+'-' symbols are treated by the Postgres FTI parser context sensitive.
+If they precede a word, they are removed.
+
+ >>> print search_same('foo -bar')
+ FTI data: 'bar':2 'foo':1
+ query: 'foo' & 'bar'
+ match: True
+
+If a '-' precedes a number, it is retained.
+
+ >>> print search_same('123 -456')
+ FTI data: '-456':2 '123':1
+ query: '123' & '-456'
+ match: True
+
+Trailing '-' are always ignored.
+
+ >>> print search_same('bar- 123-')
+ FTI data: '123':2 'bar':1
+ query: 'bar' & '123'
+ match: True
Repeated '-' are simply ignored by to_tsquery().
@@ -259,6 +265,12 @@
query: 'foo-bar' & 'foo' & 'bar'
match: True
+A '-' surrounded by numbers is treated as the sign of the right-hand number.
+
+ >>> print search_same('123-456')
+ FTI data: '-456':2 '123':1
+ query: '123' & '-456'
+ match: True
Punctuation is handled consistently. If a string containing punctuation
appears in an FTI, it can also be passed to ftq(),and a search for this
@@ -342,11 +354,36 @@
>>> print search('some text <div>whatever</div>', 'div')
FTI data: 'text':2 'whatev':3 query: 'div' match: False
-Treatment of characters that are used as operators in to_tsquery():
+The symbols '&', '|' and '!' are treated as operators by to_tsquery();
+to_tsvector() treats them as whitespace. ftq() converts the words 'AND',
+'OR', 'NOT' are into these operators expected by to_tsquery(), and it
+replaces the symbols '&', '|' and '!' with spaces. This avoids
+surprising search results when the operator symbols appear accidentally
+in search terms, e.g., by using a plain copy of a source code line as
+the search term.
>>> ftq('cool!')
cool <=> 'cool'
+ >>> print search_same('Shell scripts usually start with #!/bin/sh.')
+ FTI data: '/bin/sh':6 'script':2 'shell':1 'start':4 'usual':3
+ query: 'shell' & 'script' & 'usual' & 'start' & '/bin/sh'
+ match: True
+
+ >>> print search_same('int foo = (bar & ! baz) | bla;')
+ FTI data: 'bar':3 'baz':4 'bla':5 'foo':2 'int':1
+ query: 'int' & 'foo' & 'bar' & 'baz' & 'bla'
+ match: True
+
+Queries containing only punctuation symbols yield an empty ts_query
+object. Note that _ftq() first replaces the '!' with a ' '; later on,
+_ftq() joins the two remaining terms '?' and '.' with the "AND"
+operator '&'. Finally, to_tsquery() detects the AND combination of
+two symbols that are not tokenized and returns null.
+
+ >>> ftq('?!.') # Bug 1020443
+ ?&. <=> None
+
Email addresses are retained as a whole, both by to_tsvector() and by
ftq().
@@ -430,11 +467,17 @@
>>> ftq("administrate")
administrate <=> 'administr'
+Note that stemming is not always idempotent:
+
+ >>> ftq('extension')
+ extension <=> 'extens'
+ >>> ftq('extens')
+ extens <=> 'exten'
Dud queries are 'repaired', such as doubled operators, trailing operators
or invalid leading operators
- >>> ftq('hi & OR mom')
+ >>> ftq('hi AND OR mom')
hi&mom <=> 'hi' & 'mom'
>>> ftq('(hi OR OR hello) AND mom')
@@ -443,7 +486,7 @@
>>> ftq('(hi OR AND hello) AND mom')
(hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
- >>> ftq('(hi OR -AND hello) AND mom')
+ >>> ftq('(hi OR NOT AND hello) AND mom')
(hi|!hello)&mom <=> ( 'hi' | !'hello' ) & 'mom'
>>> ftq('(hi OR - AND hello) AND mom')
@@ -452,13 +495,13 @@
>>> ftq('hi AND mom AND')
hi&mom <=> 'hi' & 'mom'
- >>> ftq('& hi & mom')
+ >>> ftq('AND hi AND mom')
hi&mom <=> 'hi' & 'mom'
- >>> ftq('(& hi | hello) AND mom')
+ >>> ftq('(AND hi OR hello) AND mom')
(hi|hello)&mom <=> ( 'hi' | 'hello' ) & 'mom'
- >>> ftq('() hi mom ( ) ((! |((&)))) :-)')
+ >>> ftq('() hi mom ( ) ((NOT OR((AND)))) :-)')
(hi&mom&-) <=> 'hi' & 'mom'
>>> ftq("(hi mom")
@@ -502,10 +545,10 @@
Bug #160236
- >>> ftq("foo&&bar-baz")
+ >>> ftq("foo AND AND bar-baz")
foo&bar-baz <=> 'foo' & 'bar-baz' & 'bar' & 'baz'
- >>> ftq("foo||bar.baz")
+ >>> ftq("foo OR OR bar.baz")
foo|bar.baz <=> 'foo' | 'bar.baz'
Follow ups