launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #00270
[Merge] lp:~lifeless/launchpad/malone into lp:launchpad/devel
Robert Collins has proposed merging lp:~lifeless/launchpad/malone into lp:launchpad/devel.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
#607776 +filebug-show-similar FTI query timeouts
https://bugs.launchpad.net/bugs/607776
Further improve the performance of bug duplicate detection by switching from | (any term matches) to & (all terms must match), with an expansion to handle one missing term.
This reduces over 90% of the overhead in tests so far, taking a 8second operation(on staging) down to 800msec.
I'm putting this up for review now to get feedback on the changes; I'm going to spin it up on staging and see if it delivers what it looks like it promises.
--
https://code.launchpad.net/~lifeless/launchpad/malone/+merge/30904
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~lifeless/launchpad/malone into lp:launchpad/devel.
=== modified file 'lib/canonical/database/nl_search.py'
--- lib/canonical/database/nl_search.py 2010-07-21 19:55:38 +0000
+++ lib/canonical/database/nl_search.py 2010-07-25 22:07:47 +0000
@@ -73,17 +73,30 @@
def _nl_phrase_search(terms, table, constraints, extra_constraints_tables):
"""Perform a very simple pruning of the phrase, letting fti do ranking.
+ This function groups the terms with & clause, and creates an additional
+ & grouping for each subset of terms created by discarding one term.
+
+ When there are less than 3 terms, additional groupings are not created.
+
See nl_phrase_search for the contract of this function.
"""
- bad_words = set(['firefox', 'ubuntu', 'launchpad'])
terms = set(terms)
- filtered_terms = []
- for term in terms:
- if term.lower() in bad_words:
- continue
- filtered_terms.append(term)
- # sorted for testing convenience.
- return '|'.join(sorted(filtered_terms))
+ # As a special case, we don't expand a 2-term search : its very slow and
+ # too noisy at the moment.
+ # sorted for doctesting convenience - should have no impact on tsearch2.
+ if len(terms) < 3:
+ return '&'.join(sorted(terms))
+ # Expand
+ and_groups = [None] * (len(terms) + 1)
+ for pos in range(len(terms) + 1):
+ and_groups[pos] = set(terms)
+ # sorted for doctesting convenience - should have no impact on tsearch2.
+ for pos, term in enumerate(sorted(terms)):
+ and_groups[pos + 1].discard(term)
+ # sorted for doctesting convenience - should have no impact on tsearch2.
+ and_clauses = ['(' + '&'.join(sorted(group)) + ')'
+ for group in and_groups]
+ return '|'.join(and_clauses)
def _slow_nl_phrase_search(terms, table, constraints,
=== modified file 'lib/canonical/launchpad/doc/textsearching.txt'
--- lib/canonical/launchpad/doc/textsearching.txt 2010-07-21 19:55:38 +0000
+++ lib/canonical/launchpad/doc/textsearching.txt 2010-07-25 22:07:47 +0000
@@ -622,11 +622,17 @@
>>> nl_phrase_search('how do I do this?', Question)
''
-The fast code path only removes firefox, ubuntu and launchpad today:
+The fast code path does not remove any terms. Rather it uses an & query over
+all the terms combined with an & query for each ordinal-1 subset of the terms:
>>> nl_phrase_search('system is slow when running firefox on ubuntu',
... Question)
- u'run|slow|system'
+ u'(firefox&run&slow&system&ubuntu)|(run&slow&system&ubuntu)|(firefox&slow&system&ubuntu)|(firefox&run&system&ubuntu)|(firefox&run&slow&ubuntu)|(firefox&run&slow&system)'
+
+Short queries are not expanded like this:
+
+ >>> nl_phrase_search('system is slow', Question)
+ u'slow&system'
==== Using other constraints ====
@@ -655,10 +661,10 @@
... ['Product'], fast_enabled=False)
u'slow|flickr'
-When the query only has stop words or common words in it, the returned
-query will be the empty string:
+When the query only has stop words in it, the returned query will be the empty
+string:
- >>> nl_phrase_search('ubuntu will not do it', Question)
+ >>> nl_phrase_search('will not do it', Question)
''
When there are no candidate rows, only stemming and stop words removal
@@ -668,7 +674,7 @@
0
>>> nl_phrase_search('firefox is very slow on flickr', Question,
... 'product = -1')
- u'flickr|slow'
+ u'(firefox&flickr&slow)|(flickr&slow)|(firefox&slow)|(firefox&flickr)'
==== No keywords filtering with few rows ====
@@ -705,4 +711,4 @@
... 'firefox is slow', Question,
... 'distribution = %s AND sourcepackagename = %s' % sqlvalues(
... ubuntu, firefox_package.sourcepackagename))
- u'slow'
+ u'firefox&slow'
=== modified file 'lib/lp/answers/doc/faqtarget.txt'
--- lib/lp/answers/doc/faqtarget.txt 2010-07-21 19:55:38 +0000
+++ lib/lp/answers/doc/faqtarget.txt 2010-07-25 22:07:47 +0000
@@ -159,7 +159,6 @@
>>> for faq in target.findSimilarFAQs('How do I use the Answer Tracker'):
... print faq.title
How to answer a question
- How to use bug mail
How to become a Launchpad king
The results are ordered by relevancy. The first document is considered
=== modified file 'lib/lp/answers/doc/questiontarget.txt'
--- lib/lp/answers/doc/questiontarget.txt 2010-07-21 19:55:38 +0000
+++ lib/lp/answers/doc/questiontarget.txt 2010-07-25 22:07:47 +0000
@@ -359,10 +359,6 @@
... print t.title
New question
Another question
- Question title3
- Question title2
- Question title1
- Question title0
=== modified file 'lib/lp/answers/stories/question-add-in-other-languages.txt'
--- lib/lp/answers/stories/question-add-in-other-languages.txt 2010-07-22 05:48:07 +0000
+++ lib/lp/answers/stories/question-add-in-other-languages.txt 2010-07-25 22:07:47 +0000
@@ -41,10 +41,18 @@
understood by any member of the support community.
>>> similar_questions = find_tag_by_id(browser.contents, 'similar-questions')
- >>> for row in similar_questions.findAll('tr', 'noted'):
- ... row.first('a').renderContents()
- 'Installation of Java Runtime Environment for Mozilla'
- 'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'
+
+ XXX: Making search fast has a significant impact on this tests' use case,
+ because there are 9 terms - the query has to ignore 7 of the terms to
+ permit a hit under the new & based logic (which is needed for scaling).
+ When we push better relevance and cheap ordering into the core we will be
+ able to test this case again. Note that in production this exact use case
+ already finds nothing.
+
+ #>>> for row in similar_questions.findAll('tr', 'noted'):
+ #... row.first('a').renderContents()
+ #'Installation of Java Runtime Environment for Mozilla'
+ #'Problema al recompilar kernel con soporte smp (doble-n\xc3\xbacleo)'
>>> for tag in find_tags_by_class(browser.contents, 'warning message'):
... print tag.renderContents()
=== modified file 'lib/lp/answers/stories/question-add.txt'
--- lib/lp/answers/stories/question-add.txt 2010-07-22 05:48:07 +0000
+++ lib/lp/answers/stories/question-add.txt 2010-07-25 22:07:47 +0000
@@ -66,8 +66,17 @@
The user enters his problem summary and a list of similar FAQs and
questions are displayed.
+XXX: Original search, disabled due to performance issues RBC 20100725. This
+will be reinstated when cheap relevance filtering is available / when search
+is overhauled.
+
>>> user_browser.getControl('Summary').value = (
... 'Visiting a web page requiring java crashes firefox')
+
+For now, use a closer search:
+
+ >>> user_browser.getControl('Summary').value = (
+ ... 'java web pages')
>>> user_browser.getControl('Continue').click()
>>> contents = find_main_content(user_browser.contents)
>>> similar_faqs = contents.find(id='similar-faqs')
@@ -122,7 +131,7 @@
summary he provided.
>>> user_browser.getControl('Summary').value
- 'Visiting a web page requiring java crashes firefox'
+ 'java web pages'
If the user doesn't provide details, he'll get an error message:
=== modified file 'lib/lp/bugs/doc/bugtask.txt'
--- lib/lp/bugs/doc/bugtask.txt 2010-07-21 19:55:38 +0000
+++ lib/lp/bugs/doc/bugtask.txt 2010-07-25 22:07:47 +0000
@@ -1365,7 +1365,6 @@
>>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
... print "%s: %s" % (similar_bug.id, similar_bug.title)
1: Firefox does not support SVG
- 4: Reflow problems with complex page layouts
5: Firefox install instructions should be complete
This also works for distributions...
@@ -1375,12 +1374,6 @@
>>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
... print "%s: %s" % (similar_bug.id, similar_bug.title)
1: Firefox does not support SVG
- 2: Blackhole Trash folder
- 9: Thunderbird crashes
- 10: another test bug
- 16: a test bug
- 17: a test bug
- 18: New Bug
... and for SourcePackages.
@@ -1394,7 +1387,6 @@
>>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
... print "%s: %s" % (similar_bug.id, similar_bug.title)
1: Firefox does not support SVG
- 18: New Bug
Private bugs won't show up in the list of similar bugs unless the user
is a direct subscriber. We'll demonstrate this by creating a new bug
@@ -1406,7 +1398,6 @@
>>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
... print "%s: %s" % (similar_bug.id, similar_bug.title)
1: Firefox does not support SVG
- 4: Reflow problems with complex page layouts
5: Firefox install instructions should be complete
...Yet another Firefox bug
@@ -1420,7 +1411,6 @@
>>> for similar_bug in sorted(similar_bugs, key=attrgetter('id')):
... print "%s: %s" % (similar_bug.id, similar_bug.title)
1: Firefox does not support SVG
- 4: Reflow problems with complex page layouts
5: Firefox install instructions should be complete
=== modified file 'lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt'
--- lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt 2010-07-22 05:48:07 +0000
+++ lib/lp/bugs/stories/guided-filebug/xx-sorting-by-relevance.txt 2010-07-25 22:07:47 +0000
@@ -5,6 +5,11 @@
reason, the search result is sorted by how well the given summary
matches the bug's summary and description.
+These bugs used to return more hits, but the more restrictive search introduced
+to bandaid search scaling issues inhibits that in the sample data; when we get
+a relevance-integrated index we'll be able to do relevance ordered, limited
+searches efficiently and these tests can be expanded at that point.
+
>>> user_browser.open("http://launchpad.dev/products/firefox/+filebug")
>>> user_browser.getControl("Summary").value = (
@@ -16,10 +21,6 @@
>>> print_bugs_list(user_browser.contents, "similar-bugs")
#1 Firefox does not support SVG
New (1 comment) last updated 2006-05-19...
- Firefox needs to support embedded SVG...
- #4 Reflow problems with complex page layouts
- New (0 comments) last updated 2006-07-14...
- Malone pages that use more complex layouts...
If we instead enter a summary that matches bug #4 better, the result will
be reversed.
@@ -27,16 +28,10 @@
>>> user_browser.open("http://launchpad.dev/products/firefox/+filebug")
>>> user_browser.getControl("Summary").value = (
- ... "Reflow problems when there are SVG images on the page")
+ ... "Reflow problems with SVG")
>>> user_browser.getControl("Continue").click()
>>> print_bugs_list(user_browser.contents, "similar-bugs")
#4 Reflow problems with complex page layouts
New (0 comments) last updated 2006-07-14...
Malone pages that use more complex layouts...
- #1 Firefox does not support SVG
- New (1 comment) last updated 2006-05-19...
- Firefox needs to support embedded SVG...
- #5 Firefox install instructions should be complete
- New (0 comments) last updated 2006-07-14...
- All ways of downloading firefox should provide complete...